notesum.ai
Published at November 17Mitigating Relative Over-Generalization in Multi-Agent Reinforcement Learning
cs.AI
stat.ML
Released Date: November 17, 2024
Authors: Ting Zhu1, Yue Jin2, Jeremie Houssineau3, Giovanni Montana4
Aff.: 1Department of Statistics, University of Warwick, Coventry, UK; 2Warwick Manufacturing Group, University of Warwick, Coventry, UK; 3School of Physical & Mathematical Sciences, Nanyang Technological University, Singapore; 4Alan Turing Institute, London, UK

| Setting | Setting | MMQ | IDDPG | HyDDPG | I2Q* | I2Q |
|---|---|---|---|---|---|---|
| DG | 19.550.16 | 14.674.61 | 19.470.17 | 17.843.01 | 4.23 | |
| MPE Tasks | CN | -15.661.75 | -30.9112.78 | -19.121.56 | -51.2412.01 | -33.1611.82 |
| CN+more penalty | -18.012.24 | -60.250.07 | -50.1213.24 | -60.550.36 | -60.540.70 | |
| CN+HT | -24.250.49 | -40.3426.01 | -30.936.92 | -56.0440.80 | -35.9515.83 | |
| CN+HA | -17.634.12 | -45.7613.99 | -28.219.71 | -60.490.15 | -60.250.03 | |
| PP | -35.280.40 | -56.630.11 | -49.407.18 | -57.100.33 | -56.670.14 | |
| Sequential Task | -215.0959.97 | -295.319.75 | -233.5553.21 | -300.530.45 | -266.4243.37 | |
| Half-Cheetah | -134.0916.05 | -163.819.61 | -152.6610.32 | -135.654.60 | -140.948.55 |