notesum.ai
Published at May 10e-COP : Episodic Constrained Optimization of Policies
NeurIPS
Released Date: May 10, 2024
Authors: Akhil Agnihotri1, Rahul Jain2, Deepak Ramachandran3, Sahil Singla3
Aff.: 1University of Southern California; 2Google DeepMind and USC; 3Google DeepMind
Arxiv: https://openreview.net/pdf/e2087acb4d653ff52e076f87d71a30d1772bd69a.pdf

| Task | e-COP | FOCOPS [43] | PPO-L [30] | PCPO [39] | P3O [41] | CPO [3] | APPO [15] | IPO [26] | |
|---|---|---|---|---|---|---|---|---|---|
| Humanoid | R | 1652.5 13.4 | 1734.1 27.4 | 1431.2 25.2 | 1602.3 10.1 | 1669.4 13.7 | 1465.1 55.3 | 1488.2 29.3 | 1578.6 25.2 |
| C (20.0) | 17.3 0.3 | 19.7 0.6 | 18.8 1.5 | 16.3 1.4 | 20.1 3.3 | 18.5 2.9 | 20.0 1.3 | 19.1 2.5 | |
| PointCircle | R | 110.5 9.3 | 81.6 8.4 | 57.2 9.2 | 68.2 9.1 | 89.1 7.1 | 65.3 5.3 | 91.2 9.6 | 68.7 15.2 |
| C (10.0) | 9.8 0.9 | 10.0 0.4 | 9.8 0.5 | 9.9 0.4 | 9.9 0.3 | 9.5 0.9 | 10.2 0.6 | 9.3 0.5 | |
| AntCircle | R | 198.6 7.4 | 161.9 22.2 | 134.4 10.3 | 168.3 13.3 | 182.6 18.7 | 127.1 12.1 | 155.5 19.4 | 149.3 33.6 |
| C (10.0) | 9.8 0.6 | 9.9 0.5 | 9.6 1.6 | 9.5 0.6 | 9.8 0.2 | 10.1 0.7 | 10.0 0.5 | 9.5 1.0 | |
| PointReach | R | 81.5 10.2 | 65.1 9.6 | 46.1 14.8 | 73.2 7.4 | 76.3 6.4 | 89.2 8.1 | 74.3 6.7 | 49.1 10.6 |
| C (25.0) | 24.5 6.1 | 24.8 7.6 | 25.1 6.1 | 24.9 5.6 | 26.3 6.9 | 33.3 10.7 | 26.3 8.1 | 24.7 11.3 | |
| AntReach | R | 70.8 14.6 | 48.3 5.6 | 54.2 9.5 | 39.4 5.3 | 73.6 5.1 | 102.3 7.1 | 61.5 10.4 | 45.2 13.3 |
| C (25.0) | 24.2 8.4 | 25.1 11.9 | 21.9 10.7 | 27.9 12.2 | 24.8 7.3 | 35.1 10.9 | 24.5 6.4 | 24.9 9.2 | |
| R | 258.1 33.1 | 215.4 45.6 | 276.3 57.9 | 226.5 29.2 | 201.5 39.2 | 178.1 23.8 | 184.4 21.5 | 229.4 32.8 | |
| Grid | C (75.0) | 71.3 26.9 | 76.6 29.8 | 71.8 25.1 | 72.6 16.5 | 79.3 19.3 | 69.3 19.8 | 79.5 35.8 | 74.2 24.6 |
| Bottleneck | R | 345.1 52.6 | 251.3 59.1 | 298.3 71.2 | 264.2 43.8 | 291.1 26.7 | 388.1 36.6 | 220.1 30.1 | 279.3 43.8 |
| C (50.0) | 49.7 15.1 | 46.6 19.8 | 41.4 17.6 | 49.8 10.5 | 45.3 8.2 | 54.3 13.5 | 47.4 12.3 | 48.2 14.6 | |
| Navigation | R | 217.6 11.5 | 175.1 3.7 | 153.5 25.2 | 135.7 19.2 | 164.1 12.8 | |||
| C1 (10.0) | 9.6 1.5 | n/a | 9.9 1.9 | n/a | 9.9 1.7 | n/a | 9.9 2.1 | 10.0 0.5 | |
| C2 (25.0) | 23.7 4.1 | 22.3 2.1 | 24.5 4.1 | 23.9 3.8 | 24.6 3.1 | ||||