notesum.ai
Published at November 29Proto Successor Measure: Representing the Space of All Possible Solutions of Reinforcement Learning
cs.LG
cs.AI
Released Date: November 29, 2024
Authors: Siddhant Agarwal1, Harshit Sikchi1, Peter Stone2, Amy Zhang3
Aff.: 1The University of Texas at Austin; 2The University of Texas at Austin, Sony AI; 3The University of Texas at Austin, Meta AI

| Environment | Task | Laplace | FB | HILP | PSM |
|---|---|---|---|---|---|
| Walker | Stand | 243.70 151.40 | 902.63 38.94 | 607.07 165.28 | 872.61 38.81 |
| Run | 63.65 31.02 | 392.76 31.29 | 107.84 34.24 | 351.50 19.46 | |
| Walk | 190.53 168.45 | 877.10 81.05 | 399.67 39.31 | 891.44 46.81 | |
| Flip | 48.73 17.66 | 206.22 162.27 | 277.95 59.63 | 640.75 31.88 | |
| Average | 136.65 | 594.67 | 348.13 | 689.07 | |
| Cheetah | Run | 96.32 35.69 | 257.59 58.51 | 68.22 47.08 | 276.41 70.23 |
| Run Backward | 106.38 29.4 | 307.07 14.91 | 37.99 25.16 | 286.13 25.38 | |
| Walk | 409.15 56.08 | 799.83 67.51 | 318.30 168.42 | 887.02 59.87 | |
| Walk Backward | 654.29 219.81 | 980.76 2.32 | 349.61 236.29 | 980.90 2.04 | |
| Average | 316.53 | 586.31 | 193.53 | 607.61 | |
| Quadruped | Stand | 854.50 41.47 | 740.05 107.15 | 409.54 97.59 | 842.86 82.18 |
| Run | 412.98 54.03 | 386.67 32.53 | 205.44 47.89 | 431.77 44.69 | |
| Walk | 494.56 62.49 | 566.57 53.22 | 218.54 86.67 | 603.9773.67 | |
| Jump | 642.84 114.15 | 581.28 107.38 | 325.51 93.06 | 596.37 94.23 | |
| Average | 601.22 | 568.64 | 289.75 | 618.74 | |
| Pointmass | Reach Top Left | 713.46 58.90 | 897.83 35.79 | 944.46 12.94 | 831.43 69.51 |
| Reach Top Right | 581.14 214.79 | 274.95 197.90 | 96.04 166.34 | 730.27 58.10 | |
| Reach Bottom Left | 689.05 37.08 | 517.23 302.63 | 192.34 177.48 | 451.38 73.46 | |
| Reach Bottom Right | 21.29 42.54 | 19.3733.54 | 0.17 0.29 | 43.29 38.40 | |
| Average | 501.23 | 427.34 | 308.25 | 514.09 |