notesum.ai
Published at May 13Variational Delayed Policy Optimization
NeurIPS
Spotlight
Released Date: May 13, 2024
Authors: Qingyuan Wu1, Simon Sinong Zhan2, Yixuan Wang2, Yuhui Wang3, Chung-Wei Lin4, Chen Lv5, Qi Zhu2, Chao Huang1
Aff.: 1University of Southampton; 2Northwestern University; 3King Abdullah University of Science and Technology; 4National Taiwan University; 5Nanyang Technological University
Arxiv: https://openreview.net/pdf/d97a3ffd9d96cadb9fcf33a58bd18c416933fd62.pdf

| Task (Delays=5) | A-SAC | DC/AC | DIDA | BPQL | AD-SAC | VDPO (ours) |
| Ant-v4 | 0.42M | |||||
| HalfCheetah-v4 | 0.99M | 0.56M | 0.44M | |||
| Hopper-v4 | 0.83M | 0.35M | 0.29M | 0.12M | 0.07M | |
| Humanoid-v4 | 0.67M | |||||
| HumanoidStandup-v4 | 0.64M | 0.35M | 0.10M | 0.09M | 0.14M | 0.14M |
| Pusher-v4 | 0.17M | 0.02M | 0.10M | 0.27M | 0.04M | 0.01M |
| Reacher-v4 | 0.61M | 0.10M | 0.90M | 0.44M | 0.77M | |
| Swimmer-v4 | 0.94M | 0.10M | 0.13M | 0.07M | ||
| Walker2d-v4 | 0.52M | 0.67M | 0.25M |