notesum.ai

Published at May 13

Variational Delayed Policy Optimization

NeurIPS

Spotlight

Released Date: May 13, 2024

Authors: Qingyuan Wu¹, Simon Sinong Zhan², Yixuan Wang², Yuhui Wang³, Chung-Wei Lin⁴, Chen Lv⁵, Qi Zhu², Chao Huang¹

Aff.: ¹University of Southampton; ²Northwestern University; ³King Abdullah University of Science and Technology; ⁴National Taiwan University; ⁵Nanyang Technological University

Arxiv: https://openreview.net/pdf/d97a3ffd9d96cadb9fcf33a58bd18c416933fd62.pdf

Task (Delays=5)	A-SAC	DC/AC	DIDA	BPQL	AD-SAC	VDPO (ours)
Ant-v4	$\times$	$\times$	$\times$	$\times$	$\times$	0.42M
HalfCheetah-v4	$\times$	$\times$	$\times$	0.99M	0.56M	0.44M
Hopper-v4	0.83M	0.35M	$\times$	0.29M	0.12M	0.07M
Humanoid-v4	$\times$	$\times$	$\times$	$\times$	$\times$	0.67M
HumanoidStandup-v4	0.64M	0.35M	0.10M	0.09M	0.14M	0.14M
Pusher-v4	0.17M	0.02M	0.10M	0.27M	0.04M	0.01M
Reacher-v4	$\times$	0.61M	0.10M	0.90M	0.44M	0.77M
Swimmer-v4	$\times$	0.94M	0.10M	$\times$	0.13M	0.07M
Walker2d-v4	$\times$	$\times$	$\times$	0.52M	0.67M	0.25M