notesum.ai

Published at October 21

cs.LG

cs.AI

Released Date: October 21, 2024

Authors: Di Zeng¹, Ling Zheng¹, Xiantong Yang¹, Yinong Li¹

Aff.: ¹College of Mechanical and Vehicle Engineering, Chongqing University, Shazheng Street, Chongqing, 40044, Chongqing, China.

Hyperparameters	AVRL	QCMAE	(H)RITP
Dropout probability $p_{i}$	0.1	0.1	0.1
Planning horizon $T_{\text{p}}$	80	80	80
Historical time horizon $T$	-	50	50
Hidden feature dimension $D$	64	64	64
Number of modes $K$	-	6	6
Exploration strength $\beta$	-	-	0.1
Policy noise strength $\beta$	-	-	0.2
Clipping boundary $c$	-	-	0.5
Optimizer	Adam [45]	AdamW [46]	AdamW
Learning rate	3e-7	5e-5 $\rightarrow$ 5e-4 $\rightarrow$ 5e-5	5e-5
Learning rate scheduler	-	OneCycle [47]	-
Batch size $N$	1	4	4, 1¹¹1Footnote for Data 2
Discount factor $\gamma$	-	-	0.99
Target update rate $\xi$	-	-	0.005
Update delay $d$	-	-	2
Uncertainty-penalization strength $\lambda$	-	-	1.5
Number of stochastic forward passes $O$	10	-	-
Max number of experiences in replay buffer	-	-	1e4
Total training epochs	10	60	-
Total training steps	-	-	1e5
\botrule