notesum.ai
Published at December 5Marvel: Accelerating Safe Online Reinforcement Learning with Finetuned Offline Policy
cs.LG
cs.AI
Released Date: December 5, 2024
Authors: Keru Chen1, Honghao Wei2, Zhigang Deng3, Sen Lin3
Aff.: 1Xi'an Jiaotong University; 2Washington State University; 3University of Houston

| VPA | BallCircle | CarRun | |||
|---|---|---|---|---|---|
| random | dataset | random | dataset | ||
| Q-value | before | -0.2387 | -0.3852 | -0.1143 | -0.5078 |
| after | 0.5661 | 0.8278 | -0.0125 | 0.8314 | |
| Qc-value | before | -0.2521 | 0.1725 | -0.2431 | -0.4327 |
| after | 0.3579 | 0.8252 | 0.1254 | 0.4937 | |