notesum.ai
Published at October 31Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers
cs.LG
cs.AI
Released Date: October 31, 2024
Authors: Kai Yan1, Alexander G. Schwing1, Yu-Xiong Wang1
Aff.: 1University of Illinois Urbana-Champaign

| TD3+BC | IQL | ODT | PDT | TD3 | DDPG+ODT | TD3+ODT (ours) | |
|---|---|---|---|---|---|---|---|
| P-E-v1 | 47.88(-84.23) | 149.65(-3.63) | 121.82(+5.48) | 25.07(+25.56) | 61.56(-69.74) | 2.75(-129.63) | 120.65(-11.91) |
| P-C-v1 | 3.75(-10.8) | 78.12(+25.01) | 22.88(-24.0) | 14.05(+12.15) | 58.04(-17.39) | -1.41(-81.87) | 133.77(+58.05) |
| P-H-v1 | 26.77(+3.19) | 96.5(+27.79) | 27.55(-13.91) | 4.03(+0.38) | 38.58(-57.71) | 2.09(-92.56) | 107.1(+11.87) |
| H-E-v1 | 3.11(-0.02) | 126.54(+13.28) | 123.07(+12.79) | 98.95(+98.94) | 93.99(-31.41) | -0.24(-127.05) | 129.8(+6.34) |
| H-C-v1 | 0.33(+0.03) | 2.27(+0.58) | 0.84(+0.32) | 0.66(+0.66) | 0.07(-0.69) | 0.12(-0.83) | 126.39(+124.59) |
| H-H-v1 | 0.17(-0.3) | 16.12(+14.18) | 0.97(-0.13) | 32.76(+32.75) | -0.03(-1.11) | -0.06(-1.02) | 116.83(+115.82) |
| D-E-v1 | -0.34(-0.01) | 97.57(-7.92) | 50.26(+50.14) | 59.48(+59.41) | 76.92(-25.56) | 26.48(-78.72) | 103.13(-1.94) |
| D-C-v1 | -0.36(-0.01) | 23.8(+21.66) | 5.45(+5.37) | 1.38(+1.54) | 0.17(-4.8) | -0.01(-4.45) | 58.28(+53.31) |
| D-H-v1 | -0.33(-0.1) | 34.64(+29.65) | 10.61(+6.69) | 0.05(+0.22) | -0.14(-9.22) | 12.39(-12.33) | 65.24(+55.94) |
| R-E-v1 | -1.37(+0.22) | 105.78(+2.81) | 101.16(+2.11) | 66.57(+66.7) | 0.44(-106.67) | 0.26(-106.48) | 91.38(-16.19) |
| R-C-v1 | -0.3(+0.0) | 1.1(+0.97) | 0.06(+0.08) | -0.03(+0.04) | -0.19(-0.29) | -0.12(-0.32) | 0.36(+0.26) |
| R-H-v1 | -0.08(+0.1) | 1.6(+1.5) | 0.04(+0.05) | 0.04(+0.17) | -0.17(-0.28) | -0.1(-0.27) | 1.19(+0.99) |
| Average | 6.6(-7.66) | 61.14(+10.49) | 38.73(+3.75) | 25.25(+24.87) | 31.85(-29.63) | 3.51(-52.96) | 87.84(+33.09) |