notesum.ai
Published at May 8Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers
NeurIPS
Spotlight
Released Date: May 8, 2024
Authors: Kai Yan1, Alex Schwing, Yu-Xiong Wang1
Aff.: 1University of Illinois Urbana-Champaign
Arxiv: https://openreview.net/pdf/ecce2f00d7d353921acab4fa6ec0c37938a55c2a.pdf

| TD3+BC | IQL | ODT | PDT | TD3 | DDPG+ODT | TD3+ODT (ours) | |
|---|---|---|---|---|---|---|---|
| Ho-M-v2 | 60.24(+4.4) | 44.72(-21.3) | 97.84(+48.69) | 74.43(+72.21) | 88.98(+29.25) | 41.7(-13.18) | 89.07(+25.97) |
| Ho-MR-v2 | 99.07(+33.33) | 62.76(-7.63) | 83.29(+63.17) | 84.53(+82.23) | 93.72(+55.66) | 32.36(+9.9) | 95.65(+65.89) |
| Ho-R-v2 | 8.36(-0.35) | 20.42(+12.36) | 29.08(+26.92) | 35.9(+34.67) | 75.68(+73.69) | 25.12(+23.14) | 76.13(+74.15) |
| Ha-M-v2 | 51.29(+2.73) | 37.12(-10.35) | 42.27(+19.23) | 39.35(+39.55) | 70.9(+29.59) | 55.69(+14.71) | 76.91(+35.3) |
| Ha-MR-v2 | 56.5(+13.07) | 49.97(+6.84) | 41.45(+26.77) | 31.47(+31.8) | 69.87(+40.59) | 53.71(+24.91) | 73.27(+43.98) |
| Ha-R-v2 | 44.78(+31.12) | 47.85(+40.3) | 2.15(-0.09) | 0.74(+0.9) | 68.55(+66.3) | 34.56(+32.31) | 59.35(+57.1) |
| Wa-M-v2 | 85.34(+3.49) | 65.55(-15.12) | 75.57(+18.47) | 63.37(+63.3) | 90.49(+24.74) | 2.01(-69.54) | 97.86(+27.08) |
| Wa-MR-v2 | 83.28(+0.0) | 95.99(+28.78) | 77.2(+12.46) | 54.49(+54.18) | 100.88(+32.54) | 1.04(-60.59) | 100.6(+42.54) |
| Wa-R-v2 | 6.99(+5.86) | 10.67(+4.96) | 14.12(+9.82) | 15.47(+15.32) | 69.91(+66.31) | 2.91(-2.47) | 57.86(+53.27) |
| An-M-v2 | 129.11(+7.11) | 110.36(+14.26) | 88.1(-0.51) | 52.08(+48.47) | 125.67(+37.55) | 10.81(-75.52) | 132.0(+41.42) |
| An-MR-v2 | 129.33(+41.03) | 113.16(+24.24) | 85.64(+4.49) | 36.92(+32.41) | 133.58(+51.17) | 4.05(-87.7) | 130.23(+52.08) |
| An-R-v2 | 67.89(+33.47) | 12.28(+0.97) | 24.96(-6.44) | 14.88(+10.38) | 63.47(+32.02) | 4.93(-26.55) | 71.69(+40.31) |
| Average | 68.52(+14.6) | 55.9(+7.8) | 55.14(+18.58) | 41.97(+40.44) | 87.64(+44.95) | 22.87(-19.22) | 88.38(+46.59) |