notesum.ai
Published at December 9Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone
cs.LG
cs.AI
Released Date: December 9, 2024
Authors: Max Sobol Mark1, Tian Gao2, Georgia Gabriela Sampaio2, Mohan Kumar Srirama1, Archit Sharma2, Chelsea Finn2, Aviral Kumar1
Aff.: 1Carnegie Mellon University; 2Stanford University

| Domain / Task | IDQL | DQL | DPPO | Cal-QL | PA-RL + Cal-QL (Ours) |
|---|---|---|---|---|---|
| CALVIN | 19 35 | 19 22 | 13 18 | 6 36 | 28 61 |
| Kitchen (-v0) | |||||
| complete | 65 72 | 70 44 | 55 76 | 19 57 | 59 90 |
| mixed | 60 70 | 56 57 | 45 75 | 37 72 | 67 77 |
| partial | 70 90 | 56 46 | 38 69 | 59 84 | 78 94 |
| Antmaze (-v2) | |||||
| large-diverse | 66 69 | 22 38 | 0 1 | 33 95 | 73 95 |
| large-play | 53 41 | 60 18 | 2 17 | 26 90 | 87 98 |
| medium-diverse | 83 86 | 14 70 | 43 95 | 75 98 | 88 98 |
| medium-play | 81 77 | 25 78 | 19 91 | 54 97 | 88 98 |
| Aggregate | 497 540 | 322 373 | 215 442 | 309 629 | 568 711 |