notesum.ai
Published at November 7Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
cs.LG
cs.AI
cs.CL
Released Date: November 7, 2024
Authors: Joey Hong1, Anca Dragan1, Sergey Levine1
Aff.: 1University of California, Berkeley

| language games | alfworld | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Method | Chess | Wordle | 20Q | Pick | Examine | Clean | Heat | Cool | Pick2 |
| ReAct | |||||||||
| SFT | |||||||||
| ILQL | |||||||||
| Q-SFT (ours) | |||||||||