notesum.ai

Published at November 7

Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning

cs.LG
cs.AI
cs.CL

Released Date: November 7, 2024

Authors: Joey Hong1, Anca Dragan1, Sergey Levine1

Aff.: 1University of California, Berkeley

Arxiv: http://arxiv.org/abs/2411.05193v1