notesum.ai

Published at November 25

Self-Generated Critiques Boost Reward Modeling for Language Models

cs.CL
cs.AI

Released Date: November 25, 2024

Authors: Yue Yu, Zhengxing Chen, Aston Zhang, Liang Tan, Chenguang Zhu, Richard Yuanzhe Pang, Yundi Qian, Xuewei Wang, Suchin Gururangan, Chao Zhang, Melanie Kambadur, Dhruv Mahajan, Rui Hou

Arxiv: http://arxiv.org/abs/2411.16646v1