notesum.ai

Published at November 7

Rethinking Bradley-Terry Models in Preference-Based Reward Modeling: Foundations, Theory, and Alternatives

cs.AI

Released Date: November 7, 2024

Authors: Hao Sun1, Yunyi Shen2, Jean-Francois Ton3

Aff.: 1University of Cambridge, Cambridge, UK; 2Massachusetts Institute of Technology, Cambridge, MA, USA; 3ByteDance Research, London, UK

Arxiv: http://arxiv.org/abs/2411.04991v1