notesum.ai

Published at November 13

R3HF: Reward Redistribution for Enhancing Reinforcement Learning from Human Feedback

cs.CL
cs.AI

Released Date: November 13, 2024

Authors: Jiahui Li1, Tai-wei Chang2, Fengda Zhang1, Kun Kuang1, Long Chen3

Aff.: 1Zhejiang University; 2Ant Group; 3HKUST

Arxiv: http://arxiv.org/abs/2411.08302v1