notesum.ai

Published at November 2

Rule Based Rewards for Language Model Safety

cs.AI

Released Date: November 2, 2024

Authors: Tong Mu1, Alec Helyar1, Johannes Heidecke1, Joshua Achiam1, Andrea Vallone1, Ian Kivlichan1, Molly Lin1, Alex Beutel1, John Schulman1, Lilian Weng1

Aff.: 1OpenAI

Arxiv: http://arxiv.org/abs/2411.01111v1