notesum.ai

Published at November 4

Fantastic LLMs for Preference Data Annotation and How to (not) Find Them

cs.CL

cs.AI

Released Date: November 4, 2024

Authors: Guangxuan Xu¹, Kai Xu², Shivchander Sudalairaj², Hao Wang², Akash Srivastava²

Aff.: ¹IBM Research; ²Red Hat Inc

Arxiv: http://arxiv.org/abs/2411.02481v1

Refer to caption

Reward Function	Chat	ChatHard	Safety	Reasoning	Overall
GPT-4-turbo	95.3	75.4	86.7	82.7	85.2
Claude-3.5-sonnet	96.4	74.0	81.6	84.7	84.2
RM-Mistral-7B	96.6	60.5	87.0	77.4	80.4
ArmoRM-Llama-3-8B	96.9	76.8	90.5	97.3	90.4
Generative reward	53.0	49.5	48.3	52.1	50.0
density ratio	92.2	60.5	82.4	73.8	77.2
density ratio (base)	89.9	65.6	62.8	71.9	71.9
density ratio (sft, base)	79.6	65.6	52.8	70.0	67.0
CDR (safety)	88.3	61.8	91.0	87.7	82.5
CDR (code/math)	91.6	60.1	89.9	89.7	83.0
CDR (chat-hard)	89.1	69.7	89.1	85.9	83.5
CDR (adaptive, chat-hard, oracle)	89.1	69.7	91.0	89.7	84.9
CDR (adaptive, oracle)	92.2	60.5	91.0	89.7	83.4
CDR (adaptive, router)	93.9	56.8	91.0	88.0	82.6