notesum.ai

Published at November 25

cs.CL

Released Date: November 25, 2024

Authors: Fangkai Jiao¹, Geyang Guo², Xingxing Zhang³, Nancy F. Chen¹, Shafiq Joty⁴, Furu Wei³

Aff.: ¹Nanyang Technological University and I2R, A*STAR; ²Georgia Institute of Technology; ³Microsoft Research; ⁴Salesforce Research

	MATH	GSM8K	College Math
GPT-4o-2024-0512	78.7	95.8	46.7
GPT-4-Turbo-2024-0409	72.8	94.8	44.2
GPT-4-Turbo-1106-preview^†	64.3	—	—
GPT-4-0613	55.0	93.5	39.0
NuminaMath-72B-CoT (Beeching et al., 2024)	67.1	91.7	39.8
Llama-3.1-8B-Instruct (Dubey et al., 2024)	47.5	84.5	27.5
Llama-3.1-70B-Instruct (Dubey et al., 2024)	68.1	95.5	41.8
Llama-3.1-8B-base (Dubey et al., 2024)	20.3 (4-shot)	56.7 (8-shot)	20.1 (4-shot)
w/ SFT	53.8	85.1	34.6
w/ PFPO-LLM Iter. 0	55.0	86.6	35.8
w/ PFPO-Self Iter. 1	55.9	87.6	36.6
w/ PFPO-Self Iter. 2	56.6	88.9	37.0
w/ PFPO-Self Iter. 3	57.0	88.8	36.7
/w PFPO-Self Iter. 4	57.4	89.1	37.6
w/ PFPO-Self Iter. 5	57.8	89.6	38.0
Mathstral-7B-v0.1 (Mistral AI Team, 2024b)	58.3	85.6	34.3
w/ SFT	61.4	87.3	38.4
w/ PFPO-LLM Iter. 0	66.7	90.0	41.3
w/ PFPO-Self Iter. 1	67.8	90.8	42.0
w/ PFPO-Self Iter. 2	68.6	90.3	42.2
w/ PFPO-Self Iter. 3	68.2	90.4	42.3