notesum.ai
Published at November 25Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision
cs.CL
cs.AI
Released Date: November 25, 2024
Authors: Zhiheng Xi1, Dingwen Yang1, Jixuan Huang1, Jiafu Tang1, Guanyu Li1, Yiwen Ding1, Wei He1, Boyang Hong1, Shihan Do, Wenyu Zhan1, Xiao Wang1, Rui Zheng1, Tao Ji1, Xiaowei Shi2, Yitao Zhai2, Rongxiang Weng2, Jingang Wang2, Xunliang Cai2, Tao Gui1, Zuxuan Wu1, Qi Zhang1, Xipeng Qiu1, Xuanjing Huang1, Yu-Gang Jiang1
Aff.: 1Fudan University; 2Meituan

| Critique Model | GSM8K | MATH | ||||
|---|---|---|---|---|---|---|
| Acc. | Discrimin. | Helpfulness | Acc. | Discrimin. | Helpfulness | |
| No Critic | - | - | - | - | ||
| GPT-3.5-Turbo | ||||||
| GPT-4-Turbo | ||||||
| GPT-4o | ||||||
| Critique Model-8B | ||||||
| Critique Model-70B | ||||||