notesum.ai
Published at November 29Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability
cs.CL
cs.AI
cs.LG
Released Date: November 29, 2024
Authors: Zicheng Lin, Tian Liang, Jiahao Xu, Xing Wang, Ruilin Luo, Chufan Shi, Siheng Li, Yujiu Yang, Zhaopeng Tu

| Method | GSM8K | MATH500 | ||||||
| Llama-3 | DeepSeek | Avg. | Llama-3 | DeepSeek | Avg. | |||
| 8B | 70B | math-7B | 8B | 70B | math-7B | |||
| Baseline | 56.4 | 80.4 | 64.1 | 67.0 | 16.8 | 42.2 | 31.4 | 30.1 |
| + SFT | 61.2 | 82.1 | 67.1 | 70.1 | 17.2 | 43.0 | 32.6 | 30.9 |
| + DPO (Rafailov et al., 2024) | 59.7 | 87.8 | 66.5 | 71.3 | 17.0 | 41.2 | 33.4 | 30.5 |
| + TokenDPO (Zeng et al., 2024) | 62.3 | 83.3 | 69.6 | 71.7 | 17.8 | 42.2 | 32.4 | 30.8 |
| + DPO (Rafailov et al., 2024) | 59.6 | 88.9 | 63.1 | 70.5 | 15.4 | 39.8 | 33.0 | 29.4 |
| + RPO (Pang et al., 2024) | 67.5 | 89.7 | 68.9 | 75.4 | 18.4 | 43.8 | 34.8 | 32.3 |
| + cDPO (Ours) | 67.9* | 90.8* | 72.9* | 77.2* | 19.6* | 45.6* | 35.0* | 33.4* |