notesum.ai
Published at November 27Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS
cs.CL
Released Date: November 27, 2024
Authors: Jinyang Wu1, Mingkuan Feng1, Shuai Zhang1, Feihu Che2, Zengqi Wen2, Jianhua Tao3
Aff.: 1Institution 1; 2Institution 2; 3Institution 1 and Institution 2

| MODEL | SETTING | MATHEMATICS | ARITHMETIC | COMMONSENSE | AVERAGE | |
|---|---|---|---|---|---|---|
| MATH | GSM8K | SVAMP | StrategyQA | |||
| Qwen2.5-14B-instruct | Zero-shot CoT | 69.8 | 92.4 | 91.6 | 62.8 | 79.1 |
| Few-shot CoT | 80.0 | 94.8 | 91.3 | 53.1 | 79.8 | |
| CoT+SC@4 | 76.2 | 94.0 | 91.0 | 69.7 | 82.7 | |
| Ours | 80.2 | 95.3 | 93.7 | 77.3 | 86.6 | |
| Qwen2.5-7B-instruct | Zero-shot CoT | 64.8 | 86.2 | 91.3 | 52.8 | 73.7 |
| Few-shot CoT | 75.5 | 91.6 | 92.3 | 67.6 | 81.7 | |
| CoT+SC@4 | 76.4 | 92.0 | 92.3 | 73.2 | 83.4 | |
| Ours | 79.6 | 92.8 | 93.0 | 76.0 | 85.4 | |
| Qwen2-7B-instruct | Zero-shot CoT | 36.9 | 76.6 | 85.2 | 55.3 | 63.5 |
| Few-shot CoT | 52.9 | 85.7 | 87.3 | 62.3 | 72.0 | |
| CoT+SC@4 | 55.6 | 87.7 | 90.3 | 65.5 | 74.8 | |
| Ours | 63.8 | 90.6 | 92.7 | 72.0 | 79.8 | |
| Yi-1.5-6B-Chat | Zero-shot CoT | 30.4 | 76.4 | 64.4 | 46.2 | 54.3 |
| Few-shot CoT | 40.5 | 78.9 | 81.3 | 61.1 | 65.4 | |
| CoT+SC@4 | 42.2 | 79.4 | 87.6 | 65.2 | 68.6 | |
| Ours | 54.0 | 81.4 | 90.0 | 70.3 | 74.0 | |
| Llama-3-8B-Instruct | Zero-shot CoT | 5.8 | 68.3 | 70.9 | 57.2 | 50.5 |
| Few-shot CoT | 17.8 | 74.5 | 81.0 | 68.4 | 60.4 | |
| CoT+SC@4 | 28.8 | 80.6 | 88.0 | 66.8 | 66.0 | |
| Ours | 43.2 | 89.6 | 92.7 | 73.0 | 74.6 | |
| Llama-3.1-8B-Instruct | Zero-shot CoT | 18.0 | 61.5 | 69.3 | 52.4 | 50.3 |
| Few-shot CoT | 47.2 | 76.6 | 82.0 | 63.6 | 67.3 | |
| CoT+SC@4 | 44.2 | 80.5 | 85.6 | 69.8 | 70.0 | |
| Ours | 55.0 | 90.7 | 93.0 | 73.2 | 78.0 | |