notesum.ai
Published at October 21A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration
cs.CL
cs.AI
cs.LG
Released Date: October 21, 2024
Authors: Yingqian Cui1, Pengfei He1, Xianfeng Tang1, Qi He2, Chen Luo2, Jiliang Tang1, Yue Xing2
Aff.: 1Michigan State University; 2Amazon

| Dataset | Method | GPT-3.5-Turbo | GPT-4o-mini |
| Disambiguation QA | w/o IR | 68.00% | 70.00% |
| w/ IR | 72.00% | 71.60% | |
| Tracking Shuffled Objects | w/o IR | 56.53% | 88.00% |
| w/ IR | 61.20% | 88.80% | |
| Date understanding | w/o IR | 82.27% | 90.80% |
| w/ IR | 85.07% | 91.47% | |
| Penguins in a table | w/o IR | 81.34% | 91.10% |
| w/ IR | 82.19% | 92.92% | |
| GSM8K | w/o IR | 81.03% | 92.72% |
| w/ IR | 83.38% | 93.03% | |
| Dataset | Method | Gemini Pro | DeepSeek 67B |
| Disambiguation QA | w/o IR | 68.80% | 81.20% |
| w/ IR | 76.80% | 81.20% | |
| Tracking Shuffled Objects | w/o IR | 58.20% | 71.20% |
| w/ IR | 64.80% | 72.40% | |
| Date understanding | w/o IR | 88.80% | 80.80% |
| w/ IR | 88.80% | 83.20% | |
| Penguins in a table | w/o IR | 82.19% | 73.97% |
| w/ IR | 83.56% | 80.14% | |
| GSM8K | w/o IR | 80.82% | 83.85% |
| w/ IR | 81.27% | 83.40% |