notesum.ai
Published at November 29Reverse Thinking Makes LLMs Stronger Reasoners
cs.CL
cs.AI
cs.LG
Released Date: November 29, 2024
Authors: Justin Chih-Yao Chen1, Zifeng Wang2, Hamid Palangi2, Rujun Han2, Sayna Ebrahimi3, Long Le2, Vincent Perot3, Swaroop Mishra3, Mohit Bansal1, Chen-Yu Lee2, Tomas Pfister2
Aff.: 1UNC Chapel Hill; 2Google Cloud AI Research; 3Google DeepMind

| SQA | CSQA | ARC | MATH | GSM8K | TabMWP | ANLI | Date | Avg. | |
| Mistral-7B-Instruct | |||||||||
| Zero-shot Kojima et al. (2022) | 53.89 | 62.57 | 73.68 | 10.42 | 54.71 | 65.59 | 43.92 | 39.64 | 50.55 |
| SKD Li et al. (2023a); West et al. (2022) | 63.76 | 71.86 | 74.66 | 12.48 | 56.16 | 78.19 | 44.90 | 48.50 | 56.08 |
| Distill Step-by-Step Hsieh et al. (2023) | 64.19 | 71.92 | 75.32 | 11.54 | 56.01 | 76.78 | 44.42 | 49.63 | 56.26 |
| Rephrase Question Yu et al. (2024) | 65.07 | 70.19 | 74.51 | 12.98 | 55.10 | 76.31 | 43.58 | 45.51 | 55.41 |
| Question Aug Li et al. (2024) | 65.07 | 72.23 | 73.32 | 13.64 | 58.70 | 80.11 | 42.20 | 47.21 | 56.56 |
| Answer Aug Yu et al. (2024) | 66.38 | 69.12 | 76.77 | 14.78 | 59.08 | 79.67 | 45.01 | 49.12 | 57.49 |
| RevThink (Ours) | 70.97 | 75.76 | 78.50 | 15.28 | 60.88 | 85.44 | 48.58 | 70.40 | 63.23 |
| Gemma-7B-Instruct | |||||||||
| Zero-shot Kojima et al. (2022) | 56.33 | 66.26 | 68.34 | 8.58 | 41.09 | 55.67 | 37.92 | 40.24 | 46.80 |
| SKD Li et al. (2023a); West et al. (2022) | 56.77 | 72.48 | 73.29 | 16.86 | 52.24 | 60.52 | 45.42 | 59.62 | 54.65 |
| Distill Step-by-Step Hsieh et al. (2023) | 56.77 | 73.01 | 72.92 | 16.04 | 51.88 | 62.11 | 44.23 | 60.91 | 54.73 |
| Rephrase Question Yu et al. (2024) | 54.15 | 70.22 | 72.37 | 16.96 | 53.07 | 57.62 | 43.07 | 57.99 | 53.18 |
| Question Aug Li et al. (2024) | 55.10 | 68.11 | 72.74 | 17.76 | 56.38 | 63.16 | 41.22 | 59.83 | 54.29 |
| Answer Aug Yu et al. (2024) | 57.21 | 73.01 | 73.92 | 18.92 | 57.37 | 65.93 | 42.72 | 64.14 | 56.65 |
| RevThink (Ours) | 64.19 | 74.53 | 75.09 | 19.96 | 57.21 | 84.71 | 47.36 | 66.27 | 61.17 |
| Gemini-1.5-Pro-001 (Teacher Model) | |||||||||
| Zero-shot Kojima et al. (2022) | 77.39 | 76.72 | 91.51 | 55.90 | 93.73 | 94.27 | 70.12 | 80.00 | 79.76 |