| Models |
Methods |
Sample
Budget
|
Coverage |
Held-in Datasets |
Held-out Datasets |
| Avg. |
AQuA |
GSM8K |
MATH |
Avg. |
MathQA |
SVAMP |
Thm.QA |
| Llama2-7B |
SFT |
- |
- |
|
|
|
|
|
|
|
|
| Self-Improve () |
|
|
|
|
|
|
|
|
|
|
| Self-Improve () |
|
|
|
|
|
|
|
|
|
|
| Self-Improve () |
|
|
|
|
|
|
|
|
|
|
|
\cdashline2-12\cdashline2-12 |
Guided Self-Improve () |
|
|
|
|
|
|
|
|
|
|
|
+ Answer-driven |
|
|
|
|
|
|
|
|
|
|
|
+ Rationale-driven |
|
|
|
|
|
|
|
|
|
|
|
+ Interactive Sampling |
|
|
|
|
|
|
|
|
|
|
|
+ State Reset |
|
|
|
|
|
|
|
|
|
|
| Llama3-8B |
SFT |
- |
- |
|
|
|
|
|
|
|
|
| Self-Improve () |
|
|
|
|
|
|
|
|
|
|
| Self-Improve () |
|
|
|
|
|
|
|
|
|
|
| Self-Improve () |
|
|
|
|
|
|
|
|
|
|
|
\cdashline2-12 \cdashline2-12 |
Guided Self-Improve () |
|
|
|
|
|
|
|
|
|
|
|
+ Answer-driven |
|
|
|
|
|
|
|
|
|
|
|
+ Rationale-driven |
|
|
|
|
|
|
|
|
|
|
|
+ Interactive Sampling |
|
|
|
|
|
|
|
|
|
|
|
+ State Reset |
|
|
|
|
|
|
|
|
|
|
|
DeepSeek-
Math-7B
|
SFT |
- |
- |
|
|
|
|
|
|
|
|
| Self-Improve () |
|
|
|
|
|
|
|
|
|
|
| Self-Improve () |
|
|
|
|
|
|
|
|
|
|
| Self-Improve () |
|
|
|
|
|
|
|
|
|
|
|
\cdashline2-12\cdashline2-12 |
Guided Self-Improve () |
|
|
|
|
|
|
|
|
|
|
|
+ Answer-driven |
|
|
|
|
|
|
|
|
|
|
|
+ Rationale-driven |
|
|
|
|
|
|
|
|
|
|
|
+ Interactive Sampling |
|
|
|
|
|
|
|
|
|
|
|
+ State Reset |
|
|
|
|
|
|
|
|
|
|
| Mistral-7B |
SFT |
- |
- |
|
|
|
|
|
|
|
|
| Self-Improve () |
|
|
|
|
|
|
|
|
|
|
| Self-Improve () |
|
|
|
|
|
|
|
|
|
|
| Self-Improve () |
|
|
|
|
|
|
|
|
|
|
|
\cdashline2-12\cdashline2-12 |
Guided Self-Improve () |
|
|
|
|
|
|
|
|
|
|
|
+ Answer-driven |
|
|
|
|
|
|
|
|
|
|
|
+ Rationale-driven |
|
|
|
|
|
|
|
|
|
|
|
+ Interactive Sampling |
|
|
|
|
|
|
|
|
|
|
|
+ State Reset |
|
|
|
|
|
|
|
|
|
|