notesum.ai
Published at October 21SMART: Self-learning Meta-strategy Agent for Reasoning Tasks
cs.LG
cs.AI
cs.CR
stat.ML
Released Date: October 21, 2024
Authors: Rongxing Liu1, Kumar Shridhar1, Manish Prajapat2, Patrick Xia3, Mrinmaya Sachan1
Aff.: 1ETH Zurich; 2ETH Zurich, ETH AI Center; 3Microsoft
![[Uncaptioned image]](https://arxiv.org/html/2410.16128v1/extracted/5943361/figures/github.png)
| Model | Method | Test Accuracy (%) | Refinement | |
|---|---|---|---|---|
| Strategy | Accuracy (%) | |||
| Baseline (Using 8-shot examples) | ||||
| Chain of Thought (CoT) | 40.0 | same | 44.6 | |
| different | 49.4 | |||
| Least to Most (L2M) | 34.9 | same | 43.4 | |
| different | 48.7 | |||
| Program of Thought (PoT) | 40.4 | same | 46.1 | |
| different | 51.3 | |||
| Gemma 7B | ||||
| SMART (Proposed Approach) | ||||
| Iteration 1 | 46.5 | SMART | 64.6 | |
| Iteration 2 | 50.6 | SMART | 64.0 | |
| Final Iteration - Iteration 5 |
55.6
( +15.2) |
SMART |
67.5
( +16.2) |
|
| Baseline (Using 8-shot examples) | ||||
| Chain of Thought (CoT) | 50.6 | same | 59.0 | |
| different | 67.5 | |||
| Least to Most (L2M) | 52.4 | same | 56.6 | |
| different | 67.2 | |||
| Program of Thought (PoT) | 56.9 | same | 61.3 | |
| different | 70.1 | |||
| Mistral 7B | ||||
| SMART (Proposed Approach) | ||||
| Iteration 1 | 63.8 | SMART | 74.1 | |
| Iteration 2 | 66.3 | SMART | 76.4 | |
| Final Iteration - Iteration 3 |
67.9
( +11.0) |
SMART |
78.0
( +7.9) |
|
| Baseline (Using 8-shot examples) | ||||
| Chain of Thought (CoT) | 81.9 | same | 90.4 | |
| different | 89.3 | |||
| Least to Most (L2M) | 80.0 | same | 84.9 | |
| different | 88.7 | |||
| Program of Thought (PoT) | 76.9 | same | 86.0 | |
| different | 88.6 | |||
| Qwen2 7B | ||||
| SMART (Proposed Approach) | ||||
| Iteration 1 | 84.5 | SMART | 91.4 | |
| Iteration 2 |
85.4
( +3.5) |
SMART |
91.9
( +1.5) |
|
| Final Iteration - Iteration 3 | 85.2 | SMART | 91.2 | |