notesum.ai
Published at November 10Self-Training Meets Consistency: Improving LLMs' Reasoning With Consistency-Driven Rationale Evaluation
cs.LG
cs.AI
cs.CL
Released Date: November 10, 2024
Authors: Jaehyeok Lee1, Keisuke Sakaguchi2, JinYeong Bak1
Aff.: 1Sungkyunkwan University; 2Tohoku University

| Base model | Llama 3 8B | Gemma 7B | |||||
|---|---|---|---|---|---|---|---|
| Approach | Model | ReClor | ARC-C | CSQA | ReClor | ARC-C | CSQA |
| Zero-shot | 52.10 | 69.28 | 53.89 | 53.60 | 77.47 | 65.68 | |
| Few-shot | 55.30 | 77.21 | 70.76 | 58.70 | 83.11 | 75.02 | |
| Self-training | STaR | 58.60 | 77.99 | 76.17 | 58.40 | 82.34 | 77.56 |
| RFT | 64.40 | 80.72 | 78.54 | 66.90 | 83.36 | 80.02 | |
| Self-motivated Learning | 67.80 | 80.03 | 80.34 | 68.20 | 83.53 | 80.59 | |
| 66.10 | 81.40 | 79.36 | 67.90 | 84.22 | 80.51 | ||
| 69.50 | 81.91 | 81.41 | 70.00 | 84.47 | 80.67 | ||
| Direct fine-tuning | Fine-tune (Label) | 77.40 | 80.80 | 80.18 | 81.90 | 85.58 | 84.44 |
| 79.30 | 81.23 | 81.24 | 83.70 | 87.20 | 84.85 | ||