notesum.ai
Published at December 10HARP: Hesitation-Aware Reframing in Transformer Inference Pass
cs.CL
cs.AI
cs.LG
Released Date: December 10, 2024
Authors: Romain Storaï1, Seung-won Hwang1
Aff.: 1Seoul National University

| Models | Methods | Datasets | Relative Cost overhead to Vanilla | ||||
|---|---|---|---|---|---|---|---|
| CsQA | GSM8K | LAMBADA | MMLU Pro | CNN/DM | |||
| LLaMA-3.1 Instruct (8B) | Vanilla (Greedy) | 78.79 | 76.88 | 30.86 | 46.42 | 32.44 | |
| Beam Search | 79.29 | 76.38 | 31.35 | 48.21 | 33.17 | x2.79 | |
| Ours (Greedy) | 80.30 (+1.52) | 78.39 (+1.51) | 36.02 (+5.16) | 48.21 (+1.79) | 34.03 (+1.59) | x1.16 | |
| Vanilla (Nucleus) | 79.80 | 73.00 | 27.44 | 42.55 | 30.81 | ||
| Ours (Nucleus) | 79.29 (-0.51) | 74.00 (+1.00) | 31.38 (+3.94) | 43.45 (+0.90) | 32.38 (+1.57) | x1.17 | |
| Mistral v0.3 Instruct (7.25B) | Vanilla (Greedy) | 70.37 | 43.62 | 45.15 | 31.76 | 29.11 | |
| Beam Search | 70.99 | 50.53 | 45.70 | 33.11 | 28.71 | x3.07 | |
| Ours (Greedy) | 70.99 (+0.62) | 48.40 (+4.79) | 49.76 (+4.64) | 31.76 (0.00) | 29.57 (+0.47) | x1.24 | |
| Vanilla (Nucleus) | 70.74 | 29.38 | 45.02 | 31.76 | 28.72 | ||
| Ours (Nucleus) | 70.21 (-0.53) | 31.88 (+2.50) | 48.26 (+3.24) | 33.79 (+2.03) | 28.72 (0.00) | x1.29 | |
| Phi 3.5 Mini Instruct (3.82B) | Vanilla (Greedy) | 77.20 | 72.50 | 32.76 | 29.65 | 26.10 | |
| Beam Search | 77.72 | 73.00 | 33.60 | 33.78 | 25.79 | x3.18 | |
| Ours (Greedy) | 78.24 (+1.04) | 73.00 (+0.50) | 33.44 (+0.68) | 32.75 (+3.10) | 26.97 (+0.87) | x1.26 | |
| Vanilla (Nucleus) | 77.04 | 71.50 | 31.85 | 28.82 | 25.30 | ||
| Ours (Nucleus) | 77.55 (+0.51) | 74.50 (+3.00) | 32.99 (+1.14) | 34.53 (+5.71) | 25.19 (-0.11) | x1.27 | |