notesum.ai
Published at December 5The Hyperfitting Phenomenon: Sharpening and Stabilizing LLMs for Open-Ended Text Generation
cs.CL
cs.AI
Released Date: December 5, 2024
Authors: Fredrik Carlsson1, Fangyu Liu2, Daniel Ward1, Murathan Kurfali1, Joakim Nivre3
Aff.: 1RISE Research Institutes of Sweden; 2Google DeepMind; 3Uppsala University

| Model | Context PPL | 128 Pref | 256 Pref | 128 TTR | 256 TTR |
|---|---|---|---|---|---|
| Original Texts | – | – | – | 73.5 | 73.8 |
| Strong Baselines | |||||
| TinyLLama (1.1 B) Top-P | 245 | 31.8 | 21.1 | 38.8 | 28.2 |
| DeepSeek (7 B) Top-P | 34 | 50.0 | 35.6 | 58.2 | 49.7 |
| Llama 3.1 (8 B) Top-P | 36 | 50.5 | 38.5 | 62.1 | 57.0 |
| Original Models | |||||
| TinyLlama (1.1 B) | 245 | 12.0 | 4.9 | 25.1 | 17.0 |
| DeepSeek (7 B) | 34 | 37.7 | 17.1 | 45.6 | 32.2 |
| Llama 3.1 (8 B) | 36 | 35.0 | 25.6 | 48.5 | 34.5 |
| Llama 3.1 (70 B) | 29 | 48.7 | 34.4 | 56.4 | 50.6 |
| Hyperfitted Models | |||||
| TinyLLama (1.1 B) | 467 | 44.6 | 34.3 | 64.5 | 60.0 |
| DeepSeek (7 B) | 545 | 49.4 | 45.2 | 62.3 | 60.5 |
| Llama 3.1 (8 B) | 389 | 50.1 | 42.9 | 64.5 | 62.6 |
| Llama 3.1 (70 B) | 255 | 55.9 | 52.4 | 62.0 | 61.6 |
| Hyperfitted Models + Citation Blocking | |||||
| TinyLLama (1.1 B) | 467 | 45.2 | 35.0 | 64.8 | 60.3 |
| DeepSeek (7 B) | 545 | 47.5 | 44.1 | 62.5 | 60.6 |
| Llama 3.1 (8 B) | 389 | 47.6 | 41.2 | 64.4 | 63.3 |