notesum.ai
Published at October 18LoGU: Long-form Generation with Uncertainty Expressions
cs.LG
cs.AI
Released Date: October 18, 2024
Authors: Ruihan Yang1, Caiqi Zhang2, Zhisong Zhang3, Xinting Huang3, Sen Yang4, Nigel Collier2, Dong Yu3, Deqing Yang1
Aff.: 1Fudan University; 2University of Cambridge; 3Tencent AI Lab; 4The Chinese University of Hong Kong

| Method | Bios | LongFact | WildHallu | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Mistral-7B-Instruct | Orig. | 38.8 | – | 27.9 | 86.2 | – | 4.55 | 71.5 | – | 8.31 |
| Prompt-Based | ||||||||||
| Unc-Zero | 43.6 | 76.6 | 22.3 | 88.5 | 29.9 | 1.85 | 75.4 | 48.6 | 5.20 | |
| Unc-Few | 41.7 | 74.8 | 22.8 | 85.6 | 32.4 | 5.58 | 72.6 | 57.3 | 10.1 | |
| Pair-Few | 40.3 | 66.7 | 25.9 | 86.1 | 35.1 | 5.08 | 73.4 | 51.6 | 9.80 | |
| Self-Refine | 38.3 | 57.4 | 27.2 | 86.2 | 42.4 | 4.61 | 72.4 | 50.6 | 8.29 | |
| Training-Based | ||||||||||
| LoGU-SFT | 54.5 | 77.1 | 11.4 | 88.6 | 43.5 | 3.21 | 79.2 | 51.1 | 5.73 | |
| LoGU-DPO | 65.4 | 80.7 | 6.54 | 91.3 | 54.6 | 2.09 | 84.4 | 61.8 | 3.49 | |
| Llama3-8B-Instruct | Orig. | 51.9 | – | 20.4 | 85.5 | – | 7.45 | 74.4 | – | 6.24 |
| Prompt-Based | ||||||||||
| Unc-Zero | 53.8 | 65.4 | 14.9 | 89.3 | 30.4 | 2.67 | 78.7 | 45.8 | 3.29 | |
| Unc-Few | 58.7 | 69.1 | 11.7 | 88.3 | 40.6 | 4.01 | 80.2 | 48.6 | 3.44 | |
| Pair-Few | 60.6 | 46.3 | 9.86 | 86.7 | 42.2 | 4.78 | 84.0 | 48.1 | 2.33 | |
| Self-Refine | 51.8 | 40.5 | 18.8 | 85.0 | 27.2 | 6.38 | 77.0 | 28.7 | 5.13 | |
| Training-Based | ||||||||||
| LoGU-SFT | 58.5 | 69.7 | 11.5 | 87.5 | 44.8 | 4.53 | 78.9 | 44.6 | 5.37 | |
| LoGU-DPO | 71.4 | 70.9 | 4.84 | 91.5 | 47.5 | 2.36 | 86.0 | 52.8 | 2.68 | |