notesum.ai
Published at October 23DataTales: A Benchmark for Real-World Intelligent Data Narration
cs.CR
cs.AI
cs.DB
Released Date: October 23, 2024
Authors: Yajing Yang1, Qian Liu2, Min-Yen Kan3
Aff.: 1National University of Singapore, Rio Tinto; 2Sea AI Lab; 3National University of Singapore

| Model | Data | Setting | Avg. Length | Factuality | Style | Insightfulness | ||
|---|---|---|---|---|---|---|---|---|
| Acc. (%) | BLEU (%) | Impact | Significance | Avg. | ||||
| GPT-3.5-Turbo | Same day | Zero-shot | 320 | 14.58 | 3.42 | 3.26 | 2.71 | 2.98 |
| GPT-4 | Same day | Zero-shot | 423 | 25.22 | 1.96 | 3.29 | 2.51 | 2.90 |
| LlaMa2-7B | Same day | Zero-shot | 693 | 18.76 | 2.26 | 2.79 | 2.05 | 2.42 |
| LlaMa2-7B | Same day | Fine-tuned | 180 | 22.10 | 11.19 | 3.42 | 2.38 | 2.90 |
| LlaMa2-13B | Same day | Zero-shot | 502 | 20.73 | 3.40 | 3.25 | 2.52 | 2.89 |
| LlaMa2-13B | Same day | Fine-tuned | 139 | 28.93 | 14.13 | 3.40 | 2.54 | 2.97 |
| GPT-3.5-Turbo | 1 Week | Zero-shot | 342 | 14.00 | 3.32 | 3.38 | 2.80 | 3.09 |
| GPT-4 | 1 Week | Zero-shot | 421 | 28.68 | 2.04 | 3.06 | 2.40 | 2.73 |
| LlaMa2-7B | 1 Week | Zero-shot | 405 | 11.15 | 3.34 | 2.99 | 2.48 | 2.74 |
| LlaMa2-7B | 1 Week | Fine-tuned | 136 | 11.64 | 10.47 | 3.28 | 2.39 | 2.84 |
| LlaMa2-13B | 1 Week | Zero-shot | 370 | 7.11 | 4.11 | 3.36 | 2.85 | 3.11 |
| LlaMa2-13B | 1 Week | Fine-tuned | 136 | 12.30 | 10.66 | 3.37 | 2.55 | 2.96 |