notesum.ai

Published at October 23

DataTales: A Benchmark for Real-World Intelligent Data Narration

cs.CR

cs.AI

cs.DB

Released Date: October 23, 2024

Authors: Yajing Yang¹, Qian Liu², Min-Yen Kan³

Aff.: ¹National University of Singapore, Rio Tinto; ²Sea AI Lab; ³National University of Singapore

Arxiv: https://arxiv.org/abs/2410.17859v1

Refer to caption

Model	Data	Setting	Avg. Length	Factuality	Style	Insightfulness
				Acc. (%)	BLEU (%)	Impact	Significance	Avg.
GPT-3.5-Turbo	Same day	Zero-shot	320	14.58	3.42	3.26	2.71	2.98
GPT-4	Same day	Zero-shot	423	25.22	1.96	3.29	2.51	2.90
LlaMa2-7B	Same day	Zero-shot	693	18.76	2.26	2.79	2.05	2.42
LlaMa2-7B	Same day	Fine-tuned	180	22.10	11.19	3.42	2.38	2.90
LlaMa2-13B	Same day	Zero-shot	502	20.73	3.40	3.25	2.52	2.89
LlaMa2-13B	Same day	Fine-tuned	139	28.93	14.13	3.40	2.54	2.97
GPT-3.5-Turbo	1 Week	Zero-shot	342	14.00	3.32	3.38	2.80	3.09
GPT-4	1 Week	Zero-shot	421	28.68	2.04	3.06	2.40	2.73
LlaMa2-7B	1 Week	Zero-shot	405	11.15	3.34	2.99	2.48	2.74
LlaMa2-7B	1 Week	Fine-tuned	136	11.64	10.47	3.28	2.39	2.84
LlaMa2-13B	1 Week	Zero-shot	370	7.11	4.11	3.36	2.85	3.11
LlaMa2-13B	1 Week	Fine-tuned	136	12.30	10.66	3.37	2.55	2.96