notesum.ai

Published at October 31

cs.PF

cs.AI

Released Date: October 31, 2024

Authors: Youpeng Zhao¹, Jun Wang¹

Aff.: ¹University of Central Florida, Orlando, FL, USA

Metrics	OPT-2.7B		OPT-6.7B		OPT-13B
Metrics	Proxy-based	Retrieval-based	Proxy-based	Retrieval-based	Proxy-based	Retrieval-based
Accuracy ( $\uparrow$ )	0.781	0.821	0.712	0.856	0.634	0.744
Pred. Error ( $\downarrow$ )	0.122	0.057	0.145	0.096	0.178	0.123
Avg. Pred. Latency ( $\downarrow$ )	12.2 ms	3.92 ms	11.7 ms	4.74 ms	14.8 ms	4.49 ms
Throughput ( $\uparrow$ )	$1\times$	$1.47\times$	$1\times$	$1.63\times$	$1\times$	$1.82\times$