notesum.ai

Published at December 9

I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token

cs.LG

cs.CL

Released Date: December 9, 2024

Authors: Roi Cohen¹, Konstantin Dobler¹, Eden Biran², Gerard de Melo¹

Aff.: ¹HPI / University of Potsdam; ²Tel Aviv University

Arxiv: http://arxiv.org/pdf/2412.06676v1

Refer to caption

	LAMA Google-RE			LAMA T-Rex			LAMA SQuAD			TriviaQA			PopQA
	P	R	F1	P	R	F1	P	R	F1	P	R	F1	P	R	F1
Mistral-7B-v0.1	$48.1$	$48.1$	$48.1$	$71.2$	$\mathbf{71.2}$	$71.2$	$45.8$	$45.8$	$45.8$	$52.0$	$52.0$	$52.0$	$35.5$	$\mathbf{35.5}$	$\mathbf{35.5}$
Mistral-7B-v0.1 + The Pile	$48.8$	$\mathbf{48.8}$	$48.8$	$69.9$	$69.9$	$69.9$	$48.0$	$\mathbf{48.0}$	$48.0$	$52.2$	$\mathbf{52.2}$	$52.2$	$35.2$	$35.2$	$35.2$
Mistral-7B-v0.1 + Confidence Threshold	$60.0$	$40.0$	$48.0$	$80.4$	$63.5$	$71.0$	$64.4$	$33.5$	$44.1$	$70.4$	$41.1$	$51.9$	$64.6$	$20.6$	$31.2$
Mistral-7B-v0.1 + P(True)	$54.4$	$44.5$	$48.9$	$73.8$	$65.1$	$69.2$	$54.9$	$41.0$	$46.9$	$58.8$	$47.5$	$52.5$	$40.3$	$29.0$	$33.7$
Mistral-7B-v0.1 + Semantic Entropy	$70.1$	$38.9$	$50.0$	$88.0$	$65.4$	$75.0$	$70.2$	$44.5$	$54.4$	$68.5$	$52.5$	$59.4$	$68.7$	$20.4$	$31.5$
Mistral-7B-v0.1 + IDK-tuning on The Pile	$\mathbf{71.1}$	$40.6$	$\mathbf{51.7}$	$\mathbf{88.5}$	$65.5$	$\mathbf{75.3}$	$\mathbf{72.0}$	$44.3$	$\mathbf{54.9}$	$\mathbf{72.5}$	$52.0$	$\mathbf{60.6}$	$\mathbf{78.1}$	$20.5$	$32.5$