notesum.ai

Published at October 21

cs.LG

cs.AI

cs.CR

stat.ML

Released Date: October 21, 2024

Authors: Rongxing Liu¹, Kumar Shridhar¹, Manish Prajapat², Patrick Xia³, Mrinmaya Sachan¹

Aff.: ¹ETH Zurich; ²ETH Zurich, ETH AI Center; ³Microsoft

Model	Method	Test Accuracy (%)	Refinement
Model	Method	Test Accuracy (%)	Strategy	Accuracy (%)
	Baseline (Using 8-shot examples)
	Chain of Thought (CoT)	40.0	same	44.6
	Chain of Thought (CoT)	40.0	different	49.4
	Least to Most (L2M)	34.9	same	43.4
	Least to Most (L2M)	34.9	different	48.7
	Program of Thought (PoT)	40.4	same	46.1
	Program of Thought (PoT)	40.4	different	51.3
Gemma 7B
	SMART (Proposed Approach)
	Iteration 1	46.5	SMART	64.6
	Iteration 2	50.6	SMART	64.0
	Final Iteration - Iteration 5	55.6 ( $\uparrow$ +15.2)	SMART	67.5 ( $\uparrow$ +16.2)
	Baseline (Using 8-shot examples)
	Chain of Thought (CoT)	50.6	same	59.0
	Chain of Thought (CoT)	50.6	different	67.5
	Least to Most (L2M)	52.4	same	56.6
	Least to Most (L2M)	52.4	different	67.2
	Program of Thought (PoT)	56.9	same	61.3
	Program of Thought (PoT)	56.9	different	70.1
Mistral 7B
	SMART (Proposed Approach)
	Iteration 1	63.8	SMART	74.1
	Iteration 2	66.3	SMART	76.4
	Final Iteration - Iteration 3	67.9 ( $\uparrow$ +11.0)	SMART	78.0 ( $\uparrow$ +7.9)
	Baseline (Using 8-shot examples)
	Chain of Thought (CoT)	81.9	same	90.4
	Chain of Thought (CoT)	81.9	different	89.3
	Least to Most (L2M)	80.0	same	84.9
	Least to Most (L2M)	80.0	different	88.7
	Program of Thought (PoT)	76.9	same	86.0
	Program of Thought (PoT)	76.9	different	88.6
Qwen2 7B
	SMART (Proposed Approach)
	Iteration 1	84.5	SMART	91.4
	Iteration 2	85.4 ( $\uparrow$ +3.5)	SMART	91.9 ( $\uparrow$ +1.5)
	Final Iteration - Iteration 3	85.2	SMART	91.2