notesum.ai

Published at November 11

Explore the Reasoning Capability of LLMs in the Chess Testbed

cs.CL

cs.AI

Released Date: November 11, 2024

Authors: Shu Wang¹, Lei Ji², Renxi Wang³, Wenxiao Zhao¹, Haokun Liu⁴, Yifan Hou⁵, Ying Nian Wu¹

Aff.: ¹UCLA; ²Microsoft Research; ³MBZUAI; ⁴University of Toronto; ⁵Peking University

Arxiv: http://arxiv.org/abs/2411.06655v1

Refer to caption

Model	Zero-Shot Learning				Few-Shot Learning
	N	S	T	ST	N	S	T	ST
gpt-4	53.1	54.6	60.0	60.0	54.7	58.9	57.7	68.1
gpt-4o	46.4	52.8	54.8	60.1	48.5	54.3	52.7	63.1
o1-mini	51.5	58.8	64.1	69.2	50.4	58.3	62.0	65.9
o1-preview	56.4	65.4	77.2	76.6	59.0	65.4	76.2	78.6
claude-3.5-sonnet	49.6	54.9	56.9	54.9	51.9	63.7	59.9	66.1
claude-3-opus	48.3	54.5	53.7	57.3	51.0	55.8	53.2	60.2
gemini-1.5-pro	50.6	48.8	54.2	52.6	50.5	50.1	52.7	50.4
gemini-1.5-flash	46.1	50.8	54.2	52.9	49.7	48.2	53.8	55.6
Ours-no-explanation	63.5	–	–	–	64.7	–	–	–
Ours-strategy	–	89.7	–	–	–	89.8	–	–
Ours-tactic	–	–	94.6	–	–	–	94.5	–
Ours-strategy&tactic	–	–	–	95.2	–	–	–	95.3