notesum.ai

Published at November 20

BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

cs.AI

Released Date: November 20, 2024

Authors: Davide Paglieri1, Bartłomiej Cupiał2, Samuel Coward3, Ulyana Piterbarg4, Maciej Wolczyk2, Akbir Khan1, Eduardo Pignatelli1, Łukasz Kuciński, Lerrel Pinto4, Rob Fergus4, Jakob Nicolaus Foerster3, Jack Parker-Holder1, Tim Rocktäschel

Aff.: 1AI Centre, University College London; 2IDEAS NCBR; 3University of Oxford; 4New York University

Arxiv: http://arxiv.org/abs/2411.13543v1