notesum.ai

Published at October 30

TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models

cs.CV
cs.AI
cs.CL

Released Date: October 30, 2024

Authors: Ziyao Shangguan1, Chuhan Li1, Yuxuan Ding1, Yanan Zheng1, Yilun Zhao1, Tesca Fitzgerald1, Arman Cohan2

Aff.: 1Yale University; 2Allen Institute for AI

Arxiv: http://arxiv.org/abs/2410.23266v1