notesum.ai

Published at November 22

ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos

cs.CV
cs.CL

Released Date: November 22, 2024

Authors: Tanveer Hannan1, Md Mohaiminul Islam2, Jindong Gu3, Thomas Seidl1, Gedas Bertasius2

Aff.: 1LMU Munich; 2UNC Chapel Hill; 3University of Oxford

Arxiv: http://arxiv.org/abs/2411.14901v1