notesum.ai

Published at November 29

Look Every Frame All at Once: Video-Ma$^2$mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing

cs.CV
cs.AI
cs.LG

Released Date: November 29, 2024

Authors: Hosu Lee1, Junho Kim1, Hyunjun Kim1, Yong Man Ro1

Aff.: 1Integrated Vision and Language Lab, KAIST, South Korea

Arxiv: http://arxiv.org/pdf/2411.19460v1