notesum.ai
Published at November 15Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
cs.CV
cs.AI
Released Date: November 15, 2024
Authors: Andong Deng1, Tongjia Chen2, Shoubin Yu3, Taojiannan Yang4, Lincoln Spencer1, Yapeng Tian5, Ajmal Saeed Mian2, Mohit Bansal3, Chen Chen1
Aff.: 1Center for Research in Computer Vision, University of Central Florida; 2University of Western Australia; 3UNC, Chapel Hill; 4Amazon Web Services; 5University of Texas at Dallas

| Tasks |
|
|
|
|
|
|
|||||||||||
| Action Recognition | Kinetics400 (Carreira & Zisserman, 2017), UCF101 (Soomro et al., 2012) | ✗ | ✗ | ✗ | ✗ | ✗ | |||||||||||
| Temporal Action Localization | ActivityNet (Caba Heilbron et al., 2015), THUMOS14 (Jiang et al., 2014) | ✗ | ✓ | ✗ | ✗ | ✗ | |||||||||||
| Spatiotemporal Action Localization | AVA (Gu et al., 2018), MultiSports (Li et al., 2021) | ✓ | ✓ | ✗ | ✗ | ✗ | |||||||||||
| Motion Expression Video Segmentation | MeViS (Ding et al., 2023) | ✓ | ✗ | ✗ | ✓ | ✗ | |||||||||||
| Video Reasoning Segmentation | ReVOS (Yan et al., 2024), VideoReasonSeg Zheng et al. (2024) | ✓ | ✗ | ✗ | ✓ | ✓ | |||||||||||
| Motion-Grounded Video Reasoning | GroundMoRe (Ours) | ✓ | ✓ | ✓ | ✓ | ✓ |