notesum.ai

Published at November 11

StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification

cs.CV
cs.AI

Released Date: November 11, 2024

Authors: Yichen He1, Yuan Lin1, Jianchao Wu1, Hanchong Zhang2, Yuchen Zhang1, Ruicheng Le3

Aff.: 1ByteDance Research; 2Shanghai Jiao Tong University; 3Peking University

Arxiv: http://arxiv.org/abs/2411.07076v1