notesum.ai

Published at November 29

CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation

cs.RO
cs.AI
cs.CL
cs.CV
cs.LG

Released Date: November 29, 2024

Authors: Qixiu Li1, Yaobo Liang2, Zeyu Wang1, Lin Luo2, Xi Chen2, Mozheng Liao3, Fangyun Wei2, Yu Deng2, Sicheng Xu2, Yizhong Zhang2, Xiaofan Wang4, Bei Liu2, Jianlong Fu2, Jianmin Bao2, Dong Chen2, Yuanchun Shi1, Jiaolong Yang2, Baining Guo2

Aff.: 1Tsinghua University; 2Microsoft Research Asia; 3USTC; 4Institute of Microelectronics, CAS

Arxiv: http://arxiv.org/pdf/2411.19650v1