notesum.ai
Published at October 30Aligning Audio-Visual Joint Representations with an Agentic Workflow
cs.CV
cs.AI
cs.LG
cs.MM
cs.SD
eess.AS
Released Date: October 30, 2024
Authors: Shentong Mo1, Yibing Song2
Aff.: 1Carnegie Mellon University MBZUAI; 2Alibaba Group Hupan Lab

| True Pairs | False Pairs | T-Alignment (%, ) |
|---|---|---|
| 50k | 0 | 78.23 |
| 0 | 50k | 42.05 |
| 50k | 50k | 52.29 |
| 50k | 100k | 45.65 |
| 100k | 50k | 63.71 |