notesum.ai
Published at December 9MuMu-LLaMA: Multi-modal Music Understanding and Generation via Large Language Models
cs.SD
cs.MM
eess.AS
Released Date: December 9, 2024
Authors: Shansong Liu1, Atin Sakkeer Hussain2, Qilong Wu2, Chenshuo Sun2, Ying Shan1
Aff.: 1ARC Lab, Tencent PCG; 2National University of Singapore

| Music Understanding | ||||
|---|---|---|---|---|
| Model | B-U | M-R | R-L | BERT-S |
| LTU | 0.242 | 0.274 | 0.326 | 0.887 |
| LLaMA Adapter | 0.273 | 0.334 | 0.413 | 0.895 |
| SALMONN | 0.286 | 0.332 | 0.371 | 0.898 |
| MU-LLaMA | 0.306 | 0.385 | 0.466 | 0.901 |
| MuMu-LLaMA | 0.341 | 0.442 | 0.491 | 0.908 |