notesum.ai

Published at December 9

MuMu-LLaMA: Multi-modal Music Understanding and Generation via Large Language Models

cs.SD

cs.MM

eess.AS

Released Date: December 9, 2024

Authors: Shansong Liu¹, Atin Sakkeer Hussain², Qilong Wu², Chenshuo Sun², Ying Shan¹

Aff.: ¹ARC Lab, Tencent PCG; ²National University of Singapore

Arxiv: http://arxiv.org/pdf/2412.06660v1

Refer to caption

Music Understanding
Model	B-U $\uparrow$	M-R $\uparrow$	R-L $\uparrow$	BERT-S $\uparrow$
LTU	0.242	0.274	0.326	0.887
LLaMA Adapter	0.273	0.334	0.413	0.895
SALMONN	0.286	0.332	0.371	0.898
MU-LLaMA	0.306	0.385	0.466	0.901
MuMu-LLaMA	0.341	0.442	0.491	0.908