notesum.ai
Published at November 20Song Form-aware Full-Song Text-to-Lyrics Generation with Multi-Level Granularity Syllable Count Control
cs.CL
cs.AI
Released Date: November 20, 2024
Authors: Yunkee Chae1, Eunsik Shin2, Hwang Suntae2, Seungryeol Paik2, Kyogu Lee3
Aff.: 1IPAI, Seoul National University; 2Department of Intelligence and Information, Seoul National University; 3IPAI, Department of Intelligence and Information, AIIS, Seoul National University

| Model | SCD | SCErr (%) | PPL | BERT-S | ||||||||||||
| Full | Para. | Line | Phrase | Word | Full | Para. | Line | Phrase | Word | Full | Para. | Line | Phrase | Word | ||
| Front-P | 0.026∗ | 0.068∗ | 0.013∗ | 0.021∗ | 0.024∗ | 10.313∗ | 84.856∗ | 11.711∗ | 6.140∗ | 3.618∗ | 17.069 | 31.259 | 28.442 | 16.793 | 11.271 | 0.735∗ |
| Front-S | 0.006∗ | 0.047∗ | 0.004∗ | 0.002 | 0.002 | 5.501∗ | 75.781∗ | 3.998∗ | 0.759 | 0.252 | 19.036 | 35.613 | 31.321 | 18.377 | 12.218 | 0.736∗ |
| Both-P | 0.016∗ | 0.043∗ | 0.009∗ | 0.012∗ | 0.015∗ | 8.247∗ | 79.516∗ | 8.596∗ | 3.944∗ | 2.178∗ | 17.048 | 31.652 | 28.421 | 16.831 | 11.259 | 0.737∗ |
| Both-S | 0.005∗ | 0.041∗ | 0.003∗ | 0.002 | 0.001 | 5.404∗ | 74.847∗ | 3.501∗ | 0.778 | 0.244 | 19.453 | 36.423 | 31.972 | 18.847 | 12.410 | 0.737∗ |
| Back-P | 0.014∗ | 0.037∗ | 0.009∗ | 0.011∗ | 0.014∗ | 7.516∗ | 76.813∗ | 7.606∗ | 3.194∗ | 2.041∗ | 17.776 | 34.667 | 30.344 | 17.633 | 11.847 | 0.739∗ |
| \hdashlineBack-S | 0.003 | 0.027 | 0.003 | 0.002 | 0.002 | 5.025 | 68.777 | 2.851 | 0.792 | 0.266 | 19.883 | 39.189 | 33.867 | 19.422 | 12.802 | 0.740 |