notesum.ai
Published at October 22VoiceBench: Benchmarking LLM-Based Voice Assistants
cs.CL
cs.AI
cs.SD
eess.AS
Released Date: October 22, 2024
Authors: Yiming Chen1, Xianghu Yue1, Chen Zhang1, Xiaoxue Gao1, Robby T. Tan1, Haizhou Li1
Aff.: 1National University of Singapore

|
|
|
|
|
Overall | |||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Text | Speech | Text | Speech | Text | Speech | Text | Speech | Text | Speech | Text | Speech | |||||||||||||||
| Naïve | 4.81 | 4.68 | 4.38 | 4.04 | 77.76 | 75.41 | 72.33 | 68.54 | 73.91 | 79.52 | 65.37 | 73.70 | 96.54 | 98.08 | 80.27 | 74.50 | ||||||||||
| DiVA | 4.84 | 3.86 | 4.29 | 3.54 | 78.30 | 74.50 | 62.39 | 51.72 | 68.70 | 76.31 | 34.93 | 43.38 | 99.23 | 98.27 | 81.14 | 64.02 | ||||||||||
| LLaMA-Omni | 4.58 | 3.95 | 4.32 | 3.46 | 55.33 | 60.40 | 40.14 | 39.24 | 45.38 | 56.53 | 10.15 | 19.58 | 98.46 | 11.35 | 69.26 | 40.21 | ||||||||||
| Mini-Omni | 2.64 | 2.25 | 2.55 | 2.02 | 26.04 | 7.23 | 23.69 | 4.16 | 13.04 | 22.89 | 8.99 | 18.17 | 86.35 | 37.12 | 57.11 | 41.56 | ||||||||||
| Qwen2-Audio | 4.27 | 3.89 | 3.77 | 3.43 | 61.66 | 40.69 | 41.77 | 29.66 | 28.70 | 38.06 | 20.73 | 31.93 | 96.73 | 96.73 | 72.72 | 59.83 | ||||||||||
| VITA | 4.16 | 3.78 | 3.88 | 2.15 | 72.69 | 76.13 | 31.28 | 24.59 | 48.99 | 57.53 | 18.12 | 27.51 | 95.19 | 26.73 | 75.14 | 39.33 | ||||||||||