notesum.ai
Published at October 23OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
cs.CL
cs.AI
Released Date: October 23, 2024
Authors: Qinglin Zhang1, Luyao Cheng1, Chong Deng1, Qian Chen1, Wen Wang1, Siqi Zheng1, Jiaqing Liu1, Hai Yu1, Chaohong Tan1
Aff.: 1Tongyi Lab

| Librispeech(CER) | WenetSpeech(CER) | ||
|---|---|---|---|
| Model | test_clean | test_other | test_meeting |
| ASR | |||
| OmniFlatten (Ours) | 9.46 | 22.48 | 31.76 |
| Whisper V3 | 3.71 | 5.74 | 19.91 |
| TTS | |||
| OmniFlatten (Ours) | 10.9 | 12.87 | 50.56 |
| GT Speech Tokens | 5.82 | 12.74 | 40.18 |