notesum.ai
Published at November 11Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech Recognition
cs.SD
eess.AS
Released Date: November 11, 2024
Authors: Yoshiki Masuyama1, Koichi Miyazaki2, Masato Murata2
Aff.: 1Tokyo Metropolitan University; 2CyberAgent, Inc.

| Dataset | Metric | Eval sets | Results | |||
|---|---|---|---|---|---|---|
| MADEON | MADEON-SP | MADEON-2SP | Transformer | |||
| LibriSpeech 100h [librispeech-corpus] | WER | {dev,test}_{clean,other} | 4.9 / 7.4 / 5.0 / 8.3 | 4.4 / 6.8 / 4.4 / 7.4 | 4.3 / 6.8 / 4.2 / 7.3 | 4.0 / 6.6 / 3.9 / 7.1 |
| LibriSpeech 960h [librispeech-corpus] | WER | {dev,test}_{clean,other} | 2.7 / 4.8 / 2.7 / 5.2 | 2.3 / 4.7 / 2.5 / 4.8 | 2.2 / 4.6 / 2.4 / 4.7 | 2.3 / 4.6 / 2.4 / 4.8 |
| TEDLIUM3 [tedlium3] | WER | dev / test | 10.7 / 9.6 | 9.7 / 9.7 | 8.9 / 8.9 | 8.7 / 8.7 |
| GigaSpeech [gigaspeech] | WER | dev / test | 11.2 / 11.3 | 11.0 / 11.2 | 11.0 / 11.1 | 11.1 / 11.1 |
| AISHELL [aishell-corpus] | CER | dev / test | 5.4 / 5.6 | 4.8 / 5.0 | 5.0 / 5.2 | 5.5 / 5.7 |
| CSJ [csj] | CER | eval1 / eval2 / eval3 | 5.7 / 4.3 / 4.6 | 5.1 / 3.7 / 4.2 | 5.2 / 3.7 / 4.1 | 5.9 / 4.6 / 4.9 |