notesum.ai
Published at December 4A Measure of the System Dependence of Automated Metrics
cs.CL
cs.AI
Released Date: December 4, 2024
Authors: Pius von Däniken1, Jan Deriu, Mark Cieliebak
Aff.: 1Centre for Artificial Intelligence, ZHAW School of Engineering
| Human | Metric | Remapped | Exp. Deviation | ||||
|---|---|---|---|---|---|---|---|
| R | R | R | ED | ||||
| Lan-BridgeMT | -2.100 | 1 | 0.889 | 2 | -2.920 | 2 | -0.820 |
| GPT4-5shot | -2.305 | 2 | 0.893 | 1 | -2.800 | 1 | -0.494 |
| Yishu | -3.231 | 3 | 0.880 | 4 | -3.179 | 4 | 0.052 |
| ONLINE-B | -3.385 | 4 | 0.879 | 5 | -3.188 | 5 | 0.197 |
| HW-TSC | -3.398 | 5 | 0.883 | 3 | -3.080 | 3 | 0.318 |
| ONLINE-A | -3.785 | 6 | 0.856 | 8 | -3.812 | 8 | -0.027 |
| ONLINE-Y | -3.792 | 7 | 0.868 | 6 | -3.479 | 6 | 0.313 |
| ONLINE-G | -3.857 | 8 | 0.864 | 7 | -3.607 | 7 | 0.250 |
| ONLINE-W | -4.062 | 9 | 0.848 | 9 | -4.165 | 10 | -0.103 |
| ZengHuiMT | -4.232 | 10 | 0.846 | 10 | -4.140 | 9 | 0.092 |
| IOL-Research | -4.586 | 11 | 0.843 | 11 | -4.251 | 11 | 0.335 |
| ONLINE-M | -5.433 | 12 | 0.820 | 15 | -4.907 | 15 | 0.526 |
| ANVITA | -6.078 | 13 | 0.830 | 13 | -4.602 | 13 | 1.475 |
| NLLB-MBR-BLEU | -6.360 | 14 | 0.825 | 14 | -4.726 | 14 | 1.634 |
| NLLB-Greedy | -6.574 | 15 | 0.831 | 12 | -4.578 | 12 | 1.996 |