notesum.ai
Published at November 21Benchmarking GPT-4 against Human Translators: A Comprehensive Evaluation Across Languages, Domains, and Expertise Levels
cs.CL
cs.AI
Released Date: November 21, 2024
Authors: Jianhao Yan, Pingchuan Yan, Yulong Chen, Jing Li, Xianchao Zhu, Yue Zhang

| Severity | Error Type | ||||
| Translator | Minor | Major | Accuracy | Fluency | |
| Seamless | 29.52 | 17.35 | 28.25 | 17.37 | |
| Machine | GPT4 | 20.43 | 03.71 | 11.12 | 12.95 |
| Junior | 18.19 | 03.27 | 08.55 | 12.74 | |
| Medium | 20.19 | 03.30 | 12.58 | 10.66 | |
| Human | Senior | 12.04 | 01.83 | 07.93 | 05.93 |