notesum.ai
Published at November 3A Deep Dive Into Large Language Model Code Generation Mistakes: What and Why?
cs.SE
cs.AI
Released Date: November 3, 2024
Authors: QiHong Chen1, Jiawei Li1, Jiecheng Deng1, Jiachen Yu1, Justin Tian Jin Chen1, Iftekhar Ahmed1
Aff.: 1University

| Mistake Category | HumanEval-X | CoderEval | ||||||
| GPT-4 | Gemini | GPT-4 | Gemini | |||||
| Python | Java | Python | Java | Python | Java | Python | Java | |
| CE | 10 | 14 | 25 | 15 | 5 | 8 | 9 | 19 |
| GC | 16 | 11 | 36 | 26 | 10 | 5 | 6 | 5 |
| MFLE | 3 | 6 | 4 | 0 | 1 | 0 | 1 | 1 |
| MOFE | 1 | 1 | 3 | 2 | 13 | 3 | 14 | 2 |
| MOOV | 1 | 0 | 2 | 2 | 1 | 0 | 1 | 0 |
| MLA | 1 | 1 | 2 | 0 | 2 | 2 | 5 | 3 |
| IOM | 0 | 2 | 3 | 2 | 0 | 0 | 0 | 0 |
| Total | 32 | 35 | 75 | 47 | 32 | 18 | 36 | 30 |