notesum.ai
Published at October 21Self-Explained Keywords Empower Large Language Models for Code Generation
cs.CV
cs.AI
cs.LG
Released Date: October 21, 2024
Authors: Lishui Fan1, Mouxiang Chen1, Zhongxin Liu1
Aff.: 1Zhejiang University

| Model | Method | HumanEval | HumanEval+ | MBPP | MBPP+ |
|
|
|
Average | ||||||
| Llama-3.1-70B-Instruct | Default | 78.0 | 73.8 | 87.6 | 70.9 | 50.0 | 15.0 | 5.0 | 54.3 | ||||||
| Beam Search | 79.3 | 74.4 | 87.8 | 70.9 | 55.0 | 16.1 | 5.0 | 55.5 | |||||||
| CoT | 79.9 | 74.4 | 87.0 | 71.7 | 43.3 | 16.6 | 6.7 | 54.2 | |||||||
| SelfEvolve | 81.7 | 75.6 | 85.4 | 70.4 | 50.0 | 15.5 | 8.3 | 55.3 | |||||||
| SEK | 84.8 | 79.3 | 88.4 | 71.2 | 61.7 | 20.0 | 8.3 | 59.1 | |||||||
| Mixtral-8×22B-Instruct-v0.1 | Default | 76.2 | 72.0 | 73.8 | 64.3 | 28.3 | 7.7 | 1.6 | 46.3 | ||||||
| Beam Search | 78.7 | 73.2 | 81.2 | 70.6 | 33.3 | 8.8 | 6.6 | 50.3 | |||||||
| CoT | 72.0 | 65.9 | 78.0 | 68.0 | 31.6 | 3.8 | 5.0 | 46.3 | |||||||
| SelfEvolve | 56.7 | 50.0 | 68.5 | 60.1 | 33.3 | 7.2 | 5.0 | 40.1 | |||||||
| SEK | 81.1 | 75.6 | 79.1 | 66.9 | 33.3 | 10.0 | 6.6 | 50.4 | |||||||
| GPT-3.5-turbo (API) | Default | 72.6 | 67.7 | 84.1 | 71.2 | 46.6 | 18.3 | 0.0 | 51.5 | ||||||
| CoT | 58.5 | 54.9 | 84.1 | 68.8 | 41.6 | 17.2 | 1.6 | 46.7 | |||||||
| SelfEvolve | 73.2 | 67.7 | 82.3 | 66.7 | 45.0 | 19.4 | 1.6 | 51.8 | |||||||
| SEK | 75.6 | 69.5 | 84.1 | 72.5 | 53.3 | 20.6 | 5.0 | 54.4 | |||||||
| GPT-4o-mini (API) | Default | 87.8 | 84.1 | 85.7 | 72.8 | 53.3 | 31.6 | 11.6 | 61.0 | ||||||
| CoT | 87.2 | 84.1 | 88.1 | 73.3 | 50.0 | 33.8 | 11.6 | 61.2 | |||||||
| SEK | 87.2 | 84.1 | 87.8 | 74.1 | 58.3 | 35.0 | 13.3 | 62.8 | |||||||
| DeepSeekCoder-V2-Instruct (API) | Default | 85.4 | 82.3 | 89.4 | 75.1 | 70.0 | 36.1 | 10.0 | 64.0 | ||||||
| CoT | 88.4 | 82.3 | 90.5 | 75.4 | 60.0 | 40.5 | 10.0 | 63.9 | |||||||
| SEK | 93.3 | 85.4 | 90.2 | 76.2 | 75.0 | 41.1 | 13.3 | 67.8 |