notesum.ai
Published at October 23Process Supervision-Guided Policy Optimization for Code Generation
cs.AI
cs.CL
cs.IR
Released Date: October 23, 2024
Authors: Ning Dai1, Zheng Wu2, Renjie Zheng2, Ziyun Wei2, Wenlei Shi2, Xing Jin2, Guanlin Liu2, Chen Dun2, Liang Huang1, Lin Yan2
Aff.: 1Oregon State University; 2ByteDance

| Model | Setting | Dataset | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Dense | Value | LiveCodeBench | InHouseBench | ||||||
| Reward | Init. | Easy | Medium | Hard | Overall | Contest | NL2Alg | Overall | |
| GPT-4o-mini | - | - | 81.9 | 27.2 | 3.6 | 40.7 | 43.8 | 68.4 | 51.4 |
| Qwen2-72B | - | - | 65.0 | 21.3 | 2.8 | 32.2 | 14.8 | 51.3 | 26.1 |
| Gemini-Flash-1.5 | - | - | 67.7 | 13.1 | 1.9 | 29.6 | - | - | - |
| DeepseekCoder-33B | - | - | 60.8 | 14.8 | 1.2 | 27.7 | 10.3 | 50.3 | 22.7 |
| Ours-SFT | - | - | 55.3 | 9.3 | 0.3 | 23.5 | 10.4 | 41.4 | 20.0 |
| Ours-RL | × | × | 70.0 | 7.2 | 1.7 | 28.2 | 24.4 | 48.7 | 31.8 |
| × | ✓ | 67.9 | 8.9 | 1.9 | 28.2 | 25.0 | 45.4 | 31.4 | |
| ✓ | × | 68.5 | 9.9 | 2.5 | 28.9 | 25.2 | 48.1 | 32.3 | |
| ✓ | ✓ | 69.3 | 12.0 | 1.6 | 29.8 | 27.9 | 53.5 | 35.8 | |