notesum.ai
Published at November 20DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs
cs.SE
cs.AI
Released Date: November 20, 2024
Authors: Zhihan Liu1, Shenao Zhang1, Yongfei Liu1, Boyi Liu1, Yingxiang Yang1, Zhaoran Wang1
Aff.: 1Not specified
| Model | HumanEval | MBPP | BCB (Complete) | BCB (Instruct) | ||||
| Basic (%) | Plus (%) | Basic (%) | Plus (%) | Hard (%) | Full (%) | Hard (%) | Full (%) | |
| Ref. | 72.0 | 62.2 | 75.2 | 60.9 | 14.9 | 45.1 | 12.2 | 37.6 |
| DSTC + DPO | 74.4 | 66.5 | 76.4 | 61.9 | 16.2 | 47.5 | 12.8 | 37.9 |
| DPO | 70.1 | 62.2 | 76.9 | 62.2 | 14.2 | 46.6 | 12.2 | 38.1 |
| DSTC + KTO | 73.8 | 66.5 | 75.9 | 61.4 | 16.9 | 47.1 | 14.2 | 38.9 |
| KTO | 62.8 | 54.3 | 74.9 | 61.2 | 15.5 | 46.6 | 8.1 | 38.1 |