notesum.ai
Published at November 16CODECLEANER: Elevating Standards with A Robust Data Contamination Mitigation Toolkit
cs.SE
cs.AI
Released Date: November 16, 2024
Authors: Jialun Cao1, Songqiang Chen2, Wuqi Zhang1, Hau Ching Lo1, Shing-Chi Cheung2
Aff.: 1The Hong Kong University of Science and Technology, Hong Kong, China; 2The Hong Kong University of Science and Technology

| Syntactic | Semantic | Code Style | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Models | IFF | Loop | Iter | Comm | Deco | Param | Renm | Norm | Styl |
| Starcoder | 0.0740 | 0.0752 | -0.0329 | -0.0357 | 0.3617 | 0.1242 | 0.3764 | -0.0724 | 0.1120 |
| Starchat | 0.0511 | 0.0557 | -0.0273 | -0.0244 | 0.6444 | 0.0465 | 0.3201 | -0.0549 | 0.1067 |
| CodeLlama | 0.0595 | 0.0811 | 0.0184 | 0.0189 | 0.6186 | 0.1174 | 0.3096 | -0.0052 | 0.0989 |
| WizardCoder | 0.0552 | 0.0778 | -0.0235 | -0.0248 | 0.3836 | 0.1349 | 0.3858 | -0.0637 | 0.1157 |
| Average | 0.0600 | 0.0725 | -0.0163 | -0.0165 | 0.5021 | 0.1057 | 0.3480 | -0.0491 | 0.1083 |