notesum.ai
Published at November 13VALTEST: Automated Validation of Language Model Generated Test Cases
cs.SE
cs.AI
Released Date: November 13, 2024
Authors: Hamed Taherkhani1, Hadi Hemmati1
Aff.: 1York University, Canada
| Dataset | LLM | Base | VALTEST | |||||||
| #Tests | VR | LC | MS | #Tests | %Tests | VR | LC | MS | ||
| HE | GPT-4o | 0.83 | 0.925(+%) | 0.84(-%) | ||||||
| HE | GPT-3.5 Turbo | 0.74 | 0.892(+%) | 0.79(-%) | ||||||
| HE | LLaMA 3.1 8B | 0.63 | 0.756(+%) | 0.63(-%) | ||||||
| HE | Average | 2778 | 0.733 | 0.961 | 0.83 | 1501 | 0.54 | 0.858(+%) | 0.954 | 0.75(-%) |
| LeetCode | GPT-4o | 0.75 | 0.946(+%) | 0.845(-%) | ||||||
| LeetCode | GPT-3.5 Turbo | 0.63 | 0.870(+%) | 0.86(-%) | ||||||
| LeetCode | LLaMA 3.1 8B | 0.46 | 0.690(+%) | 0.744(-%) | ||||||
| LeetCode | Average | 7425 | 0.613 | 0.977 | 0.833 | 3339 | 0.46 | 0.835(+%) | 0.973 | 0.816(-%) |
| MBPP | GPT-4o | 0.71 | 0.796(+%) | 0.79(-%) | ||||||
| MBPP | GPT-3.5 Turbo | 0.60 | 0.667(+%) | 0.72(-%) | ||||||
| MBPP | LLaMA 3.1 8B | 0.53 | 0.592(+%) | 0.65(-%) | ||||||
| MBPP | Average | 6038 | 0.613 | 0.965 | 0.773 | 2173 | 0.37 | 0.685(+%) | 0.954 | 0.72(-%) |