notesum.ai
Published at November 16LLM4DS: Evaluating Large Language Models for Data Science Code Generation
cs.SE
cs.AI
cs.ET
Released Date: November 16, 2024
Authors: Nathalia Nascimento1, Everton Guimaraes1, Sai Sanjna Chintakunta, Santhosh Anitha Boominathan1
Aff.: 1Pennsylvania State University Great Valley, USA

| Baseline | LLM | Success Rate (%) | p-value | Conclusion |
| 50% | Copilot | 60% | 0.0284 | Significant |
| ChatGPT | 72% | 0.0000 | Significant | |
| Perplexity | 66% | 0.0009 | Significant | |
| Claude | 70% | 0.0000 | Significant | |
| 60% | Copilot | 60% | 0.5433 | Not Significant |
| ChatGPT | 72% | 0.0084 | Significant | |
| Perplexity | 66% | 0.1303 | Not Significant | |
| Claude | 70% | 0.0248 | Significant | |
| 70% | Copilot | 60% | 0.9875 | Not Significant |
| ChatGPT | 72% | 0.3768 | Not Significant | |
| Perplexity | 66% | 0.8371 | Not Significant | |
| Claude | 70% | 0.5491 | Not Significant |