notesum.ai
Published at October 24Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks
cs.IR
cs.AI
Released Date: October 24, 2024
Authors: Graziano A. Manduzio, Federico A. Galatolo, Mario G. C. A. Cimino, Enzo Pasquale Scilingo, Lorenzo Cominelli

| Model | Dataset | FOL | GSM8K | Overall |
|---|---|---|---|---|
| original | Training set | 76.47% | 12.82% | 44.65% |
| original | Test set | 90.19% | 2.56% | 46.38% |
| original | Whole set | 81.04% | 8.26% | 44.65% |
| fine-tuned | Training set | 88.75% | 16.81% | 52.78% |
| fine-tuned | Test set | 99.75% | 2.77% | 51.26% |
| fine-tuned | Whole set | 92.42% | 10.57% | 51.50% |
| Model | Dataset | FOL | GSM8K | Overall |
| original | Training set | 83.18% | 20.22% | 51.70% |
| original | Test set | 94.14% | 7.22% | 50.68% |
| original | Whole set | 86.84% | 14.45% | 50.65% |
| fine-tuned | Training set | 90.50% | 26.36% | 58.43% |
| fine-tuned | Test set | 99.75% | 8.18% | 53.97% |
| fine-tuned | Whole set | 93.58% | 18.28% | 55.93% |
| and | ||||
| Model | Dataset | FOL | GSM8K | Overall |
| original | Training set | 79.83% | 16.52% | 48.18% |
| original | Test set | 92.16% | 4.89% | 48.53% |
| original | Whole set | 83.94% | 11.35% | 47.65% |
| fine-tuned | Training set | 89.62% | 21.58% | 55.60% |
| fine-tuned | Test set | 99.75% | 5.47% | 52.61% |
| fine-tuned | Whole set | 93.00% | 14.42% | 53.71% |