notesum.ai
Published at November 25When Babies Teach Babies: Can student knowledge sharing outperform Teacher-Guided Distillation on small datasets?
cs.CL
cs.AI
Released Date: November 25, 2024
Authors: Srikrishna Iyer1
Aff.: 1Artificial Intelligence - Data Analytics Strategic Technology Center, ST Engineering IHQ Ltd., Singapore

| Text-only 10M Dataset | ||||
|---|---|---|---|---|
| Model | BLiMP | Supp. | EWoK | GLUE |
| BabyLlama | 69.8 | 59.5 | 50.7 | 63.3 |
| LTG-BERT | 60.6 | 60.8 | 48.9 | 60.3 |
| RoBERTa-base | 49.6 | 48.9 | 51.6 | 42.5 |
| RoBERTa-DWML | 51.6 | 52.3 | 50.3 | 43.1 |
| Text-only 100M Dataset | ||||
| Model | BLiMP | Supp. | EWoK | GLUE |
| BabyLlama | 73.1 | 60.6 | 52.1 | 69.0 |
| LTG-BERT | 69.2 | 66.5 | 51.9 | 68.4 |
| RoBERTa-base | 49.8 | 46.8 | 50.25 | 43.4 |
| RoBERTa-DWML | 52.1 | 48.4 | 51.6 | 44.0 |