notesum.ai
Published at October 21Compute-Constrained Data Selection
cs.CR
cs.AI
Released Date: October 21, 2024
Authors: Junjie Oscar Yin1, Alexander M. Rush2
Aff.: 1Johns Hopkins University; 2Cornell University

| Training Data | Data Selection Method | Model Size | Target Task |
|---|---|---|---|
| 2.5% | Random | Llama2 7B | MMLU |
| 5% | BM25 | Llama3 8B | BBH |
| 10% | Embed | Llama2 13B | IFEval |
| 25% | PPL | Llama2 70B | |
| 50% | LESS | ||
| 100% |