notesum.ai
Published at November 11Stronger Models are NOT Stronger Teachers for Instruction Tuning
cs.AI
cs.CL
Released Date: November 11, 2024
Authors: Zhangchen Xu1, Fengqing Jiang1, Luyao Niu1, Bill Yuchen Lin2, Radha Poovendran1
Aff.: 1University of Washington; 2Allen Institute for AI

| Response | AlpacaEval 2 | Arena-Hard | AP | |
| Generator Model | LC (%) | WR (%) | WR (%) | (%) |
| Gemma-2-9b-it | 16.09 | 13.70 | 13.7 | 14.90 |
| Gemma-2-27b-it | 13.93 | 13.31 | 12.4 | 13.17 |
| Llama-3-70b-Instruct | 10.55 | 10.68 | 6.7 | 8.62 |
| Llama-3.1-70b-Instruct | 9.52 | 10.10 | 8.3 | 8.91 |
| Qwen2.5-7B-Instruct | 13.50 | 14.33 | 10.6 | 12.05 |
| Qwen2.5-72B-Instruct | 19.20 | 21.01 | 13.1 | 16.15 |
| GPT-4 | 6.63 | 5.70 | 4.8 | 5.72 |