notesum.ai
Published at October 18Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation
cs.CV
cs.AI
cs.LG
cs.SD
eess.AS
Released Date: October 18, 2024
Authors: Shuo Tang1, Xianghe Pang1, Zexi Liu1, Bohan Tang2, Rui Ye1, Xiaowen Dong2, Yanfeng Wang3, Siheng Chen1
Aff.: 1Shanghai Jiao Tong University; 2University of Oxford; 3Shanghai AI Laboratory

| Dataset (Base LLM = MATRIX-SFT-Model) | AlpacaEval 2 | Arena-Hard | |||
| LC (%) | WR (%) | SD | WR (%) | ||
| UltraFeedback (Cui et al., 2024) | 17.17 | 18.48 | 1.18 | 14.0 | |
| Magpie-PRO-DPO (Xu et al., 2024b) | 18.99 | 20.30 | 1.21 | 15.9 | |
| Orca (Mukherjee et al., 2023) | 17.26 | 20.10 | 1.19 | 15.2 | |
| ArgillaMix (argilla, 2024) | 9.75 | 11.15 | 0.94 | 11.3 | |
| MATRIX-Gen-DPO | 24.20 | 31.30 | 1.39 | 22.7 | |
| Llama-3-8B-Instruct (Dubey et al., 2024) | 22.92 | 22.57 | 1.26 | 20.6 | |