notesum.ai
Published at December 10SpecFuse: Ensembling Large Language Models via Next-Segment Prediction
cs.CL
cs.AI
Released Date: December 10, 2024
Authors: Bo Lv1, Chen Tang2, Yanan Zhang3, Xin Liu4, Yue Yu4, Ping Luo1
Aff.: 1Chinese Academy of Sciences; 2The University of Manchester; 3University of Chinese Academy of Sciences; 4Peng Cheng Laboratory

| Model | Rouge1 | Rouge2 | RougeL | BLEU | Bart-S | Bert-S | GPT4-R |
|---|---|---|---|---|---|---|---|
| Base LLMs | |||||||
| Gemma-2-9B | 29.1486 | 7.6523 | 18.3456 | 3.3647 | -4.2845 | 68.7312 | 8.5471 |
| Qwen2-7B | 29.9296 | 8.0901 | 20.0345 | 3.6181 | -4.3271 | 69.9989 | 6.5122 |
| Mistral-7B | 30.9890 | 8.6498 | 20.6568 | 4.4205 | -4.4800 | 70.1000 | 6.6230 |
| Glm-4-9B | 30.8761 | 8.7092 | 20.4193 | 4.4674 | -4.3018 | 70.2452 | 5.2068 |
| Larger LLMs | |||||||
| Llama-3-70B | 27.7816 | 7.0486 | 20.2227 | 4.1399 | -4.5517 | 68.5211 | 7.4415 |
| Qwen2-72B | 31.4356 | 8.9688 | 22.4781 | 4.8838 | -4.3368 | 70.6480 | 3.4485 |
| Ensemble Base LLMs | |||||||
| GF (Qwen2) | 28.6936 | 7.8675 | 18.9339 | 3.3169 | -4.4105 | 69.8126 | 8.3508 |
| GF (Mistral) | 30.2933 | 8.1220 | 20.3324 | 3.8817 | -4.5356 | 70.0437 | 7.1736 |
| GF (Glm-4) | 30.2643 | 8.6996 | 20.5103 | 4.2722 | -4.3316 | 70.2348 | 5.5864 |
| MBR | 30.9335 | 8.7132 | 20.6322 | 4.3149 | -4.3060 | 70.2266 | 4.9921 |
| \hdashlineSpecFuse | 31.8931 | 9.3475 | 23.5114 | 4.6383 | -4.2596 | 70.5199 | 3.7077 |
| SpecFuse(w/o ET) | 32.3152 | 9.4461 | 23.7639 | 4.7074 | -4.2759 | 70.5662 | 3.4023 |