notesum.ai
Published at November 11Model Fusion through Bayesian Optimization in Language Model Fine-Tuning
cs.AI
cs.CL
Released Date: November 11, 2024
Authors: Chaeyun Jang1, Hyungi Lee1, Jungtaek Kim2, Juho Lee1
Aff.: 1KAIST; 2University of Pittsburgh

| RoBERTa-base | t5-base | ||||||
| Method | RTE (Acc) | MRPC (F1) | SST-2 (Acc) | QNLI (Acc) | QQP (F1) | MNLI (Acc) | SQuAD2.0 (F1/EM) |
| Grid Fine-Tune | 77.78 | 92.39 | 94.87 | 92.62 | 88.16 | 87.41 | 78.18/72.83 |
| HPBO (Full) | 78.57 | 92.78 | 95.11 | 93.01 | 88.58 | 87.46 | 78.28/73.29 |
| SWA | 78.62 | 92.24 | 95.42 | 92.81 | 88.49 | 87.41 | 80.31/74.85 |
| OTfusion | 77.08 | 92.82 | 94.27 | 92.22 | 88.34 | 87.43 | 80.75/74.99 |
| Greedy SWA | 80.70 | 92.83 | 95.54 | 93.16 | 88.64 | 87.45 | 80.63/75.44 |
| Learned SWA | 81.40 | 92.81 | 95.31 | 92.94 | 88.38 | 87.41 | 80.65/74.23 |
| TWA | 81.23 | 91.58 | 95.54 | 93.00 | 87.85 | 87.42 | 80.29/74.79 |
| BOMF† (ours) |
81.75 |
93.37 |
95.65 |
94.83 |
88.66 | 87.51 | 80.82/75.79 |
| BOMF (ours) | 81.40 |
93.90 |
95.54 | 93.50 |
88.68 |
87.86 |
81.82 76.21 |