notesum.ai
Published at November 16HELENE: Hessian Layer-wise Clipping and Gradient Annealing for Accelerating Fine-tuning LLM with Zeroth-order Optimization
cs.AI
Released Date: November 16, 2024
Authors: Huaqin Zhao1, Jiaxi Li1, Yi Pan1, Shizhe Liang1, Xiaofeng Yang2, Wei Liu3, Xiang Li4, Fei Dou1, Tianming Liu1, Jin Lu1
Aff.: 1University of Georgia; 2Emory University; 3Mayo Clinic; 4Massachusetts General Hospital and Harvard Medical School

| Task Type | SST-2 | SST-5 | SNLI | MNLI | RTE | TREC |
|---|---|---|---|---|---|---|
| —— sentiment —— | —— natural language inference —— | — topic — | ||||
| Zero-shot | 79.0 | 35.5 | 50.2 | 48.8 | 51.4 | 32.0 |
| LP | 76.0 (2.8) | 40.3 (1.9) | 66.0 (2.7) | 56.5 (2.5) | 59.4 (5.3) | 51.3 (5.5) |
| FT | 91.9 (1.8) | 46.7 (1.9) | 77.5 (2.6) | 70.0 (2.3) | 66.4 (7.2) | 85.0 (2.5) |
| FT(LoRA) | 91.4 (1.7) | 46.7 (1.1) | 74.9 (4.3) | 67.7 (1.4) | 66.1 (3.5) | 82.7 (4.1) |
| FT(Prefix) | 91.9 (1.0) | 47.7 (1.1) | 77.2 (1.3) | 66.5 (2.5) | 66.6 (2.0) | 85.7 (1.3) |
| MeZO | 90.5 (1.2) | 45.5 (2.0) | 68.5 (3.9) | 58.7 (2.5) | 64.0 (3.3) | 76.9 (2.7) |
| MeZO (LoRA) | 91.4 (0.9) | 43.0 (1.6) | 69.7 (6.0) | 64.0 (2.5) | 64.9 (3.6) | 73.1 (6.5) |
| MeZO (Prefix) | 90.8 (1.7) | 45.8 (2.0) | 71.6 (2.5) | 63.4 (1.8) | 65.4 (3.9) | 80.3 (3.6) |
| HELENE | 91.6 (2.3) | 44.7 (0.8) | 70.0 (2.6) | 58.9 (1.1) | 65.7 (1.2) | 78.1 (1.5) |
| HELENE (LoRA) | 90.6 (0.3) | 41.8 (1.0) | 68.5 (2.0) | 59.0 (1.1) | 66.8 (3.2) | 67.4 (2.1) |
| HELENE (Prefix) | 91.7 (0.6) | 46.0 (0.7) | 69.5 (1.9) | 64.6 (2.1) | 66.1 (1.8) | 77.4 (2.1) |