notesum.ai
Published at November 25Fast training of large kernel models with delayed projections
stat.ML
Released Date: November 25, 2024
Authors: Amirhesam Abedsoltan1, Siyuan Ma2, Parthe Pandit3, Mikhail Belkin1
Aff.: 1UC San Diego; 2Google; 3IIT Bombay

| Model size | Method | CIFAR5M*(M) | CIFAR5M (M) | Librispeech (M) | Webvision (M) |
| p = 64K | EigenPro 4 | 5m (4.6x, 88%) | 3m (15x, 69%) | 16m (9.1x, 86.8%) | 2m (45.5x, 24.3%) |
| EigenPro 3 | 23m (1x, 88.3%) | 45m (1x, 68.8%) | 145m (1x, 85.4%) | 91m (1x, 24%) | |
| Falkon | 3m (7.67x, 86.1%) | 5m (9x, 57.7%) | 9m (16.11x, 81.0%) | 4m (22.75x, 21.7%) | |
| p = 128K | EigenPro 4 | 5m (10x, 88.25%) | 4m (26.25x, 70.9%) | 19m (17.95x, 87.8%) | 4m (49.75x, 24.9%) |
| EigenPro 3 | 50m (1x, 88.42%) | 105m (1x, 70.3%) | 341m (1x, 84.75%) | 199m (1x, 24.5%) | |
| Falkon | 9m (5.56x, 86.55%) | 11m (9.55x, 59.4%) | 21m (16.24x, 82.30%) | 13m (15.31x, 22.4%) | |
| p = 256K | EigenPro 4 | 7m (18.3x, 88.61%) | 6m (130.8x, 71.8%) | 24m (120x, 88.33%) | 5m (106.2x, 26%) |
| EigenPro 3 | 128m (1x, 88.61%) | 785m (1x, 70.53%) | 2 days (1x) | 531m (1x, 25.52%) | |
| Falkon | 38m (3.37x, 86.73%) | OOM | OOM | OOM | |
| p = 512K | EigenPro 4 | 12m (44.25x, 88.58%) | 10m ( 288x, 72.9%) | 36m ( 200x, 88.89%) | 11m (240x, 27.3%) |
| EigenPro 3 | 531m (1x, 88.56%) | 2 days (1x) | 5 days (1x) | 2 days (1x) | |
| Falkon | 240m (2.21x, 86.71%) | OOM | OOM | OOM | |
| p = 1M | EigenPro 4 | 21m ( 274x, 88.7%) | 17m ( 508x, 73.8%) | 70m ( 411x, 89.5%) | 21m ( 686x, 29.3%) |
| EigenPro 3 | 4 days (1x) | 6 days (1x) | 20 days (1x) | 10 days (1x) | |
| Falkon | OOM | OOM | OOM | OOM |