notesum.ai
Published at November 5Specialized Foundation Models Struggle to Beat Supervised Baselines
cs.LG
cs.AI
cs.CV
q-bio.GN
Released Date: November 5, 2024
Authors: Zongzhe Xu1, Ritvik Gupta1, Wenduo Cheng1, Alexander Shen1, Junhong Shen1, Ameet Talwalkar1, Mikhail Khodak2
Aff.: 1Carnegie Mellon University; 2Princeton University

| Model | Model | Pretraining | Avg. | Avg. | Mean | Median |
|---|---|---|---|---|---|---|
| Size | Base-Pairs | Score | Rank | %Imp. | %Imp. | |
| Foundation Models | ||||||
| Enformer | 252M | 4B | 0.569 | 11.86 | 27.73 | 27.91 |
| NT-1000G (500M) | 500M | 20.5T | 0.625 | 10.52 | 33.48 | 36.74 |
| NT-1000G (2.5B) | 2.5B | 20.5T | 0.656 | 7.0 | 36.58 | 40.86 |
| NT-Multispecies (500M) | 500M | 174B | 0.700 | 3.81 | 40.76 | 45.07 |
| NT-Multispecies (2.5B) | 2.5B | 174B | 0.697 | 4.08 | 40.51 | 45.52 |
| DNABERT-2 | 117M | 32.5B | 0.680 | 6.88 | 38.65 | 43.59 |
| HyenaDNA-1K | 1.6M | 3.2B | 0.708 | 6.92 | 41.2 | 43.36 |
| HyenaDNA-32K | 1.6M | 3.2B | 0.630 | 10.22 | 33.96 | 36.93 |
| Caduceus-PS | 1.9M | 35B | 0.689 | 6.69 | 39.08 | 41.38 |
| Caduceus-PH | 1.9M | 35B | 0.725 | 4.69 | 42.63 | 45.01 |
| Supervised Methods | ||||||
| Wide ResNet | 2.0M | 0 | 0.694 | 6.83 | 37.16 | 43.08 |
| UNet | 4.5M | 0 | 0.68 | 7.78 | 38.67 | 42.69 |
| DASHA (our workflow) | 10.5M | 0 | 0.761 | 3.69 | 46.33 | 49.08 |