notesum.ai
Published at December 9A Flexible Template for Edge Generative AI with High-Accuracy Accelerated Softmax & GELU
cs.AR
Released Date: December 9, 2024
Authors: Andrea Belano1, Yvan Tortorella, Angelo Garofalo, Luca Benini, Davide Rossi, Francesco Conti
Aff.: 1Department of Electrical, Electronic, and Information Engineering (DEI), University of Bologna, Bologna, Italy

| Tambe et al. [36] | ITA [20] | Keller et al.[21] | ViTA [39] | Dumoulin et al.[40] | This Work | |||||||
| Data Format | FP8 | INT8 | INT8 | INT8 | INT8 | BF16 | ||||||
| Technology (nm) | 12 | 22 | 5 | 28 | 28 | 12 | ||||||
| Area (mm2) | 4.60 | 0.991 | 0.153 | 2.00 | 1.48 | 1.21 | ||||||
| Voltage (V) | 0.62-1.0 | 0.65 | 0.46-1.05 | 1.05 | - | 0.55-0.8 | ||||||
| Power (mW) | 10-122 | 132 | - | 217 | 18.4 | 110-581 | ||||||
| Frequency (MHz) | 77-717 | 425 | 152-1760 | 200 | 100 | 460-1120 | ||||||
| MAC Units | 256 | 1024 | 512 | 512 | 256 | 192 | ||||||
| On-Chip SRAM (KiB) | 647 | 128 | 141 | 48 | 512 | 256 | ||||||
| Supported Nonlinearities | Softmax | Softmax | Softmax | Softmax, GELU | Softmax | Softmax, GELU | ||||||
| Peak Throughput (GOPS) | 367 | 870 | 1800 | 204 | 51.2 | 430 | ||||||
| Peak Energy Efficiency (TOPS/W) | 3.0 | 5.49 | 39.1∗ | 0.943 | 2.78 | 1.61 |