notesum.ai
Published at November 27A Runtime-Adaptive Transformer Neural Network Accelerator on FPGAs
cs.AR
cs.LG
cs.SY
eess.SY
Released Date: November 27, 2024
Authors: Ehsan Kabir1, Austin R. J. Downey, Jason D. Bakos2, David Andrews1, Miaoqing Huang1
Aff.: 1Department of Electrical Engineering and Computer Science, University of Arkansas, USA; 2Department of Computer Science and Engineering, University of South Carolina, USA

| Accelerator | DSP | LUT | GOPS | Power | (GOPS/DSP) | (GOPS/LUT) | GOPS/ | Method | Sparsity \bigstrut[t] |
| (W) | 1000 | 1000 | Power | \bigstrut[b] | |||||
| Network #1 | Shallow Transformer \bigstrut | ||||||||
| Fang et al. (fang_algorithm, ) | 4160 (34%) | 464 k (27%) | 1467 | 27 | 353 | 3.16 | 13 | HDL | 75% \bigstrut |
| Qi et al. (qi_accommodating_2021, ) | 3572 (52%) | 485 k (41%) | 14 | – | 3.92 | 0.03 | – | HLS | 80% \bigstrut |
| Qi et al. (qi_accelerating_2021, ) | 5040 (74%) | 908 k (76%) | 12 | – | 2.38 | 0.013 | – | 86% \bigstrut | |
| ADAPTOR | 3612 (40%) | 391 k (30%) | 27 | 11.8 | 7.47 | 0.069 | 2.28 | 0% \bigstrut | |
| \bigstrut | |||||||||
| Network #2 | Custom Transformer Encoder \bigstrut | ||||||||
| Qi et al. (qi_accelerating_2021, ) | 4145 (60%) | 937 k (79%) | 75.94 | – | 18 | 0.08 | – | HLS | 0% \bigstrut |
| ADAPTOR | 3612 (40%) | 391 k (30%) | 132 | 11.8 | 37 | 0.34 | 11 | \bigstrut | |
| \bigstrut | |||||||||
| Network #3 | BERT \bigstrut | ||||||||
| Ftrans (li_ftrans_2020, ) | 6531 (95%) | 451 k (38%) | 1053 | 25.06 | 161 | 2.33 | 42 | HLS | 93% \bigstrut |
| FQ-BERT (liu_hardware_2021, ) | 1751 (69%) | 123 k (45%) | 254 | 9.8 | 145 | 2.06 | 26 | 87% \bigstrut | |
| Tzanos et al. (tzanos_hardware_2022, ) | 5861 (85%) | 910 k (77%) | 65.7 | – | 11.2 | 0.07 | – | 0% \bigstrut | |
| TRAC (plagwitz_trac_2022, ) | 1379 (80%) | 126 k (55%) | 128 | – | 93 | 1.01 | – | – \bigstrut | |
| ADAPTOR | 3612 (40%) | 391 k (30%) | 40 | 11.8 | 11 | 0.10 | 3.39 | 0% \bigstrut | |