notesum.ai
Published at December 5Distributed Inference with Minimal Off-Chip Traffic for Transformers on Low-Power MCUs
cs.AR
Released Date: December 5, 2024
Authors: Severin Bochem1, Victor J. B. Jung, Arpan Prasad2, Francesco Conti3, Luca Benini4
Aff.: 1D-ITET, ETH Zurich, Switzerland; 2Integrated Systems Laboratory, ETH Zurich, Switzerland; 3DEI, and Information Engineering, University of Bologna, Italy; 4Integrated Systems Laboratory, ETH Zurich, Switzerland; DEI, and Information Engineering, University of Bologna, Italy

| Work | Model | Scale | Platform | Pipelining | Weight Duplication |
|---|---|---|---|---|---|
| Deepthings [20] | CNN | Low-Power | Raspberry Pi | No | Yes |
| Efficiently Scaling Transformer Inference [13] | Transformer | Datacenter | TPU | No | No |
| DeepSpeed Inference [12] | Transformer | Datacenter | GPU | Yes | No |
| When the Edge Meets Transformers [21] | Transformer | Low-Power | CPU | No | Yes |
| Hermes [22] | Transformer | Low-Power | CPU | Yes | No |
| Ours | Transformer | Extreme Edge | Siracusa (MCU) | No | No |