notesum.ai
Published at October 23MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers
cs.CV
cs.AI
Released Date: October 23, 2024
Authors: Zebin Yang1, Renze Chen2, Taiqiang Wu3, Ngai Wong3, Yun Liang4, Runsheng Wang5, Ru Huang5, Meng Li4
Aff.: 1Institute for Artificial Intelligence, Peking University, Beijing, China; 2School of Computer Science, Peking University, Beijing, China; 3The University of Hong Kong, Hong Kong, China; 4School of Integrated Circuits, Peking University, Beijing, China; 5Beijing Advanced Innovation Center for Integrated Circuits, Beijing, China

| Notations | Meanings |
|---|---|
| Vocabulary size | |
| Sequence length, # heads, and embedding dimension | |
| # clusters | |
| Loop variables for clusters, tokens, and singular values | |
| Sequence length each tile | |
| Unitary matrices generated by SVD | |
| Embedding table and linear projection for the cluster | |
| Embedding vector for the token in cluster | |
| Vector of singular values | |
| NAS parameters for embedding compression | |
| NAS parameters for token in cluster in first stage NAS | |
| NAS parameters for singular value in cluster in second stage NAS | |
| Threshold for NAS parameters in cluster in second stage NAS | |
| Loop ranges of a matrix multiplication | |
| Loop variables of a matrix multiplication |