notesum.ai
Published at December 4FlashAttention on a Napkin: A Diagrammatic Approach to Deep Learning IO-Awareness
cs.LG
Released Date: December 4, 2024
Authors: Vincent Abbott1, Gioele Zardini2
Aff.: 1University College London; 2Massachusetts Institute of Technology

| Variable | Size | Location | |
|---|---|---|---|
| A | Queries | Registers | |
| B | Keys/Values Stream | SMEM | |
| C | Transposed Keys | SMEM | |
| D | Auxiliary | Registers | |
| E | Output | Registers | |
| F | Tensor Core Primary | Registers | |
| G | Tensor Core Secondary | Registers | |
| H | Tensor Core Accumulator | Registers | |
| I | Transfer Cache | SMEM | |
| J | Processed Values | Registers | |
| K | Subloop Cache | SMEM |