notesum.ai

Published at December 4

FlashAttention on a Napkin: A Diagrammatic Approach to Deep Learning IO-Awareness

cs.LG

Released Date: December 4, 2024

Authors: Vincent Abbott¹, Gioele Zardini²

Aff.: ¹University College London; ²Massachusetts Institute of Technology

Arxiv: http://arxiv.org/pdf/2412.03317v1

Refer to caption

	Variable	Size	Location
A	Queries	$\displaystyle{\color[rgb]{0.31,0.89,0.76}\definecolor[named]{pgfstrokecolor}{% rgb}{0.31,0.89,0.76}w}{\color[rgb]{0.31,0.89,0.76}{}_{\overline{q}}^{(32)}}{% \color[rgb]{0.31,0.89,0.76}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.31,0.89,0.76}\times d}{\color[rgb]{0.31,0.89,0.76}{}^{(64)}}$	Registers
B	Keys/Values Stream	$\displaystyle s_{\overline{x}}^{(16)}\times d^{(64)}$	SMEM
C	Transposed Keys	$\displaystyle d^{(64)}\times s_{\overline{x}}^{(16)}$	SMEM
D	Auxiliary	$\displaystyle{\color[rgb]{0.49,0.83,0.13}\definecolor[named]{pgfstrokecolor}{% rgb}{0.49,0.83,0.13}t}{\color[rgb]{0.49,0.83,0.13}{}_{\overline{q}}}{\color[% rgb]{0.49,0.83,0.13}\definecolor[named]{pgfstrokecolor}{rgb}{0.49,0.83,0.13}% \times 3}$	Registers
E	Output	$\displaystyle{\color[rgb]{0.49,0.83,0.13}\definecolor[named]{pgfstrokecolor}{% rgb}{0.49,0.83,0.13}t}{\color[rgb]{0.49,0.83,0.13}{}_{\overline{q}}}{\color[% rgb]{0.49,0.83,0.13}\definecolor[named]{pgfstrokecolor}{rgb}{0.49,0.83,0.13}% \times d}$	Registers
F	Tensor Core Primary	$\displaystyle{\color[rgb]{0.31,0.89,0.76}\definecolor[named]{pgfstrokecolor}{% rgb}{0.31,0.89,0.76}32\times 8}$	Registers
G	Tensor Core Secondary	$\displaystyle{\color[rgb]{0.31,0.89,0.76}\definecolor[named]{pgfstrokecolor}{% rgb}{0.31,0.89,0.76}8\times 16}$	Registers
H	Tensor Core Accumulator	$\displaystyle{\color[rgb]{0.31,0.89,0.76}\definecolor[named]{pgfstrokecolor}{% rgb}{0.31,0.89,0.76}32\times 16}$	Registers
I	Transfer Cache	$\displaystyle g_{\overline{q}}^{(32)}\times s_{\overline{x}}^{(16)}$	SMEM
J	Processed Values	$\displaystyle t_{\overline{q}}\times u_{\overline{x}}^{(8)}$	Registers
K	Subloop Cache	$\displaystyle g_{\overline{q}}^{(32)}\times d^{\prime(16)}$	SMEM