notesum.ai
Published at December 3The Asymptotic Behavior of Attention in Transformers
cs.AI
cs.LG
cs.SY
eess.SY
math.DS
math.OC
Released Date: December 3, 2024
Authors: Álvaro Rodríguez Abella, João Pedro Silvestre, Paulo Tabuada

| Full attention | Causal attention (auto-regressive) | |||
| Section | §3 | §4 | §5 | §6 |
| # of heads | ||||
| Time invariant, symmetric, positive definite | Time varying, uniformly continuous, bounded | Time varying, bounded | Time varying, bounded | |
| Identity | Identity | Identity | Time invariant, symmetric | |
| Result | Theorem 3.2 | Theorem 4.1 | Theorem 5.1 | Theorem 6.1 |
| Statement | Gradient flow, convergence to equilibrium | Convergence to consensus | Asympt. stability of consensus (determined by the first token) | Asympt. stability of consensus (determined by eigenspace of largest eigenvalue of ) |
| Domain of attraction | Whole sphere | Some hemisphere | Conull (complement of zero measure) | Fixed hemisphere |