notesum.ai
Published at November 26Star Attention: Efficient LLM Inference over Long Sequences
cs.CL
cs.AI
cs.LG
Released Date: November 26, 2024
Authors: Shantanu Acharya1, Fei Jia1, Boris Ginsburg1
Aff.: 1NVIDIA

| Model | Seq. Len. | Block Size | Ring-Attn | Star-Attn | |
| (K) | (K) | Acc.(%) | Acc. | Speedup | |
| 16 | 4 | 86.12 | +2.47% | 1.1x | |
| 32 | 8 | 82.52 | +1.54% | 1.2x | |
| 64 | 16 | 79.05 | +1.28% | 1.8x | |
| Llama-3-8B-Instruct, 1048K Gradient.ai (2024) | 128 | 32 | 77.39 | +1.23% | 2.7x |
| 16 | 4 | 95.09 | -2.85% | 1.7x | |
| 32 | 8 | 94.61 | -2.70% | 2.0x | |
| Llama-3.1-70B-Instruct, 128K Meta-AI (2024) | 64 | 16 | 88.54 | -1.63% | 4.7x |