notesum.ai
Published at November 4Context Parallelism for Scalable Million-Token Inference
cs.DC
cs.AI
cs.LG
Released Date: November 4, 2024
Authors: Amy, Yang, Jingyi Yang1, Aya Ibrahim1, Xinfeng Xie1, Bangsheng Tang1, Grigory Sizov1, Jongsoo Park1, Jianyu Huang1
Aff.: 1Meta Platforms, Inc., Menlo Park, California, USA

| Miss Rate | pass-KV | pass-Q | ||
|---|---|---|---|---|
| 126720 | 1280 | 1.00% | 1023.39 | 898.71 |
| 124800 | 3200 | 2.50% | 1110.18 | 1046.43 |
| 123840 | 4160 | 3.25% | 1298.92 | 1280.1 |
| 121600 | 6400 | 5.00% | 1305.56 | 1302.01 |
| 115200 | 12800 | 10.00% | 2080.67 | 2205.27 |
| 102400 | 25600 | 20.00% | 3353.02 | 3617.02 |
| 89600 | 38400 | 30.00% | 4629.23 | 4922.52 |
| 76800 | 51200 | 40.00% | 5745.08 | 6217.83 |
| 64000 | 64000 | 50.00% | 6845.21 | 7367.99 |
| 51200 | 76800 | 60.00% | 7890.35 | 8468.66 |
| 38400 | 89600 | 70.00% | 8697.27 | 9666.62 |
| 25600 | 102400 | 80.00% | 10105.78 | 10652.39 |
| 12800 | 115200 | 90.00% | 11136.4 | 11571.62 |
| 0 | 128000 | 100.00% | 11462.15 | 12360.57 |