notesum.ai
Published at December 9SparseAccelerate: Efficient Long-Context Inference for Mid-Range GPUs
cs.CL
Released Date: December 9, 2024

| ctx | default | flashattention2 | spda | eager | our |
|---|---|---|---|---|---|
| 10 | 0.4381 | 0.3672 | 0.3994 | 0.3988 | 2.9376 |
| 50 | 0.4176 | 0.4035 | 0.3659 | 0.3873 | 3.6412 |
| 100 | 0.6134 | 0.3764 | 0.3794 | 0.4583 | 3.8863 |
| 500 | 0.4257 | 0.4464 | 0.4990 | 0.4506 | 4.2050 |
| 1000 | 0.5319 | 0.5265 | 0.5443 | 0.5036 | 3.9370 |
| 2000 | 0.8250 | 0.8339 | 0.7496 | 1.0656 | 4.2897 |
| 4000 | 1.2269 | 1.3181 | 1.1606 | 2.4236 | 4.5298 |
| 8000 | 2.2902 | 2.2521 | 2.3460 | - | 5.8242 |
| 16000 | 4.9286 | 4.7635 | 4.8810 | - | 6.4833 |
| 32000 | 11.6073 | 11.4526 | 11.5421 | - | 11.1209 |
| 64000 | - | - | - | - | 19.5932 |
| 128000 | - | - | - | - | 36.3091 |