notesum.ai
Published at October 31ALISE: Accelerating Large Language Model Serving with Speculative Scheduling
cs.PF
cs.AI
Released Date: October 31, 2024
Authors: Youpeng Zhao1, Jun Wang1
Aff.: 1University of Central Florida, Orlando, FL, USA

| Metrics | OPT-2.7B | OPT-6.7B | OPT-13B | |||
|---|---|---|---|---|---|---|
| Proxy-based | Retrieval-based | Proxy-based | Retrieval-based | Proxy-based | Retrieval-based | |
| Accuracy () | 0.781 | 0.821 | 0.712 | 0.856 | 0.634 | 0.744 |
| Pred. Error () | 0.122 | 0.057 | 0.145 | 0.096 | 0.178 | 0.123 |
| Avg. Pred. Latency () | 12.2 ms | 3.92 ms | 11.7 ms | 4.74 ms | 14.8 ms | 4.49 ms |
| Throughput () | ||||||