notesum.ai
Published at December 6GUIDE: A Global Unified Inference Engine for Deploying Large Language Models in Heterogeneous Environments
cs.AI
Released Date: December 6, 2024
Authors: Yanyu Chen1, Ganhong Huang1
Aff.: 1Sun Yat-sen University

| Error Metric | vLLM (%) | FastGen (%) |
|---|---|---|
| Batch Latency | 33.04 | 32.74 |
| TTFT | 33.31 | 41.43 |
| Decode Throughput | 51.43 | 54.94 |