notesum.ai
Published at November 26Toward High-Performance LLM Serving: A Simulation-Based Approach for Identifying Optimal Parallelism
cs.DC
Released Date: November 26, 2024
Authors: Yi-Chien Lin1, Woosuk Kwon, Ronald Pineda2, Fanny Nina Paravecino
Aff.: 1University of Southern California, Los Angeles, California, USA; 2University of California, Los Angeles, Los Angeles, California, USA

| Traces | Context Lengths | Generation Lengths | # of requests |
|---|---|---|---|
| Summarization | 2742.11944.33 | 172.2273.17 | 1188 |
| Creation | 306.8281.03 | 1128.34419.64 | 512 |
| Chat | 73.32148.65 | 189.47174.18 | 1024 |