notesum.ai

Published at November 26

Toward High-Performance LLM Serving: A Simulation-Based Approach for Identifying Optimal Parallelism

cs.DC

Released Date: November 26, 2024

Authors: Yi-Chien Lin1, Woosuk Kwon, Ronald Pineda2, Fanny Nina Paravecino

Aff.: 1University of Southern California, Los Angeles, California, USA; 2University of California, Los Angeles, Los Angeles, California, USA

Arxiv: http://arxiv.org/abs/2411.17651v1