notesum.ai

Published at November 25

BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching

Released Date: November 25, 2024

Authors: Yilong Zhao1, Shuo Yang1, Kan Zhu2, Lianmin Zheng3, Baris Kasikci2, Yang Zhou1, Jiarong Xing1, Ion Stoica1

Aff.: 1University of California, Berkeley; 2University of Washington; 3xAI

Arxiv: http://arxiv.org/abs/2411.16102v1