notesum.ai
Published at November 25BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching
Released Date: November 25, 2024
Authors: Yilong Zhao1, Shuo Yang1, Kan Zhu2, Lianmin Zheng3, Baris Kasikci2, Yang Zhou1, Jiarong Xing1, Ion Stoica1
Aff.: 1University of California, Berkeley; 2University of Washington; 3xAI
