notesum.ai

Published at November 18

MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs

cs.DC
cs.AI

Released Date: November 18, 2024

Authors: Shiyi Cao, Shu Liu, Tyler Griggs, Peter Schafhalter, Xiaoxuan Liu, Ying Sheng, Joseph E. Gonzalez, Matei Zaharia, Ion Stoica

Arxiv: http://arxiv.org/abs/2411.11217v1