notesum.ai

Published at October 23

ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference

cs.AI
cs.MA

Released Date: October 23, 2024

Authors: Xin He1, Shunkang Zhang2, Yuxin Wang3, Haiyan Yin1, Zihao Zeng4, Shaohuai Shi5, Zhenheng Tang2, Xiaowen Chu6, Ivor Tsang1, Ong Yew Soon1

Aff.: 1CFAR, Agency for Science, Technology and Research (A*STAR); 2Hong Kong University of Science and Technology; 3Hong Kong Baptist University; 4Nanyang Technological University; 5Harbin Institute of Technology, Shenzhen; 6Hong Kong University of Science and Technology (Guangzhou)

Arxiv: https://arxiv.org/abs/2410.17954v1