notesum.ai

Published at December 6

Flash Communication: Reducing Tensor Parallelization Bottleneck for Fast Large Language Model Inference

cs.AI

Released Date: December 6, 2024

Authors: Qingyuan Li1, Bo Zhang, Liang Ye, Yifan Zhang, Wei Wu, Yerui Sun, Lin Ma, Yuchen Xie

Aff.: 1Meituan

Arxiv: http://arxiv.org/pdf/2412.04964v1