notesum.ai

Published at November 14

Communication Compression for Tensor Parallel LLM Inference

cs.LG
cs.AI
cs.CL

Released Date: November 14, 2024

Authors: Jan Hansen-Palmus1, Michael Truong-Le, Oliver Hausdörfer2, Alok Verma1

Aff.: 1Recogni; 2Technical University of Munich

Arxiv: http://arxiv.org/abs/2411.09510v1