notesum.ai

Published at November 12

Towards Low-bit Communication for Tensor Parallel LLM Inference

cs.AI
cs.LG

Released Date: November 12, 2024

Authors: Harry Dong1, Tyler Johnson2, Minsik Cho2, Emad Soroush2

Aff.: 1Carnegie Mellon University; 2Apple

Arxiv: http://arxiv.org/abs/2411.07942v1