notesum.ai
Published at November 12Towards Low-bit Communication for Tensor Parallel LLM Inference
cs.AI
cs.LG
Released Date: November 12, 2024
Authors: Harry Dong1, Tyler Johnson2, Minsik Cho2, Emad Soroush2
Aff.: 1Carnegie Mellon University; 2Apple
