notesum.ai
Published at November 25MixPE: Quantization and Hardware Co-design for Efficient LLM Inference
cs.AI
cs.AR
Released Date: November 25, 2024
Authors: Yu Zhang1, Mingzi Wang2, Lancheng Zou1, Wulong Liu3, Hui-Ling Zhen3, Mingxuan Yuan3, Bei Yu1
Aff.: 1The Chinese University of Hong Kong; 2Tsinghua University; 3Huawei Noah's Ark Lab
