notesum.ai

Published at November 25

MixPE: Quantization and Hardware Co-design for Efficient LLM Inference

cs.AI
cs.AR

Released Date: November 25, 2024

Authors: Yu Zhang1, Mingzi Wang2, Lancheng Zou1, Wulong Liu3, Hui-Ling Zhen3, Mingxuan Yuan3, Bei Yu1

Aff.: 1The Chinese University of Hong Kong; 2Tsinghua University; 3Huawei Noah's Ark Lab

Arxiv: http://arxiv.org/abs/2411.16158v1