notesum.ai

Published at November 17

SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration

cs.AI
cs.CV
cs.NE
cs.PF

Released Date: November 17, 2024

Authors: Jintao Zhang1, Haofeng Huang1, Pengle Zhang1, Jia Wei1, Jun Zhu1, Jianfei Chen1

Aff.: 1Tsinghua University

Arxiv: http://arxiv.org/abs/2411.10958v1