notesum.ai

Published at October 21

Beyond 2:4: exploring V:N:M sparsity for efficient transformer inference on GPUs

cs.CV
cs.AI
cs.LG

Released Date: October 21, 2024

Authors: Kang Zhao1, Tao Yuan2, Han Bao2, Zhenfeng Su2, Chang Gao3, Zhaofeng Sun1, Zichen Liang1, Liping Jing3, Jianfei Chen1

Aff.: 1Tsinghua University; 2Huawei Noah's Ark Lab; 3Beijing Jiaotong University

Arxiv: https://arxiv.org/abs/2410.16135v1