notesum.ai

Published at November 20

Topkima-Former: Low-energy, Low-Latency Inference for Transformers using top-k In-memory ADC

cs.AR

Released Date: November 20, 2024

Authors: Shuai Dong1, Junyi Yang1, Xiaoqi Peng1, Hongyang Shang1, Ye Ke1, Xiaofeng Yang2, Hongjie Liu2, Arindam Basu1

Aff.: 1Department of Electrical Engineering, City University of Hong Kong, Hong Kong; 2Reexen Technology, China

Arxiv: http://arxiv.org/abs/2411.13050v1