notesum.ai

Published at November 19

When Backdoors Speak: Understanding LLM Backdoor Attacks Through Model-Generated Explanations

cs.CR
cs.AI

Released Date: November 19, 2024

Authors: Huaizhi Ge1, Yiming Li2, Qifan Wang3, Yongfeng Zhang4, Ruixiang Tang4

Aff.: 1Columbia University; 2Nanyang Technological University; 3Meta AI; 4Rutgers University

Arxiv: http://arxiv.org/abs/2411.12701v1