notesum.ai
Published at November 19When Backdoors Speak: Understanding LLM Backdoor Attacks Through Model-Generated Explanations
cs.CR
cs.AI
Released Date: November 19, 2024
Authors: Huaizhi Ge1, Yiming Li2, Qifan Wang3, Yongfeng Zhang4, Ruixiang Tang4
Aff.: 1Columbia University; 2Nanyang Technological University; 3Meta AI; 4Rutgers University

| Exp | Jaccard Similarity | STS Similarity |
|---|---|---|
| 1 | 1.54e-08 | 8.92e-14 |
| 2 | 0.0270 | 3.07e-4 |
| 3 | 0.0210 | 0.0476 |
| 4 | 5.87e-15 | 1.95e-13 |
| 5 | 1.11e-10 | 5.35e-12 |
| 6 | 0.0347 | 0.951 |