notesum.ai
Published at December 3Gracefully Filtering Backdoor Samples for Generative Large Language Models without Retraining
cs.CL
cs.AI
cs.CR
Released Date: December 3, 2024
Authors: Zongru Wu1, Pengzhou Cheng, Lingyong Fang, Zhuosheng Zhang, Gongshen Liu
Aff.: 1Shanghai Jiao Tong University

| Dataset | Attack | Vanilla | CUBE | MuScleLoRA | DeCE | CleanGen | GraCeFul | ||||||
| CACC | ASR | CACC | ASR | CACC | ASR | CACC | ASR | CACC | ASR | CACC | ASR | ||
| WebQA | Badnets | 47.19 | 99.21 | 38.09 | 0 | 22.83 | 1.52 | 45.11 | 91.88 | 31.05 | 0 | 45.72 | 0 |
| Addsent | 47.64 | 94.78 | 38.93 | 0 | 23.23 | 0.09 | 45.41 | 79.04 | 32.78 | 0 | 46.11 | 0 | |
| CBA | 46.36 | 81.79 | 38.58 | 0 | 22.34 | 0.83 | 45.11 | 47.69 | 32.53 | 0 | 44.54 | 0 | |
| FreebaseQA | Badnets | 63.45 | 99.30 | 60.45 | 0 | 39.85 | 0 | 60.25 | 95.15 | 29.10 | 0 | 63.55 | 0 |
| Addsent | 63.10 | 98.45 | 59.55 | 0 | 38.95 | 0 | 60.50 | 64.27 | 33.65 | 0 | 63.20 | 0 | |
| CBA | 62.35 | 95.35 | 58.95 | 0 | 39.05 | 0.05 | 61.20 | 95.60 | 33.25 | 0 | 64.25 | 0 | |
| NQ | Badnets | 74.25 | 97.80 | 72.70 | 0 | 66.30 | 82.70 | 74.65 | 99.40 | 32.65 | 0.05 | 73.25 | 0 |
| Addsent | 72.29 | 98.19 | 65.10 | 0 | 65.75 | 79.10 | 73.25 | 99.35 | 33.20 | 0.05 | 74.95 | 0 | |
| CBA | 72.09 | 95.78 | 73.25 | 0 | 64.70 | 11.85 | 72.50 | 14.35 | 32.05 | 0 | 73.75 | 0 | |
| CoQA | Badnets | 70.68 | 95.98 | 66.87 | 0 | 63.05 | 85.54 | 72.09 | 99.00 | 53.41 | 0 | 71.89 | 0 |
| Addsent | 72.29 | 98.19 | 69.08 | 0 | 61.85 | 71.91 | 70.89 | 81.17 | 54.42 | 0 | 71.49 | 0 | |
| CBA | 72.09 | 95.84 | 67.27 | 0 | 62.45 | 77.11 | 70.29 | 91.16 | 54.22 | 0.20 | 69.68 | 0 | |