notesum.ai
Published at October 18Unlearning Backdoor Attacks for LLMs with Weak-to-Strong Knowledge Distillation
cs.CV
cs.AI
Released Date: October 18, 2024
Authors: Shuai Zhao1, Xiaobao Wu1, Cong-Duy Nguyen1, Meihuizi Jia1, Yichao Feng1, Luu Anh Tuan1
Aff.: 1Nanyang Technological University, Singapore

| Attack | Defense | LLaMA3 | Vicuna | Qwen2.5 | |||
| CA | ASR | CA | ASR | CA | ASR | ||
| BadNet | LoRA | 94.06 | 100 | 93.03 | 100 | 94.32 | 86.07 |
| Back Tr. | 93.16 | 41.37 | 91.35 | 42.20 | 92.00 | 36.17 | |
| SCPD | 81.61 | 35.21 | 81.35 | 40.00 | 83.42 | 34.58 | |
| ONION | 90.45 | 30.56 | 88.90 | 32.64 | 90.45 | 26.40 | |
| Prune | 93.03 | 39.29 | 91.23 | 35.14 | 92.39 | 7.90 | |
| W2SDefense | 93.81 | 6.24 | 93.55 | 8.32 | 92.13 | 2.91 | |
| InSent | LoRA | 94.32 | 99.79 | 92.39 | 82.33 | 92.65 | 100 |
| Back Tr. | 93.16 | 52.39 | 90.32 | 81.70 | 92.77 | 83.37 | |
| SCPD | 82.51 | 32.29 | 82.25 | 18.54 | 83.42 | 21.46 | |
| ONION | 92.64 | 98.33 | 89.93 | 88.77 | 90.19 | 98.75 | |
| Prune | 93.55 | 42.62 | 90.71 | 50.73 | 76.00 | 24.53 | |
| W2SDefense | 91.48 | 17.88 | 91.61 | 10.60 | 91.61 | 4.99 | |
| SynAttack | LoRA | 86.45 | 21.25 | 91.74 | 17.29 | 92.90 | 22.29 |
| Back Tr. | 86.58 | 18.96 | 66.45 | 81.46 | 91.48 | 22.50 | |
| SCPD | 79.02 | 20.00 | 81.48 | 12.71 | 82.51 | 17.08 | |
| ONION | 83.61 | 26.66 | 89.80 | 18.33 | 91.87 | 23.54 | |
| Prune | 85.68 | 21.88 | 91.48 | 22.71 | 80.39 | 33.13 | |
| W2SDefense | 90.97 | 15.83 | 91.87 | 8.96 | 90.06 | 15.83 | |