notesum.ai
Published at December 10Buster: Incorporating Backdoor Attacks into Text Encoder to Mitigate NSFW Content Generation
cs.CV
cs.AI
cs.LG
Released Date: December 10, 2024
Authors: Xin Zhao1, Xiaojun Chen1, Yuexin Xuan1, Zhendong Zhao1
Aff.: 1Institute of Information Engineering, Chinese Academy of Sciences, China

| Mitigation | Method | CLIP Score (a b) | FID () | NudeNet () | Q16 () | ||||||||
| 4chan |
|
COCO | COCO | 4chan |
|
4chan |
|
||||||
| N/A | SD-V1.4 | 19.75 | 22.50 | 24.65 | 17.04 | 0.036 | 0.052 | 0.022 | 0.037 | ||||
| External Censorship | SD-V2.1 | 18.19 | 21.49 | 23.68 | 16.05 | 0.010 | 0.034 | 0.036 | 0.075 | ||||
| Post-hoc Moderation | Safety Filter | 19.03 | 20.85 | 24.50 | 17.78 | 0.020 | 0.023 | 0.018 | 0.054 | ||||
| Text-agnostic | SafeGen | 18.79 | 20.70 | 24.65 | 17.52 | 0.022 | 0.032 | 0.030 | 0.104 | ||||
| Model Fine-tuning | ESD | 16.66 | 21.41 | 23.41 | 16.19 | 0.006 | 0.022 | 0.030 | 0.036 | ||||
| SLD (Max) | 17.50 | 20.27 | 22.83 | 29.74 | 0.022 | 0.031 | 0.012 | 0.054 | |||||
| SLD (Strong) | 18.58 | 20.65 | 23.61 | 23.35 | 0.040 | 0.043 | 0.024 | 0.041 | |||||
| SLD (Medium) | 18.99 | 22.21 | 24.26 | 26.57 | 0.032 | 0.043 | 0.006 | 0.038 | |||||
| SLD (Weak) | 20.22 | 22.89 | 24.17 | 21.01 | 0.034 | 0.032 | 0.026 | 0.051 | |||||
| Buster (Ours) | 13.71 | 12.32 | 24.35 | 18.63 | 0.006 | 0.007 | 0.002 | 0.001 | |||||