notesum.ai
Published at November 5Stochastic Monkeys at Play: Random Augmentations Cheaply Break LLM Safety Alignment
cs.LG
cs.AI
Released Date: November 5, 2024
Authors: Jason Vega1, Junsheng Huang2, Gaokai Zhang2, Hangoo Kang1, Minjia Zhang1, Gagandeep Singh3
Aff.: 1University of Illinois Urbana-Champaign; 2University of Illinois Urbana-Champaign, Zhejiang University; 3University of Illinois Urbana-Champaign, VMware Research

| FPR | FNR | Avg | ||||||
| Augmentation | ||||||||
| None | 0.000 | 0.024 | 0.024 | 0.078 | 0.078 | 0.051 | 0.051 | |
| String Insertion | Suffix | 0.000 | 0.125 | 0.125 | 0.027 | 0.027 | 0.076 | 0.076 |
| Prefix | 0.000 | 0.055 | 0.055 | 0.044 | 0.044 | 0.050 | 0.050 | |
| Any | 0.080 | 0.129 | 0.065 | 0.051 | 0.102 | 0.090 | 0.083 | |
| Character-Level | Edit | 0.080 | 0.197 | 0.049 | 0.000 | 0.102 | 0.098 | 0.076 |
| Insert | 0.040 | 0.156 | 0.073 | 0.025 | 0.100 | 0.091 | 0.086 | |
| Delete | 0.040 | 0.173 | 0.107 | 0.067 | 0.078 | 0.120 | 0.092 | |
| Overall | 0.000 | 0.112 | 0.112 | 0.038 | 0.038 | 0.075 | 0.075 | |