notesum.ai
Published at December 4Does Safety Training of LLMs Generalize to Semantically Related Natural Prompts?
cs.CL
cs.AI
Released Date: December 4, 2024
Authors: Sravanti Addepalli1, Yerram Varun1, Arun Suggala1, Karthikeyan Shanmugam1, Prateek Jain1
Aff.: 1Google DeepMind

| Category | gpt-3.5 (turbo-1106) | gpt-4 (0125-preview) | Gemma-2 (27B) | |||
|---|---|---|---|---|---|---|
| Para-QA | ReG-QA | Para-QA | ReG-QA | Para-QA | ReG-QA | |
| Disinformation | 50 | 70 | 10 | 30 | 20 | 50 |
| Economic Harm | 70 | 90 | 30 | 90 | 20 | 80 |
| Expert Advice | 40 | 80 | 30 | 60 | 10 | 60 |
| Fraud/Deception | 80 | 100 | 50 | 80 | 70 | 100 |
| Government decision-making | 80 | 100 | 80 | 100 | 70 | 100 |
| Harassment/Discrimination | 40 | 100 | 20 | 80 | 10 | 70 |
| Malware/Hacking | 90 | 100 | 80 | 100 | 70 | 100 |
| Physical Harm | 50 | 100 | 10 | 100 | 10 | 80 |
| Privacy | 100 | 100 | 70 | 90 | 70 | 90 |
| Sexual/Adult Context | 60 | 90 | 30 | 90 | 10 | 90 |
| Overall | 66 | 93 | 41 | 82 | 36 | 82 |