notesum.ai
Published at November 21Lightweight Safety Guardrails Using Fine-tuned BERT Embeddings
cs.CL
Released Date: November 21, 2024
Authors: Aaron Zheng1, Mansi Rana2, Andreas Stolcke2
Aff.: 1Uniphore & UC Berkeley; 2Uniphore

| AEGIS (on-policy) | ||
| AUPRC | F1 | |
| LlamaGuardBase (Meta) | 0.930 | 0.62 |
| NeMo43B(Nvidia) | - | 0.83 |
| OpenAI Mod API | 0.895 | 0.34 |
| Perspective API | 0.860 | 0.24 |
| LlamaGuardDefensive (AEGIS) | 0.941 | 0.85 |
| LlamaGuardPermissive (AEGIS) | 0.941 | 0.76 |
| NeMo43B-Defensive (AEGIS) | - | 0.89 |
| WildGuard (most recent) | - | 0.89 |
| Our Sentence-BERT model | 0.946 | 0.89 |