notesum.ai
Published at October 30InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models
cs.CL
cs.AI
cs.CR
Released Date: October 30, 2024
Authors: Hao Li1, Xiaogeng Liu1, Chaowei Xiao1
Aff.: 1University of Wisconsin-Madison

| Performance | Overhead | |||||||
| Category | Model | Over-defense (%) | Benign (%) | Malicious (%) | Average (%) | GFLOPs | Inference (ms) | Efficiency |
| Prompt Guard Model | Fmops fmops (2024) | 5.60 | 34.63 | 93.50 | 44.58 | 24.19 | 4.43 | 10.06 |
| Deepset Deepset (2024b) | 5.31 | 34.06 | 91.50 | 43.62 | 60.45 | 15.22 | 2.87 | |
| PromptGuard Meta (2024) | 0.88 | 26.82 | 97.10 | 41.60 | 60.45 | 15.28 | 2.72 | |
| ProtectAIv2 ProtectAI.com (2024) | 56.64 | 86.20 | 48.60 | 63.81 | 60.45 | 15.77 | 4.05 | |
| LakeraGuard LakeraAI (2024a) | 87.61 | 90.89 | 53.19 | 77.23 | - | 710.41 | 0.11 | |
| Large Language Model | GPT-4o OpenAI (2024a) | 86.73 | 90.78 | 79.10 | 85.53 | - | 7907.18 | 0.01 |
| Llama-2-chat Touvron et al. (2023) | 76.40 | 61.03 | 31.09 | 56.17 | 1387.49 | 3111.36 | 0.02 | |
| LlamaGuard3 Dubey et al. (2024) | 99.71 | 95.18 | 28.28 | 74.39 | 1418.38 | 787.48 | 0.09 | |
| Prompt Guard Model | InjecGuard (Ours) | 87.32 | 85.74 | 77.39 | 83.48 | 60.45 | 15.34 | 5.44 |