notesum.ai

Published at October 30

InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models

cs.CL

cs.AI

cs.CR

Released Date: October 30, 2024

Authors: Hao Li¹, Xiaogeng Liu¹, Chaowei Xiao¹

Aff.: ¹University of Wisconsin-Madison

Arxiv: http://arxiv.org/abs/2410.22770v1

Refer to caption

		Performance				Overhead
Category	Model	Over-defense (%)	Benign (%)	Malicious (%)	Average (%)	GFLOPs	Inference (ms)	Efficiency
Prompt Guard Model	Fmops fmops (2024)	5.60	34.63	93.50	44.58	24.19	4.43	10.06
	Deepset Deepset (2024b)	5.31	34.06	91.50	43.62	60.45	15.22	2.87
	PromptGuard Meta (2024)	0.88	26.82	97.10	41.60	60.45	15.28	2.72
	ProtectAIv2 ProtectAI.com (2024)	56.64	86.20	48.60	63.81	60.45	15.77	4.05
	LakeraGuard LakeraAI (2024a)	87.61	90.89	53.19	77.23	-	710.41	0.11
Large Language Model	GPT-4o OpenAI (2024a)	86.73	90.78	79.10	85.53	-	7907.18	0.01
	Llama-2-chat Touvron et al. (2023)	76.40	61.03	31.09	56.17	1387.49	3111.36	0.02
	LlamaGuard3 Dubey et al. (2024)	99.71	95.18	28.28	74.39	1418.38	787.48	0.09
Prompt Guard Model	InjecGuard (Ours)	87.32	85.74	77.39	83.48	60.45	15.34	5.44