notesum.ai

Published at November 9

cs.CL

cs.AI

Released Date: November 9, 2024

Authors: Chong Zhang¹, Mingyu Jin², Dong Shu³, Taowen Wang⁴, Dongfang Liu¹, Xiaobo Jin¹

Aff.: ¹Xi'an Jiaotong-Liverpool University; ²Rutgers University; ³Northwestern University; ⁴Rochester Institute of Technology

Models	SQuAD2.0			Math			GSM8K
Models	$\mathcal{A}_{\textrm{clean}}$	$\mathcal{A}_{\textrm{attack}}$	ASR $\uparrow$	$\mathcal{A}_{\textrm{clean}}$	$\mathcal{A}_{\textrm{attack}}$	ASR $\uparrow$	$\mathcal{A}_{\textrm{clean}}$	$\mathcal{A}_{\textrm{attack}}$	ASR $\uparrow$	Avg. Time
BertAttack	71.16	24.67	65.33	72.30	44.82	38.01	77.82	34.26	55.98	1.04s
DeepWordBug	71.16	65.68	7.70	72.30	48.36	33.11	77.82	25.67	67.01	1.18s
TextFooler	71.16	15.60	78.08	72.30	46.80	35.27	77.82	24.33	68.74	2.80s
TextBugger	71.16	60.14	16.08	72.30	47.75	33.96	77.82	52.61	32.40	1.57s
Stress Test	71.16	70.66	0.70	72.30	39.59	45.24	77.82	35.19	54.78	2.84s
Checklist	71.16	68.81	3.30	72.30	36.90	48.96	77.82	44.33	43.04	1.32s
Ours (Token Manipulation)	71.16	14.91	79.05	72.30	13.39	81.48	77.82	22.17	71.51	1.75s
Ours (Misleading Adversarial Attack)	71.16	12.08	83.02	72.30	33.39	53.82	77.82	32.04	58.83	1.73s