notesum.ai
Published at November 14Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats in LLM-Based Agents
cs.AI
Released Date: November 14, 2024
Authors: Yuyou Gan1, Yong Yang1, Zhe Ma1, Ping He1, Rui Zeng1, Yiming Wang1, Qingming Li1, Chunyi Zhou1, Songze Li2, Ting Wang3, Yunjun Gao1, Yingcai Wu1, Shouling Ji1
Aff.: 1Zhejiang University, China; 2Southeast University, China; 3Stony Brook University, USA

| Year | Paper | Threat Source | Threats | Key Features | Specific Effects |
| 2023 | Liu et al. (Liu et al., 2023d) | Inputs | Adversarial Example | MM | Adversarial T2I generation. |
| 2023 | Li et al. (Li et al., 2023d) | Inputs | Adversarial Example | MRI | Adversarial dialogue. |
| 2024 | Wang et al. (Wang et al., 2024a) | Inputs | Adversarial Example | MM & MSI | Transferable adversarial example. |
| 2024 | Yin et al. (Yin et al., 2024) | Inputs | Adversarial Example | MM & MMIO | Multimodal and multiple tasks attack. |
| 2024 | Shen et al. (Shen et al., 2023) | Inputs | Adversarial Example | LC | Dynamic attention to enhance robustness. |
| 2023 | Qiang et al. (Qiang et al., 2023) | Inputs | Goal Hijacking | LC | Induce unwanted outputs. |
| 2024 | Pasquini et al. (Pasquini et al., 2024) | Inputs | Goal Hijacking | LC | Optimization-based prompt injection. |
| 2024 | Kimura et al. (Kimura et al., 2024) | Inputs | Goal Hijacking | MMIO | Redirect task execution. |
| 2024 | Wei et al. (Wei et al., 2024) | Inputs | Goal Hijacking | MRI | Manipulates context to influence outputs. |
| 2024 | Zhan et al. (Zhan et al., 2024) | Inputs | Goal Hijacking | MM & TI | Benchmark of indirect prompt injections. |
| 2023 | Greshake et al. (Greshake et al., 2023) | Inputs | Goal Hijacking | MSI | Indirect injection to manipulates outputs. |
| 2024 | Hui et al. (Hui et al., 2024) | Inputs | Prompt Leakage | LC | Extract system prompt. |
| 2024 | Yang et al. (Yang et al., 2024c) | Inputs | Prompt Leakage | LC | Steal target prompt. |
| 2024 | Shen et al. (Shen et al., 2024b) | Inputs | Prompt Leakage | MIO | Steal target prompt. |
| 2024 | Carlini et al. (Carlini et al., 2024b) | Inputs | Model Extraction | LC | Extract the parameter of the last layer. |
| 2023 | Li et al. (Li et al., 2024d) | Inputs | Model Extraction | LC | Extract the specialized code abilities. |
| 2023 | Zou et al. (Zou et al., 2023) | Inputs | Jailbreaking | LC | Generate adversarial jailbreak prompts. |
| 2023 | Yu et al. (Yu et al., 2023b) | Inputs | Jailbreaking | LC | Auto-generated jailbreak prompts. |
| 2023 | Shayegani et al. (Shayegani et al., 2023) | Inputs | Jailbreaking | MMIO | Induce harmful content generation. |
| 2024 | Anil et al. (Anil et al., 2024) | Inputs | Jailbreaking | MRI | Induce harmful content generation. |
| 2024 | Gu et al. (Gu et al., 2024) | Inputs | Jailbreaking | MIO & MSI | Malicious input trigger agent harm. |
| 2024 | Zhao et al. (Zhao et al., 2023b) | Model | Hallucination | MMIO | reducing hallucination via data augmentation. |
| 2024 | Favero et al. (Favero et al., 2024) | Model | Hallucination | MMIO | reducing hallucination via novel decoding. |
| 2023 | Liu et al. (Liu et al., 2023b) | Model | Hallucination | MMIO | reducing hallucination via Instruction tuning. |
| 2023 | Peng et al. (Peng et al., 2023a) | Model | Hallucination | MM | reducing hallucination via external databases. |
| 2023 | Chen et al. (Chen et al., 2023) | Model | Hallucination | MSI | reducing hallucination via standardization. |
| 2024 | Luo et al. (Luo et al., 2024b) | Model | Hallucination | MSI & MRI & MM | Benchmark for hallucination evaluation. |
| 2023 | Carlini et al. (Carlini et al., 2023b) | Model | Memorization | LC | Study the influence factors of memorization. |
| 2024 | Tang et al. (Tang et al., 2024c) | Model | Bias | LC | Gender bias measurement and mitigation. |
| 2023 | Xie et al. (Xie and Lukasiewicz, 2023) | Model | Bias | LC | Bias mitigation in pre-trained LMs. |
| 2023 | Limisiewicz et al. (Limisiewicz et al., 2023) | Model | Bias | LC | Debiasing algorithm through model adaptation. |
| 2024 | Howard et al. (Howard et al., 2024) | Model | Bias | MMIO | Bias measurement and mitigation in VLMs. |
| 2024 | D’Inca et al. (D’Incà et al., 2024) | Model | Bias | MMIO | Bias measurement in text-to-image models. |
| 2022 | Bagdasaryan et al. (Bagdasaryan and Shmatikov, 2022) | Combination | Backdoor | LC | Backdoors for propaganda-as-a-service. |
| 2024 | Hubinger et al. (Hubinger et al., 2024) | Combination | Backdoor | LC | Backdoors that persist through safety training. |
| 2023 | Dong et al. (Dong et al., 2023) | Combination | Backdoor | TI | Triggering unintended tool invocation. |
| 2024 | Liu et al. (Liu et al., 2024e) | Combination | Backdoor | MMIO & TI | Triggering unintended tool invocation. |
| 2024 | Xiang et al. (Chen et al., 2024d) | Combination | Backdoor | MM & TI | Corrupted memories causing errors in retrieval. |
| 2021 | Carlini et al. (Carlini et al., 2021) | Combination | Privacy Leakage | LC | Extract training data. |
| 2024 | Bagdasaryan et al. (Bagdasaryan et al., 2024) | Combination | Privacy Leakage | MI | User private information leakage. |
| 2024 | Zeng et al. (Zeng et al., 2024c) | Combination | Privacy Leakage | MM | Database private information leakage. |