notesum.ai
Published at November 10SequentialBreak: Large Language Models Can be Fooled by Embedding Jailbreak Prompts into Sequential Prompt Chains
cs.CR
cs.AI
cs.CL
cs.LG
Released Date: November 10, 2024
Authors: Bijoy Ahmed Saiem1, MD Sadik Hossain Shanto1, Rakib Ahsan1, Md Rafi ur Rashid2
Aff.: 1Bangladesh University of Engineering and Technology, Dhaka, Bangladesh; 2Pennsylvania State University, PA, USA

| Method | Llama-3 | Gemma-2 | Vicuna | GPT-4o |
|---|---|---|---|---|
| PAIR[15] | 10% | 21% | 52% | 35% |
| DeepInception[16] | 8% | 24% | 92% | 36% |
| ReneLLM[17] | 48% | 88% | 92% | 81% |
| Question Bank T1 | 88% | 80% | 93% | 90% |
| Question Bank T2 | 98% | 85% | 100% | 98% |
| Dialog Completion T1 | 99% | 100% | 100% | 99% |
| Dialog Completion T2 | 35% | 92% | 97% | 84% |
| Game Scenario T1 | 91% | 99% | 34% | 90% |
| Game Scenario T2 | 80% | 91% | 100% | 96% |