notesum.ai
Published at May 13Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters
NeurIPS
Released Date: May 13, 2024
Authors: Haibo Jin1, Andy Zhou2, Joe D. Menke1, Haohan Wang1
Aff.: 1School of Information Sciences University of Illinois at Urbana-Champaign, Champaign, IL 61820; 2Computer Science Lapis Labs University of Illinois at Urbana-Champaign, Champaign, IL 61820
Arxiv: https://openreview.net/pdf/aecaf57ee9a4cd36e01edfe38d57f5b8a2ba3164.pdf