notesum.ai

Published at December 6

LIAR: Leveraging Alignment (Best-of-N) to Jailbreak LLMs in Seconds

cs.CL

Released Date: December 6, 2024

Authors: James Beetham1, Souradip Chakraborty2, Mengdi Wang3, Furong Huang2, Amrit Singh Bedi1, Mubarak Shah1

Aff.: 1University of Central Florida; 2University of Maryland, College Park; 3Princeton University

Arxiv: http://arxiv.org/pdf/2412.05232v1