notesum.ai

Published at December 9

Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models

cs.LG
cs.CL

Released Date: December 9, 2024

Authors: Neel Jain1, Aditya Shrivastava2, Chenyang Zhu2, Daben Liu2, Alfy Samuel2, Ashwinee Panda1, Anoop Kumar2, Micah Goldblum3, Tom Goldstein1

Aff.: 1University of Maryland; 2Capital One; 3New York University

Arxiv: http://arxiv.org/pdf/2412.06748v1