notesum.ai
Published at December 9Refusal Tokens: A Simple Way to Calibrate Refusals in Large Language Models
cs.LG
cs.CL
Released Date: December 9, 2024
Authors: Neel Jain1, Aditya Shrivastava2, Chenyang Zhu2, Daben Liu2, Alfy Samuel2, Ashwinee Panda1, Anoop Kumar2, Micah Goldblum3, Tom Goldstein1
Aff.: 1University of Maryland; 2Capital One; 3New York University

| Potential Approach |