notesum.ai
Published at November 4Targeted Manipulation and Deception Emerge when Optimizing LLMs for User Feedback
cs.LG
cs.AI
Released Date: November 4, 2024
Authors: Marcus Williams1, Micah Carroll1, Adhyyan Narang2, Constantin Weisser3, Brendan Murphy4, Anca Dragan5
Aff.: 1MATS; 2University of Washington; 3MATS & Haize Labs; 4Independent; 5UC Berkeley