notesum.ai

Published at October 22

Towards Reliable Evaluation of Behavior Steering Interventions in LLMs

cs.LG
cs.AI

Released Date: October 22, 2024

Authors: Itamar Pres1, Laura Ruis2, Ekdeep Singh Lubana1, David Krueger3

Aff.: 1University of Michigan; 2University College London; 3University of Cambridge

Arxiv: https://arxiv.org/abs/2410.17245v1