notesum.ai

Published at November 26

Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats

cs.CL

Released Date: November 26, 2024

Authors: Jiaxin Wen¹, Vivek Hebbar², Caleb Larson³, Aryan Bhatt², Ansh Radhakrishnan⁴, Mrinank Sharma⁴, Henry Sleight³, Shi Feng⁵, He He⁵, Ethan Perez⁴, Buck Shlegeris², Akbir Khan⁶

Aff.: ¹Tsinghua University; ²Redwood Research; ³MATS; ⁴Anthropic; ⁵George Washington University; ⁶Anthropic, UCL

Arxiv: http://arxiv.org/abs/2411.17693v1

Safety	Query	Query	Other
Coefficient	Untrusted	Trusted	Micro-Protocols	Safety	Usefulness
$\alpha=1$	75%	14%	12%	99.24%	46.7%
$\alpha=20$	36%	56%	9%	99.88%	37.8%
$\alpha=100$	7%	92%	2%	99.99%	27.1%