notesum.ai
Published at November 26Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
cs.CL
Released Date: November 26, 2024
Authors: Jiaxin Wen1, Vivek Hebbar2, Caleb Larson3, Aryan Bhatt2, Ansh Radhakrishnan4, Mrinank Sharma4, Henry Sleight3, Shi Feng5, He He5, Ethan Perez4, Buck Shlegeris2, Akbir Khan6
Aff.: 1Tsinghua University; 2Redwood Research; 3MATS; 4Anthropic; 5George Washington University; 6Anthropic, UCL

| Safety | Query | Query | Other | ||
|---|---|---|---|---|---|
| Coefficient | Untrusted | Trusted | Micro-Protocols | Safety | Usefulness |
| 75% | 14% | 12% | 99.24% | 46.7% | |
| 36% | 56% | 9% | 99.88% | 37.8% | |
| 7% | 92% | 2% | 99.99% | 27.1% |