notesum.ai
Published at November 4Improving Steering Vectors by Targeting Sparse Autoencoder Features
cs.LG
cs.AI
cs.CL
Released Date: November 4, 2024
Authors: Sviatoslav Chalnev1, Matthew Siu1, Arthur Conmy1
Aff.: 1Not specified

| Steering Goal | ActSteer | SAE | SAE-TS (ours) |
| Anger | 0.0976 | 0.0732 | 0.2302 |
| Christian | 0.0346 | 0.0901 | 0.3445 |
| Conspiracy | 0.1150 | 0.2097 | 0.3858 |
| French | 0.3324 | 0.0586 | 0.3040 |
| London | 0.0093 | 0.0073 | 0.5753 |
| Love | 0.1427 | 0.1082 | 0.4431 |
| Praise | 0.1291 | 0.2788 | 0.2754 |
| Want to die | 0.0323 | 0.0872 | 0.1992 |
| Wedding | 0.2119 | 0.2389 | 0.5500 |