notesum.ai
Published at November 25Adaptive Circuit Behavior and Generalization in Mechanistic Interpretability
cs.AI
cs.CL
Released Date: November 25, 2024
Authors: Jatin Nainani1, Sankaran Vaidyanathan1, AJ Yeung1, Kartik Gupta1, David Jensen1
Aff.: 1University of Massachusetts Amherst

| Task | Model Logit Difference | Circuit Logit Difference | Faithfulness |
|---|---|---|---|
| Base IOI | 3.484 | 3.119 | 0.895 |
| DoubleIO | 2.118 | 2.722 | 1.285 |
| TripleIO | 1.227 | 3.174 | 2.586 |