notesum.ai
Published at October 31What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
cs.CL
cs.AI
cs.LG
Released Date: October 31, 2024
Authors: Ming Li1, Yanhong Li2, Tianyi Zhou1
Aff.: 1University of Maryland; 2University of Chicago

| Dataset | Curve | Cot | Mean Absolute Difference (MAD) | |||
|---|---|---|---|---|---|---|
| Early | Middle | Last | All | |||
| AQuA | None | 5.76 | 4.13 | 3.49 | 4.42 | |
| Simplified | 0.89 | 0.52 | 0.77 | 0.69 | ||
| Detailed | 0.23 | 0.28 | 0.29 | 0.28 | ||
| None | 7.20 | 6.29 | 8.40 | 7.06 | ||
| Simplified | 1.01 | 0.56 | 1.11 | 0.81 | ||
| Detailed | 0.22 | 0.21 | 0.42 | 0.27 | ||
| None | 37.29 | 16.12 | 3.94 | 17.32 | ||
| Simplified | 5.08 | 2.14 | 0.86 | 2.36 | ||
| Detailed | 1.15 | 0.62 | 0.33 | 0.64 | ||
| None | 23.79 | 14.35 | 3.04 | 12.91 | ||
| Simplified | 3.31 | 2.18 | 0.63 | 1.97 | ||
| Detailed | 0.82 | 0.75 | 0.29 | 0.64 | ||
| ECQA | None | 8.00 | 7.01 | 5.01 | 6.53 | |
| Simplified | 1.11 | 0.70 | 0.86 | 0.85 | ||
| Detailed | 0.30 | 0.37 | 0.26 | 0.35 | ||
| None | 11.51 | 11.07 | 13.32 | 11.11 | ||
| Simplified | 1.34 | 1.24 | 1.01 | 1.13 | ||
| Detailed | 0.26 | 0.29 | 0.54 | 0.34 | ||
| None | 59.33 | 24.83 | 7.46 | 27.40 | ||
| Simplified | 8.53 | 3.55 | 1.66 | 4.01 | ||
| Detailed | 1.56 | 0.74 | 0.48 | 0.82 | ||
| None | 39.20 | 19.50 | 5.12 | 19.38 | ||
| Simplified | 5.56 | 3.33 | 1.41 | 3.22 | ||
| Detailed | 1.00 | 0.97 | 0.52 | 0.85 | ||