notesum.ai
Published at November 11The Surprising Effectiveness of Test-Time Training for Abstract Reasoning
cs.AI
cs.CL
cs.LG
Released Date: November 11, 2024
Authors: Ekin Akyürek, Mehul Damani, Linlu Qiu, Han Guo, Yoon Kim, Jacob Andreas

| Program Synthesizer | Fine-tuned LM | TTT Method | Score (pass@2) |
| X | Ours | X | % |
| X | Ours | Ours | % |
| X | BARC | Ours | % |
| BARC | Ours | Ours | % |
| BARC | BARC | Ours | % |
| Avg. Human | % | ||
| Best Human | % | ||
| BARC (ensemble) | % | ||
| BARC (no synthesizer) | % | ||
| Claude - Few-shot prompting | % | ||
| GPT-4.0 - Few-shot prompting | % | ||