notesum.ai

Published at November 6

Towards Interpreting Language Models: A Case Study in Multi-Hop Reasoning

cs.CL

cs.AI

cs.LG

Released Date: November 6, 2024

Authors: Mansi Sakarvadia¹

Aff.: ¹The University of Chicago - Department of Computer Science

Arxiv: http://arxiv.org/abs/2411.05037v1

Refer to caption

				Curated	Random
Model	Data	$\ell$	$\tau$	Subject	Adj.	Adv.	Conj.	Noun	Verb	Top- $5050$
GPT2 Small	Hand	7	3	45%	-7.6%	-6.0%	-6.3%	-6.5%	-7.5%	-6.0%
GPT2 Small	2wmh	6	5	424%	-17.1%	-15.1%	-10.3%	-1.1%	-1.2%	1.6%
GPT2 Large	Hand	14	10	68%	-8.1%	-4.4%	-4.9%	-9.8%	-6.0%	-4.7%
GPT2 Large	2wmh	8	9	204%	13.0%	11.6%	3.5%	11.8%	4.3%	17.6%