notesum.ai

Published at December 10

Causal World Representation in the GPT Model

cs.AI

cs.CL

cs.LG

stat.ML

Released Date: December 10, 2024

Authors: Raanan Y. Rohekar¹, Yaniv Gurwicz¹, Sungduk Yu¹, Vasudev Lal¹

Aff.: ¹Intel Labs

Arxiv: http://arxiv.org/pdf/2412.07446v1

Refer to caption

Symbol	Description
$\boldsymbol{Z}_{i}$	output embedding of input symbol $i$ , $\boldsymbol{Z}_{i}\equiv\mathbf{Z}(i,{\mkern 2.0mu\cdot\mkern 2.0mu})$ , in attention layer
$\boldsymbol{V}_{i}$	value vector corresponding to input $i$ , $\boldsymbol{V}_{i}\equiv\mathbf{V}(i,{\mkern 2.0mu\cdot\mkern 2.0mu})$ , in attention layer
$\mathbf{A}$	attention matrix
$\mathcal{T}$	Transformer neural network
$\mathbf{W}_{V},\mathbf{W}_{QK}$	learnable weight matrices in GPT
$X_{i}$	a random variable representing node $i$ in an SCM
$U_{i}$	latent exogenous random variable $i$ in an SCM
$\mathbf{G}$	weighted adjacency matrix of an SCM
$\mathcal{G}$	causal graph (unweighted, directed-graph structure)