← sidechannels

Attention Mechanisms as Computational Circuits

Alice V. Researcher Interpretability Unit
· ESSAY

How do transformers perform computation? While the architecture is known, the mechanistic internal logic remains opaque. This essay explores the hypothesis that individual attention heads act as discrete gates in a larger computational circuit.

The Circuit Hypothesis

We treat the transformer’s residual stream as a communication bus. Each attention head reads from this bus, performs a transformation, and writes the result back.1

“Individual attention heads are not monolithic feature detectors, but rather modular gates that selectively route information across the residual stream.”

An induction head is a specific pattern where a head attends to a previous token that was followed by the current token.2

Mathematical Formalism

The attention operation can be written as a bilinear form over the embedding space:

A=softmax(QKTdk)VA = \text{softmax}\left(\frac{Q K^T}{\sqrt{d_k}}\right) V

By decomposing the WQWKTW_Q W_K^T matrix into its singular value decomposition (SVD), we can identify “feature directions” that the model is specifically looking for.

The Emergence of Logic

By analyzing the eigenvalues of these products across different layers, we can observe the “emergence” of specific grammatical logic during the training process. Induction heads typically emerge suddenly around 50% through training.

References

[1]
N. Elhage, N. Nanda, C. Olsson, et al.. "A Mathematical Framework for Transformer Circuits". 2021. [PDF]
[2]
Neel Nanda, et al.. "Progress measures for grokking via mechanistic interpretability". 2022. [PDF]

How to Cite

@article{researcher2024attention, title = {Attention Mechanisms as Computational Circuits}, author = {Alice V. Researcher}, year = {2024}, url = {https://sidechannels.pub/posts/attention-as-circuit/} }