Vector Rotation Visualization with Claude: https://claude.ai/public/artifacts/31c01f6d-0b4f-43c6-8113-4c3187e6e94c
Notations:
The self-attention first incorporates position information to the word embeddings and transforms them into queries, keys, and value representations.
$q_m = f_q(x_m, m) = W_qx_m$ $k_n = f_k(x_n, n) = W_kx_n$ $v_n = f_v(x_n, n) = W_vx_n$
where $q_m$, $k_n$ and $v_n$ incorporate the $m^{th}$ and $n^{th}$ positions through $f_q$, $f_k$ and $f_v$, respectively.
Dimensions
$q^⊺_mk_n$ typically enables knowledge conveyance between tokens at different positions.
In order to incorporate relative position information, we require the inner product of query $q_m$ and key $k_n$ to be formulated by a function $g$, which takes only the word embeddings $x_m, x_n$, and their relative position $m − n$ as input variables.
In other words, we hope that the inner product encodes position information only in the relative form:
$⟨f_q(x_m, m), f_k(x_n, n)⟩ = g(x_m, x_n, m − n).$
A 2D case: We write $f_{\{q,k\}}$ in a multiplication matrix:
$f_{\{q,k\}}(x_m, m) = \begin{pmatrix} \cos m\theta & -\sin m\theta \\ \sin m\theta & \cos m\theta \end{pmatrix} \begin{pmatrix} W_{\{q,k\}}^{(11)} & W_{\{q,k\}}^{(12)} \\ W_{\{q,k\}}^{(21)} & W_{\{q,k\}}^{(22)} \end{pmatrix} \begin{pmatrix} x_m^{(1)} \\ x_m^{(2)} \end{pmatrix}$
General form
MatMul
$f_{\{q,k\}}(x_m, m) = R_{\Theta,m}^d W_{\{q,k\}} x_m$
where $R_{\Theta,m}^d$ denotes the rotation matrix for $m-th$ token with dimension $d$
$R_{\Theta,m}^d = \begin{pmatrix} \cos m\theta_1 & -\sin m\theta_1 & 0 & 0 & \cdots & 0 & 0 \\ \sin m\theta_1 & \cos m\theta_1 & 0 & 0 & \cdots & 0 & 0 \\ 0 & 0 & \cos m\theta_2 & -\sin m\theta_2 & \cdots & 0 & 0 \\ 0 & 0 & \sin m\theta_2 & \cos m\theta_2 & \cdots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & 0 & \cdots & \cos m\theta_{d/2} & -\sin m\theta_{d/2} \\ 0 & 0 & 0 & 0 & \cdots & \sin m\theta_{d/2} & \cos m\theta_{d/2} \end{pmatrix}$
$\Theta = \{\theta_i = 10000^{-2(i-1)/d}, i \in [1, 2, ..., d/2]\}.$
Notes
Visualization