Helical Cell Architecture¶

A compact recurrent update that combines a triangle-of-means kernel with a phase rotation applied to channel pairs. This spec matches the training code in helical_poc.py.

1. Interfaces¶

Inputs - \(H_{t-1}\) — previous hidden state \([B, D]\) - \(E_t\) — embedded token at step \(t\) \([B, D_{\text{in}}]\)

Outputs - \(H_t\) — next hidden state \([B, D]\) - Classifier head: \(\text{logits}_t = W_o H_t\)

Constraint:
hidden_dim (D) must be even (pairs for 2×2 rotations).

2. Notation¶

Projection: \(X_t = W_x H_{t-1}\)
Reception: \(Y_t = W_y E_t\)

Triangle channels:

\[ b_t = \tfrac{1}{2}(Y_t - X_t) \]

\[ a_t = \exp\!\Big(\tfrac{1}{2}\big[\ln(|X_t|+\epsilon) + \ln(|Y_t|+\epsilon)\big]\Big) \]

\[ c_t = \tfrac{1}{2}(Y_t + X_t) \]

Concatenate and fuse:

\[ z_t = \mathrm{GELU}\!\big(W_{\text{mix}}[\,b_t;\,a_t;\,c_t]\big) \]

3. Phase rotation on pairs¶

For each adjacent channel pair \((h_{2k}, h_{2k+1})\), apply:

\[ \begin{bmatrix} h'_{2k} \\ h'_{2k+1} \end{bmatrix} = \begin{bmatrix} \cos \Delta\phi_t & -\sin \Delta\phi_t \\ \sin \Delta\phi_t & \cos \Delta\phi_t \end{bmatrix} \begin{bmatrix} h_{2k} \\ h_{2k+1} \end{bmatrix} \]

where \(\Delta\phi_t \in \{5,7,11,13\}\cdot \tfrac{2\pi}{24}\) (quasi-prime wheel).

4. State update¶

Final update:

\[ H_t = \mathrm{GELU}\!\Big(\mathrm{LayerNorm}(z_t + \alpha \cdot H^{\mathrm{rot}}_{t-1})\Big), \quad \alpha \approx 0.1 \]

Classifier:

\[ \text{logits}_t = W_o H_t \]

5. Optional Coherence Regularizer¶

Encourages directional similarity between successive hidden states:

\[ \mathcal{L}_\text{coh} = \lambda \cdot \big(1 - \cos \angle(H_{t-1}, H_t)\big) \]

Enabled by default, disabled with --no_coh.

6. Pseudocode¶

```python

Shapes: H_prev [B, D], E_t [B, D_in]; D even¶

X = W_x(H_prev) Y = W_y(E_t)

b = 0.5 * (Y - X) a = torch.exp(0.5 * (torch.log(torch.abs(X)+eps) + torch.log(torch.abs(Y)+eps))) c = 0.5 * (Y + X)

z = GELU(W_mix(torch.cat([b, a, c], dim=-1)))

dphi = wheel[t % len(wheel)] * (2*math.pi/24) Hrot = rotate_pairs(H_prev, dphi)

H_t = GELU(LayerNorm(z + 0.1 * Hrot))

graph TD; A[Input token] -->|Embed| B[Embedding E_t] B --> C[Linear W_y] Hprev[H_prev] --> D[Linear W_x] D --> E[Triangle channels b,a,c] C --> E E --> F[W_mix + GELU] Hprev --> G[Rotate pairs R delta_phi_t] F --> H[LayerNorm + GELU + alpha·H_rot] G --> H H --> I[Linear head W_o] I --> J[logits] Hprev -- optional --> K((coherence loss)) H -- compare --> K

helical_cell: dim: 128 phase_wheel: [5, 7, 11, 13] # × (2π/24) residual_alpha: 0.1 norm: layernorm activation: gelu coherence_lambda: 0.05 # set 0 or use --no_coh eps: 1e-6