Helical Cell Architecture¶
A compact recurrent update that combines a triangle-of-means kernel with a phase rotation applied to channel pairs. This spec matches the training code in helical_poc.py
.
1. Interfaces¶
Inputs - \(H_{t-1}\) — previous hidden state \([B, D]\) - \(E_t\) — embedded token at step \(t\) \([B, D_{\text{in}}]\)
Outputs - \(H_t\) — next hidden state \([B, D]\) - Classifier head: \(\text{logits}_t = W_o H_t\)
Constraint:
hidden_dim (D)
must be even (pairs for 2×2 rotations).
2. Notation¶
- Projection: \(X_t = W_x H_{t-1}\)
- Reception: \(Y_t = W_y E_t\)
Triangle channels:
Concatenate and fuse:
3. Phase rotation on pairs¶
For each adjacent channel pair \((h_{2k}, h_{2k+1})\), apply:
where \(\Delta\phi_t \in \{5,7,11,13\}\cdot \tfrac{2\pi}{24}\) (quasi-prime wheel).
4. State update¶
Final update:
Classifier:
5. Optional Coherence Regularizer¶
Encourages directional similarity between successive hidden states:
Enabled by default, disabled with --no_coh
.
6. Pseudocode¶
```python
Shapes: H_prev [B, D], E_t [B, D_in]; D even¶
X = W_x(H_prev) Y = W_y(E_t)
b = 0.5 * (Y - X) a = torch.exp(0.5 * (torch.log(torch.abs(X)+eps) + torch.log(torch.abs(Y)+eps))) c = 0.5 * (Y + X)
z = GELU(W_mix(torch.cat([b, a, c], dim=-1)))
dphi = wheel[t % len(wheel)] * (2*math.pi/24) Hrot = rotate_pairs(H_prev, dphi)
H_t = GELU(LayerNorm(z + 0.1 * Hrot))
graph TD; A[Input token] -->|Embed| B[Embedding E_t] B --> C[Linear W_y] Hprev[H_prev] --> D[Linear W_x] D --> E[Triangle channels b,a,c] C --> E E --> F[W_mix + GELU] Hprev --> G[Rotate pairs R delta_phi_t] F --> H[LayerNorm + GELU + alpha·H_rot] G --> H H --> I[Linear head W_o] I --> J[logits] Hprev -- optional --> K((coherence loss)) H -- compare --> K
helical_cell: dim: 128 phase_wheel: [5, 7, 11, 13] # × (2π/24) residual_alpha: 0.1 norm: layernorm activation: gelu coherence_lambda: 0.05 # set 0 or use --no_coh eps: 1e-6