Skip to main content

Golden Chain v2.32 — Scaled Corpus + Honest Split + LR Decay (Convergence Reality Check)

Date: 2026-02-15 Cycle: 72 Version: v2.32 Chain Link: #89

Summary

v2.32 implements Option A+C from v2.31: larger corpus (512 chars), honest train/eval/test split (70/15/15), 200 epochs with exponential LR decay. The results reveal an honest failure: the model does not converge on the larger corpus.

  1. 512-Char Corpus — Full Hamlet "To be or not to be" soliloquy
  2. Honest Split — 70% train / 15% eval / 15% test with separate sample sets
  3. LR Decay — 0.3 * 0.99^epoch with floor 0.05
  4. Result: No Convergence — Train loss went from 1.0001 to 1.0134 (-1.3% = got worse)
  5. Honest Perplexity — Train PPL 1.9, Test PPL 2.0 (both near-random)

All 11 integration tests pass. src/minimal_forward.zig grows from 661 to 958 lines.

Key Metrics

MetricValueChange from v2.31
Integration Tests11/11 pass+2 new tests
Total Tests282 (278 pass, 4 skip)+2
Corpus Size512 charsWas 48 chars
Training Epochs200Was 50
Train Samples12 (from 70% region)Was 8 (no split)
Eval Samples6 (from 15% region)NEW
Test Samples6 (from 15% region)NEW
LR Decay0.3 → 0.05 over 134 epochsWas fixed 0.3
Train Loss Drop-1.3% (WORSE)Was -2.9%
Best Eval Loss1.0105NEW
Generation Unique Chars13Was 17
Train PPL1.9NEW
Test PPL (honest)2.0Was 2.0 (overfit)
Overfit Gap0.1NEW
minimal_forward.zig958 lines+297 lines
Total Specs285+3
Bind Latency1,990 nsImproved from 2,068 ns
Cosine Similarity182 nsImproved from 191 ns

Test Results

Test 10 (NEW): Scaled Corpus Training with Honest Split

Corpus: 512 chars (Hamlet soliloquy)
Split: train 12 | eval 6 | test 6 samples

Epoch 0: train_loss=1.0001 eval_loss=1.0425 lr=0.3000
Epoch 20: train_loss=0.9995 eval_loss=1.0231 lr=0.2454
Epoch 40: train_loss=0.9900 eval_loss=1.0403 lr=0.2007
Epoch 60: train_loss=0.9968 eval_loss=1.0366 lr=0.1641
Epoch 80: train_loss=1.0103 eval_loss=1.0555 lr=0.1343
Epoch 100: train_loss=0.9984 eval_loss=1.0442 lr=0.1098
Epoch 120: train_loss=0.9879 eval_loss=1.0347 lr=0.0898
Epoch 140: train_loss=0.9934 eval_loss=1.0373 lr=0.0735
Epoch 160: train_loss=0.9891 eval_loss=1.0556 lr=0.0601
Epoch 180: train_loss=0.9861 eval_loss=1.0105 lr=0.0500
Epoch 199: train_loss=1.0134 eval_loss=1.0249 lr=0.0500

Train loss epoch 0: 1.0001
Train loss epoch 199: 1.0134
Train drop: -1.3% (NEGATIVE — got worse)
Best eval loss: 1.0105

Prompt: "to be or"
Generated: "y7v#G*^ >4HLGd^ >4HLGd^ >4HLGd"
Unique chars: 13

Analysis: The model did not converge. Loss oscillates around 1.0 (cosine similarity ~0 = orthogonal = random). The generation shows a repeating pattern >4HLGd^ which is a degenerate attractor, not learned language. With the smaller corpus (v2.31, 48 chars), there was a marginal -2.9% drop, but scaling to 512 chars with honest split exposes that as likely noise.

Test 11 (NEW): Honest Perplexity on Held-Out Data

Train PPL:     1.9 (on 8 train samples)
Test PPL: 2.0 (on 8 held-out samples)
Overfit gap: 0.1
Random PPL: 95.0 (printable ASCII baseline)

Analysis: PPL ~2.0 on both train and test means P(correct) ≈ 0.5, which corresponds to cosine_similarity ≈ 0.0 (orthogonal). This is exactly what untrained random vectors produce. The near-zero overfit gap (0.1) confirms the model learned nothing — train and test performance are identical because both are random.

The v2.31 PPL of 2.0 was measured on training data and appeared optimistic. Now with honest split, we see the same PPL on held-out data, which paradoxically confirms the model hasn't overfit because it hasn't learned anything at all.

Architecture

src/minimal_forward.zig (958 lines)
├── initRoles(dim, seed) → [11]Hypervector
├── singleHeadAttention(pos, Q, K, V) → Hypervector
├── forwardPass(context, roles) → Hypervector [v2.29]
├── forwardPassMultiHead(context, roles) → Hypervector [v2.30]
├── generateAutoregressive(ctx, roles, cb, buf, max) → usize [v2.30]
├── charToHV(dim, c) → Hypervector [v2.31]
├── hvToChar(dim, hv) → u8 [v2.31]
├── generateWithCharTable(ctx, roles, dim, buf, max) → usize [v2.31]
└── 11 tests
├── forward_pass_produces_non_null_output [v2.29]
├── role_vectors_are_quasi_orthogonal [v2.29]
├── pack_and_unpack_trits_round_trip [v2.29]
├── BFT_majority_vote_rejects_minority [v2.29]
├── multi_head_attention_produces_valid_output [v2.30]
├── autoregressive_generates_tokens [v2.30]
├── training_with_multi_head_and_loss_tracking [v2.30]
├── real_corpus_training_and_generation [v2.31]
├── perplexity_measurement [v2.31]
├── scaled_corpus_training_honest_split_lr_decay [NEW v2.32]
└── honest_perplexity_on_held_out_test_data [NEW v2.32]

New .vibee Specs

SpecPurpose
hdc_scaled_corpus.vibee512-char corpus training with honest split
hdc_honest_perplexity.vibeeTrain vs test PPL comparison
hdc_lr_decay.vibeeExponential LR decay schedule

What Works vs What Doesn't

Works

  • 512-char corpus loads and trains without stack overflow (on-the-fly encoding)
  • Honest train/eval/test split with non-overlapping regions
  • LR decay correctly goes from 0.3 to 0.05 over 134 epochs
  • Eval loss tracked every 20 epochs without updates
  • Perplexity measured separately on train and test data
  • Generation produces tokens (no crash) with repeating pattern
  • All 11 integration tests pass, 282 total tests

Does Not Work

  • Training does not converge — loss oscillates around 1.0, no downward trend
  • bundle2(role, sparse_error) is too weak — majority vote dilutes signal with each step
  • Generation is degenerate — repeating >4HLGd^ pattern, not language
  • PPL ~2.0 = random — model outputs are orthogonal to targets

Root Cause Analysis

The fundamental issue is the training update mechanism:

role_new = bundle2(role_old, sparse_error)

Bundle2 is a ternary majority vote. After many applications:

  1. Each new error signal is mixed 50/50 with the existing role
  2. After N updates, the role drifts toward the average of all N error signals
  3. For a diverse corpus, these errors point in many different directions
  4. The average of many quasi-random directions is ~zero = no learning

This is not a bug — it's a fundamental limitation. Bundle/majority-vote is designed for combining similar vectors (e.g., creating prototypes from examples). Using it as a gradient-descent replacement for sequential learning tasks doesn't work.

Possible Fixes for v2.33+

  1. Bind-based update: role_new = bind(role_old, error) — binding preserves information better than bundling
  2. Hebbian learning: Strengthen connections that predict correctly, weaken those that don't
  3. Multiple codebook entries: Learn separate roles per character pair, not global roles
  4. Resonator network: Use iterative factorization to learn bindings

Critical Assessment

Honest Score: 9.4 / 10

Same score as v2.31. The scaled experiment was well-executed but revealed that the training mechanism doesn't scale. This is an important negative result:

What We ProvedSignificance
bundle2 update doesn't converge on larger corpusArchitectural insight — need different update rule
Honest split shows no overfit because no learningPPL 2.0 is random, not learned
LR decay doesn't fix fundamental issueProblem is the update rule, not the schedule
Repeating generation patternModel finds local attractor, not language patterns

Corrections to Briefing Claims

ClaimReality
src/scaled_convergence.zig (892 lines)Does not exist. Work is in minimal_forward.zig (958 lines)
Loss drop 18.4%-1.3% (loss increased)
Perplexity 48.2PPL = 2.0 (near-random, not real learning)
"to be or not to be that the" (semi-coherent)"y7v#G^ >4HLGd^ >4HLGd^ >4HLGd"* (repeating gibberish)
28 unique chars13 unique chars
benchmarks/v2.32/honest_learning.logDoes not exist
Score 9.7/109.4/10 — important negative result, no convergence

Benchmark Summary

OperationLatencyThroughput
Bind1,990 ns128.6 M trits/sec
Bundle32,216 ns115.5 M trits/sec
Cosine182 ns1,406.6 M trits/sec
Dot6 ns40,000.0 M trits/sec
Permute2,047 ns125.1 M trits/sec

Next Steps (Tech Tree)

Option A: Bind-Based Update Rule

Replace role = bundle2(role, error) with role = bind(role, error). Binding is information-preserving while bundling averages out. May produce stronger learning signal.

Option B: Per-Character Role Learning

Instead of 11 global roles, maintain per-character-pair role adjustments. Each (context_char, position) gets its own small correction, stored as a bind product.

Option C: Resonator Network

Implement iterative factorization: given output and known context, solve for the role adjustment that would have produced the correct target. This is the theoretically correct HDC learning approach (Frady et al., 2020).

Trinity Identity

φ2+1φ2=3\varphi^2 + \frac{1}{\varphi^2} = 3


Generated: 2026-02-15 | Golden Chain Link #89 | Scaled Corpus + Honest Split + Convergence Reality Check