Skip to main content

Memory Efficiency

Trinity achieves up to 20x memory savings compared to float32 representations through a combination of packed ternary encoding, lazy conversion strategies, and sparse vector formats. This page explains each memory optimization technique and when to use it.

Ternary Information Density​

Each ternary value (trit) can be one of three states: {-1, 0, +1}. This carries log2(3) = 1.58 bits of information. In contrast, a float32 value uses 32 bits, and even a single byte (int8) uses 8 bits. The theoretical minimum storage for a trit is 1.58 bits, and Trinity's packed format approaches this limit.

HybridBigInt: Dual Representation​

The HybridBigInt type (defined in the core library) provides a hybrid storage strategy with two internal representations:

  • Packed format: Trits are stored at approximately 1.58 bits per trit using a custom encoding scheme. This is the memory-efficient representation used for storage and transmission.
  • Unpacked format: Each trit occupies a full integer slot in a fixed-size array ([MAX_TRITS]Trit). This is the compute-friendly representation used during arithmetic operations.

Conversion between formats is lazy -- the system only unpacks when an operation requires element-level access, and only packs when storage efficiency is needed. This avoids redundant conversions in operation chains. The ensureUnpacked() method is called before JIT-compiled operations to guarantee direct memory access to the trit array.

Packed Trit Encoding​

At the lowest level, Trinity encodes trits using 2 bits per trit in packed byte arrays. The encoding maps:

Trit Value2-bit Encoding
-10b10
00b00
+10b01

Four trits fit in a single byte. For a 10,000-dimensional vector:

FormatSizeCalculation
float3240,000 bytes (40 KB)10,000 x 4 bytes
int810,000 bytes (10 KB)10,000 x 1 byte
Packed 2-bit2,500 bytes (2.5 KB)10,000 x 2 bits / 8
Theoretical (1.58-bit)1,981 bytes (~2 KB)10,000 x 1.58 bits / 8

The packed 2-bit format achieves a 16x reduction compared to float32. With the higher-density 1.58 bits/trit packing used by HybridBigInt, the compression approaches 20x.

Sparse Vector Representation​

For vectors where a large proportion of trits are zero (sparsity > 50%), Trinity provides a SparseVector type that uses the Coordinate List (COO) format. Instead of storing every element, it stores only the indices and values of non-zero elements:

SparseVector {
indices: [u32] -- sorted positions of non-zero trits
values: [Trit] -- trit values at those positions (-1 or +1)
dimension: u32 -- total vector length
}

Memory usage scales with the number of non-zero elements (nnz) rather than the total dimension:

Sparsity10,000-dim Dense (packed)10,000-dim Sparse (COO)Savings
50% zeros2,500 bytes~25,000 bytesNone (sparse is worse)
90% zeros2,500 bytes~5,000 bytesNone (sparse is worse)
99% zeros2,500 bytes~500 bytes5x
99.9% zeros2,500 bytes~50 bytes50x

The sparse format becomes advantageous at very high sparsity levels (above ~95% zeros), which occurs in certain VSA encoding patterns and after thresholding operations. The SparseVector provides a sparsity() method to measure the zero ratio and a memorySavings() method to compare against the equivalent dense representation.

Choosing the Right Format​

Use CaseRecommended FormatReason
General VSA operationsHybridBigInt (packed)Good balance of memory and speed
JIT-compiled hot pathsHybridBigInt (unpacked)Direct memory access for native code
Storage and serializationPacked trit arraysMinimum size for dense vectors
Very sparse data (>95% zeros)SparseVector (COO)Memory proportional to non-zero count
BitNet model weightsPacked ternary20x compression vs float32

Impact on Inference​

For BitNet b1.58 language models, the memory savings from ternary weights are substantial. A 7B parameter model in float32 requires approximately 28 GB of memory for weights alone. With ternary packing at 1.58 bits per weight, the same model fits in roughly 1.4 GB -- small enough to run on a single consumer GPU or even in system RAM on a laptop.