Memory Efficiency
Trinity achieves up to 20x memory savings compared to float32 representations through a combination of packed ternary encoding, lazy conversion strategies, and sparse vector formats. This page explains each memory optimization technique and when to use it.
Ternary Information Density​
Each ternary value (trit) can be one of three states: {-1, 0, +1}. This carries log2(3) = 1.58 bits of information. In contrast, a float32 value uses 32 bits, and even a single byte (int8) uses 8 bits. The theoretical minimum storage for a trit is 1.58 bits, and Trinity's packed format approaches this limit.
HybridBigInt: Dual Representation​
The HybridBigInt type (defined in the core library) provides a hybrid storage strategy with two internal representations:
- Packed format: Trits are stored at approximately 1.58 bits per trit using a custom encoding scheme. This is the memory-efficient representation used for storage and transmission.
- Unpacked format: Each trit occupies a full integer slot in a fixed-size array (
[MAX_TRITS]Trit). This is the compute-friendly representation used during arithmetic operations.
Conversion between formats is lazy -- the system only unpacks when an operation requires element-level access, and only packs when storage efficiency is needed. This avoids redundant conversions in operation chains. The ensureUnpacked() method is called before JIT-compiled operations to guarantee direct memory access to the trit array.
Packed Trit Encoding​
At the lowest level, Trinity encodes trits using 2 bits per trit in packed byte arrays. The encoding maps:
| Trit Value | 2-bit Encoding |
|---|---|
| -1 | 0b10 |
| 0 | 0b00 |
| +1 | 0b01 |
Four trits fit in a single byte. For a 10,000-dimensional vector:
| Format | Size | Calculation |
|---|---|---|
| float32 | 40,000 bytes (40 KB) | 10,000 x 4 bytes |
| int8 | 10,000 bytes (10 KB) | 10,000 x 1 byte |
| Packed 2-bit | 2,500 bytes (2.5 KB) | 10,000 x 2 bits / 8 |
| Theoretical (1.58-bit) | 1,981 bytes (~2 KB) | 10,000 x 1.58 bits / 8 |
The packed 2-bit format achieves a 16x reduction compared to float32. With the higher-density 1.58 bits/trit packing used by HybridBigInt, the compression approaches 20x.
Sparse Vector Representation​
For vectors where a large proportion of trits are zero (sparsity > 50%), Trinity provides a SparseVector type that uses the Coordinate List (COO) format. Instead of storing every element, it stores only the indices and values of non-zero elements:
SparseVector {
indices: [u32] -- sorted positions of non-zero trits
values: [Trit] -- trit values at those positions (-1 or +1)
dimension: u32 -- total vector length
}
Memory usage scales with the number of non-zero elements (nnz) rather than the total dimension:
| Sparsity | 10,000-dim Dense (packed) | 10,000-dim Sparse (COO) | Savings |
|---|---|---|---|
| 50% zeros | 2,500 bytes | ~25,000 bytes | None (sparse is worse) |
| 90% zeros | 2,500 bytes | ~5,000 bytes | None (sparse is worse) |
| 99% zeros | 2,500 bytes | ~500 bytes | 5x |
| 99.9% zeros | 2,500 bytes | ~50 bytes | 50x |
The sparse format becomes advantageous at very high sparsity levels (above ~95% zeros), which occurs in certain VSA encoding patterns and after thresholding operations. The SparseVector provides a sparsity() method to measure the zero ratio and a memorySavings() method to compare against the equivalent dense representation.
Choosing the Right Format​
| Use Case | Recommended Format | Reason |
|---|---|---|
| General VSA operations | HybridBigInt (packed) | Good balance of memory and speed |
| JIT-compiled hot paths | HybridBigInt (unpacked) | Direct memory access for native code |
| Storage and serialization | Packed trit arrays | Minimum size for dense vectors |
| Very sparse data (>95% zeros) | SparseVector (COO) | Memory proportional to non-zero count |
| BitNet model weights | Packed ternary | 20x compression vs float32 |
Impact on Inference​
For BitNet b1.58 language models, the memory savings from ternary weights are substantial. A 7B parameter model in float32 requires approximately 28 GB of memory for weights alone. With ternary packing at 1.58 bits per weight, the same model fits in roughly 1.4 GB -- small enough to run on a single consumer GPU or even in system RAM on a laptop.