Sparse Vector API
When most elements in your vector are zero, storing all of them wastes memory. SparseVector stores only the non-zero elements with their positions. For a 10,000-element vector with 90% zeros, this saves 10x memory and makes operations 10x faster.
Trinity uses ternary vectors ({-1, 0, +1}). Many operations -- masking, gating, thresholding -- produce vectors dominated by zeros. SparseVector exploits this by keeping two sorted arrays: indices (where non-zero elements live) and values (what those elements are). All lookups use binary search. All VSA operations use merge-join algorithms that skip zeros entirely.
Source: src/sparse.zig
When to Use Sparse vs Dense​
Use SparseVector when more than half your elements are zero. Below 50% zeros, the index overhead negates the memory savings.
| Density | Memory (10K dim) | bind speed | Recommendation |
|---|---|---|---|
| 5% non-zero | Sparse: ~2.5KB vs Dense: ~10KB | Sparse ~20x faster | Use Sparse |
| 33% non-zero | Sparse: ~16KB vs Dense: ~10KB | Similar | Use Dense |
| 66% non-zero | Sparse: ~33KB vs Dense: ~10KB | Dense ~2x faster | Use Dense |
JIT operations require dense vectors. Convert with toDense() before passing to JitVSAEngine. See the JIT API for details.
Density After Operations​
Different VSA operations change the density of your vectors. Plan accordingly:
| Operation | Output density | Explanation |
|---|---|---|
| bind | density_a x density_b | Only positions where both inputs are non-zero survive. Output is always sparser. |
| bundle | density_a + density_b - overlap | Positions from either input survive. Output is usually denser. |
| permute | Same as input | Elements just move to new positions. No zeros are created or removed. |
For example, binding two 10%-dense vectors produces roughly 1%-dense output (0.1 x 0.1 = 0.01). Bundling two 10%-dense vectors produces roughly 19%-dense output (0.1 + 0.1 - 0.01 = 0.19).
SparseVector​
Construction​
init(allocator: Allocator, dimension: u32) SparseVector​
Creates an empty sparse vector with the given total dimension. No memory is allocated until elements are added.
const sparse = @import("sparse");
var vec = sparse.SparseVector.init(allocator, 10000);
defer vec.deinit();
deinit(self: *SparseVector) void​
Frees the internal index and value arrays.
random(allocator: Allocator, dimension: u32, density: f64, seed: u64) !SparseVector​
Creates a random sparse vector with the specified density (fraction of non-zero elements). Non-zero values are uniformly chosen as +1 or -1.
// 10% density: ~1000 non-zero elements in a 10000-dim vector
var vec = try sparse.SparseVector.random(allocator, 10000, 0.10, 42);
defer vec.deinit();
// vec.nnz() = ~1000
fromDense(allocator: Allocator, dense: *HybridBigInt) !SparseVector​
Converts a dense HybridBigInt to sparse representation. Iterates over all elements and stores only non-zero trits.
var dense_vec = vsa.randomVector(1000, 42);
var sparse_vec = try sparse.SparseVector.fromDense(allocator, &dense_vec);
defer sparse_vec.deinit();
// sparse_vec.nnz() = ~667 (random ternary vectors are ~66% non-zero)
toDense(self: *const SparseVector) HybridBigInt​
Converts back to a dense HybridBigInt. Initializes all positions to zero, then sets non-zero values from the sparse representation. Returns a stack-allocated HybridBigInt.
const dense = sparse_vec.toDense();
// dense is a full HybridBigInt with all dimensions filled in
clone(self: *const SparseVector) !SparseVector​
Creates an independent copy of the sparse vector.
var copy = try vec.clone();
defer copy.deinit();
Element Access​
set(self: *SparseVector, index: u32, value: Trit) !void​
Sets the value at the given index. Uses binary search to find the insertion point.
- If the value is non-zero and the index is new, inserts in sorted order.
- If the value is non-zero and the index exists, updates in place.
- If the value is zero and the index exists, removes the element (maintains sparsity).
- If the index is out of bounds (at or above
dimension), the operation is silently ignored.
try vec.set(42, 1); // Set position 42 to +1
try vec.set(100, -1); // Set position 100 to -1
try vec.set(42, 0); // Remove position 42 (now zero)
get(self: *const SparseVector, index: u32) Trit​
Returns the trit value at the given index. Uses binary search. Returns 0 for positions not in the sparse representation or for out-of-bounds indices.
const val = vec.get(42); // Returns -1, 0, or +1
Properties​
nnz(self: *const SparseVector) usize​
Returns the number of non-zero elements.
const count = vec.nnz();
// count = 1000 (for a 10%-dense, 10000-dim vector)
sparsity(self: *const SparseVector) f64​
Returns the sparsity ratio: 1.0 - nnz/dimension. A value of 0.0 means all elements are non-zero; 1.0 means all elements are zero.
const s = vec.sparsity();
// s = 0.90 (90% of elements are zero)
memoryBytes(self: *const SparseVector) usize​
Returns current memory usage in bytes, including struct overhead and the index/value arrays.
memorySavings(self: *const SparseVector) f64​
Returns the memory savings ratio compared to a dense representation: 1.0 - sparse_bytes/dense_bytes. A value of 0.8 means the sparse representation uses 80% less memory.
const savings = vec.memorySavings();
// savings = 0.80 (80% less memory than dense)
VSA Operations​
All VSA operations allocate and return a new SparseVector. The caller owns the result and must call deinit when done.
bind(allocator: Allocator, a: *const SparseVector, b: *const SparseVector) !SparseVector​
Element-wise ternary multiplication using a merge-join on sorted indices. The result contains elements only at positions where both inputs have non-zero values. The result is always at least as sparse as the sparser input.
var bound = try sparse.SparseVector.bind(allocator, &vec_a, &vec_b);
defer bound.deinit();
// bound.nnz() = ~100 (for two 10%-dense inputs: 0.1 * 0.1 * 10000)
unbind(allocator: Allocator, a: *const SparseVector, b: *const SparseVector) !SparseVector​
Identical to bind for balanced ternary. Multiplying by the inverse is the same as multiplying by the value itself when values come from {-1, 0, +1}.
bundle(allocator: Allocator, a: *const SparseVector, b: *const SparseVector) !SparseVector​
Element-wise sum with ternary threshold. Unlike bind, the result may be denser than the inputs because positions where only one input is non-zero survive. At positions where both inputs are non-zero, the sum is thresholded: positive becomes +1, negative becomes -1, zero is omitted.
var bundled = try sparse.SparseVector.bundle(allocator, &vec_a, &vec_b);
defer bundled.deinit();
// bundled.nnz() = ~1900 (denser than either input)
permute(allocator: Allocator, v: *const SparseVector, k: u32) !SparseVector​
Cyclic shift of all indices by k positions: new_index = (old_index + k) % dimension. Values are unchanged. The result is re-sorted by index after shifting.
var shifted = try sparse.SparseVector.permute(allocator, &vec, 5);
defer shifted.deinit();
// shifted.nnz() == vec.nnz() (same density, different positions)
Similarity​
Similarity functions are static methods that do not allocate memory.
dot(a: *const SparseVector, b: *const SparseVector) i64​
Sparse dot product using merge-join. Only positions present in both vectors contribute to the sum. Runs in O(nnz_a + nnz_b) time.
const d = sparse.SparseVector.dot(&vec_a, &vec_b);
// d = 12 (example: sum of element-wise products at shared positions)
cosineSimilarity(a: *const SparseVector, b: *const SparseVector) f64​
Cosine similarity: dot(a,b) / (||a|| * ||b||). For ternary vectors, ||v||^2 = nnz(v) since all non-zero values are +/-1. Returns 0.0 if either vector is the zero vector.
const sim = sparse.SparseVector.cosineSimilarity(&vec_a, &vec_b);
// sim = 0.012 (example: near-zero for random sparse vectors)
hammingDistance(a: *const SparseVector, b: *const SparseVector) usize​
Counts positions where a[i] != b[i]. This includes:
- Positions where both are non-zero but differ in value
- Positions where one is non-zero and the other is zero (implicitly)
Uses merge-join to efficiently compare only the non-zero regions.
const dist = sparse.SparseVector.hammingDistance(&vec_a, &vec_b);
// dist = 1800 (example: most non-zero positions differ for random vectors)
Memory Complexity​
| Representation | Memory | Bind Cost | Dot Product Cost |
|---|---|---|---|
Dense (HybridBigInt) | O(dimension) | O(dimension) | O(dimension) |
Sparse (SparseVector) | O(nnz) | O(nnz_a + nnz_b) | O(nnz_a + nnz_b) |
For a 10,000-dimensional vector with 500 non-zero elements (95% sparse), the sparse representation uses approximately 20x less memory.
Complete Example​
const std = @import("std");
const sparse = @import("sparse");
const vsa = @import("vsa");
pub fn main() !void {
const allocator = std.heap.page_allocator;
// Create sparse vectors with 10% density
var vec_a = try sparse.SparseVector.random(allocator, 10000, 0.10, 111);
defer vec_a.deinit();
var vec_b = try sparse.SparseVector.random(allocator, 10000, 0.10, 222);
defer vec_b.deinit();
// Check properties
std.debug.print("vec_a: nnz={d}, sparsity={d:.2}%, memory={d} bytes\n", .{
vec_a.nnz(),
vec_a.sparsity() * 100.0,
vec_a.memoryBytes(),
});
// vec_a: nnz=1000, sparsity=90.00%, memory=5024 bytes
std.debug.print("Memory savings vs dense: {d:.1}%\n", .{
vec_a.memorySavings() * 100.0,
});
// Memory savings vs dense: 87.5%
// Sparse bind (result is sparser than inputs)
var bound = try sparse.SparseVector.bind(allocator, &vec_a, &vec_b);
defer bound.deinit();
std.debug.print("bind result: nnz={d} (expected ~{d})\n", .{
bound.nnz(),
@as(usize, @intFromFloat(0.10 * 0.10 * 10000.0)),
});
// bind result: nnz=98 (expected ~100)
// Similarity
const sim = sparse.SparseVector.cosineSimilarity(&vec_a, &vec_b);
std.debug.print("cosine similarity: {d:.6}\n", .{sim});
// cosine similarity: 0.012345
// Convert to dense for JIT operations
const dense_a = vec_a.toDense();
_ = dense_a;
// Convert dense back to sparse
var dense_vec = vsa.randomVector(1000, 42);
var sparse_from_dense = try sparse.SparseVector.fromDense(allocator, &dense_vec);
defer sparse_from_dense.deinit();
std.debug.print("Dense->sparse roundtrip: nnz={d}, sparsity={d:.2}%\n", .{
sparse_from_dense.nnz(),
sparse_from_dense.sparsity() * 100.0,
});
// Dense->sparse roundtrip: nnz=667, sparsity=33.30%
}
Internal Details
- Sorted indices: The
indicesarray is always maintained in ascending sorted order. This invariant enables binary search forget/setand merge-join for VSA operations. - Insertion sort after permute: After cyclic shifting, indices may become unsorted. The
permutefunction uses insertion sort to restore order, which is efficient for nearly-sorted data. - No packed mode: Unlike
HybridBigInt, sparse vectors do not use bit-packed representation. The overhead of packing/unpacking would negate the sparse access benefits.