Skip to main content

Benchmarks

Trinity is a high-performance ternary computing framework built in Zig, designed for both Vector Symbolic Architecture (VSA) operations and large language model inference using BitNet b1.58 ternary weights. This section provides an overview of Trinity's performance characteristics across several key dimensions.

Key Performance Metrics​

MetricValueNotes
GPU inference throughputUp to 298K tokens/secRTX 3090, BitNet b1.58 via bitnet.cpp
JIT speedup15-260xOver interpreted Zig execution for VSA ops
Memory compression20xTernary packed vs float32 representation
Compute modelAdd-onlyNo multiply operations required for ternary weights

Additional Results​

MetricValueNotes
SIMD ternary matmul7.65 GFLOPSBatchTiled, 2.28x over SIMD-16 baseline
Model load time4.8s (NVMe)43x improvement over 208s (ephemeral disk)
HDC continual learning3% avg forgetting20 classes, 10 phases (vs 50-90% neural nets)
BitNet coherent textConfirmedbitnet.cpp on RunPod RTX 4090
Unit tests passing143Across all subsystems

Why Ternary is Fast​

Ternary {-1, 0, +1} weights eliminate the need for multiplication in matrix-vector products. Instead of weight * activation, the operation reduces to addition, subtraction, or skip. This has two major consequences: dramatically lower memory bandwidth requirements (1.58 bits per weight vs 32 bits for float32) and simpler arithmetic that maps efficiently to both CPU SIMD instructions and custom hardware.

Performance Areas​

GPU Inference​

BitNet b1.58 models running on consumer and datacenter GPUs achieve throughput measured in hundreds of thousands of tokens per second for small models. Performance varies by GPU type, model size, and batch configuration. See GPU Inference Benchmarks for detailed numbers.

JIT Compilation​

Trinity includes a custom JIT compiler with backends for ARM64 (Apple Silicon, Raspberry Pi, etc.) and x86-64 (Intel/AMD). VSA operations such as bind, bundle, dot product, and permute are compiled to native machine code at runtime, with compiled functions cached for reuse. See JIT Compilation Performance for architecture-specific results.

Memory Efficiency​

The framework provides multiple memory representations optimized for different use cases: HybridBigInt with lazy packed/unpacked conversion, bit-packed trit arrays, and sparse COO-format vectors for data with many zeros. A 10,000-dimensional vector that would consume 40KB in float32 fits in roughly 2.5KB using packed ternary encoding. See Memory Efficiency for a detailed breakdown.

Competitor Comparison​

How does Trinity stack up against Groq, GPT-4, and other LLM providers? Trinity offers 35-52 tok/s on CPU with self-hosted costs of $0.01-0.35/hr, compared to cloud providers charging per-token fees. See Competitor Comparison for detailed benchmarks and cost analysis.

Ternary Arithmetic Advantage​

The mathematical basis for ternary efficiency comes from information theory. The optimal radix for information density is Euler's number (e ~ 2.718), and 3 is the closest integer. Each trit carries 1.58 bits of information (log2(3)), compared to 1 bit per binary digit. This means ternary representations achieve higher information density per storage unit, which translates directly to reduced memory footprint and bandwidth consumption in real workloads.