Cortex Performance
All benchmarks measured using vitest bench on the included benchmark suite. Run yourself with:
npm run benchDistance Calculations
384-dimensional vectors (all-MiniLM-L6-v2 embedding size).
| Operation | Throughput | Mean Latency | P99 |
|---|---|---|---|
cosineDistance (single pair) |
45,965 ops/s | 21.8 μs | 34.1 μs |
euclideanDistance (single pair) |
46,320 ops/s | 21.6 μs | — |
cosineDistanceSq8 (quantized pair) |
3,715,855 ops/s | 0.27 μs | 0.3 μs |
cosineDistanceSq8Batch (1K vectors) |
2,142 ops/s | 467 μs | 1.1 ms |
cosineDistanceBatch (1K vectors) |
77 ops/s | 13.0 ms | — |
Key insight: SQ8 quantized distance is 81x faster than full-precision for single-pair comparisons. For batch operations, SQ8 batch is 28x faster than full-precision batch.
Quantization
| Operation | Throughput | Mean Latency |
|---|---|---|
quantizeSq8 (384-dim) |
74,020 ops/s | 13.5 μs |
dequantizeSq8 (384-dim) |
53,440 ops/s | 18.7 μs |
Aggregation Engine
| Operation | Throughput | Mean Latency |
|---|---|---|
incrementalUpdate (1K entities, 3 metrics) |
809 ops/s | 1.2 ms |
rebuildAggregate (10K entities, 5 groups) |
475 ops/s | 2.1 ms |
rebuildAggregate (100K entities, Rayon) |
66 ops/s | 15.2 ms |
queryAggregate (1K groups, sort + paginate) |
986 ops/s | 1.0 ms |
computeGroupKey (10K entities, time bucketing) |
146 ops/s | 6.8 ms |
Rebuild scales linearly with entity count. Rayon parallelism activates above 1,000 entities.
Serialization
| Operation | Throughput | Mean Latency |
|---|---|---|
msgpackEncode (1K entities) |
235 ops/s | 4.3 ms |
msgpackDecode (1K entities) |
355 ops/s | 2.8 ms |
msgpackEncodeBatch (1K entities) |
290 ops/s | 3.4 ms |
msgpackDecodeBatch (1K entities) |
356 ops/s | 2.8 ms |
RoaringBitmap32.create (10K elements) |
3,067 ops/s | 326 μs |
RoaringBitmap32.serialize (10K elements) |
361,258 ops/s | 2.8 μs |
RoaringBitmap32.deserialize (10K elements) |
548,714 ops/s | 1.8 μs |
RoaringBitmap32.and (two 10K bitmaps) |
1,379 ops/s | 725 μs |
RoaringBitmap32.or (two 10K bitmaps) |
1,219 ops/s | 820 μs |
encodeConnections (1K lists) |
913 ops/s | 1.1 ms |
decodeConnections (1K lists) |
1,078 ops/s | 928 μs |
Hardware Recommendations
- CPU: Multi-core for Rayon parallel rebuild. x86_64 for SIMD distance.
- Memory: Cortex native allocations are tracked and reported via the cache subsystem.
- Storage: SSD recommended for mmap vector store. NVMe for best disk locality benefits from graph-aware compaction.
Running Benchmarks
# Run all benchmarks
npm run bench
# Run specific benchmark
npx vitest bench src/benchmarks/distance.bench.ts
# Run with verbose output
npx vitest bench --reporter=verboseBenchmarks use vitest bench mode and run multiple iterations to produce stable statistics. Results include Hz (ops/sec), min, max, mean, P75, P99, P99.5, P99.9, and relative margin of error (RME).