Guide

Cortex Performance

All benchmarks measured using vitest bench on the included benchmark suite. Run yourself with:

npm run bench

Distance Calculations

384-dimensional vectors (all-MiniLM-L6-v2 embedding size).

Operation Throughput Mean Latency P99
cosineDistance (single pair) 45,965 ops/s 21.8 μs 34.1 μs
euclideanDistance (single pair) 46,320 ops/s 21.6 μs
cosineDistanceSq8 (quantized pair) 3,715,855 ops/s 0.27 μs 0.3 μs
cosineDistanceSq8Batch (1K vectors) 2,142 ops/s 467 μs 1.1 ms
cosineDistanceBatch (1K vectors) 77 ops/s 13.0 ms

Key insight: SQ8 quantized distance is 81x faster than full-precision for single-pair comparisons. For batch operations, SQ8 batch is 28x faster than full-precision batch.

Quantization

Operation Throughput Mean Latency
quantizeSq8 (384-dim) 74,020 ops/s 13.5 μs
dequantizeSq8 (384-dim) 53,440 ops/s 18.7 μs

Aggregation Engine

Operation Throughput Mean Latency
incrementalUpdate (1K entities, 3 metrics) 809 ops/s 1.2 ms
rebuildAggregate (10K entities, 5 groups) 475 ops/s 2.1 ms
rebuildAggregate (100K entities, Rayon) 66 ops/s 15.2 ms
queryAggregate (1K groups, sort + paginate) 986 ops/s 1.0 ms
computeGroupKey (10K entities, time bucketing) 146 ops/s 6.8 ms

Rebuild scales linearly with entity count. Rayon parallelism activates above 1,000 entities.

Serialization

Operation Throughput Mean Latency
msgpackEncode (1K entities) 235 ops/s 4.3 ms
msgpackDecode (1K entities) 355 ops/s 2.8 ms
msgpackEncodeBatch (1K entities) 290 ops/s 3.4 ms
msgpackDecodeBatch (1K entities) 356 ops/s 2.8 ms
RoaringBitmap32.create (10K elements) 3,067 ops/s 326 μs
RoaringBitmap32.serialize (10K elements) 361,258 ops/s 2.8 μs
RoaringBitmap32.deserialize (10K elements) 548,714 ops/s 1.8 μs
RoaringBitmap32.and (two 10K bitmaps) 1,379 ops/s 725 μs
RoaringBitmap32.or (two 10K bitmaps) 1,219 ops/s 820 μs
encodeConnections (1K lists) 913 ops/s 1.1 ms
decodeConnections (1K lists) 1,078 ops/s 928 μs

Hardware Recommendations

  • CPU: Multi-core for Rayon parallel rebuild. x86_64 for SIMD distance.
  • Memory: Cortex native allocations are tracked and reported via the cache subsystem.
  • Storage: SSD recommended for mmap vector store. NVMe for best disk locality benefits from graph-aware compaction.

Running Benchmarks

# Run all benchmarks
npm run bench

# Run specific benchmark
npx vitest bench src/benchmarks/distance.bench.ts

# Run with verbose output
npx vitest bench --reporter=verbose

Benchmarks use vitest bench mode and run multiple iterations to produce stable statistics. Results include Hz (ops/sec), min, max, mean, P75, P99, P99.5, P99.9, and relative margin of error (RME).