Guide

Cortex Performance

All benchmarks measured using vitest bench on the included benchmark suite. Run yourself with:

npm run bench

Distance Calculations

384-dimensional vectors (all-MiniLM-L6-v2 embedding size).

Operation Throughput Mean Latency P99
cosineDistance (single pair) 45,965 ops/s 21.8 μs 34.1 μs
euclideanDistance (single pair) 46,320 ops/s 21.6 μs
cosineDistanceSq8 (quantized pair) 3,715,855 ops/s 0.27 μs 0.3 μs
cosineDistanceSq8Batch (1K vectors) 2,142 ops/s 467 μs 1.1 ms
cosineDistanceBatch (1K vectors) 77 ops/s 13.0 ms

Key insight: SQ8 quantized distance is 81x faster than full-precision for single-pair comparisons. For batch operations, SQ8 batch is 28x faster than full-precision batch.

Quantization

Operation Throughput Mean Latency
quantizeSq8 (384-dim) 74,020 ops/s 13.5 μs
dequantizeSq8 (384-dim) 53,440 ops/s 18.7 μs

Aggregation Engine

Operation Throughput Mean Latency
incrementalUpdate (1K entities, 3 metrics) 809 ops/s 1.2 ms
rebuildAggregate (10K entities, 5 groups) 475 ops/s 2.1 ms
rebuildAggregate (100K entities, Rayon) 66 ops/s 15.2 ms
queryAggregate (1K groups, sort + paginate) 986 ops/s 1.0 ms
computeGroupKey (10K entities, time bucketing) 146 ops/s 6.8 ms

Rebuild scales linearly with entity count. Rayon parallelism activates above 1,000 entities.

Column Store — orderBy Sort

The native column store backs Brainy's find({ orderBy, limit }). Its sortTopK (O(N log K) k-way merge over sorted segments) scales near-linearly with entity count.

Operation Entities Mean Latency
sortTopK (k=100) 1,000 ~0.028 ms
sortTopK (k=100) 10,000 ~0.21 ms
sortTopK (k=100) 100,000 ~3.3 ms

MEASURED on AMD Ryzen 9 7950X3D, Node 22, release native (src/benchmarks/scaling.bench.test.ts). Growth from 1k→100k is ~125x against a 100x linear ideal — emphatically not the ~10,000x of an O(N²) sort. scaling.bench.test.ts runs in CI and fails the build if the growth ratio ever turns super-linear.

Cross-Language Consistency

Cortex is a pure accelerator: every native result is byte-for-byte equal to the Brainy JavaScript baseline it replaces, so query results are identical with or without the plugin installed. This is enforced by a 104-test cross-language parity suite (src/native/crossLanguageParity.test.ts) covering tokenization, value normalization (the UTF-16/ASCII boundary), code-point string collation, SQ8 quantized distance, top-K ranking, and roaring/msgpack round-trips against Brainy's golden outputs.

Serialization

Operation Throughput Mean Latency
msgpackEncode (1K entities) 235 ops/s 4.3 ms
msgpackDecode (1K entities) 355 ops/s 2.8 ms
msgpackEncodeBatch (1K entities) 290 ops/s 3.4 ms
msgpackDecodeBatch (1K entities) 356 ops/s 2.8 ms
RoaringBitmap32.create (10K elements) 3,067 ops/s 326 μs
RoaringBitmap32.serialize (10K elements) 361,258 ops/s 2.8 μs
RoaringBitmap32.deserialize (10K elements) 548,714 ops/s 1.8 μs
RoaringBitmap32.and (two 10K bitmaps) 1,379 ops/s 725 μs
RoaringBitmap32.or (two 10K bitmaps) 1,219 ops/s 820 μs
encodeConnections (1K lists) 913 ops/s 1.1 ms
decodeConnections (1K lists) 1,078 ops/s 928 μs

Hardware Recommendations

  • CPU: Multi-core for Rayon parallel rebuild. x86_64 for SIMD distance.
  • Memory: Cortex native allocations are tracked and reported via the cache subsystem.
  • Storage: SSD (NVMe ideal) for the memory-mapped HNSW index and graph-adjacency SSTables — the kernel keeps the hot working set in RAM and pages the rest from disk. See Scaling & Resource Management.

Running Benchmarks

# Run all benchmarks
npm run bench

# Run specific benchmark
npx vitest bench src/benchmarks/distance.bench.ts

# Run with verbose output
npx vitest bench --reporter=verbose

Benchmarks use vitest bench mode and run multiple iterations to produce stable statistics. Results include Hz (ops/sec), min, max, mean, P75, P99, P99.5, P99.9, and relative margin of error (RME).