Cortex Performance

All benchmarks measured using vitest bench on the included benchmark suite. Run yourself with:

npm run bench

Distance Calculations

384-dimensional vectors (all-MiniLM-L6-v2 embedding size).

Operation	Throughput	Mean Latency	P99
`cosineDistance` (single pair)	45,965 ops/s	21.8 μs	34.1 μs
`euclideanDistance` (single pair)	46,320 ops/s	21.6 μs	—
`cosineDistanceSq8` (quantized pair)	3,715,855 ops/s	0.27 μs	0.3 μs
`cosineDistanceSq8Batch` (1K vectors)	2,142 ops/s	467 μs	1.1 ms
`cosineDistanceBatch` (1K vectors)	77 ops/s	13.0 ms	—

Key insight: SQ8 quantized distance is 81x faster than full-precision for single-pair comparisons. For batch operations, SQ8 batch is 28x faster than full-precision batch.

Quantization

Operation	Throughput	Mean Latency
`quantizeSq8` (384-dim)	74,020 ops/s	13.5 μs
`dequantizeSq8` (384-dim)	53,440 ops/s	18.7 μs

Aggregation Engine

Operation	Throughput	Mean Latency
`incrementalUpdate` (1K entities, 3 metrics)	809 ops/s	1.2 ms
`rebuildAggregate` (10K entities, 5 groups)	475 ops/s	2.1 ms
`rebuildAggregate` (100K entities, Rayon)	66 ops/s	15.2 ms
`queryAggregate` (1K groups, sort + paginate)	986 ops/s	1.0 ms
`computeGroupKey` (10K entities, time bucketing)	146 ops/s	6.8 ms

Rebuild scales linearly with entity count. Rayon parallelism activates above 1,000 entities.

Column Store — orderBy Sort

The native column store backs Brainy's find({ orderBy, limit }). Its sortTopK (O(N log K) k-way merge over sorted segments) scales near-linearly with entity count.

Operation	Entities	Mean Latency
`sortTopK` (k=100)	1,000	~0.028 ms
`sortTopK` (k=100)	10,000	~0.21 ms
`sortTopK` (k=100)	100,000	~3.3 ms

MEASURED on AMD Ryzen 9 7950X3D, Node 22, release native (src/benchmarks/scaling.bench.test.ts). Growth from 1k→100k is ~125x against a 100x linear ideal — emphatically not the ~10,000x of an O(N²) sort. scaling.bench.test.ts runs in CI and fails the build if the growth ratio ever turns super-linear.

Cross-Language Consistency

Cortex is a pure accelerator: every native result is byte-for-byte equal to the Brainy JavaScript baseline it replaces, so query results are identical with or without the plugin installed. This is enforced by a 104-test cross-language parity suite (src/native/crossLanguageParity.test.ts) covering tokenization, value normalization (the UTF-16/ASCII boundary), code-point string collation, SQ8 quantized distance, top-K ranking, and roaring/msgpack round-trips against Brainy's golden outputs.

Serialization

Operation	Throughput	Mean Latency
`msgpackEncode` (1K entities)	235 ops/s	4.3 ms
`msgpackDecode` (1K entities)	355 ops/s	2.8 ms
`msgpackEncodeBatch` (1K entities)	290 ops/s	3.4 ms
`msgpackDecodeBatch` (1K entities)	356 ops/s	2.8 ms
`RoaringBitmap32.create` (10K elements)	3,067 ops/s	326 μs
`RoaringBitmap32.serialize` (10K elements)	361,258 ops/s	2.8 μs
`RoaringBitmap32.deserialize` (10K elements)	548,714 ops/s	1.8 μs
`RoaringBitmap32.and` (two 10K bitmaps)	1,379 ops/s	725 μs
`RoaringBitmap32.or` (two 10K bitmaps)	1,219 ops/s	820 μs
`encodeConnections` (1K lists)	913 ops/s	1.1 ms
`decodeConnections` (1K lists)	1,078 ops/s	928 μs

Hardware Recommendations

CPU: Multi-core for Rayon parallel rebuild. x86_64 for SIMD distance.
Memory: Cortex native allocations are tracked and reported via the cache subsystem.
Storage: SSD (NVMe ideal) for the memory-mapped HNSW index and graph-adjacency SSTables — the kernel keeps the hot working set in RAM and pages the rest from disk. See Scaling & Resource Management.

Running Benchmarks

# Run all benchmarks
npm run bench

# Run specific benchmark
npx vitest bench src/benchmarks/distance.bench.ts

# Run with verbose output
npx vitest bench --reporter=verbose

Benchmarks use vitest bench mode and run multiple iterations to produce stable statistics. Results include Hz (ops/sec), min, max, mean, P75, P99, P99.5, P99.9, and relative margin of error (RME).