DiskANN — Billion-Scale Vector Search
Cortex ships a 100% pure-Rust DiskANN implementation that targets a single Brainy instance from the ~10 M-vector HNSW comfort zone to 1 B+ vectors at ~5 ms search latency on commodity hardware (PROJECTED — design target; awaiting docs/verification-report.md from Piece 9 of the cortex 3.0 release). It's an algorithmic alternative to HNSW, not a competitor service — it lives in the same plugin, behind the same Brainy APIs.
The problem DiskANN solves
HNSW is excellent up to roughly 10 M vectors per machine. Beyond that, two costs compound:
- Memory pressure. At 1 B vectors of 384-dim float32, the vectors alone are ~1.5 TB and the HNSW graph metadata adds another ~2 TB. Even with the cortex 2.4.0 mmap vector backend, the graph traversal pattern faults pages with no spatial locality.
- No locality. HNSW's insertion order has no correlation with traversal order on disk, so every search hop on a cold cache is a fresh ~10 μs page fault.
DiskANN (Subramanya et al., NeurIPS 2019) was designed for exactly this regime:
- Vamana α-pruned graph picks neighbours so that nodes visited together during search end up adjacent on disk — disk locality emerges from construction.
- Product Quantization compresses each vector to M ≤ 16 bytes resident in RAM. At 1 B vectors that's ~16 GB of PQ codes instead of 1.5 TB of full vectors.
- Full vectors on disk are only touched to re-rank the top candidate set, not during the graph walk.
What cortex delivers
- One contiguous file (header + codebook + PQ codes + Vamana graph + full vectors), mmap-mappable so an SSD-resident billion-vector dataset never has to be copied into RAM.
- Zero-copy section accessors (
vectors_f32()returns&[f32]directly into the mapped region). - Parallel build with rayon. The graph and PQ encoding both parallelize.
- File-backed build adjacency for billion-scale construction (the in-RAM concurrent adjacency would consume ~64 GB of bookkeeping at 1 B nodes; the mmap variant uses atomic-u32 slots with sharded write locks).
- Connectivity-repair pass that guarantees every node is reachable from the entry point. Sequential Vamana provides this via insertion order; parallel Vamana doesn't, so cortex closes the gap explicitly.
- PQ-walk + full-vector re-rank search. Greedy walk uses ADC distance over RAM-resident PQ codes (M-byte table lookups, ~50 ns per hop); re-rank scores
ceil(k × paddingFactor)candidates with the exact full-precision distance. - HNSW-shaped TS wrapper (
NativeDiskAnnWrapper) so Brainy's higher layers don't know which engine is underneath.
Scale envelope, honestly
PROJECTED — design targets, not measured. The table below cites numbers matching the published DiskANN paper (Subramanya et al., NeurIPS 2019, Table 4 on SIFT1B). Cortex's own DiskANN test corpus in CI today is 10k synthetic dim=64 random vectors (
#[ignore]-gated, recall threshold ≥ 0.95); measured RAM + latency on cortex at 100M and 1B on real embedding corpora ship indocs/verification-report.mdas part of Piece 9.
| Vectors | RAM with DiskANN | RAM with HNSW | DiskANN search latency |
|---|---|---|---|
| 1 M | 0.5–2 GB | 0.5–2 GB | <1 ms |
| 10 M | 1–5 GB | 8–20 GB | 1–3 ms |
| 100 M | 5–20 GB | 80–200 GB (impractical on single machine) | 2–5 ms |
| 1 B | 20–70 GB | 1.5+ TB (single-machine impossible) | 5–10 ms |
End-to-end query latency at 1 B includes filesystem hydration of the returned entities. Design target is ~100–500 ms total (dominated by the FileSystemStorage random reads + metadata lookup) (PROJECTED — awaiting verification-report.md). The roadmap to bring end-to-end query latency down to match search latency is in docs/scaling.md — all fixes ship inside cortex, no external storage.
How it engages
Cortex registers a 'diskann' provider; Brainy's createIndex() consults it at init:
- Explicit opt-in via
config.index.type: 'diskann'— required if the engagement conditions aren't satisfied, otherwise throws. - Auto-engagement when all of:
- The cortex DiskANN provider is registered (you've loaded the plugin).
- The storage adapter exposes a local filesystem path (
getBinaryBlobPath('_diskann/main')). Cloud-storage adapters returnnullhere and stay on HNSW. - The metadata index has a stable
idMapper(cortex 2.4.0's stable EntityIdMapper).
- Explicit opt-out via
config.index.type: 'hnsw'keeps the historical in-memory index for the rare workload where you want it.
import { BrainyData } from '@soulcraft/brainy'
import { register as registerCortex } from '@soulcraft/cortex'
const brain = new BrainyData({
storage: { type: 'filesystem', rootDirectory: '/data/idx' }
})
await registerCortex(brain)
await brain.init()
// → [brainy] DiskANN engaged (path=/data/idx/_diskann/main.bin, dim=384)
const hits = await brain.search(queryVector, 10)
// Design target: ~5–10 ms at 1 B vectors, depending on cache state and SSD
// (PROJECTED — awaiting verification-report.md)All Brainy APIs — add, search, relate, searchSimilarVerbs, find — work unchanged.
Migrating an existing index
Existing HNSW-backed Brainy installs do not auto-migrate on upgrade. They keep working as-is. To convert:
const result = await brain.migrateToDiskAnn({
recallTarget: 0.95, // require ≥95% recall vs old index before swapping
paddingFactor: 1.2, // search-time over-fetch for re-rank
verifySampleSize: 100 // sample queries for the recall check
})
// 1. Builds new index in parallel (old HNSW keeps serving)
// 2. Samples queries — compares top-k results between old and new
// 3. Aborts if recall < target; old index stays in place
// 4. Atomically swaps if recall passes
// Reversible:
await brain.migrateToHnsw()Reversibility is a contract — production rollbacks are always available.
Tuning
Defaults match the published DiskANN paper and work well for sentence/image embedding workloads (dim 128–1024). Tune via config.index.diskann:
config.index.diskann = {
pqM: 16, // PQ subspaces; dim must be divisible by m
pqKsub: 256, // centroids per subspace (8-bit codes — standard)
maxDegree: 64, // Vamana R (out-degree per node)
searchListSize: 100, // Vamana L (build-time candidate set)
alpha: 1.2, // α-pruning density factor
useMmapAdjacency: true, // file-backed build adjacency — REQUIRED at >100M nodes
mmapAdjacencyPath: '/data/scratch/diskann-build.adj'
}At 1 B nodes you must set useMmapAdjacency: true. The in-RAM concurrent adjacency would consume ~64 GB of bookkeeping at that scale — the mmap variant uses atomic-u32 slots with sharded write locks and bounded RAM (a few hundred mutexes for the shard table).
Dynamic writes
DiskANN graphs are build-once by design. The cortex wrapper handles dynamic writes via a delta buffer pattern that mirrors FreshDiskANN (Singh et al., 2021):
addItem→ appends to an in-memory delta map.search→ queries the main index AND brute-forces the delta, merges, returns top-k.removeItem→ tombstone bitmap, filtered out at search time.rebuild()→ folds the delta into a new main index, swaps atomically.
Operationally: insertions are O(1), reads stay sub-ms while delta fits in cache, and you schedule rebuild() during off-peak windows. The delta brute-force scales linearly in delta size; you keep it small.
On-disk format
Single contiguous main.bin, all little-endian:
+--------------------------------------------------------------+
| Header (4 KB, page-aligned) |
| magic="DKAN" · version · dim · node_count |
| pq_m · pq_ksub · pq_dsub · max_degree · entry_point |
+--------------------------------------------------------------+
| PQ codebook (m × ksub × dsub × f32) |
+--------------------------------------------------------------+
| PQ codes (node_count × m bytes) |
+--------------------------------------------------------------+
| Vamana graph (node_count × max_degree × u32) |
| Fixed-degree CSR. Sentinel `u32::MAX` marks unused slots. |
+--------------------------------------------------------------+
| Full vectors (node_count × dim × f32) |
+--------------------------------------------------------------+Fixed-degree adjacency means neighbour-offset math is O(1) — at search time graph[node] is a single seek to graph_offset + node * max_degree * 4. The mmap base is page-aligned by the OS and section offsets are 4-byte aligned by construction, so bytemuck::cast_slice reinterprets section bytes as &[f32] / &[u32] without copying.
The same layout is what the build writes and the searcher mmaps — no separate serialization step.
Why 100% Rust, no C++ FFI
Cortex re-implements Vamana from the published paper rather than wrapping Microsoft's C++ reference. Reasons:
- Cross-platform builds for Node native modules become operationally expensive with C++ (Linux/macOS/Windows × x64/arm64 binaries, headers, link-time gotchas). napi-rs gives mature cross-platform binary distribution.
- License posture stays clean — pure Rust port from a published algorithm + permissive Rust deps (
memmap2,bytemuck,rayon,rand,thiserror). No patent grant ambiguity. - Full control over the on-disk format + napi bindings + future cortex-specific optimizations.
The pure-Rust DiskANN crate (native/diskann/) compiles and tests independently of napi, so it's separately benchmarkable and fuzz-target-ready.
See also
- ADR-002 — the architectural decision record with full design rationale
- Scaling & Resource Management — multi-tenant density + the 1 B operational ceiling
- Performance benchmarks — measured numbers from the included benchmark suite
- Brainy vs Brainy + Cortex comparison — side-by-side feature and speed comparison