Guide

DiskANN — Billion-Scale Vector Search

Cortex ships a 100% pure-Rust DiskANN implementation that targets a single Brainy instance from the ~10 M-vector HNSW comfort zone to 1 B+ vectors at ~5 ms search latency on commodity hardware (PROJECTED — design target; awaiting docs/verification-report.md from Piece 9 of the cortex 3.0 release). It's an algorithmic alternative to HNSW, not a competitor service — it lives in the same plugin, behind the same Brainy APIs.

The problem DiskANN solves

HNSW is excellent up to roughly 10 M vectors per machine. Beyond that, two costs compound:

  1. Memory pressure. At 1 B vectors of 384-dim float32, the vectors alone are ~1.5 TB and the HNSW graph metadata adds another ~2 TB. Even with the cortex 2.4.0 mmap vector backend, the graph traversal pattern faults pages with no spatial locality.
  2. No locality. HNSW's insertion order has no correlation with traversal order on disk, so every search hop on a cold cache is a fresh ~10 μs page fault.

DiskANN (Subramanya et al., NeurIPS 2019) was designed for exactly this regime:

  • Vamana α-pruned graph picks neighbours so that nodes visited together during search end up adjacent on disk — disk locality emerges from construction.
  • Product Quantization compresses each vector to M ≤ 16 bytes resident in RAM. At 1 B vectors that's ~16 GB of PQ codes instead of 1.5 TB of full vectors.
  • Full vectors on disk are only touched to re-rank the top candidate set, not during the graph walk.

What cortex delivers

  • One contiguous file (header + codebook + PQ codes + Vamana graph + full vectors), mmap-mappable so an SSD-resident billion-vector dataset never has to be copied into RAM.
  • Zero-copy section accessors (vectors_f32() returns &[f32] directly into the mapped region).
  • Parallel build with rayon. The graph and PQ encoding both parallelize.
  • File-backed build adjacency for billion-scale construction (the in-RAM concurrent adjacency would consume ~64 GB of bookkeeping at 1 B nodes; the mmap variant uses atomic-u32 slots with sharded write locks).
  • Connectivity-repair pass that guarantees every node is reachable from the entry point. Sequential Vamana provides this via insertion order; parallel Vamana doesn't, so cortex closes the gap explicitly.
  • PQ-walk + full-vector re-rank search. Greedy walk uses ADC distance over RAM-resident PQ codes (M-byte table lookups, ~50 ns per hop); re-rank scores ceil(k × paddingFactor) candidates with the exact full-precision distance.
  • HNSW-shaped TS wrapper (NativeDiskAnnWrapper) so Brainy's higher layers don't know which engine is underneath.

Scale envelope, honestly

PROJECTED — design targets, not measured. The table below cites numbers matching the published DiskANN paper (Subramanya et al., NeurIPS 2019, Table 4 on SIFT1B). Cortex's own DiskANN test corpus in CI today is 10k synthetic dim=64 random vectors (#[ignore]-gated, recall threshold ≥ 0.95); measured RAM + latency on cortex at 100M and 1B on real embedding corpora ship in docs/verification-report.md as part of Piece 9.

Vectors RAM with DiskANN RAM with HNSW DiskANN search latency
1 M 0.5–2 GB 0.5–2 GB <1 ms
10 M 1–5 GB 8–20 GB 1–3 ms
100 M 5–20 GB 80–200 GB (impractical on single machine) 2–5 ms
1 B 20–70 GB 1.5+ TB (single-machine impossible) 5–10 ms

End-to-end query latency at 1 B includes filesystem hydration of the returned entities. Design target is ~100–500 ms total (dominated by the FileSystemStorage random reads + metadata lookup) (PROJECTED — awaiting verification-report.md). The roadmap to bring end-to-end query latency down to match search latency is in docs/scaling.md — all fixes ship inside cortex, no external storage.

How it engages

Cortex registers a 'diskann' provider; Brainy's createIndex() consults it at init:

  1. Explicit opt-in via config.index.type: 'diskann' — required if the engagement conditions aren't satisfied, otherwise throws.
  2. Auto-engagement when all of:
    • The cortex DiskANN provider is registered (you've loaded the plugin).
    • The storage adapter exposes a local filesystem path (getBinaryBlobPath('_diskann/main')). Cloud-storage adapters return null here and stay on HNSW.
    • The metadata index has a stable idMapper (cortex 2.4.0's stable EntityIdMapper).
  3. Explicit opt-out via config.index.type: 'hnsw' keeps the historical in-memory index for the rare workload where you want it.
import { BrainyData } from '@soulcraft/brainy'
import { register as registerCortex } from '@soulcraft/cortex'

const brain = new BrainyData({
  storage: { type: 'filesystem', rootDirectory: '/data/idx' }
})
await registerCortex(brain)
await brain.init()
// → [brainy] DiskANN engaged (path=/data/idx/_diskann/main.bin, dim=384)

const hits = await brain.search(queryVector, 10)
// Design target: ~5–10 ms at 1 B vectors, depending on cache state and SSD
// (PROJECTED — awaiting verification-report.md)

All Brainy APIs — add, search, relate, searchSimilarVerbs, find — work unchanged.

Migrating an existing index

Existing HNSW-backed Brainy installs do not auto-migrate on upgrade. They keep working as-is. To convert:

const result = await brain.migrateToDiskAnn({
  recallTarget: 0.95,    // require ≥95% recall vs old index before swapping
  paddingFactor: 1.2,    // search-time over-fetch for re-rank
  verifySampleSize: 100  // sample queries for the recall check
})
// 1. Builds new index in parallel (old HNSW keeps serving)
// 2. Samples queries — compares top-k results between old and new
// 3. Aborts if recall < target; old index stays in place
// 4. Atomically swaps if recall passes

// Reversible:
await brain.migrateToHnsw()

Reversibility is a contract — production rollbacks are always available.

Tuning

Defaults match the published DiskANN paper and work well for sentence/image embedding workloads (dim 128–1024). Tune via config.index.diskann:

config.index.diskann = {
  pqM: 16,                  // PQ subspaces; dim must be divisible by m
  pqKsub: 256,              // centroids per subspace (8-bit codes — standard)
  maxDegree: 64,            // Vamana R (out-degree per node)
  searchListSize: 100,      // Vamana L (build-time candidate set)
  alpha: 1.2,               // α-pruning density factor
  useMmapAdjacency: true,   // file-backed build adjacency — REQUIRED at >100M nodes
  mmapAdjacencyPath: '/data/scratch/diskann-build.adj'
}

At 1 B nodes you must set useMmapAdjacency: true. The in-RAM concurrent adjacency would consume ~64 GB of bookkeeping at that scale — the mmap variant uses atomic-u32 slots with sharded write locks and bounded RAM (a few hundred mutexes for the shard table).

Dynamic writes

DiskANN graphs are build-once by design. The cortex wrapper handles dynamic writes via a delta buffer pattern that mirrors FreshDiskANN (Singh et al., 2021):

  • addItem → appends to an in-memory delta map.
  • search → queries the main index AND brute-forces the delta, merges, returns top-k.
  • removeItem → tombstone bitmap, filtered out at search time.
  • rebuild() → folds the delta into a new main index, swaps atomically.

Operationally: insertions are O(1), reads stay sub-ms while delta fits in cache, and you schedule rebuild() during off-peak windows. The delta brute-force scales linearly in delta size; you keep it small.

On-disk format

Single contiguous main.bin, all little-endian:

+--------------------------------------------------------------+
| Header (4 KB, page-aligned)                                  |
|   magic="DKAN" · version · dim · node_count                  |
|   pq_m · pq_ksub · pq_dsub · max_degree · entry_point        |
+--------------------------------------------------------------+
| PQ codebook    (m × ksub × dsub × f32)                       |
+--------------------------------------------------------------+
| PQ codes       (node_count × m bytes)                        |
+--------------------------------------------------------------+
| Vamana graph   (node_count × max_degree × u32)               |
|   Fixed-degree CSR. Sentinel `u32::MAX` marks unused slots.  |
+--------------------------------------------------------------+
| Full vectors   (node_count × dim × f32)                      |
+--------------------------------------------------------------+

Fixed-degree adjacency means neighbour-offset math is O(1) — at search time graph[node] is a single seek to graph_offset + node * max_degree * 4. The mmap base is page-aligned by the OS and section offsets are 4-byte aligned by construction, so bytemuck::cast_slice reinterprets section bytes as &[f32] / &[u32] without copying.

The same layout is what the build writes and the searcher mmaps — no separate serialization step.

Why 100% Rust, no C++ FFI

Cortex re-implements Vamana from the published paper rather than wrapping Microsoft's C++ reference. Reasons:

  1. Cross-platform builds for Node native modules become operationally expensive with C++ (Linux/macOS/Windows × x64/arm64 binaries, headers, link-time gotchas). napi-rs gives mature cross-platform binary distribution.
  2. License posture stays clean — pure Rust port from a published algorithm + permissive Rust deps (memmap2, bytemuck, rayon, rand, thiserror). No patent grant ambiguity.
  3. Full control over the on-disk format + napi bindings + future cortex-specific optimizations.

The pure-Rust DiskANN crate (native/diskann/) compiles and tests independently of napi, so it's separately benchmarkable and fuzz-target-ready.

See also