Adaptive DiskANN — From Laptop to Billion-Scale
Cor ships a 100% pure-Rust Adaptive DiskANN implementation that takes a single Brainy instance from the ~10 M-vector HNSW comfort zone to 1 B+ vectors on commodity hardware. MEASURED at 1 M: p50 0.85 ms / p99 5.23 ms on bxl9000 (mode=auto, in-memory). 1 B target is PROJECTED at ~5–10 ms via Mode 3 (on-disk) — the DiskANN paper's published range. It's an algorithmic alternative to HNSW, not a competitor service — it lives in the same plugin, behind the same Brainy APIs.
The "Adaptive" part: the same on-disk index file silently changes residency mode based on dataset size + available RAM:
| Mode | When auto-selects | What changes |
|---|---|---|
| Mode 1 — in-memory | Vectors × dim × 4 fits in RAM (typical up through 10 M at 128-dim, 1 M at 1536-dim) | Zero PQ compression. Full Vamana graph + vectors mmap'd into RAM. Fastest — no disk I/O on hot path. |
| Mode 2 — hybrid | RAM-constrained but PQ centroids fit (100 M tier on commodity boxes) | PQ-compressed centroids in RAM (16–32× smaller); full vectors mmap'd from disk with page-cache promotion. ~2-3× slower than Mode 1. |
| Mode 3 — on-disk | Billion-scale where even uncompressed vectors don't fit on commodity hardware | Maximally PQ-compressed; search walks disk via OS page cache + DiskANN's neighbour-prefetch pattern. Published 5 ms range from the paper. |
You build the index once with cor 3.0 defaults; as your dataset grows over time, the same file changes its residency mode under the hood. No re-index, no API change, no perf cliff at boundaries.
The problem DiskANN solves
HNSW is excellent up to roughly 10 M vectors per machine. Beyond that, two costs compound:
- Memory pressure. At 1 B vectors of 384-dim float32, the vectors alone are ~1.5 TB and the HNSW graph metadata adds another ~2 TB. Even with the cortex 2.4.0 mmap vector backend, the graph traversal pattern faults pages with no spatial locality.
- No locality. HNSW's insertion order has no correlation with traversal order on disk, so every search hop on a cold cache is a fresh ~10 μs page fault.
DiskANN (Subramanya et al., NeurIPS 2019) was designed for exactly this regime:
- Vamana α-pruned graph picks neighbours so that nodes visited together during search end up adjacent on disk — disk locality emerges from construction.
- Product Quantization compresses each vector to M ≤ 16 bytes resident in RAM. At 1 B vectors that's ~16 GB of PQ codes instead of 1.5 TB of full vectors.
- Full vectors on disk are only touched to re-rank the top candidate set, not during the graph walk.
What cor delivers
- One contiguous file (header + codebook + PQ codes + Vamana graph + full vectors), mmap-mappable so an SSD-resident billion-vector dataset never has to be copied into RAM.
- Zero-copy section accessors (
vectors_f32()returns&[f32]directly into the mapped region). - Parallel build with rayon. The graph and PQ encoding both parallelize.
- File-backed build adjacency for billion-scale construction (the in-RAM concurrent adjacency would consume ~64 GB of bookkeeping at 1 B nodes; the mmap variant uses atomic-u32 slots with sharded write locks).
- Connectivity-repair pass that guarantees every node is reachable from the entry point. Sequential Vamana provides this via insertion order; parallel Vamana doesn't, so cor closes the gap explicitly.
- PQ-walk + full-vector re-rank search. Greedy walk uses ADC distance over RAM-resident PQ codes (M-byte table lookups, ~50 ns per hop); re-rank scores
ceil(k × paddingFactor)candidates with the exact full-precision distance. - HNSW-shaped TS wrapper (
NativeDiskAnnWrapper) so Brainy's higher layers don't know which engine is underneath.
Scale envelope
MEASURED at 1 M on bxl9000 (Ryzen 9 7950X3D / 184 GB / NVMe), reproducible via
scripts/verify-diskann.mjs. SIFT1M canonical: p50 0.86 ms / p99 1.22 ms, recall 0.9942 on the BIGANN reference dataset. 10 M / 100 M numbers below are partial-measured (SIFT10M canonical at p50 0.78 ms / p99 1.13 ms, hybrid mode). 1 B numbers are PROJECTED from the SIFT1B reference run + Subramanya et al. algorithm model.
| Vectors | RAM with DiskANN | RAM with HNSW | Mode | DiskANN search latency |
|---|---|---|---|---|
| 1 M | 0.5–2 GB | 0.5–2 GB | Mode 1 | p50 0.85 ms / p99 5.23 ms (MEASURED, mode=auto) |
| 10 M | 1–5 GB | 8–20 GB | Mode 1 | p50 0.78 ms / p99 1.13 ms (MEASURED on SIFT10M, hybrid) |
| 100 M | 5–20 GB | 80–200 GB (impractical on single machine) | Mode 2 | 2–5 ms (PROJECTED from SIFT trends) |
| 1 B | 20–70 GB | 1.5+ TB (single-machine impossible) | Mode 3 | 5–10 ms (PROJECTED from SIFT1B trends) |
End-to-end query latency at 1 B includes filesystem hydration of the returned entities. Design target is ~100–500 ms total (dominated by the FileSystemStorage random reads + metadata lookup, PROJECTED). The roadmap to bring end-to-end query latency down to match search latency is in docs/scaling.md — all fixes ship inside cor, no external storage.
How it engages
Cor registers a 'diskann' provider; Brainy's createIndex() consults it at init:
- Explicit opt-in via
config.index.type: 'diskann'— required if the engagement conditions aren't satisfied, otherwise throws. - Auto-engagement when all of:
- The cor DiskANN provider is registered (you've loaded the plugin).
- The storage adapter exposes a local filesystem path (
getBinaryBlobPath('_diskann/main')). Cloud-storage adapters returnnullhere and stay on HNSW. - The metadata index has a stable
idMapper(cortex 2.4.0's stable EntityIdMapper).
- Explicit opt-out via
config.index.type: 'hnsw'keeps the historical in-memory index for the rare workload where you want it.
import { BrainyData } from '@soulcraft/brainy'
import { register as registerCor } from '@soulcraft/cor'
const brain = new BrainyData({
storage: { type: 'filesystem', rootDirectory: '/data/idx' }
})
await registerCor(brain)
await brain.init()
// → [brainy] DiskANN engaged (path=/data/idx/_diskann/main.bin, dim=384)
const hits = await brain.search(queryVector, 10)
// MEASURED at 1 M: p50 0.85 ms / p99 5.23 ms (mode=auto on bxl9000)
// PROJECTED at 1 B: ~5–10 ms depending on cache state and SSDAll Brainy APIs — add, search, relate, searchSimilarVerbs, find — work unchanged.
Migrating an existing index
Existing HNSW-backed Brainy installs do not auto-migrate on upgrade. They keep working as-is. To convert:
const result = await brain.migrateToDiskAnn({
recallTarget: 0.95, // require ≥95% recall vs old index before swapping
paddingFactor: 1.2, // search-time over-fetch for re-rank
verifySampleSize: 100 // sample queries for the recall check
})
// 1. Builds new index in parallel (old HNSW keeps serving)
// 2. Samples queries — compares top-k results between old and new
// 3. Aborts if recall < target; old index stays in place
// 4. Atomically swaps if recall passes
// Reversible:
await brain.migrateToHnsw()Reversibility is a contract — production rollbacks are always available.
Tuning
Defaults match the published DiskANN paper and work well for sentence/image embedding workloads (dim 128–1024). Tune via config.index.diskann:
config.index.diskann = {
pqM: 16, // PQ subspaces; dim must be divisible by m
pqKsub: 256, // centroids per subspace (8-bit codes — standard)
maxDegree: 64, // Vamana R (out-degree per node)
searchListSize: 100, // Vamana L (build-time candidate set)
alpha: 1.2, // α-pruning density factor
useMmapAdjacency: true, // file-backed build adjacency — REQUIRED at >100M nodes
mmapAdjacencyPath: '/data/scratch/diskann-build.adj'
}At 1 B nodes you must set useMmapAdjacency: true. The in-RAM concurrent adjacency would consume ~64 GB of bookkeeping at that scale — the mmap variant uses atomic-u32 slots with sharded write locks and bounded RAM (a few hundred mutexes for the shard table).
Dynamic writes
DiskANN graphs are build-once by design. The cor wrapper handles dynamic writes via a delta buffer pattern that mirrors FreshDiskANN (Singh et al., 2021):
addItem→ appends to an in-memory delta map.search→ queries the main index AND brute-forces the delta, merges, returns top-k.removeItem→ tombstone bitmap, filtered out at search time.rebuild()→ folds the delta into a new main index, swaps atomically.
Operationally: insertions are O(1), reads stay sub-ms while delta fits in cache, and you schedule rebuild() during off-peak windows. The delta brute-force scales linearly in delta size; you keep it small.
On-disk format
Single contiguous main.bin, all little-endian:
+--------------------------------------------------------------+
| Header (4 KB, page-aligned) |
| magic="DKAN" · version · dim · node_count |
| pq_m · pq_ksub · pq_dsub · max_degree · entry_point |
+--------------------------------------------------------------+
| PQ codebook (m × ksub × dsub × f32) |
+--------------------------------------------------------------+
| PQ codes (node_count × m bytes) |
+--------------------------------------------------------------+
| Vamana graph (node_count × max_degree × u32) |
| Fixed-degree CSR. Sentinel `u32::MAX` marks unused slots. |
+--------------------------------------------------------------+
| Full vectors (node_count × dim × f32) |
+--------------------------------------------------------------+Fixed-degree adjacency means neighbour-offset math is O(1) — at search time graph[node] is a single seek to graph_offset + node * max_degree * 4. The mmap base is page-aligned by the OS and section offsets are 4-byte aligned by construction, so bytemuck::cast_slice reinterprets section bytes as &[f32] / &[u32] without copying.
The same layout is what the build writes and the searcher mmaps — no separate serialization step.
Why 100% Rust, no C++ FFI
Cor re-implements Vamana from the published paper rather than wrapping Microsoft's C++ reference. Reasons:
- Cross-platform builds for Node native modules become operationally expensive with C++ (Linux/macOS/Windows × x64/arm64 binaries, headers, link-time gotchas). napi-rs gives mature cross-platform binary distribution.
- License posture stays clean — pure Rust port from a published algorithm + permissive Rust deps (
memmap2,bytemuck,rayon,rand,thiserror). No patent grant ambiguity. - Full control over the on-disk format + napi bindings + future cor-specific optimizations.
The pure-Rust DiskANN crate (native/diskann/) compiles and tests independently of napi, so it's separately benchmarkable and fuzz-target-ready.
See also
- ADR-002 — the architectural decision record with full design rationale
- Scaling & Resource Management — multi-tenant density + the 1 B operational ceiling
- Performance benchmarks — measured numbers from the included benchmark suite
- Brainy vs Brainy + Cor comparison — side-by-side feature and speed comparison