Scaling

One engine. Three tiers. Same files, same API. Run it on a laptop for development, a single commodity server for production, or across many servers for global scale — Brainy + Cor is the same code at every step.

There is no managed-service tier and no required coordinator. The engine is local Rust + local files, and "going bigger" means adding hardware, not adopting a different product. This page covers what fits at each tier and how to compose multiple instances when one machine isn't enough.

At a glance

Tier	Hardware	Capacity
Single laptop	16-64 GB RAM, 1-4 TB NVMe	1 billion+ entities
Single commodity server	Ryzen 9 / 64 GB DDR5 / 4 TB NVMe Gen 4	10 billion+ entities
Multiple machines	Independent servers, hash-sharded	Effectively unbounded

Moving up a tier is rsync plus restart at the single-machine side, and adding a routing layer at the horizontal side. The Brainy API doesn't change — your code looks the same whether you're running one instance or 256.

Single machine

The default unit of deployment is one Brainy instance (the data layer — JSON files on disk) plus one Cor accelerator (the native Rust engine that runs SIMD distance, mmap-backed indexes, and the DiskANN search graph). Together they handle workloads that would normally require a dedicated database cluster.

Everything stays local: data, vectors, metadata, query engine. No network hops at query time. No third party that can take your database down. Your only operational concern is the machine itself.

Multi-tenant density on one machine

A typical deployment runs 10–200 Brainy instances inside one Node.js process on a single VM. Each instance has its own data directory on SSD plus its own indexes, and Cor manages what's shared and what's isolated:

Component	Scope	Memory
Embedding model (all-MiniLM-L6-v2)	Shared singleton	88 MB, loaded once
SIMD distance calculations	Shared	Zero per-instance cost
UnifiedCache (eviction engine)	Shared singleton	Dynamic — adapts to instance count
HNSW vector index	Per-instance	~700 bytes per entity
Metadata inverted index	Per-instance	~300 bytes per entity
Entity ID mapper	Per-instance	~56 bytes per entity
Graph adjacency (LSM-trees)	Per-instance	Mmapped — kernel manages pages
Storage adapter	Per-instance	Minimal (file handles)

Adaptive resource manager

Cor observes system resources and adjusts memory budgets in real time. No configuration required — it detects VM size, container limits (cgroups v1/v2), and active instance count automatically.

What it observes: total and free system memory, container memory limits, process RSS and heap usage, number of active Brainy instances, per-instance entity counts.

What it decides: cache budget (shrinks dynamically as more instances are loaded), memory pressure level (normal / elevated / critical), eviction candidates (weighted by idle time × memory usage, so large idle instances are evicted before small active ones).

Memory-mapped storage

Cor's MmapFileSystemStorage extends Brainy's filesystem storage with zero-copy binary I/O. The Rust native layer memory-maps data files via memmap2, letting the Linux kernel manage which pages stay in RAM.

What mmap provides:

Vector index: the entire HNSW graph (vectors + connections) is stored as a single binary .hnsw file, memory-mapped for zero-copy reads. Search traverses the graph directly on mmap'd pages — no JSON parsing, no gzip decompression, no heap allocation. The kernel manages which pages stay in RAM.
Graph adjacency SSTables: four LSM-trees for relationship data are memory-mapped. Read-only pages are automatically paged out under memory pressure.
Dual-mode search: after flush(), search reads vectors from mmap pages (zero-copy). During active mutations, the existing in-memory engine handles search. Maximum density between mutations, mutation speed preserved.

On a 16 GB VM with a 200 GB SSD, you can effectively manage 200 GB of brainy data with only the hot working set in RAM. The SSD acts as an extension of memory, managed by the Linux kernel with no application-level complexity. An idle tenant's data is automatically paged out by the kernel — no explicit eviction needed.

Instant suspend & resume

Because each Brainy instance has its own isolated data directory, Cor can evict instances from memory and reload them later without data loss:

Eviction: brain.close() flushes any pending writes to SSD, then frees all in-memory structures.
Data persists: the data directory on SSD remains intact.
Reload: next request for the same workspace/tenant re-initializes from SSD — binary mmap load is a single syscall, typically under 100 ms for a 500-entity tenant (PROJECTED — design target; reload-time benchmark ships in docs/verification-report.md).

This is fundamentally different from traditional databases that need replication or WAL replay. Each Brainy instance is a single-writer local store — suspend/resume is an munmap/mmap away.

Capacity planning

Per-instance memory

PROJECTED — design targets, not measured. Extrapolated from per-component memory analysis (HNSW graph + metadata inverted index + entity ID mapper sizings under their target capacity). Measured per-instance memory at each tier ships in docs/verification-report.md as part of Piece 9 of the cor 3.0 release.

Entity count	Estimated memory	Typical use case
100	~4 MB	Light workspace or new tenant
500	~4.5 MB	Tenant after onboarding (time slots, customers, bookings)
2,000	~6 MB	Medium workspace with documents and notes
5,000	~9 MB	Large workspace with extensive content
10,000	~14 MB	Power user with months of accumulated data
50,000	~54 MB	Heavy workspace (years of data, many entity types)

VM sizing

Fixed overhead per process: ~240 MB (88 MB embedding model + Node.js baseline).

PROJECTED — design targets, not measured. Extrapolated from the per-instance memory model above plus a fixed per-process overhead estimate. Measured tenant capacity at each VM tier ships in docs/verification-report.md.

Per-tenant workloads (~500 entities average):

VM	RAM	Active tenants	Cache	Notes
e2-medium	4 GB	30–40	500 MB	Minimum viable — tight under load
e2-standard-4	8 GB	80–100	1.5 GB	Recommended — comfortable headroom
e2-standard-8	16 GB	200+	4 GB	High-traffic — handles spikes easily

Per-user workloads (mixed 100–50K entities):

VM	RAM	Active (small)	Active (large)	Cache	Notes
e2-standard-4	8 GB	200+	20–30	1.5 GB	Good for early-stage
e2-standard-8	16 GB	500+	60–80	4 GB	Balanced density and speed
n2-standard-8	32 GB	1,000+	150–200	8 GB	High density for growth

Zero-config adaptation

Cor adapts automatically when the VM is resized:

Upgrade from 8 GB to 16 GB: cache budget doubles, more instances stay resident, fewer cold starts.
Downgrade or container limit: cache shrinks, instances are evicted more aggressively, data stays on SSD.
No config changes required — Cor reads system memory and cgroup limits at startup.

A VM can simultaneously run heterogeneous workloads: 100 small tenants (500 entities each, ~4.5 MB) totaling 450 MB plus two large workspaces (50K entities each, ~54 MB) totaling 108 MB, with dynamic cache filling the remainder. The resource manager balances between them — when memory pressure rises, the large idle workspace is evicted first (weighted by idle time × memory usage), not the small active tenant accessed two seconds ago.

Billion-scale single-machine search (DiskANN)

For single-instance workloads beyond ~10 million vectors, HNSW's memory cost compounds quickly: at 1 B vectors × 384 dimensions, the float32 vectors alone are ~1.5 TB of RAM, and the HNSW graph metadata adds another ~2 TB. No commodity machine has that.

Cor ships a 100% pure-Rust DiskANN engine (ADR-002) that targets ~5 ms search latency at billion scale with ~20 GB RAM (PROJECTED — design target; awaiting verification-report.md). The architecture is the Vamana α-pruned graph (Subramanya et al., NeurIPS 2019) plus Product Quantization, and the on-disk file is a single mmap-mappable contiguous layout. None of this requires a separate service or external dependency — it's the same @soulcraft/cor plugin.

PROJECTED — design targets, not measured. Extrapolated from algorithm math (Vamana traversal cost + PQ ADC table-lookup cost + per-vector storage). The largest DiskANN dataset exercised in CI today is 10k synthetic dim=64 random vectors. Measured RAM + latency at 100M and 1B on real embedding corpora on cgroup-limited 32 GB hardware ship in docs/verification-report.md as part of Piece 9 of the cor 3.0 release.

Vectors	RAM with DiskANN	RAM with HNSW	Search latency (warm cache)
1 M	0.5–2 GB	0.5–2 GB	<1 ms
10 M	1–5 GB	8–20 GB	1–3 ms
100 M	5–20 GB	80–200 GB (impractical)	2–5 ms
1 B	20–70 GB	1.5+ TB (single-machine impossible)	5–10 ms
10 B	200–700 GB total (multi-shard on one box)	—	5–15 ms

A single DiskANN shard tops out near u32::MAX slots (~4 B vectors) by construction — for 10 B+ on one machine the data is split across multiple DiskANN shards on the same box. Same engine, same files, same API; the shard boundary is internal to Cor.

These numbers are search latency for the index itself. End-to-end query latency includes filesystem hydration of the returned entities — see the roadmap section below for the remaining full-stack work.

DiskANN engages automatically when conditions are met (registered provider, local filesystem path available, stable EntityIdMapper) or explicitly via config.index.type: 'diskann'. All Brainy APIs — add, search, relate, searchSimilarVerbs, find — work unchanged. DiskANN is an HNSW-shaped drop-in, with a reversible migration:

import { BrainyData } from '@soulcraft/brainy'
import { register as registerCor } from '@soulcraft/cor'

const brain = new BrainyData({
  storage: { type: 'filesystem', rootDirectory: '/data/idx' }
})
await registerCor(brain)
await brain.init()
// → [brainy] DiskANN engaged (path=/data/idx/_diskann/main.bin, dim=384)

const hits = await brain.search(queryVector, 10)

Migration is reversible — production rollbacks are always available:

await brain.migrateToDiskAnn({
  recallTarget: 0.95,    // require ≥95% recall vs old index before swapping
  paddingFactor: 1.2,
  verifySampleSize: 100
})

// And back if needed:
await brain.migrateToHnsw()

Multiple machines (horizontal scaling)

Once a workload exceeds what fits on one box — either entity count past ~10 billion or query traffic past one machine's CPU — Brainy + Cor scale horizontally by running multiple independent instances. There is no built-in coordinator: each instance is self-contained, and the patterns below describe how to compose them.

All five patterns work today without external dependencies. The Datomic-style multi-region merge (#4) becomes cleanest with the immutable Db API shipping in Brainy 8.0 / Cor 3.0.

Roadmap — managed distributed cluster (Cor 3.1, fast-follow). The 3.0 release is deliberately a single-node engine that scales from a laptop to the most powerful single server, plus the application-level horizontal patterns below (you wire the shard routing). A built-in distributed cluster — a coordinator that handles shard placement, cross-node consistency, rebalancing, and a single logical query surface across nodes — is the headline of the 3.1 fast-follow, immediately after the 3.0 GA. Until then, "the most advanced cluster of servers" is served by (a) running the same binary on the largest single instance your cloud offers and (b) the self-managed sharding patterns in this section. 3.0's single-node numbers (verification-report.md) are the per-node performance the 3.1 cluster composes.

1. Shard by entity

Hash the actor or tenant id and route to one of N brain instances:

shard = hash(actorId) mod N

Every node computes the same shard map — no coordinator state. Each user's entities, edges, and history live on one shard. Cross-shard reads fan out in parallel; each individual shard returns sub-millisecond.

Trade-off: Hot accounts (high-follower users in a social workload) create a hot shard. Combine with the edge-cache pattern (#5) when that shows up.

When to use: Multi-tenant SaaS, social workloads, anything with a natural partition key.

2. Read replicas via filesystem sync

Brainy persists everything as JSON files in a directory. Replicate the directory with rsync, btrfs send, or ZFS snapshots — POSIX primitives, no application-level replication code. Writes hit the primary; reads scale linearly across replicas.

Trade-off: Eventually consistent (sync lag = seconds, sometimes minutes depending on tooling). Fine for read-heavy workloads where slightly-stale is acceptable.

When to use: Read-heavy serving, regional read caches, analytics followers that don't accept writes.

3. Functional partitioning

Give each domain its own brain instance, tuned to its access pattern:

Brain	Workload	Cor mode
Users / profiles	Small, hot	In-memory — sub-millisecond reads
Posts / content	Large, content-heavy	Hybrid — PQ + mmap
Social graph	Billion edges, append-heavy	On-disk + LSM verb storage
Media metadata	Small, mostly-immutable	In-memory

Cor's adaptive mode selection picks the right mode per brain automatically based on observed memory. Cross-domain queries do N short hops at the application layer instead of complex joins inside the database.

Trade-off: Application-level composition replaces SQL joins. Subtype indexes on the joining fields keep cross-domain lookups O(1).

When to use: Workloads with clean domain boundaries (the AT Protocol's actor records / repo records / blob storage split is a natural example).

4. Multi-region with eventual merge

Brainy 8.0 / Cor 3.0 ship a Datomic-style immutable Db API: every write produces a new Db value via brain.transact(tx), and one Db can be folded into another via Db.with(tx). Each region writes to its local brain in real time. Regions exchange tx blobs over plain HTTP — no central coordinator, no global lock.

Trade-off: Eventually consistent across regions. Conflicts resolve with last-writer-wins per entity, or naturally via union-merge on graph edges (follow lists are CRDTs — adding two different follows from two regions produces the same set either way).

When to use: Geographic distribution where intra-region latency matters more than global consistency. Social, content delivery, multi-region SaaS.

5. Edge caches over a hot subset

Each edge node runs a small Brainy + Cor with a curated hot subset — recent posts, popular accounts, viral content, whatever your access pattern makes hot. Cache misses fall back to the primary brains over the same Brainy API used everywhere else.

Trade-off: Eviction policy matters. Cor's UnifiedCache handles in-memory hot data; a thin persisted brain at the edge handles "recently popular" with controlled size.

When to use: Skewed access (10% of accounts get 90% of reads), geographic latency requirements, public read APIs at scale.

Massive scale — composing patterns

For workloads at the limits — billion-entity multi-tenant SaaS, global social networks, multi-region compliance — these patterns compose:

Tier	Recommended composition
10 B+ entities on one region	#1 shard + #2 replicas + #3 functional partitioning
10 B+ entities, hot-account pain	Add #5 edge caches
Multi-region	Add #4 Datomic merge between regional clusters
100 B+ entities global	All five composed: regional clusters of sharded primaries with edge caches, merged across regions

A concrete example for a social network at 30 million users and a few billion posts: 64 sharded primaries (each a $1000 commodity box at ~470 K users per shard) × 4 read replicas per shard = 256 boxes serving the regional workload. Hot accounts get an edge cache layer in front; geographic redundancy adds a second region merging via #4. Total bill of materials is hardware plus operations — no database vendor invoice.

What stays the same across all tiers

	Laptop	Single server	Multiple machines
API	Same Brainy methods	Same	Same — call the right instance
Data files	JSON on local disk	JSON on local disk	JSON on each instance
Cor install	`npm install @soulcraft/cor`	Same	Same on every instance
Upgrade story	`npm update` and restart	Same	Same, one instance at a time
Operational dependencies	None	None	None — no coordinator, no broker, no managed service

Moving up the scaling ladder doesn't change your code. You add hardware, configure routing, and the same engine runs at every step.

Path to 10 billion end-to-end

DiskANN solves the vector-search bottleneck. The supporting subsystems that used to cap end-to-end operation at ~1 B have been widened to u64 throughout in the Cor 3.0 release — what remains is the verification report and a handful of bookkeeping items.

Subsystem	Cortex 2.x ceiling	Cor 3.0 status
Entity ID space	~4.3 B (u32)	✅ u64 throughout (Piece 10)
EntityIdMapper persistence	~500 M entries (JSON I/O)	✅ Native binary mmap'd `uuid↔int` map (Piece 1)
Metadata sparse-field index	~100 M entities	✅ Native Rust LSM column store + Roaring64 widening (Piece E)
Verb-graph LSM trees	~500 M edges	✅ u64-keyed LSM + pair-value `verbs_source` for native BFS (Piece D)
Verb-id namespace	~4.3 B verbs (u32)	✅ u64 widened with IdSpace tagging (Piece B)
Verb endpoints store	In-memory HashMap	✅ Packed-array mmap with adaptive sizing (Piece C)
FileSystemStorage sharding	~2.5 M entities per directory	Configurable shard depth (in progress)
Search-result hydration	~10 K results/query	Batch shard-grouped reads (planned)
End-to-end verification	—	`docs/verification-report.md` ships with the Cor 3.0 GA tag

All fixes ship inside Cor — no external databases, no competing engines. The 5–15 ms search latency target at 10 B vectors (PROJECTED — awaiting verification-report.md) holds; the remaining work brings filesystem hydration and shard layout up to match.

What we don't ship (deliberately)

Brainy + Cor deliberately does not ship:

A built-in coordinator for transparent multi-machine clustering.
A managed-service tier or SaaS deployment.
A query language that hides machine boundaries.

The architecture is single-box-per-instance, by design. At billion-per-tenant scale, a single commodity machine delivers the workload, and the cost of operating a distributed coordinator outweighs the benefit. Horizontal scaling means running more instances — same engine, no shared state.

If a future workload genuinely requires N-node coherence inside one logical database, we'll revisit. We will not ship that complexity speculatively.