Concept

Consistency Model

Brainy 8.0's consistency story rests on one mechanism: generational MVCC — multi-version concurrency control over immutable, generation-stamped records. It is exposed through a single value type, the Db: an immutable, point-in-time view of the whole store that you query like the live brain.

const db = brain.now()                 // pin the current state — O(1), no I/O

await brain.transact([
  { op: 'update', id: invoiceId, metadata: { status: 'paid' } }
])

await db.get(invoiceId)                // still 'pending' — pinned, forever
await brain.get(invoiceId)             // 'paid' — live
await db.release()                     // unpin when done

This page states the guarantees precisely — what is promised, what it costs, and where the honest limits are. The design record is ADR-001; every guarantee below is proven by a dedicated test in tests/integration/db-mvcc.test.ts.

The generation clock

A monotonic generation counter is the store's logical clock:

  • It advances once per committed transact() batch and once per single-operation write (add/update/remove/relate/…).
  • brain.generation() reads it; it is persisted in the data directory and never reissued — not across restarts, and not across restore() (the counter is floored at its pre-restore value).

Every Db is pinned at one generation. db.generation and db.timestamp identify the view; newerDb.since(olderDb) returns exactly the entity and relationship ids that committed transactions touched between two views.

Snapshot isolation for reads

Guarantee: a Db reads exactly the state at its pinned generation, no matter what commits afterwards — including deletes. There are no torn reads, no partially applied batches, and no drift over time.

  • brain.now() pins the current generation in O(1).
  • brain.transact() returns a Db pinned at the freshly committed generation.
  • brain.asOf(generation | Date | snapshotPath) pins past state.

While nothing has committed past the pin, reads delegate to the live fast paths — pinning is free until history actually moves. Once later transactions commit, the view keeps serving the full query surface at its generation (see "Reading the past" below).

Writers are never blocked by readers and readers never block writers: a pinned view stays valid because nothing overwrites the immutable records it resolves from (the LMDB reader-pin model).

Transaction atomicity

brain.transact(ops) executes a declarative batch — add, update, remove, relate, unrelateatomically as exactly one generation:

const db = await brain.transact([
  { op: 'add', id: orderId, type: NounType.Document, subtype: 'order', data: 'Order #1042' },
  { op: 'add', id: itemId, type: NounType.Thing, subtype: 'line-item', data: 'Widget x3' },
  { op: 'relate', from: orderId, to: itemId, type: VerbType.Contains, subtype: 'order-line' }
], { meta: { author: 'order-service', requestId: 'req-9f2' } })

db.receipt.ids   // resolved id per operation, in input order

Either every operation applies, or none do and the store is byte-identical to its pre-transaction state. Operation semantics mirror the corresponding single-operation methods — validation, subtype enforcement, relationship deduplication, delete cascades — and later operations may reference ids created earlier in the same batch.

The commit point is one atomic rename. The durability protocol:

  1. Before-images of every touched id are staged into an immutable generation directory and fsynced.
  2. The batch executes through the transaction manager (which has its own operation-level rollback for non-crash failures).
  3. The store manifest is replaced via atomic temp-file rename and fsynced. The rename is the commit — a generation is committed if and only if the manifest says so.

Crash recovery: on the next open, any staged generation above the manifest watermark is an uncommitted transaction; its before-images are restored (idempotently — recovery can itself crash and rerun) and derived indexes never observe the rolled-back state. A crash anywhere before the rename rolls back to the exact pre-transaction bytes; a crash after it keeps the transaction.

Transaction metadata (meta) is reified Datomic-style: recorded in an append-only transaction log readable via brain.transactionLog() — audit fields live in the database, not in commit messages.

Two levels of compare-and-swap

Concurrent transact() calls commit serially (snapshot-isolated batches). For lost-update protection across a read–modify–write cycle, Brainy offers CAS at two granularities:

Granularity Mechanism Conflict error Use when
Per entity _rev + { op: 'update', ifRev } (also on brain.update()) RevisionConflictError "This entity must not have changed since I read it."
Whole store transact(ops, { ifAtGeneration }) GenerationConflictError "Nothing may have committed since I read."
const view = brain.now()
const order = await view.get(orderId)

try {
  await brain.transact(
    [{ op: 'update', id: orderId, metadata: { total: recompute(order) }, ifRev: order._rev }],
    { ifAtGeneration: view.generation }
  )
} catch (err) {
  if (err instanceof GenerationConflictError) {
    // Something committed since the pin — re-read and retry.
  }
} finally {
  await view.release()
}

An ifRev conflict on any operation rejects the whole batch; an ifAtGeneration conflict is detected before anything is staged. Both leave the store untouched and the generation counter unchanged. See Optimistic concurrency with _rev for the per-entity pattern in depth.

Reading the past

brain.asOf() accepts a generation number, a Date (resolved through the transaction log to the newest generation committed at or before it), or a snapshot directory path. Historical views serve the full query surfaceget(), find() in every mode, semantic search, graph traversal, cursors, aggregation — through two complementary paths:

  • Record path (free): get(), metadata-level find(), and filter-based related() resolve directly through the immutable record layer. Ids untouched since the pin still ride the live fast paths.
  • Index path (paid once): index-accelerated queries — semantic/vector search, graph traversal, cursors, aggregation — are served by an at-generation index materialization built lazily on first use: Brainy reconstructs in-memory indexes over the exact record set at that generation. This costs O(n at the pinned generation) time and memory, once per Db, cached until release(). That is the open-core price of historical index queries, stated plainly.

A native index provider implementing the optional VersionedIndexProvider plugin capability serves the same historical reads from its retained index segments without any rebuild — the materializer is the correctness baseline, the provider is the accelerator. Semantics are identical on both paths.

History granularity — the honest limit

Generation records are written per transact() batch only. Single-operation writes (add/update/remove/relate/… outside transact()) advance the generation counter — so watermarks and CAS stay sound — but do not stage before-images: they remain visible through earlier pins and are not reported by db.since(). Code that needs pinned isolation across its own writes uses transact(). This is the documented 8.0 contract, not an accident.

Speculative writes: `db.with()`

db.with(ops) returns a new Db whose reads see the operations applied in memory, on top of the view — Datomic's with. Nothing touches disk, the generation counter, or index providers:

const current = brain.now()
const whatIf = await current.with([
  { op: 'update', id: employeeId, metadata: { team: 'platform' } }
])

await whatIf.find({ where: { team: 'platform' } })   // sees the change
await brain.get(employeeId)                          // unchanged — nothing committed

The one boundary: overlay entities carry no embeddings (with() never invokes the embedder), so index-accelerated queries and persist() on a speculative view throw SpeculativeOverlayError rather than returning silently incomplete results. get(), metadata-filter find(), and filter-based related() work fully on overlays. To get the full surface, commit the same operations with brain.transact().

Retention and compaction

Historical records cost disk space, so retention is explicit:

  • Every live Db holds a refcounted pin; a record-set is never reclaimed while any pin could need it — pinned reads stay correct across compaction, always.
  • brain.compactHistory({ retainGenerations?, retainMs? }) reclaims everything no retention rule and no pin protects, and records the horizonasOf() below it throws GenerationCompactedError, explicitly, never partial data.
  • To keep a state readable forever, persist() it first: snapshots are self-contained and unaffected by compaction of the source store.

Release Db values you do not keep (including the ones transact() returns). A FinalizationRegistry backstop releases leaked pins at garbage collection, but explicit release() is what makes compaction deterministic.

Durability: snapshots and restore

db.persist(path) cuts a self-contained snapshot under the store's commit mutex, so no commit or compaction can interleave. On filesystem storage it is built from hard links: because every data file is immutable-by-rename, linking is safe — the snapshot is created without copying entity data, shares disk space with the source, and later writes to the source can never alter it (rewrites swap inodes; the snapshot keeps the old bytes). Cross-device targets fall back to byte copies; in-memory stores serialize to the same directory layout, producing a real, durable store.

Two rules keep snapshots honest:

  • persist() requires the view to still be the store's latest generation (a snapshot captures current bytes); a view that history has moved past throws GenerationConflictError instead of persisting the wrong state.
  • brain.restore(path, { confirm: true }) replaces the store's entire state from a snapshot via byte copy (never links — the snapshot stays independent), rebuilds all indexes, and floors the generation counter so observed generation numbers are never reissued. Live pins do not survive a restore — release them first (a warning is logged when any exist).

Brainy.load(path) (or brain.asOf(path)) opens a snapshot as a self-contained read-only store with the full query surface, including vector search.

Reserved fields

Some field names belong to Brainy, not to your metadata. They live at top level on every entity and relationship, have dedicated write paths, and may never appear inside a metadata bag:

Entities (nouns) Relationships (verbs) Canonical write path
noun verb the type param of add() / relate()
subtype subtype the subtype param
visibility visibility the visibility param ('public' | 'internal')
confidence confidence the confidence param
weight weight the weight param
service service the service param (fixed at create time)
data data the data param
createdBy createdBy the createdBy param of add() (system-managed on verbs)
createdAt, updatedAt, _rev createdAt, updatedAt, _rev system-managed (ifRev for CAS)

The canonical machine-readable lists are exported as RESERVED_ENTITY_FIELDS and RESERVED_RELATION_FIELDS (defined in src/types/reservedFields.ts, the single source of truth). Three layers enforce the contract:

  1. Compile time — every metadata param (add, update, relate, updateRelation, and the matching transact() operations) rejects a literal reserved key as a TypeScript error.
  2. Write time — untyped (JavaScript) callers that pass one anyway are normalized: user-settable fields (confidence, weight, subtype, and service/createdBy at create time) are remapped to their dedicated param — top-level wins when both are supplied — and system-managed fields are dropped with a one-shot warning naming the correct write path. update({ metadata: { confidence: 0.9 } }) therefore behaves exactly like update({ confidence: 0.9 }).
  3. Read time — every read path (get, find, search, related, batch reads, and historical asOf() materialization) surfaces reserved fields only at top level: entity.metadata and relation.metadata contain only your custom fields, always.
const id = await brain.add({
  type: 'document', subtype: 'invoice',
  data: 'Invoice #42', confidence: 0.95,        // reserved → top-level params
  metadata: { customer: 'acme', total: 129.5 }  // custom fields only
})

const entity = await brain.get(id)
entity.confidence                                // 0.95 — top level
entity.metadata                                  // { customer: 'acme', total: 129.5 }

Visibility — `public` / `internal` / `system`

visibility is a reserved tier that controls whether an entity or relationship surfaces on Brainy's default user-facing reads. The absence of the field is exactly equivalent to 'public'.

Tier Counted in getNounCount() / stats()? Returned by default find() / related()? Opt-in
'public' (default, or field absent) yes yes
'internal' no no find({ includeInternal: true }) / related({ includeInternal: true })
'system' no no find({ includeSystem: true }) / related({ includeSystem: true })
  • 'public' — normal data. Counted and returned everywhere. Stored lean: the field is omitted on disk for public records, so existing data needs no migration.
  • 'internal' — your app's own bookkeeping (audit trails, derived caches, scratch entities) that should not pollute default queries, counts, or stats(), yet must stay retrievable on demand. Set it via the visibility param; read it back with the includeInternal opt-in.
  • 'system' — Brainy's own plumbing (for example the Virtual File System root entity). Hidden everywhere by default — even when includeInternal is set — and surfaced only with the explicit includeSystem opt-in. The 'system' tier is not part of the public add() / relate() param type ('public' | 'internal'); only internal Brainy code assigns it.

The opt-ins are applied as a hard candidate filter — hidden entities are removed before limit / offset are applied, so a default find({ limit: 10 }) always returns ten visible results when that many exist, never a short page.

// App-internal scratch entity: present, retrievable, but out of the way.
await brain.add({ type: 'task', data: 'reindex job', visibility: 'internal' })

await brain.getNounCount()                       // unchanged — internal not counted
await brain.find({ type: 'task' })               // [] — hidden by default
await brain.find({ type: 'task', includeInternal: true })   // includes it

// A brand-new brain reports zero user entities even though the VFS root exists:
const fresh = new Brainy()
await fresh.init()
await fresh.getNounCount()                        // 0 — the root is visibility:'system'

Note (8.0): the structural Contains edges the VFS creates between directories and files are left at the default (public) visibility for now — only the VFS root entity is 'system'. Marking those edges system requires companion changes to VFS traversal and is out of scope for this change.

What is not guaranteed

Stated plainly, so nothing surprises you in production:

  • Single-writer. Brainy is a single-writer, many-reader database (multi-process model). Transactions are atomic within one writer process — there is no distributed or cross-process transaction coordination.
  • History granularity. Only transact() batches produce historical records; single-operation writes between commits stay visible through earlier pins (see above).
  • Compacted history is gone. asOf() below the compaction horizon fails explicitly; persist what you must keep.
  • Counter persistence is coalesced for single-operation writes. Durable artifacts (records, manifests, snapshots) always persist the counter at their own commit points, so a crash inside the coalescing window can lose only counter values nothing durable ever referenced.
  • Speculative overlays are metadata-only readers. Index-accelerated queries on with() views throw rather than guess.

Where to go next