U64 Entity IDs

Cortex maps every entity UUID to a compact integer id internally. Two parts of the stack key off that integer — the roaring bitmaps inside the metadata index, and the mmap-backed BinaryEntityIdMapper that persists the UUID ↔ int mapping at billion scale. Both of those layers historically used 32-bit ints (Roaring32, u32), which caps a single brain at 4 294 967 295 entities — about 4.29 B.

That ceiling is fine for almost every workload. It is not fine for the corpora cortex 3.0 targets: the design point of the release is "1 B vectors on a single $1000 box", and the long tail goes well past 4 B. So cortex 3.0 adds a second IdSpace — 'u64' — that lifts the ceiling to 2⁶⁴ - 1 (≈ 1.84 × 10¹⁹) while keeping every public API the same.

Picking an IdSpace

The decision is one config flag, set at brain create time:

import { BrainyData } from '@soulcraft/brainy'
import { register as registerCortex } from '@soulcraft/cortex'

const brain = new BrainyData({
  storage: { type: 'filesystem', rootDirectory: '/data/idx' },
  // Pick once. Default is 'u32'.
  entityIdMapper: { idSpace: 'u64' },
})
await registerCortex(brain)
await brain.init()

Use U32 (default) when:

You expect ≤ 4.29 B entities for the lifetime of the brain.
You want byte-identical wire format with cortex 2.x — existing .cidx segments and chunked metadata JSON envelopes are unchanged.
You're migrating an existing brain. U32 is what cortex 2.x wrote.

Use U64 when:

The brain may exceed 4.29 B entities.
You're starting fresh and want headroom — the U64 wire format has ~negligible overhead at small scale and is forward-compatible with every cortex 3.x release.
You want to use the BigInt napi sibling methods directly (e.g. when a downstream system already speaks bigint and you want to skip the safe-integer guard).

The two modes are not interchangeable on the same files. You cannot escalate a U32 brain to U64 in place — the on-disk header is authoritative and any mismatched config is rejected as a hard error at open time. Migrating an existing U32 brain to U64 is a re-export

re-ingest into a new brain.

What changes on the wire

int_to_uuid.bin (binary mapper): v1 header for U32, v2 header for U64. The header carries the IdSpace; the file is self- describing.
.cidx column-store segments: flags byte's FLAG_U64_IDS bit (0x01) is set; the entity-id column is u64 little-endian. The cross-language fixture in src/native/cidxU64WireFormatFixture.test.ts locks the byte layout via SHA-256.
Metadata chunk JSON envelope: U64 chunks get a v2 envelope with "version": 2 and "idSpace": "u64" fields. U32 chunks emit the v1-compatible envelope (no extra fields) for backwards compatibility.
PostingList (in-memory representation): the U32 brain uses croaring::Bitmap (Roaring32); the U64 brain uses croaring::Treemap (Roaring64). Both expose the same algebra (union_assign, intersect_assign, difference_assign, iter_u64, cardinality), so every query path is variant-agnostic.

What changes in the API

Nothing, for the common case. Every wrapper method that returned a number continues to return a number. Two refinements:

getAllIntIds() throws in U64 mode. A materialised number[] at billion scale would risk OOM, and U64 brains exist precisely because the entity count is large. Use the streaming BigInt iterator on the underlying native binding.
EntityIdSpaceExceeded is thrown when a U64 brain's number- typed method (e.g. getOrAssign(uuid): number) would return an int above Number.MAX_SAFE_INTEGER (2⁵³ - 1 ≈ 9.007 PB of entities). At that point you must switch to the BigInt sibling methods for the full u64 range:
```
const big: bigint = mapper.getOrAssignBig(uuid)
const uuidBack: string | undefined = mapper.getUuidBig(big)
```
The BigInt siblings work in both modes. Use them in any code path that crosses the U32 → U64 escalation point so it doesn't need to branch on getIdSpace().

BigInt sibling surface (cortex 3.0)

Number-typed	BigInt sibling	Returns
`getOrAssign(uuid: string): number`	`getOrAssignBig(uuid: string): bigint`	The allocated int
`getInt(uuid: string): number \| undefined`	`getIntBig(uuid: string): bigint \| undefined`	The int for `uuid`, or undefined
`getUuid(int: number): string \| undefined`	`getUuidBig(int: bigint): string \| undefined`	The UUID for `int`
`get size(): number`	`sizeBig(): bigint`	Live (non-tombstone) entry count
—	`nextIntBig(): bigint`	Largest int ever assigned + 1

The BigInt methods are mode-independent: they work in U32 brains too, and return the same values widened to bigint. Mixing getOrAssign(uuid) and getOrAssignBig(uuid) for the same UUID returns the same underlying int.

Brainy 8.0 lockstep — `EntityIdSpaceExceeded` (JS fallback)

Brainy 8.0 ships the JS-fallback half of the contract. Without cortex installed, brainy uses an in-RAM EntityIdMapper that also caps at u32::MAX — but the JS fallback can't widen to u64 (the bitmap layer is Roaring32 end-to-end on the JS side). So brainy 8.0's mapper throws EntityIdSpaceExceeded (in @soulcraft/brainy/internals) at the ceiling and points the caller at cortex's idSpace: 'u64' mode as the migration path.

The two errors are siblings at different layers:

brainy's EntityIdSpaceExceeded (U32_ENTITY_ID_MAX = 2³² − 1): the JS-only path hit the bitmap-width ceiling — install cortex.
cortex's EntityIdSpaceExceeded (Number.MAX_SAFE_INTEGER = 2⁵³ − 1): a U64 brain's number-typed method overflowed JS's safe-integer range — switch to BigInt siblings.

Wire-format parity gates

Two hash-locked fixtures lock the format across reader implementations:

.cidx U64 segments — src/native/cidxU64WireFormatFixture.test.ts. Three SHA-256 hashes (numeric / float / string segments) lock the byte layout. Brainy 8.0's reader targets the same fixtures.
PostingList cross-language — round-tripped via the napi parityWriteU64NumericSegment / parityReadSegmentEntityIdsBig pair. Same write path produces same hash across 100 stress iterations (see src/native/cortex30Stress.test.ts).

If a hash drifts, do not silently update the constant — root- cause the byte diff first. A non-deterministic writer or a half-applied format change is exactly the kind of bug those fixtures exist to catch.