Entity Extraction

AI-powered extraction of entities from text with automatic NounType classification and confidence scoring.

Overview

Brainy uses a 4-signal ensemble architecture to extract entities:

extract()

Extract entities from text with NounType classification.

extract.js
import { Brainy, NounType } from '@soulcraft/brainy'

const brain = new Brainy()
await brain.init()

const text = 'John Smith founded Acme Corp in New York in 2020.'
const entities = await brain.extract(text)

entities.forEach(e => {
  console.log(`${e.text}: ${e.type} (${(e.confidence * 100).toFixed(0)}%)`)
})

// Output:
// John Smith: Person (95%)
// Acme Corp: Organization (92%)
// New York: Location (88%)
// 2020: TimeInterval (85%)

Signature

brain.extract(text: string, options?: ExtractOptions): Promise<ExtractedEntity[]>

Options

Option Type Default Description
types NounType[] all Only extract these types
confidence number 0.5 Minimum confidence threshold (0-1)
includeVectors boolean false Include embedding vectors
neuralMatching boolean true Use neural type matching

Filter by Type

filter-types.js
// Only extract people and organizations
const entities = await brain.extract(text, {
  types: [NounType.Person, NounType.Organization]
})

// Only high-confidence extractions
const confident = await brain.extract(text, {
  confidence: 0.8  // 80% minimum
})

extractEntities()

Alias for extract() with identical functionality.

// Same as brain.extract()
const entities = await brain.extractEntities('John founded Acme Corp', {
  confidence: 0.7,
  types: [NounType.Person, NounType.Organization]
})

extractConcepts()

Simplified interface for concept/topic extraction. Returns string names only.

extract-concepts.js
const text = 'Using OAuth for authentication with JWT tokens'
const concepts = await brain.extractConcepts(text)

console.log(concepts)
// ['oauth', 'authentication', 'jwt', 'tokens']

// With options
const topConcepts = await brain.extractConcepts(text, {
  confidence: 0.8,
  limit: 3
})
// ['oauth', 'authentication', 'jwt']

Return Value

Each extracted entity contains:

{
  text: 'John Smith',        // Extracted text
  type: NounType.Person,      // Classified NounType
  confidence: 0.95,           // Classification confidence (0-1)
  start: 0,                   // Start position in text
  end: 10,                    // End position in text
  vector?: Float32Array       // Embedding (if includeVectors: true)
}

Use Cases

Auto-Tag Content

auto-tag.js
async function addWithAutoTags(content) {
  // Extract concepts for tagging
  const concepts = await brain.extractConcepts(content)

  // Extract entities for relationships
  const entities = await brain.extract(content)

  // Add content with extracted metadata
  return brain.add({
    data: content,
    type: NounType.Document,
    metadata: {
      tags: concepts,
      mentions: entities.map(e => e.text),
      entityTypes: [...new Set(entities.map(e => e.type))]
    }
  })
}

Build Knowledge Graph from Text

build-graph.js
async function buildGraphFromText(text) {
  const entities = await brain.extract(text, { confidence: 0.7 })

  // Add each entity to the graph
  const entityIds = await Promise.all(
    entities.map(e =>
      brain.add({
        data: e.text,
        type: e.type,
        metadata: { extractedFrom: text, confidence: e.confidence }
      })
    )
  )

  // Create relationships based on co-occurrence
  for (let i = 0; i < entityIds.length; i++) {
    for (let j = i + 1; j < entityIds.length; j++) {
      await brain.relate({
        from: entityIds[i],
        to: entityIds[j],
        type: VerbType.RelatedTo,
        metadata: { coOccurrence: true }
      })
    }
  }

  return entityIds
}

Named Entity Recognition Pipeline

ner-pipeline.js
async function processDocument(doc) {
  // Extract all entities
  const entities = await brain.extract(doc.content)

  // Group by type
  const grouped = {
    people: entities.filter(e => e.type === NounType.Person),
    orgs: entities.filter(e => e.type === NounType.Organization),
    locations: entities.filter(e => e.type === NounType.Location),
    concepts: entities.filter(e => e.type === NounType.Concept)
  }

  return {
    documentId: doc.id,
    entities: grouped,
    summary: {
      totalEntities: entities.length,
      avgConfidence: entities.reduce((a, e) => a + e.confidence, 0) / entities.length
    }
  }
}

See Also

Next: Streaming API →