Entity Extraction

AI-powered extraction of entities from text with automatic NounType classification and confidence scoring.

Overview

Brainy uses a 4-signal ensemble architecture to extract entities:

Exact Match (40%) - Dictionary lookups for known entities
Embedding (35%) - Semantic similarity with type embeddings
Pattern (20%) - Regex patterns for dates, emails, etc.
Context (5%) - Contextual hints from surrounding text

extract()

Extract entities from text with NounType classification.

extract.js

          import { Brainy, NounType } from '@soulcraft/brainy'

const brain = new Brainy()
await brain.init()

const text = 'John Smith founded Acme Corp in New York in 2020.'
const entities = await brain.extract(text)

entities.forEach(e => {
  console.log(`${e.text}: ${e.type} (${(e.confidence * 100).toFixed(0)}%)`)
})

// Output:
// John Smith: Person (95%)
// Acme Corp: Organization (92%)
// New York: Location (88%)
// 2020: TimeInterval (85%)
        

Signature

          brain.extract(text: string, options?: ExtractOptions): Promise<ExtractedEntity[]>
        

Options

Option	Type	Default	Description
`types`	`NounType[]`	all	Only extract these types
`confidence`	`number`	0.5	Minimum confidence threshold (0-1)
`includeVectors`	`boolean`	false	Include embedding vectors
`neuralMatching`	`boolean`	true	Use neural type matching

Filter by Type

filter-types.js

          // Only extract people and organizations
const entities = await brain.extract(text, {
  types: [NounType.Person, NounType.Organization]
})

// Only high-confidence extractions
const confident = await brain.extract(text, {
  confidence: 0.8  // 80% minimum
})
        

extractEntities()

Alias for extract() with identical functionality.

          // Same as brain.extract()
const entities = await brain.extractEntities('John founded Acme Corp', {
  confidence: 0.7,
  types: [NounType.Person, NounType.Organization]
})
        

extractConcepts()

Simplified interface for concept/topic extraction. Returns string names only.

extract-concepts.js

          const text = 'Using OAuth for authentication with JWT tokens'
const concepts = await brain.extractConcepts(text)

console.log(concepts)
// ['oauth', 'authentication', 'jwt', 'tokens']

// With options
const topConcepts = await brain.extractConcepts(text, {
  confidence: 0.8,
  limit: 3
})
// ['oauth', 'authentication', 'jwt']
        

Return Value

Each extracted entity contains:

          {
  text: 'John Smith',        // Extracted text
  type: NounType.Person,      // Classified NounType
  confidence: 0.95,           // Classification confidence (0-1)
  start: 0,                   // Start position in text
  end: 10,                    // End position in text
  vector?: Float32Array       // Embedding (if includeVectors: true)
}
        

Use Cases

Auto-Tag Content

auto-tag.js

          async function addWithAutoTags(content) {
  // Extract concepts for tagging
  const concepts = await brain.extractConcepts(content)

  // Extract entities for relationships
  const entities = await brain.extract(content)

  // Add content with extracted metadata
  return brain.add({
    data: content,
    type: NounType.Document,
    metadata: {
      tags: concepts,
      mentions: entities.map(e => e.text),
      entityTypes: [...new Set(entities.map(e => e.type))]
    }
  })
}
        

Build Knowledge Graph from Text

build-graph.js

          async function buildGraphFromText(text) {
  const entities = await brain.extract(text, { confidence: 0.7 })

  // Add each entity to the graph
  const entityIds = await Promise.all(
    entities.map(e =>
      brain.add({
        data: e.text,
        type: e.type,
        metadata: { extractedFrom: text, confidence: e.confidence }
      })
    )
  )

  // Create relationships based on co-occurrence
  for (let i = 0; i < entityIds.length; i++) {
    for (let j = i + 1; j < entityIds.length; j++) {
      await brain.relate({
        from: entityIds[i],
        to: entityIds[j],
        type: VerbType.RelatedTo,
        metadata: { coOccurrence: true }
      })
    }
  }

  return entityIds
}
        

Named Entity Recognition Pipeline

ner-pipeline.js

          async function processDocument(doc) {
  // Extract all entities
  const entities = await brain.extract(doc.content)

  // Group by type
  const grouped = {
    people: entities.filter(e => e.type === NounType.Person),
    orgs: entities.filter(e => e.type === NounType.Organization),
    locations: entities.filter(e => e.type === NounType.Location),
    concepts: entities.filter(e => e.type === NounType.Concept)
  }

  return {
    documentId: doc.id,
    entities: grouped,
    summary: {
      totalEntities: entities.length,
      avgConfidence: entities.reduce((a, e) => a + e.confidence, 0) / entities.length
    }
  }
}