Entity Extraction
AI-powered extraction of entities from text with automatic NounType classification and confidence scoring.
Overview
Brainy uses a 4-signal ensemble architecture to extract entities:
- Exact Match (40%) - Dictionary lookups for known entities
- Embedding (35%) - Semantic similarity with type embeddings
- Pattern (20%) - Regex patterns for dates, emails, etc.
- Context (5%) - Contextual hints from surrounding text
extract()
Extract entities from text with NounType classification.
extract.js
import { Brainy, NounType } from '@soulcraft/brainy'
const brain = new Brainy()
await brain.init()
const text = 'John Smith founded Acme Corp in New York in 2020.'
const entities = await brain.extract(text)
entities.forEach(e => {
console.log(`${e.text}: ${e.type} (${(e.confidence * 100).toFixed(0)}%)`)
})
// Output:
// John Smith: Person (95%)
// Acme Corp: Organization (92%)
// New York: Location (88%)
// 2020: TimeInterval (85%)
Signature
brain.extract(text: string, options?: ExtractOptions): Promise<ExtractedEntity[]>
Options
| Option | Type | Default | Description |
|---|---|---|---|
types |
NounType[] |
all | Only extract these types |
confidence |
number |
0.5 | Minimum confidence threshold (0-1) |
includeVectors |
boolean |
false | Include embedding vectors |
neuralMatching |
boolean |
true | Use neural type matching |
Filter by Type
filter-types.js
// Only extract people and organizations
const entities = await brain.extract(text, {
types: [NounType.Person, NounType.Organization]
})
// Only high-confidence extractions
const confident = await brain.extract(text, {
confidence: 0.8 // 80% minimum
})
extractEntities()
Alias for extract() with identical functionality.
// Same as brain.extract()
const entities = await brain.extractEntities('John founded Acme Corp', {
confidence: 0.7,
types: [NounType.Person, NounType.Organization]
})
extractConcepts()
Simplified interface for concept/topic extraction. Returns string names only.
extract-concepts.js
const text = 'Using OAuth for authentication with JWT tokens'
const concepts = await brain.extractConcepts(text)
console.log(concepts)
// ['oauth', 'authentication', 'jwt', 'tokens']
// With options
const topConcepts = await brain.extractConcepts(text, {
confidence: 0.8,
limit: 3
})
// ['oauth', 'authentication', 'jwt']
Return Value
Each extracted entity contains:
{
text: 'John Smith', // Extracted text
type: NounType.Person, // Classified NounType
confidence: 0.95, // Classification confidence (0-1)
start: 0, // Start position in text
end: 10, // End position in text
vector?: Float32Array // Embedding (if includeVectors: true)
}
Use Cases
Auto-Tag Content
auto-tag.js
async function addWithAutoTags(content) {
// Extract concepts for tagging
const concepts = await brain.extractConcepts(content)
// Extract entities for relationships
const entities = await brain.extract(content)
// Add content with extracted metadata
return brain.add({
data: content,
type: NounType.Document,
metadata: {
tags: concepts,
mentions: entities.map(e => e.text),
entityTypes: [...new Set(entities.map(e => e.type))]
}
})
}
Build Knowledge Graph from Text
build-graph.js
async function buildGraphFromText(text) {
const entities = await brain.extract(text, { confidence: 0.7 })
// Add each entity to the graph
const entityIds = await Promise.all(
entities.map(e =>
brain.add({
data: e.text,
type: e.type,
metadata: { extractedFrom: text, confidence: e.confidence }
})
)
)
// Create relationships based on co-occurrence
for (let i = 0; i < entityIds.length; i++) {
for (let j = i + 1; j < entityIds.length; j++) {
await brain.relate({
from: entityIds[i],
to: entityIds[j],
type: VerbType.RelatedTo,
metadata: { coOccurrence: true }
})
}
}
return entityIds
}
Named Entity Recognition Pipeline
ner-pipeline.js
async function processDocument(doc) {
// Extract all entities
const entities = await brain.extract(doc.content)
// Group by type
const grouped = {
people: entities.filter(e => e.type === NounType.Person),
orgs: entities.filter(e => e.type === NounType.Organization),
locations: entities.filter(e => e.type === NounType.Location),
concepts: entities.filter(e => e.type === NounType.Concept)
}
return {
documentId: doc.id,
entities: grouped,
summary: {
totalEntities: entities.length,
avgConfidence: entities.reduce((a, e) => a + e.confidence, 0) / entities.length
}
}
}
See Also
- Import API - Automatic extraction during import
- Noun Types - All 42 entity types
- Neural API - Advanced AI analysis