Skip to content

Core Concepts

Understanding the three pillars of Hyper-Extract: Templates, Auto-Types, and Methods.


Architecture Overview

graph TD
    A[Your Text] --> B{Choose Approach}
    B -->|Quick Start| C[Template]
    B -->|More Control| D[Method]
    C --> E[Auto-Type]
    D --> E
    E --> F[Structured Knowledge]

Templates

Templates are pre-configured extraction setups for specific domains and tasks.

What is a Template?

A template combines: - Auto-Type — Output data structure - Prompts — LLM instructions - Schema — Field definitions - Guidelines — Extraction rules

Using Templates

from hyperextract import Template

# Create from preset
ka = Template.create("general/biography_graph", language="en")

# Create from file
ka = Template.create("/path/to/custom.yaml", language="en")

# List available
templates = Template.list()
for path, cfg in templates.items():
    print(f"{path}: {cfg.description}")

When to Use Templates

Use templates when: - You want quick results - Working with common document types - Need domain-specific extraction - Don't want to configure prompts

Don't use templates when: - You need full control over extraction - Implementing custom algorithms - Research/experimentation


Auto-Types

Auto-Types define the output data structure for extracted knowledge.

The 8 Auto-Types

Type Description Best For
AutoModel Single structured object Summaries, reports
AutoList Ordered collection Sequences, ranked items
AutoSet Deduplicated collection Unique items, tags
AutoGraph Entity-Relation graph Knowledge graphs
AutoHypergraph Multi-entity edges Complex relationships
AutoTemporalGraph Graph + time Timelines, histories
AutoSpatialGraph Graph + space Maps, locations
AutoSpatioTemporalGraph Graph + time + space Full context

Working with Auto-Types

from hyperextract import Template

# Graph extraction
ka = Template.create("general/graph", "en")
result = ka.parse(text)

# Access graph data
for node in result.nodes:
    print(f"Node: {node.name}")

for edge in result.edges:
    print(f"{edge.source} --{edge.type}--> {edge.target}")

Choosing an Auto-Type

graph TD
    A[What do you want to extract?] --> B{Structured data?}
    B -->|Yes| C[AutoModel]
    B -->|No| D{Relationships?}
    D -->|No| E{Ordered?}
    E -->|Yes| F[AutoList]
    E -->|No| G[AutoSet]
    D -->|Yes| H{Multi-entity edges?}
    H -->|Yes| I[AutoHypergraph]
    H -->|No| J{Time data?}
    J -->|Yes| K{Space data?}
    K -->|Yes| L[AutoSpatioTemporalGraph]
    K -->|No| M[AutoTemporalGraph]
    J -->|No| N{Space data?}
    N -->|Yes| O[AutoSpatialGraph]
    N -->|No| P[AutoGraph]

Methods

Methods are the underlying extraction algorithms.

Types of Methods

RAG-Based Methods

Use retrieval-augmented generation for large documents:

Method Description
graph_rag Community-based retrieval
light_rag Lightweight graph RAG
hyper_rag Hypergraph RAG
hypergraph_rag Advanced hypergraph
cog_rag Cognitive RAG

Typical Methods

Direct extraction approaches:

Method Description
itext2kg Triple-based extraction
itext2kg_star Enhanced iText2KG
kg_gen Knowledge graph generator
atom Temporal with evidence

Using Methods

from hyperextract import Template

# Create from method
ka = Template.create("method/light_rag")

# Or via path
ka = Template.create("method/graph_rag")

Template vs Method

Aspect Template Method
Ease of use ⭐⭐⭐ ⭐⭐
Flexibility ⭐⭐ ⭐⭐⭐
Domain fit ⭐⭐⭐ ⭐⭐
Configuration Minimal More
Language Multi English

The Extraction Pipeline

1. Text Input

text = "Your document content here..."

2. Chunking (if needed)

Long documents are automatically split: - Default chunk size: 2048 characters - Overlap: 256 characters - Parallel processing with multiple workers

3. LLM Extraction

Each chunk is processed by the LLM: - Structured output using Pydantic schemas - Concurrent processing for speed - Merge results from all chunks

4. Result

result = ka.parse(text)

# result contains:
# - result.data: Extracted knowledge
# - result.metadata: Extraction info
# - Methods for search, chat, save

Putting It All Together

from hyperextract import Template

# 1. Choose template
ka = Template.create("general/biography_graph", "en")

# 2. Extract
result = ka.parse(text)

# 3. Build index for search/chat capabilities
result.build_index()

# 4. Use
result.show()

Interactive Visualization

Method Approach (Advanced)

from hyperextract import Template

# 1. Choose method
ka = Template.create("method/light_rag")

# 2. Extract
result = ka.parse(text)

# 3. Use
result.build_index()
results = result.search("query")

Direct Auto-Type Usage (Advanced)

For full control over extraction, you can instantiate Auto-Types directly. This requires defining your own schemas and extraction logic.

from hyperextract import AutoGraph
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from pydantic import BaseModel

# Define schemas
class Entity(BaseModel):
    name: str
    type: str

class Relation(BaseModel):
    source: str
    target: str
    type: str

# Create with full control
llm = ChatOpenAI()
emb = OpenAIEmbeddings()

ka = AutoGraph[Entity, Relation](
    node_schema=Entity,
    edge_schema=Relation,
    llm_client=llm,
    embedder=emb,
    # Plus additional configuration...
)

result = ka.parse(text)

Advanced Topic

Direct Auto-Type usage requires understanding of: - Pydantic schema definitions - LLM client configuration - Key extractors for deduplication

See Working with Auto-Types for complete examples.


Next Steps