Python SDK¶

Build knowledge extraction pipelines with the Hyper-Extract Python API.

Installation¶

pip install hyperextract

For development:

pip install hyperextract[dev]

Quick Example¶

from hyperextract import Template

# Create template
ka = Template.create("general/biography_graph", language="en")

# Extract knowledge
with open("document.md") as f:
    result = ka.parse(f.read())

# Access data
print(f"Nodes: {len(result.nodes)}")
print(f"Edges: {len(result.edges)}")

# Build index for search/chat capabilities
result.build_index()

# Visualize
result.show()

Interactive Visualization

Core Classes¶

Template¶

The main entry point for template-based extraction.

from hyperextract import Template

# Create from preset
ka = Template.create("general/biography_graph", language="en")

# Create from custom YAML
ka = Template.create("/path/to/custom_template.yaml", language="en")

# Create from method
ka = Template.create("method/light_rag")

Auto-Types¶

Eight data structure types for different extraction needs:

Class	Use Case
`AutoModel`	Structured data models
`AutoList`	Ordered collections
`AutoSet`	Deduplicated collections
`AutoGraph`	Entity-relationship graphs
`AutoHypergraph`	Multi-entity relationships
`AutoTemporalGraph`	Time-based relationships
`AutoSpatialGraph`	Location-based relationships
`AutoSpatioTemporalGraph`	Combined time + space

API Overview¶

Template API¶

# Create
ka = Template.create(template_path, language="en")

# Extract
result = ka.parse(text)           # New extraction
result.feed_text(text)            # Add to existing

# Query
result.build_index()              # Build search index
results = result.search(query)    # Semantic search
response = result.chat(query)     # Chat with knowledge

# Persist
result.dump("./output/")          # Save to disk
result.load("./output/")          # Load from disk

# Visualize
result.show()                     # Interactive visualization

Interactive Visualization

Documentation Structure¶

Quickstart — Your first Python extraction
Core Concepts — Template, AutoType, Method explained
Guides:
Using Templates
Using Methods
Working with Auto-Types
Search and Chat
Incremental Updates
Saving and Loading
API Reference:
Template
Auto-Types
Methods

Examples by Use Case¶

Research Paper Analysis¶

from hyperextract import Template

ka = Template.create("general/concept_graph", language="en")

with open("paper.md") as f:
    paper = ka.parse(f.read())

# Build searchable knowledge abstract
paper.build_index()

# Ask questions
response = paper.chat("What are the main contributions?")
print(response.content)

Financial Report Extraction¶

from hyperextract import Template

ka = Template.create("finance/earnings_summary", language="en")

report = ka.parse(earnings_text)
print(report.data.revenue)
print(report.data.eps)

Building a Knowledge Abstract¶

from hyperextract import Template

ka = Template.create("general/graph", language="en")

# Initial extraction
ka = ka.parse(doc1_text)

# Add more documents
ka.feed_text(doc2_text)
ka.feed_text(doc3_text)

# Save for later
ka.dump("./my_ka/")

Configuration¶

Environment Variables¶

import os

os.environ["OPENAI_API_KEY"] = "your-api-key"
os.environ["OPENAI_BASE_URL"] = "https://api.openai.com/v1"

Using .env File¶

from dotenv import load_dotenv
load_dotenv()  # Loads from .env file

Custom Clients¶

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from hyperextract import Template

llm = ChatOpenAI(model="gpt-4o")
embedder = OpenAIEmbeddings(model="text-embedding-3-large")

ka = Template.create(
    "general/biography_graph",
    language="en",
    llm_client=llm,
    embedder=embedder
)

Type Hints¶

Hyper-Extract is fully typed for IDE support:

from hyperextract import Template, AutoGraph

ka: AutoGraph = Template.create("general/graph", "en")
result = ka.parse(text)

# IDE autocomplete works
nodes = result.nodes
edges = result.edges

Next Steps¶

Follow the Quickstart
Learn Core Concepts
Browse Guides
Read API Reference