Skip to content

Python SDK

Build knowledge extraction pipelines with the Hyper-Extract Python API.


Installation

pip install hyperextract

For development:

pip install hyperextract[dev]

Quick Example

from hyperextract import Template

# Create template
ka = Template.create("general/biography_graph", language="en")

# Extract knowledge
with open("document.md") as f:
    result = ka.parse(f.read())

# Access data
print(f"Nodes: {len(result.nodes)}")
print(f"Edges: {len(result.edges)}")

# Build index for search/chat capabilities
result.build_index()

# Visualize
result.show()

Interactive Visualization


Core Classes

Template

The main entry point for template-based extraction.

from hyperextract import Template

# Create from preset
ka = Template.create("general/biography_graph", language="en")

# Create from custom YAML
ka = Template.create("/path/to/custom_template.yaml", language="en")

# Create from method
ka = Template.create("method/light_rag")

Auto-Types

Eight data structure types for different extraction needs:

Class Use Case
AutoModel Structured data models
AutoList Ordered collections
AutoSet Deduplicated collections
AutoGraph Entity-relationship graphs
AutoHypergraph Multi-entity relationships
AutoTemporalGraph Time-based relationships
AutoSpatialGraph Location-based relationships
AutoSpatioTemporalGraph Combined time + space

API Overview

Template API

# Create
ka = Template.create(template_path, language="en")

# Extract
result = ka.parse(text)           # New extraction
result.feed_text(text)            # Add to existing

# Query
result.build_index()              # Build search index
results = result.search(query)    # Semantic search
response = result.chat(query)     # Chat with knowledge

# Persist
result.dump("./output/")          # Save to disk
result.load("./output/")          # Load from disk

# Visualize
result.show()                     # Interactive visualization

Interactive Visualization


Documentation Structure


Examples by Use Case

Research Paper Analysis

from hyperextract import Template

ka = Template.create("general/concept_graph", language="en")

with open("paper.md") as f:
    paper = ka.parse(f.read())

# Build searchable knowledge abstract
paper.build_index()

# Ask questions
response = paper.chat("What are the main contributions?")
print(response.content)

Financial Report Extraction

from hyperextract import Template

ka = Template.create("finance/earnings_summary", language="en")

report = ka.parse(earnings_text)
print(report.data.revenue)
print(report.data.eps)

Building a Knowledge Abstract

from hyperextract import Template

ka = Template.create("general/graph", language="en")

# Initial extraction
ka = ka.parse(doc1_text)

# Add more documents
ka.feed_text(doc2_text)
ka.feed_text(doc3_text)

# Save for later
ka.dump("./my_ka/")

Configuration

Environment Variables

import os

os.environ["OPENAI_API_KEY"] = "your-api-key"
os.environ["OPENAI_BASE_URL"] = "https://api.openai.com/v1"

Using .env File

from dotenv import load_dotenv
load_dotenv()  # Loads from .env file

Custom Clients

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from hyperextract import Template

llm = ChatOpenAI(model="gpt-4o")
embedder = OpenAIEmbeddings(model="text-embedding-3-large")

ka = Template.create(
    "general/biography_graph",
    language="en",
    llm_client=llm,
    embedder=embedder
)

Type Hints

Hyper-Extract is fully typed for IDE support:

from hyperextract import Template, AutoGraph

ka: AutoGraph = Template.create("general/graph", "en")
result = ka.parse(text)

# IDE autocomplete works
nodes = result.nodes
edges = result.edges

Next Steps