Skip to content

Home

Hyper-Extract Logo

Transform documents into structured knowledge with one command.

"ๅ‘Šๅˆซๆ–‡ๆกฃ็„ฆ่™‘๏ผŒ่ฎฉไฟกๆฏไธ€็›ฎไบ†็„ถ"

Hyper-Extract is an intelligent, LLM-powered knowledge extraction framework. It transforms unstructured text into persistent, predictable, and strongly-typed knowledge structuresโ€”from simple lists to complex knowledge graphs, hypergraphs, and spatio-temporal graphs.


โšก 5-Minute Quick Start

# 1. Install CLI tool
uv tool install hyperextract

# 2. Configure API Key
he config init -k YOUR_OPENAI_API_KEY

# 3. Extract knowledge from a document
he parse tesla.md -t general/biography_graph -o ./output/ -l en

# 4. Visualize the knowledge graph
he show ./output/
from hyperextract import Template

# 1. Create a template
ka = Template.create("general/biography_graph", "en")

# 2. Extract knowledge
with open("tesla.md") as f:
    result = ka.parse(f.read())

# 3. Visualize
ka.show(result)

โ†’ Ready to dive deeper? Check out the Getting Started Guide or jump to CLI / Python SDK documentation.


โœจ What Makes Hyper-Extract Different?

  • 8 Auto-Types


    From simple AutoList/AutoModel to advanced AutoGraph, AutoHypergraph, and AutoSpatioTemporalGraph. Pick the right structure for your data.

  • 10+ Extraction Engines


    Built-in support for GraphRAG, LightRAG, Hyper-RAG, KG-Gen, iText2KG, and more. Choose the best method for your use case.

  • 80+ Domain Templates


    Ready-to-use templates for Finance, Legal, Medical, TCM, and Industry. Zero configuration needed.

  • Incremental Evolution


    Feed new documents to continuously expand your knowledge abstract. No need to reprocess everything.


๐ŸŽฏ Choose Your Path

  • CLI User


    Process documents directly from your terminal. Perfect for:

    • Quick knowledge extraction
    • Batch document processing
    • Building knowledge abstracts without coding

    CLI Guide

  • Python Developer


    Integrate into your Python applications. Perfect for:

    • Custom extraction pipelines
    • Integration with existing workflows
    • Building AI-powered applications

    Python SDK

  • Want to Learn More?


    Understand the concepts and architecture:

    • How Auto-Types work
    • Choosing extraction methods
    • Creating custom templates

    Concepts


๐Ÿงฉ The 8 Auto-Types at a Glance

Type Use Case Example Output
AutoModel Structured summaries A pydantic model with specific fields
AutoList Collections of items A list of entities or facts
AutoSet Deduplicated collections A set of unique items
AutoGraph Entity-relationship networks Knowledge graph with nodes and edges
AutoHypergraph Multi-entity relationships Hyperedges connecting multiple nodes
AutoTemporalGraph Time-based relationships Graph with temporal information
AutoSpatialGraph Location-based relationships Graph with geographic data
AutoSpatioTemporalGraph Time + Space combined Full context with when and where

โ†’ Learn which Auto-Type to choose


๐Ÿ—๏ธ Architecture Overview

Hyper-Extract follows a three-layer architecture:

graph TD
    A[Your Document] --> B[CLI / Python API]
    B --> C[Template]
    B --> D[Method]
    C --> E[Auto-Type]
    D --> E
    E --> F[Structured Knowledge]

    subgraph "Layer 3: Templates & Methods"
        C
        D
    end

    subgraph "Layer 2: Core Engine"
        E
    end

    subgraph "Layer 1: Output"
        F
    end
  1. Auto-Types โ€” Define the output data structure (8 types)
  2. Methods โ€” Provide extraction algorithms (RAG-based and Typical)
  3. Templates โ€” Offer domain-specific, ready-to-use configurations

You can use Hyper-Extract at any level: pick a template for quick results, choose a method for more control, or work directly with Auto-Types for full customization.


๐Ÿ“Š Comparison with Other Tools

Feature GraphRAG LightRAG KG-Gen Hyper-Extract
Knowledge Graph โœ… โœ… โœ… โœ…
Temporal Graph โœ… โŒ โŒ โœ…
Spatial Graph โŒ โŒ โŒ โœ…
Hypergraph โŒ โŒ โŒ โœ…
Domain Templates โŒ โŒ โŒ โœ…
CLI Tool โœ… โŒ โŒ โœ…
Multi-language โœ… โŒ โŒ โœ…

๐Ÿ“š Documentation Structure

  • Getting Started โ€” Installation and your first extraction
  • CLI Guide โ€” Complete terminal workflow documentation
  • Python SDK โ€” API reference and developer guides
  • Concepts โ€” Understanding the architecture
  • Templates โ€” Domain-specific extraction templates
  • Resources โ€” FAQ, troubleshooting, and contributing

๐Ÿค Contributing

Contributions are welcome! Whether it's bug reports, feature requests, or documentation improvements, please feel free to submit an issue or pull request.

GitHub Repository


๐Ÿ“„ License

Hyper-Extract is licensed under the Apache-2.0 License.