Concepts¶
Understand the fundamental concepts behind Hyper-Extract.
Three-Layer Architecture¶
Hyper-Extract is built on a three-layer architecture:
graph TD
subgraph "Layer 3: Interface"
A[CLI] --> B[Python SDK]
B --> C[Template API]
end
subgraph "Layer 2: Methods"
C --> D[RAG-Based]
C --> E[Typical]
end
subgraph "Layer 1: Data"
D --> F[Auto-Types]
E --> F
end
F --> G[Structured Knowledge]
| Layer | Purpose | Components |
|---|---|---|
| Auto-Types | Define data structures | 8 type classes |
| Methods | Extraction algorithms | RAG + Typical methods |
| Templates | Domain-specific configs | 80+ preset templates |
Core Concepts¶
Auto-Types¶
The 8 data structure types that define extraction output:
- Record Types: AutoModel, AutoList, AutoSet (store data, no entity relations)
- Graph Types: AutoGraph, AutoHypergraph, AutoTemporalGraph, AutoSpatialGraph, AutoSpatioTemporalGraph (represent entity relationships)
Methods¶
The extraction algorithms:
- RAG-Based: GraphRAG, LightRAG, Hyper-RAG
- Typical: iText2KG, KG-Gen, Atom
Template Format¶
The YAML format for defining extraction templates:
- Schema definition
- Prompt engineering
- Guidelines and rules
Architecture¶
Deep dive into the system design:
- Data flow
- Processing pipeline
- Extension points
Quick Links¶
- CLI Documentation — Terminal usage
- Python SDK — Programmatic usage
- Template Library — Available templates