Frequently Asked Questions¶
Common questions about Hyper-Extract.
General¶
What is Hyper-Extract?¶
Hyper-Extract is an LLM-powered knowledge extraction framework that transforms unstructured text into structured knowledge graphs, lists, models, and more.
What can I use it for?¶
- Research paper analysis
- Knowledge base construction
- Document processing
- Information extraction
- Question-answering systems
Is it free?¶
The software is open-source (Apache-2.0). You need to provide your own OpenAI API key for LLM calls.
Installation¶
What are the requirements?¶
- Python 3.11+
- OpenAI API key
How do I install it?¶
Installation fails with "No module named 'hyperextract'"¶
Try:
Or use a virtual environment:
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install hyperextract
Configuration¶
Where do I set my API key?¶
Option 1: CLI
Option 2: Environment variable
Option 3: .env file
Can I use a different LLM provider?¶
Yes, set the base URL:
Which models are supported?¶
- OpenAI models (gpt-4o, gpt-4o-mini, etc.)
- Any OpenAI-compatible API
Usage¶
Which template should I use?¶
See the How to Choose guide or use:
How do I process a PDF?¶
Convert to text first:
Can I process multiple documents?¶
Option 1: Feed incrementally
Option 2: Process directory
How do I extract in Chinese?¶
Performance¶
Why is extraction slow?¶
- Long documents are chunked and processed in parallel
- Each chunk requires an LLM call
- Consider using
--no-indexduring batch processing
How can I speed it up?¶
- Use smaller chunk sizes
- Reduce
max_workersif hitting rate limits - Process documents in parallel (manually)
Memory issues with large documents?¶
Process in smaller batches:
Results¶
Where is my data stored?¶
./output/
├── data.json # Extracted knowledge
├── metadata.json # Extraction info
└── index/ # Search index
How do I visualize results?¶
Or in Python:

Can I export to other formats?¶
import json
# To JSON
json_data = result.data.model_dump_json()
# To dict
data_dict = result.data.model_dump()
Troubleshooting¶
"API key not found"¶
Run:
"Template not found"¶
List available templates:
"Index not found" error¶
Build the index:
Search returns no results¶
Try:
- Different search terms
- Increase top_k: he search ./ka/ "query" -n 10
- Check if index is built: he info ./ka/
Advanced¶
Can I create custom templates?¶
Yes! See Custom Templates.
Can I use my own extraction method?¶
Yes, implement and register:
from hyperextract.methods import register_method
class MyMethod:
def extract(self, text):
# Your logic
pass
register_method("my_method", MyMethod, "graph", "Description")
How do I integrate with my application?¶
from hyperextract import Template
class MyApp:
def __init__(self):
self.ka = Template.create("general/graph", "en")
def process_document(self, text):
return self.ka.parse(text)