Creating Custom Templates¶
Level 2+ - Intermediate Extension
This guide shows how to create reusable templates. Before reading, complete Level 1: Using Templates or Level 2: Working with Auto-Types.
Learn how to create your own templates for specialized extraction tasks.
Overview¶
Custom templates allow you to define:
- Output schema — What entities and relationships to extract
- Extraction rules — Guidelines for the LLM
- Display configuration — How to visualize results
Quick Start¶
1. Create Template File¶
Create a YAML file in ~/.hyperextract/templates/:
2. Define Template Structure¶
language: [zh, en]
name: my_template
type: graph
tags: [custom, domain]
description:
zh: "自定义模板描述"
en: "Custom template description"
output:
description:
zh: "输出描述"
en: "Output description"
entities:
description:
zh: "实体描述"
en: "Entity description"
fields:
- name: name
type: str
description:
zh: "名称"
en: "Name"
required: true
- name: category
type: str
description:
zh: "类别"
en: "Category"
required: true
relations:
description:
zh: "关系描述"
en: "Relation description"
fields:
- name: type
type: str
description:
zh: "关系类型"
en: "Relation type"
required: true
guideline:
target:
zh: "提取目标描述"
en: "Extraction target description"
rules_for_entities:
zh:
- "规则1"
- "规则2"
en:
- "Rule 1"
- "Rule 2"
rules_for_relations:
zh:
- "关系规则1"
en:
- "Relation rule 1"
identifiers:
entity_id: name
relation_id: "{source}|{type}|{target}"
relation_members:
source: source
target: target
display:
entity_label: "{name} ({category})"
relation_label: "{type}"
3. Validate and Use¶
from hyperextract import Template
# Load your custom template
ka = Template.create("my_domain/my_template", "en")
# Use it
result = ka.parse("Your document text here")
Field Types¶
| Type | Description | Example |
|---|---|---|
str |
String value | "Hello" |
int |
Integer | 42 |
float |
Floating point | 3.14 |
bool |
Boolean | true |
list[str] |
List of strings | ["a", "b"] |
datetime |
Date/time | "2024-01-01" |
enum |
Enumerated values | See below |
Enum Type¶
For fields with predefined values:
- name: priority
type: str
description:
zh: "优先级: high/medium/low"
en: "Priority: high/medium/low"
required: true
Auto-Types¶
Choose the appropriate Auto-Type for your data:
| Type | Use Case | Features |
|---|---|---|
graph |
General knowledge graphs | Entities + Relations |
temporal_graph |
Time-series data | + Temporal events |
spatial_graph |
Geospatial data | + Location coordinates |
hypergraph |
Complex relationships | + Hyperedges |
tree |
Hierarchical data | Parent-child structure |
table |
Structured records | Row-based data |
Best Practices¶
1. Clear Descriptions¶
Write detailed descriptions to guide the LLM:
# Good
description:
en: "Extract product information including name, price, and category"
# Bad
description:
en: "Product info"
2. Specific Rules¶
Provide concrete extraction rules:
guideline:
rules_for_entities:
en:
- "Only extract products mentioned in the pricing section"
- "Category must be one of: electronics, clothing, food, other"
- "Round prices to 2 decimal places"
3. Required Fields¶
Mark critical fields as required:
fields:
- name: id
type: str
required: true # Extraction will fail if missing
- name: notes
type: str
required: false # Optional field
4. Validation¶
Test your template with sample documents:
# Test extraction
test_doc = """
Sample document text for testing...
"""
result = ka.parse(test_doc)
print(result.data.entities)
print(result.data.relations)
Example: Product Catalog Template¶
language: [zh, en]
name: product_catalog
type: table
tags: [ecommerce, products]
description:
zh: "从商品目录中提取产品信息"
en: "Extract product information from catalogs"
output:
description:
zh: "产品信息表"
en: "Product information table"
entities:
description:
zh: "产品实体"
en: "Product entities"
fields:
- name: sku
type: str
description:
zh: "产品SKU(唯一标识)"
en: "Product SKU (unique identifier)"
required: true
- name: name
type: str
description:
zh: "产品名称"
en: "Product name"
required: true
- name: price
type: float
description:
zh: "价格(数字)"
en: "Price (numeric value)"
required: true
- name: category
type: str
description:
zh: "类别:electronics/clothing/food/home/other"
en: "Category: electronics/clothing/food/home/other"
required: true
- name: in_stock
type: bool
description:
zh: "是否有库存"
en: "Whether in stock"
required: false
guideline:
target:
zh: "提取所有产品信息,忽略缺货商品"
en: "Extract all product info, skip out-of-stock items"
rules_for_entities:
en:
- "Extract products from tables and bullet lists"
- "Convert price strings to numeric values"
- "Map category names to standard values"
- "Set in_stock=false if explicitly mentioned as out of stock"
identifiers:
entity_id: sku
display:
entity_label: "{name} (${price})"
Using Method Classes Directly¶
If you need full control over the extraction process, you can instantiate method classes directly instead of going through Template.create:
from hyperextract.methods import Light_RAG
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
# Create method instance directly
llm = ChatOpenAI()
emb = OpenAIEmbeddings()
method = Light_RAG(
llm_client=llm,
embedder=emb,
# Method-specific parameters
chunk_size=1024,
max_workers=5
)
# Extract knowledge
result = method.parse(text)
This approach is useful for advanced scenarios where you want to bypass the Template layer and tune algorithm parameters directly.
Registering Custom Methods¶
You can also register custom extraction methods and use them through the Template API:
from hyperextract.methods import register_method
class MyCustomMethod:
def __init__(self, llm_client, embedder, **kwargs):
self.llm_client = llm_client
self.embedder = embedder
def parse(self, text):
# Your extraction logic
pass
# Register the method
register_method(
name="my_method",
method_class=MyCustomMethod,
autotype="graph",
description="My custom extraction method"
)
# Use via Template API
from hyperextract import Template
ka = Template.create("method/my_method")
Sharing Templates¶
Register Globally¶
Place templates in the system directory:
# System-wide (all users)
/usr/share/hyperextract/templates/
# User-specific
~/.hyperextract/templates/
Template Registry¶
To share with the community:
- Fork the Hyper-Extract repository
- Add your template to
hyperextract/templates/ - Submit a pull request
Troubleshooting¶
Template Not Found¶
# Check template path
Template.list_available() # List all templates
# Verify directory
import os
print(os.listdir("~/.hyperextract/templates/"))
Poor Extraction Quality¶
- Add more examples in guidelines
- Refine field descriptions with specific constraints
- Split complex templates into smaller ones
- Use stricter rules to constrain output
Validation Errors¶
See Also¶
Prerequisites: - Template Format Reference — Complete YAML specification - Using Templates — Level 1: Basic template usage
Advanced Usage: - Incremental Updates — Add more documents - Search and Chat — Use extracted knowledge - Template Library — Browse existing templates