Skip to content

Graph Types

Auto-Types for representing relationships between entities.


AutoGraph

hyperextract.types.AutoGraph

Bases: BaseAutoType[AutoGraphSchema[NodeSchema, EdgeSchema]], Generic[NodeSchema, EdgeSchema]

AutoGraph - extracts knowledge graphs with nodes and edges from text.

This pattern extracts structured knowledge graphs consisting of entities (nodes) and their relationships (edges). Suitable for entity relationship extraction, knowledge graph construction, and semantic network building.

Key characteristics
  • Extraction target: Graph structure with nodes and edges
  • Deduplication: Automatic deduplication for both nodes and edges using OMem
  • Node merge strategy: Configurable (LLM-powered intelligent merging by default)
  • Edge merge strategy: Configurable (simple merge by default)
  • Extraction modes:
    • one_stage: Extract nodes and edges simultaneously (faster, simpler prompt)
    • two_stage: Extract nodes first, then edges with node context (more accurate)
  • Consistency validation: Ensures edges only connect existing nodes
Example

class Entity(BaseModel): ... name: str ... type: str ... properties: dict = {}

class Relation(BaseModel): ... source: str ... target: str ... relation_type: str

graph = AutoGraph( ... node_schema=Entity, ... edge_schema=Relation, ... node_key_extractor=lambda x: x.name, ... edge_key_extractor=lambda x: f"{x.source}-{x.relation_type}-{x.target}", ... nodes_in_edge_extractor=lambda x: (x.source, x.target), ... llm_client=llm, ... embedder=embedder, ... extraction_mode="two_stage" ... )

Attributes

graph_schema: Type[AutoGraphSchema[NodeSchema, EdgeSchema]] = create_model(graph_schema_name, nodes=(List[node_schema], Field(default_factory=list)), edges=(List[edge_schema], Field(default_factory=list))) instance-attribute

node_schema = node_schema instance-attribute

edge_schema = edge_schema instance-attribute

node_key_extractor = node_key_extractor instance-attribute

edge_key_extractor = edge_key_extractor instance-attribute

nodes_in_edge_extractor = nodes_in_edge_extractor instance-attribute

extraction_mode = extraction_mode instance-attribute

node_fields_for_index = node_fields_for_index instance-attribute

edge_fields_for_index = edge_fields_for_index instance-attribute

_constructor_kwargs = kwargs instance-attribute

node_list_schema = create_model('NodeList', items=(List[node_schema], Field(default_factory=list, description='Extracted nodes'))) instance-attribute

edge_list_schema = create_model('EdgeList', items=(List[edge_schema], Field(default_factory=list, description='Extracted edges'))) instance-attribute

node_merger = node_strategy_or_merger instance-attribute

edge_merger = edge_strategy_or_merger instance-attribute

_node_memory = OMem(memory_schema=node_schema, key_extractor=node_key_extractor, llm_client=llm_client, embedder=embedder, strategy_or_merger=(self.node_merger), verbose=verbose, fields_for_index=node_fields_for_index) instance-attribute

_edge_memory = OMem(memory_schema=edge_schema, key_extractor=edge_key_extractor, llm_client=llm_client, embedder=embedder, strategy_or_merger=(self.edge_merger), verbose=verbose, fields_for_index=edge_fields_for_index) instance-attribute

_node_label_extractor = node_label_extractor instance-attribute

_edge_label_extractor = edge_label_extractor instance-attribute

node_prompt = prompt_for_node_extraction or DEFAULT_NODE_PROMPT instance-attribute

edge_prompt = prompt_for_edge_extraction or DEFAULT_EDGE_PROMPT instance-attribute

prompt_template = ChatPromptTemplate.from_template(self.node_prompt) instance-attribute

node_extractor = self.prompt_template | self.llm_client.with_structured_output(self.node_list_schema) instance-attribute

edge_prompt_template = ChatPromptTemplate.from_template(self.edge_prompt) instance-attribute

edge_extractor = self.edge_prompt_template | self.llm_client.with_structured_output(self.edge_list_schema) instance-attribute

data: AutoGraphSchema[NodeSchema, EdgeSchema] property

Returns the current graph state (nodes and edges).

Returns:

Type Description
AutoGraphSchema[NodeSchema, EdgeSchema]

AutoGraphSchema containing all nodes and edges.

nodes: List[NodeSchema] property

Returns the current node collection.

Returns:

Type Description
List[NodeSchema]

List of nodes.

edges: List[EdgeSchema] property

Returns the current edge collection.

Returns:

Type Description
List[EdgeSchema]

List of edges.

Functions

__init__(node_schema: Type[NodeSchema], edge_schema: Type[EdgeSchema], node_key_extractor: Callable[[NodeSchema], str], edge_key_extractor: Callable[[EdgeSchema], str], nodes_in_edge_extractor: Callable[[EdgeSchema], Tuple[str, str]], llm_client: BaseChatModel, embedder: Embeddings, *, extraction_mode: str = 'one_stage', node_strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, edge_strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, prompt: str = '', prompt_for_node_extraction: str = '', prompt_for_edge_extraction: str = '', node_label_extractor: Callable[[NodeSchema], str] = None, edge_label_extractor: Callable[[EdgeSchema], str] = None, chunk_size: int = 2048, chunk_overlap: int = 256, max_workers: int = 10, verbose: bool = False, node_fields_for_index: List[str] | None = None, edge_fields_for_index: List[str] | None = None, **kwargs: Any)

Initialize AutoGraph with node/edge schemas and configuration.

Parameters:

Name Type Description Default
node_schema Type[NodeSchema]

Pydantic BaseModel for nodes/entities.

required
edge_schema Type[EdgeSchema]

Pydantic BaseModel for edges/relationships.

required
node_key_extractor Callable[[NodeSchema], str]

Function to extract unique key from node (e.g., lambda x: x.id).

required
edge_key_extractor Callable[[EdgeSchema], str]

Function to extract unique key from edge (e.g., lambda x: f"{x.src}-{x.rel}-{x.dst}").

required
nodes_in_edge_extractor Callable[[EdgeSchema], Tuple[str, str]]

Function to extract (source_key, target_key) node keys from an edge for validation.

required
llm_client BaseChatModel

Language model client for extraction.

required
embedder Embeddings

Embedding model for vector indexing.

required
extraction_mode str

"one_stage" (extract nodes+edges together) or "two_stage" (nodes first, then edges).

'one_stage'
node_strategy_or_merger MergeStrategy | BaseMerger

Merge strategy for duplicate nodes (default: LLM.BALANCED).

BALANCED
edge_strategy_or_merger MergeStrategy | BaseMerger

Merge strategy for duplicate edges (default: LLM.BALANCED).

BALANCED
prompt str

Custom extraction prompt for one-stage mode.

''
prompt_for_node_extraction str

Custom extraction prompt for two-stage node extraction.

''
prompt_for_edge_extraction str

Custom extraction prompt for two-stage edge extraction.

''
node_label_extractor Callable[[NodeSchema], str]

Optional function to extract label from node for visualization.

None
edge_label_extractor Callable[[EdgeSchema], str]

Optional function to extract label from edge for visualization.

None
chunk_size int

Maximum characters per chunk.

2048
chunk_overlap int

Overlapping characters between chunks.

256
max_workers int

Maximum concurrent extraction tasks.

10
verbose bool

Whether to log progress.

False
node_fields_for_index List[str] | None

Optional list of field names in node_schema to include in vector index. If None, all text fields are indexed by default. Example: ['name', 'description'] (only index these node fields)

None
edge_fields_for_index List[str] | None

Optional list of field names in edge_schema to include in vector index. If None, all text fields are indexed by default. Example: ['relation_type', 'description'] (only index these edge fields)

None
**kwargs Any

Additional arguments passed to create_merger() when strategy_or_merger is a MergeStrategy enum. Ignored if strategy_or_merger is a BaseMerger instance.

{}

_default_prompt() -> str

Returns the default prompt for one-stage graph extraction.

_create_empty_instance() -> AutoGraph[NodeSchema, EdgeSchema]

Creates a new empty AutoGraph instance with the same configuration as this one.

Overrides parent method to handle AutoGraph-specific parameters.

Returns:

Type Description
AutoGraph[NodeSchema, EdgeSchema]

A new empty AutoGraph instance with identical configuration.

empty() -> bool

Checks if the graph is empty (no nodes).

Returns:

Type Description
bool

True if node collections are empty, False otherwise.

_init_data_state() -> None

Initialize or reset graph data structures.

_init_index_state() -> None

Initialize vector index to empty state.

_set_data_state(data: AutoGraphSchema[NodeSchema, EdgeSchema]) -> None

Replace graph data with new data (full reset).

Parameters:

Name Type Description Default
data AutoGraphSchema[NodeSchema, EdgeSchema]

New graph data to set.

required

_update_data_state(incoming_data: AutoGraphSchema[NodeSchema, EdgeSchema]) -> None

Merge incoming graph data into current state.

Parameters:

Name Type Description Default
incoming_data AutoGraphSchema[NodeSchema, EdgeSchema]

Incremental graph data to merge.

required

_extract_data(text: str) -> AutoGraphSchema[NodeSchema, EdgeSchema]

Main extraction logic dispatcher.

Parameters:

Name Type Description Default
text str

Input text to extract graph from.

required

Returns:

Type Description
AutoGraphSchema[NodeSchema, EdgeSchema]

Extracted and validated graph.

_extract_data_by_one_stage(text: str) -> AutoGraphSchema[NodeSchema, EdgeSchema]

Extract nodes and edges simultaneously using single LLM call.

Parameters:

Name Type Description Default
text str

Input text.

required

Returns:

Type Description
AutoGraphSchema[NodeSchema, EdgeSchema]

Raw extracted graph data.

_extract_data_by_two_stage(text: str) -> AutoGraphSchema[NodeSchema, EdgeSchema]

Extract nodes first, then edges with node context (batch processing).

Process: 1. Split text into chunks. 2. Batch extract nodes for all chunks. 3. Batch extract edges for all chunks (using chunk-specific nodes as context). 4. Construct partial graphs (tuples of nodes/edges). 5. Merge all partial graphs into one global graph.

Parameters:

Name Type Description Default
text str

Input text.

required

Returns:

Type Description
AutoGraphSchema[NodeSchema, EdgeSchema]

Extracted and validated graph.

_extract_nodes_batch(chunks: List[str]) -> List[NodeListSchema[NodeSchema]]

Batch extract nodes from multiple text chunks.

Parameters:

Name Type Description Default
chunks List[str]

List of text chunks.

required

Returns:

Type Description
List[NodeListSchema[NodeSchema]]

List of NodeListSchema objects with extracted nodes.

_extract_edges_batch(chunks: List[str], node_lists: List[NodeListSchema[NodeSchema]]) -> List[EdgeListSchema[EdgeSchema]]

Batch extract edges using corresponding node lists as context.

Parameters:

Name Type Description Default
chunks List[str]

List of text chunks.

required
node_lists List[NodeListSchema[NodeSchema]]

List of NodeListSchema objects (one per chunk).

required

Returns:

Type Description
List[EdgeListSchema[EdgeSchema]]

List of EdgeListSchema objects with extracted edges.

_prune_dangling_edges(graph: AutoGraphSchema[NodeSchema, EdgeSchema]) -> AutoGraphSchema[NodeSchema, EdgeSchema]

Prune edges that connect to non-existent nodes (Consistency Check).

Ensures graph consistency by removing any edges where either endpoint (source or target) does not exist in the node list.

Parameters:

Name Type Description Default
graph AutoGraphSchema[NodeSchema, EdgeSchema]

Raw graph that may contain dangling edges.

required

Returns:

Type Description
AutoGraphSchema[NodeSchema, EdgeSchema]

Graph with only valid edges (endpoints must strictly exist in nodes).

merge_batch_data(data_list_or_tuple: List[AutoGraphSchema[NodeSchema, EdgeSchema]] | Tuple[List[List[NodeSchema]], List[List[EdgeSchema]]]) -> AutoGraphSchema[NodeSchema, EdgeSchema]

Merge multiple graphs or node/edge tuples into one.

Supports two input formats: - List of AutoGraphSchema objects (standard format) - Tuple of (List[List[NodeSchema]], List[List[EdgeSchema]]) (optimization for batch processing)

Parameters:

Name Type Description Default
data_list_or_tuple List[AutoGraphSchema[NodeSchema, EdgeSchema]] | Tuple[List[List[NodeSchema]], List[List[EdgeSchema]]]

Either a list of AutoGraphSchema objects or a tuple of (nodes_lists, edges_lists) where each list contains items from multiple chunks.

required

Returns:

Type Description
AutoGraphSchema[NodeSchema, EdgeSchema]

Merged graph.

build_index(index_nodes: bool = True, index_edges: bool = True)

Build vector index for graph search.

By default, builds indices for both nodes and edges to support comprehensive search.

Parameters:

Name Type Description Default
index_nodes bool

Whether to index nodes (default: True).

True
index_edges bool

Whether to index edges (default: True).

True

build_node_index() -> None

Build vector index specifically for nodes.

build_edge_index() -> None

Build vector index specifically for edges.

search(query: str, top_k_nodes: int = 3, top_k_edges: int = 3, top_k: int | None = None) -> Tuple[List[NodeSchema], List[EdgeSchema]]

Unified graph search interface.

Retrieves nodes and edges semantically related to the query. Always returns a tuple of (nodes, edges). If a count is 0, the corresponding list will be empty.

Parameters:

Name Type Description Default
query str

Search query string.

required
top_k_nodes int

Number of node results to return (default: 3). Set to 0 to disable node search.

3
top_k_edges int

Number of edge results to return (default: 3). Set to 0 to disable edge search.

3
top_k int | None

If provided, sets both top_k_nodes and top_k_edges to this value.

None

Returns:

Type Description
Tuple[List[NodeSchema], List[EdgeSchema]]

Tuple[List[NodeSchema], List[EdgeSchema]]: A tuple containing: - List of matching nodes (empty if top_k_nodes <= 0) - List of matching edges (empty if top_k_edges <= 0)

Raises:

Type Description
ValueError

If both top_k_nodes and top_k_edges are <= 0.

ValueError

If search is requested (top_k > 0) but the corresponding index is not built.

search_nodes(query: str, top_k: int = 3) -> List[NodeSchema]

Semantic search for nodes/entities only.

Parameters:

Name Type Description Default
query str

Search query string.

required
top_k int

Number of results to return (default: 3).

3

Returns:

Type Description
List[NodeSchema]

List of matching nodes using semantic similarity.

search_edges(query: str, top_k: int = 3) -> List[EdgeSchema]

Semantic search for edges/relationships only.

Parameters:

Name Type Description Default
query str

Search query string.

required
top_k int

Number of results to return (default: 3).

3

Returns:

Type Description
List[EdgeSchema]

List of matching edges using semantic similarity.

chat(query: str, top_k: int | None = None, top_k_nodes: int = 3, top_k_edges: int = 3) -> AIMessage

Performs a chat-like interaction using graph knowledge.

Retrieves relevant nodes and/or edges based on the query and retrieval counts, formats them into a structured context with clear headers, and generates an answer.

Parameters:

Name Type Description Default
query str

User query string.

required
top_k int | None

Number of relevant items to retrieve for both nodes and edges (default: 3). If provided, sets both top_k_nodes and top_k_edges to this value.

None
top_k_nodes int

Number of relevant nodes to retrieve (default: 3). Set to 0 to disable node context.

3
top_k_edges int

Number of relevant edges to retrieve (default: 3). Set to 0 to disable edge context.

3

Returns:

Type Description
AIMessage

An AIMessage object containing the LLM-generated response.

AIMessage

Access the text content via response.content.

Example
Chat using 5 nodes and 2 edges as context

response = ka.chat("What is X?", top_k_nodes=5, top_k_edges=2) print(response.content) # Print the generated answer

dump_index(folder_path: str | Path) -> None

Save indices to disk.

load_index(folder_path: str | Path) -> None

Load indices from disk.

show(node_label_extractor: Callable[[NodeSchema], str] = None, edge_label_extractor: Callable[[EdgeSchema], str] = None, *, top_k_nodes_for_search: int = 3, top_k_edges_for_search: int = 3, top_k_nodes_for_chat: int = 3, top_k_edges_for_chat: int = 3) -> None

Visualize the graph using OntoSight.

Parameters:

Name Type Description Default
node_label_extractor Callable[[NodeSchema], str]

Optional function to extract label from node for visualization. If not provided, uses the one from init.

None
edge_label_extractor Callable[[EdgeSchema], str]

Optional function to extract label from edge for visualization. If not provided, uses the one from init.

None
top_k_nodes_for_search int

Number of nodes to retrieve for search callback (default: 3).

3
top_k_edges_for_search int

Number of edges to retrieve for search callback (default: 3).

3
top_k_nodes_for_chat int

Number of nodes to retrieve for chat callback (default: 3).

3
top_k_edges_for_chat int

Number of edges to retrieve for chat callback (default: 3).

3

AutoHypergraph

hyperextract.types.AutoHypergraph

Bases: BaseAutoType[AutoHypergraphSchema[NodeSchema, EdgeSchema]], Generic[NodeSchema, EdgeSchema]

AutoHypergraph - Node-First extraction: Extract entities first, then relationships.

Extraction Strategy: Two-Stage (Node-First) 1. Stage 1: Extract all Nodes (entities) from text. 2. Stage 2: Extract Edges (relationships) using identified Nodes as context.

Suitable for: - Events with multiple participants (e.g., meetings, transactions) - Group memberships - Complex dependencies

The nodes_in_edge_extractor must return a Tuple of ALL node keys involved in the edge. Validation uses 'Strict Mode': checks if ALL nodes in that tuple exist.

Example

class Person(BaseModel): ... name: str ... class Event(BaseModel): ... label: str # e.g., "meeting", "collaboration" ... participants: List[str] # IDs of people involved ... hypergraph = AutoHypergraph( ... node_schema=Person, ... edge_schema=Event, ... node_key_extractor=lambda x: x.name, ... # CRITICAL: Sort participants to ensure {A, B} and {B, A} map to same key ... edge_key_extractor=lambda x: f"{x.label}_{sorted(x.participants)}", ... nodes_in_edge_extractor=lambda x: tuple(x.participants), ... llm_client=llm, ... embedder=embedder, ... extraction_mode="two_stage" ... )

Attributes

graph_schema: Type[AutoHypergraphSchema[NodeSchema, EdgeSchema]] = create_model(graph_schema_name, nodes=(List[node_schema], Field(default_factory=list)), edges=(List[edge_schema], Field(default_factory=list))) instance-attribute

node_schema = node_schema instance-attribute

edge_schema = edge_schema instance-attribute

node_key_extractor = node_key_extractor instance-attribute

edge_key_extractor = edge_key_extractor instance-attribute

nodes_in_edge_extractor = nodes_in_edge_extractor instance-attribute

extraction_mode = extraction_mode instance-attribute

chunk_size = chunk_size instance-attribute

chunk_overlap = chunk_overlap instance-attribute

max_workers = max_workers instance-attribute

verbose = verbose instance-attribute

node_fields_for_index = node_fields_for_index instance-attribute

edge_fields_for_index = edge_fields_for_index instance-attribute

node_list_schema = create_model(f'{node_schema.__name__}List', items=(List[node_schema], Field(default_factory=list))) instance-attribute

edge_list_schema = create_model(f'{edge_schema.__name__}List', items=(List[edge_schema], Field(default_factory=list))) instance-attribute

node_merger = node_strategy_or_merger instance-attribute

edge_merger = edge_strategy_or_merger instance-attribute

_node_memory = OMem(memory_schema=node_schema, key_extractor=node_key_extractor, llm_client=llm_client, embedder=embedder, strategy_or_merger=(self.node_merger), verbose=verbose, fields_for_index=node_fields_for_index) instance-attribute

_edge_memory = OMem(memory_schema=edge_schema, key_extractor=edge_key_extractor, llm_client=llm_client, embedder=embedder, strategy_or_merger=(self.edge_merger), verbose=verbose, fields_for_index=edge_fields_for_index) instance-attribute

_node_label_extractor = node_label_extractor instance-attribute

_edge_label_extractor = edge_label_extractor instance-attribute

node_prompt = prompt_for_node_extraction or DEFAULT_NODE_PROMPT instance-attribute

edge_prompt = prompt_for_edge_extraction or DEFAULT_EDGE_PROMPT instance-attribute

prompt_template = ChatPromptTemplate.from_template(self.node_prompt) instance-attribute

node_extractor = self.prompt_template | self.llm_client.with_structured_output(self.node_list_schema) instance-attribute

edge_prompt_template = ChatPromptTemplate.from_template(self.edge_prompt) instance-attribute

edge_extractor = self.edge_prompt_template | self.llm_client.with_structured_output(self.edge_list_schema) instance-attribute

data: AutoHypergraphSchema property

Returns the current hypergraph state.

nodes: List[NodeSchema] property

Returns the current node collection.

edges: List[EdgeSchema] property

Returns the current hyperedge collection.

Functions

__init__(node_schema: Type[NodeSchema], edge_schema: Type[EdgeSchema], node_key_extractor: Callable[[NodeSchema], str], edge_key_extractor: Callable[[EdgeSchema], str], nodes_in_edge_extractor: Callable[[EdgeSchema], Tuple[str, ...]], llm_client: BaseChatModel, embedder: Embeddings, *, extraction_mode: str = 'two_stage', node_strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, edge_strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, prompt: str = '', prompt_for_node_extraction: str = '', prompt_for_edge_extraction: str = '', node_label_extractor: Callable[[NodeSchema], str] = None, edge_label_extractor: Callable[[EdgeSchema], str] = None, chunk_size: int = 2048, chunk_overlap: int = 256, max_workers: int = 10, verbose: bool = False, node_fields_for_index: List[str] | None = None, edge_fields_for_index: List[str] | None = None, **kwargs: Any)

Initialize AutoHypergraph with node/edge schemas and configuration.

Parameters:

Name Type Description Default
node_schema Type[NodeSchema]

Pydantic BaseModel for nodes/entities.

required
edge_schema Type[EdgeSchema]

Pydantic BaseModel for hyperedges/relationships.

required
node_key_extractor Callable[[NodeSchema], str]

Function to extract unique key from node.

required
edge_key_extractor Callable[[EdgeSchema], str]

Function to extract unique key from hyperedge. CRITICAL: Since hyperedges are unordered sets of nodes (i.e., {A, B} == {B, A}), you MUST sort the participating node keys in this function to ensure consistent deduplication. For example: lambda x: f"{x.relation_type}_{sorted(x.participants)}" Without sorting, two hyperedges with the same nodes in different order will be treated as different edges, breaking the deduplication mechanism.

required
nodes_in_edge_extractor Callable[[EdgeSchema], Tuple[str, ...]]

Function to extract ALL node keys from a hyperedge as a tuple (e.g., lambda x: tuple(x.participants)).

required
llm_client BaseChatModel

Language model client for extraction.

required
embedder Embeddings

Embedding model for vector indexing.

required
extraction_mode str

"two_stage" (recommended) or "one_stage".

'two_stage'
node_strategy_or_merger MergeStrategy | BaseMerger

Merge strategy for duplicate nodes.

BALANCED
edge_strategy_or_merger MergeStrategy | BaseMerger

Merge strategy for duplicate hyperedges.

BALANCED
prompt str

Custom extraction prompt for one-stage mode.

''
prompt_for_node_extraction str

Custom extraction prompt for node extraction.

''
prompt_for_edge_extraction str

Custom extraction prompt for hyperedge extraction.

''
node_label_extractor Callable[[NodeSchema], str]

Optional function to extract label from node for visualization.

None
edge_label_extractor Callable[[EdgeSchema], str]

Optional function to extract label from edge for visualization.

None
chunk_size int

Maximum characters per chunk.

2048
chunk_overlap int

Overlapping characters between chunks.

256
max_workers int

Maximum concurrent extraction tasks.

10
verbose bool

Whether to display detailed execution logs and progress information.

False
node_fields_for_index List[str] | None

Optional list of field names in node_schema to include in vector index. If None, all text fields are indexed by default. Example: ['name', 'properties'] (only index these node fields)

None
edge_fields_for_index List[str] | None

Optional list of field names in edge_schema to include in vector index. If None, all text fields are indexed by default. Example: ['summary', 'type'] (only index these edge fields)

None
**kwargs Any

Additional arguments for merger creation.

{}

_default_prompt() -> str

Returns the default prompt for one-stage hypergraph extraction.

This is the primary prompt used when extraction_mode is 'one_stage'.

_create_empty_instance() -> AutoHypergraph[NodeSchema, EdgeSchema]

Creates a new empty AutoHypergraph instance with the same configuration.

Overrides parent method to handle AutoHypergraph-specific parameters.

Returns:

Type Description
AutoHypergraph[NodeSchema, EdgeSchema]

A new empty AutoHypergraph instance with identical configuration.

_init_data_state() -> None

Initialize or reset hypergraph data structures.

_init_index_state() -> None

Initialize vector index to empty state.

_set_data_state(data: AutoHypergraphSchema) -> None

Replace hypergraph data with new data (full reset).

_update_data_state(incoming_data: AutoHypergraphSchema) -> None

Merge incoming hypergraph data into current state.

empty() -> bool

Checks if the hypergraph is empty.

_extract_data(text: str) -> AutoHypergraphSchema

Main extraction logic dispatcher.

_extract_data_by_one_stage(text: str) -> AutoHypergraphSchema

Extract nodes and edges simultaneously using single LLM call.

Parameters:

Name Type Description Default
text str

Input text.

required

Returns:

Type Description
AutoHypergraphSchema

Raw extracted hypergraph data.

_extract_data_by_two_stage(text: str) -> AutoHypergraphSchema

Extract nodes first, then hyperedges with node context (batch processing).

Process: 1. Split text into chunks. 2. Batch extract nodes for all chunks. 3. Batch extract hyperedges for all chunks (using chunk-specific nodes as context). 4. Merge all partial results into one global hypergraph.

_extract_nodes_batch(chunks: List[str]) -> List[NodeListSchema[NodeSchema]]

Batch extract nodes from multiple text chunks.

_extract_edges_batch(chunks: List[str], node_lists: List[NodeListSchema[NodeSchema]]) -> List[EdgeListSchema[EdgeSchema]]

Batch extract hyperedges using corresponding node lists as context.

_prune_dangling_edges(graph: AutoHypergraphSchema) -> AutoHypergraphSchema

Prune hyperedges where ANY participating node does not exist (Strict Consistency).

Parameters:

Name Type Description Default
graph AutoHypergraphSchema

Raw hypergraph that may contain dangling hyperedges.

required

Returns:

Type Description
AutoHypergraphSchema

Hypergraph with only valid hyperedges (ALL participants must exist in nodes).

merge_batch_data(data_list_or_tuple: List[AutoHypergraphSchema] | Tuple[List[List[NodeSchema]], List[List[EdgeSchema]]]) -> AutoHypergraphSchema

Merge multiple hypergraphs or node/edge tuples into one.

Supports two input formats: - List of AutoHypergraphSchema objects (standard format) - Tuple of (List[List[Node]], List[List[Edge]]) (optimization for batch processing)

Parameters:

Name Type Description Default
data_list_or_tuple List[AutoHypergraphSchema] | Tuple[List[List[NodeSchema]], List[List[EdgeSchema]]]

Either a list of AutoHypergraphSchema objects or a tuple of (nodes_lists, edges_lists) where each list contains items from multiple chunks.

required

Returns:

Type Description
AutoHypergraphSchema

Merged hypergraph.

build_index(index_nodes: bool = True, index_edges: bool = True)

Build vector index for hypergraph search.

By default, builds indices for both nodes and hyperedges to support comprehensive search.

Parameters:

Name Type Description Default
index_nodes bool

Whether to index nodes (default: True).

True
index_edges bool

Whether to index hyperedges (default: True).

True

build_node_index() -> None

Build vector index specifically for nodes.

build_edge_index() -> None

Build vector index specifically for hyperedges.

search(query: str, top_k_nodes: int = 3, top_k_edges: int = 3, top_k: int | None = None) -> Tuple[List[NodeSchema], List[EdgeSchema]]

Unified hypergraph search interface.

Retrieves nodes and hyperedges semantically related to the query. Always returns a tuple of (nodes, edges). If a count is 0, the corresponding list will be empty.

Parameters:

Name Type Description Default
query str

Search query string.

required
top_k_nodes int

Number of node results to return (default: 3). Set to 0 to disable node search.

3
top_k_edges int

Number of hyperedge results to return (default: 3). Set to 0 to disable hyperedge search.

3
top_k int | None

If provided, sets both top_k_nodes and top_k_edges to this value.

None

Returns:

Type Description
Tuple[List[NodeSchema], List[EdgeSchema]]

Tuple[List[Node], List[Edge]]: A tuple containing: - List of matching nodes (empty if top_k_nodes <= 0) - List of matching hyperedges (empty if top_k_edges <= 0)

Raises:

Type Description
ValueError

If both top_k_nodes and top_k_edges are <= 0.

ValueError

If search is requested (top_k > 0) but the corresponding index is not built.

search_nodes(query: str, top_k: int = 3) -> List[NodeSchema]

Semantic search for nodes/entities only.

search_edges(query: str, top_k: int = 3) -> List[EdgeSchema]

Semantic search for hyperedges/relationships only.

chat(query: str, top_k_nodes: int = 3, top_k_edges: int = 3, top_k: int | None = None) -> AIMessage

Performs a chat-like interaction using hypergraph knowledge.

Retrieves relevant nodes and/or hyperedges based on the query and retrieval counts, formats them into a structured context with clear headers, and generates an answer.

Parameters:

Name Type Description Default
query str

User query string.

required
top_k_nodes int

Number of relevant nodes to retrieve (default: 3). Set to 0 to disable node context.

3
top_k_edges int

Number of relevant hyperedges to retrieve (default: 3). Set to 0 to disable edge context.

3
top_k int | None

If provided, sets both top_k_nodes and top_k_edges to this value.

None

Returns:

Type Description
AIMessage

An AIMessage object containing the LLM-generated response.

AIMessage

Access the text content via response.content.

Example
Chat using 5 nodes and 2 hyperedges as context

response = hg.chat("What participates in X?", top_k_nodes=5, top_k_edges=2) print(response.content) # Print the generated answer

dump_index(folder_path: str | Path) -> None

Save indices to disk.

load_index(folder_path: str | Path) -> None

Load indices from disk.

show(node_label_extractor: Callable[[NodeSchema], str] = None, edge_label_extractor: Callable[[EdgeSchema], str] = None, *, top_k_nodes_for_search: int = 3, top_k_edges_for_search: int = 3, top_k_nodes_for_chat: int = 3, top_k_edges_for_chat: int = 3) -> None

Visualize the hypergraph using OntoSight.

Parameters:

Name Type Description Default
node_label_extractor Callable[[NodeSchema], str]

Optional function to extract label from node for visualization. If not provided, uses the one from init.

None
edge_label_extractor Callable[[EdgeSchema], str]

Optional function to extract label from edge for visualization. If not provided, uses the one from init.

None
top_k_nodes_for_search int

Number of nodes to retrieve for search callback (default: 3).

3
top_k_edges_for_search int

Number of edges to retrieve for search callback (default: 3).

3
top_k_nodes_for_chat int

Number of nodes to retrieve for chat callback (default: 3).

3
top_k_edges_for_chat int

Number of edges to retrieve for chat callback (default: 3).

3

AutoTemporalGraph

hyperextract.types.AutoTemporalGraph

Bases: AutoGraph[NodeSchema, EdgeSchema]

Generic Temporal Graph Extractor (AutoTemporalGraph).

A flexible implementation supporting user-defined Node and Edge schemas with temporal awareness: - Schema Agnosticism: Support any user-defined Node and Edge Pydantic models. - Temporal-Aware Deduplication: Time information is integrated into edge deduplication logic. - Dynamic Time Injection: Observation Date is injected during extraction for relative time resolution.

Key Design: - temporal_edge_key_extractor: A unified function to extract the unique key for an edge, including temporal components (e.g., lambda x: f"{x.src}|{x.relation}|{x.dst}|{x.year}"). This ensures (A, rel, B) @ 2020 and (A, rel, B) @ 2021 are treated as different edges. - Unified Prompt Management: Temporal system prompts are prepended to any user-provided prompts.

Example

from pydantic import BaseModel, Field

class MyEntity(BaseModel): ... name: str ... category: str = "Unknown"

class MyTemporalEdge(BaseModel): ... src: str ... dst: str ... relation: str ... year: Optional[str] = None

kg = AutoTemporalGraph( ... node_schema=MyEntity, ... edge_schema=MyTemporalEdge, ... node_key_extractor=lambda x: x.name, ... edge_key_extractor=lambda x: f"{x.src}|{x.relation}|{x.dst}", ... time_in_edge_extractor=lambda x: x.year or "", ... nodes_in_edge_extractor=lambda x: (x.src, x.dst), ... llm_client=llm, ... embedder=embedder, ... observation_time="2024-01-15" ... )

Attributes

observation_time = observation_time or datetime.now().strftime('%Y-%m-%d') instance-attribute

raw_edge_key_extractor = edge_key_extractor instance-attribute

time_in_edge_extractor = time_in_edge_extractor instance-attribute

_constructor_kwargs = kwargs instance-attribute

Functions

__init__(node_schema: Type[NodeSchema], edge_schema: Type[EdgeSchema], node_key_extractor: Callable[[NodeSchema], str], edge_key_extractor: Callable[[EdgeSchema], str], time_in_edge_extractor: Callable[[EdgeSchema], str], nodes_in_edge_extractor: Callable[[EdgeSchema], Tuple[str, str]], llm_client: BaseChatModel, embedder: Embeddings, observation_time: str | None = None, extraction_mode: str = 'two_stage', node_strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, edge_strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, prompt_for_node_extraction: str = '', prompt_for_edge_extraction: str = '', prompt: str = '', chunk_size: int = 2048, chunk_overlap: int = 256, max_workers: int = 10, verbose: bool = False, node_fields_for_index: List[str] | None = None, edge_fields_for_index: List[str] | None = None, **kwargs: Any)

Initialize AutoTemporalGraph.

Parameters:

Name Type Description Default
node_schema Type[NodeSchema]

User-defined Node Pydantic model.

required
edge_schema Type[EdgeSchema]

User-defined Edge Pydantic model with time fields.

required
node_key_extractor Callable[[NodeSchema], str]

Function to extract unique key from node (e.g., lambda x: x.name).

required
edge_key_extractor Callable[[EdgeSchema], str]

Function to extract the base unique identifier for an edge purely based on entities and relation (e.g., lambda x: f"{x.src}|{x.relation}|{x.dst}").

required
time_in_edge_extractor Callable[[EdgeSchema], str]

Function to extract the time component from an edge (e.g., lambda x: x.year or "permanent"). This ensures (A, rel, B) @ 2020 and (A, rel, B) @ 2021 are treated as different edges.

required
nodes_in_edge_extractor Callable[[EdgeSchema], Tuple[str, str]]

Function to extract (source_key, target_key) from edge.

required
llm_client BaseChatModel

LangChain BaseChatModel for extraction.

required
embedder Embeddings

LangChain Embeddings for semantic operations.

required
observation_time str | None

Date context for relative time resolution (default: today in YYYY-MM-DD format).

None
extraction_mode str

"one_stage" or "two_stage" (default: "two_stage").

'two_stage'
node_strategy_or_merger MergeStrategy | BaseMerger

Merge strategy for duplicate nodes (default: LLM.BALANCED).

BALANCED
edge_strategy_or_merger MergeStrategy | BaseMerger

Merge strategy for duplicate edges (default: LLM.BALANCED).

BALANCED
prompt_for_node_extraction str

Additional user prompt to append to node extraction system prompt.

''
prompt_for_edge_extraction str

Additional user prompt to append to edge extraction system prompt.

''
prompt str

Additional user prompt to append to one-stage graph extraction system prompt.

''
chunk_size int

Size of text chunks for processing (default: 2048).

2048
chunk_overlap int

Overlap between text chunks (default: 256).

256
max_workers int

Maximum number of concurrent LLM calls (default: 10).

10
verbose bool

Whether to print verbose output (default: False).

False
node_fields_for_index List[str] | None

List of node fields to include in vector index (optional).

None
edge_fields_for_index List[str] | None

List of edge fields to include in vector index (optional).

None
**kwargs Any

Additional arguments passed to create_merger() when strategy_or_merger is a MergeStrategy enum. Ignored if strategy_or_merger is a BaseMerger instance.

{}

_extract_edges_batch(chunks: List[str], node_lists: List[NodeListSchema[NodeSchema]]) -> List[EdgeListSchema[EdgeSchema]]

Override: Inject observation_time into edge extraction during two-stage extraction.

_extract_data_by_one_stage(text: str) -> Any

Override: Inject observation_time into one-stage extraction.

_create_empty_instance() -> AutoTemporalGraph[NodeSchema, EdgeSchema]

Override: Recreate instance with all temporal-specific attributes.


AutoSpatialGraph

hyperextract.types.AutoSpatialGraph

Bases: AutoGraph[NodeSchema, EdgeSchema]

Generic Spatial Graph Extractor (AutoSpatialGraph).

A flexible implementation supporting user-defined Node and Edge schemas with spatial awareness: - Schema Agnosticism: Support any user-defined Node and Edge Pydantic models. - Location-Aware Deduplication: Spatial information is integrated into edge deduplication. - Dynamic Localization: Observation Location is injected during extraction for relative resolution.

Key Design: - Decoupled Extractors: location_in_edge_extractor is a separate function. - Spatial Identity: Edges are uniquely identified by (source, relation, target) + location.

Example

from pydantic import BaseModel, Field

class MyEntity(BaseModel): ... name: str ... class MySpatialEdge(BaseModel): ... src: str ... dst: str ... relation: str ... place: Optional[str] = None

kg = AutoSpatialGraph( ... node_schema=MyEntity, ... edge_schema=MySpatialEdge, ... node_key_extractor=lambda x: x.name, ... edge_key_extractor=lambda x: f"{x.src}|{x.relation}|{x.dst}", ... location_in_edge_extractor=lambda x: x.place or "", ... nodes_in_edge_extractor=lambda x: (x.src, x.dst), ... llm_client=llm, ... embedder=embedder, ... observation_location="Main Hall" ... )

Attributes

observation_location = observation_location or 'Unknown Location' instance-attribute

_constructor_kwargs = kwargs instance-attribute

raw_edge_key_extractor = edge_key_extractor instance-attribute

location_in_edge_extractor = location_in_edge_extractor instance-attribute

Functions

__init__(node_schema: Type[NodeSchema], edge_schema: Type[EdgeSchema], node_key_extractor: Callable[[NodeSchema], str], edge_key_extractor: Callable[[EdgeSchema], str], location_in_edge_extractor: Callable[[EdgeSchema], str], nodes_in_edge_extractor: Callable[[EdgeSchema], Tuple[str, str]], llm_client: BaseChatModel, embedder: Embeddings, *, observation_location: str | None = None, extraction_mode: str = 'two_stage', node_strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, edge_strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, prompt_for_node_extraction: str = '', prompt_for_edge_extraction: str = '', prompt: str = '', chunk_size: int = 2048, chunk_overlap: int = 256, max_workers: int = 10, verbose: bool = False, node_fields_for_index: List[str] | None = None, edge_fields_for_index: List[str] | None = None, **kwargs: Any)

Initialize AutoSpatialGraph.

Parameters:

Name Type Description Default
node_schema Type[NodeSchema]

User-defined Node Pydantic model.

required
edge_schema Type[EdgeSchema]

User-defined Edge Pydantic model with location fields.

required
node_key_extractor Callable[[NodeSchema], str]

Function to extract unique key from node.

required
edge_key_extractor Callable[[EdgeSchema], str]

Function to extract the base unique identifier for an edge.

required
location_in_edge_extractor Callable[[EdgeSchema], str]

Function to extract the spatial component from an edge.

required
nodes_in_edge_extractor Callable[[EdgeSchema], Tuple[str, str]]

Function to extract (source_key, target_key) from edge.

required
llm_client BaseChatModel

LangChain BaseChatModel for extraction.

required
embedder Embeddings

LangChain Embeddings for semantic operations.

required
observation_location str | None

Location context for spatial resolution (default: "Unknown Location").

None
extraction_mode str

"one_stage" or "two_stage" (default: "two_stage").

'two_stage'
node_strategy_or_merger MergeStrategy | BaseMerger

Merge strategy for duplicate nodes (default: LLM.BALANCED).

BALANCED
edge_strategy_or_merger MergeStrategy | BaseMerger

Merge strategy for duplicate edges (default: LLM.BALANCED).

BALANCED
prompt_for_node_extraction str

Additional user prompt to append to node extraction system prompt.

''
prompt_for_edge_extraction str

Additional user prompt to append to edge extraction system prompt.

''
prompt str

Additional user prompt to append to one-stage graph extraction system prompt.

''
chunk_size int

Size of text chunks for processing (default: 2048).

2048
chunk_overlap int

Overlap between text chunks (default: 256).

256
max_workers int

Maximum number of concurrent LLM calls (default: 10).

10
verbose bool

Whether to print verbose output (default: False).

False
node_fields_for_index List[str] | None

List of node fields to include in vector index (optional).

None
edge_fields_for_index List[str] | None

List of edge fields to include in vector index (optional).

None
**kwargs Any

Additional arguments passed to create_merger() when strategy_or_merger is a MergeStrategy enum. Ignored if strategy_or_merger is a BaseMerger instance.

{}

_extract_edges_batch(chunks: List[str], node_lists: List[NodeListSchema[NodeSchema]]) -> List[EdgeListSchema[EdgeSchema]]

Override: Inject observation_location into edge extraction during two-stage extraction.

_extract_data_by_one_stage(text: str) -> Any

Override: Inject observation_location into one-stage extraction.

_create_empty_instance() -> AutoSpatialGraph[NodeSchema, EdgeSchema]

Override: Recreate instance with spatial attributes preserved.


AutoSpatioTemporalGraph

hyperextract.types.AutoSpatioTemporalGraph

Bases: AutoGraph[NodeSchema, EdgeSchema]

Generic Spatio-Temporal Graph Extractor (AutoSpatioTemporalGraph).

A flexible implementation supporting user-defined Node and Edge schemas with spatio-temporal awareness: - Schema Agnosticism: Support any user-defined Node and Edge Pydantic models. - Spatio-Temporal-Aware Deduplication: Time and Space info are integrated into edge deduplication. - Dynamic Context Injection: Observation Date and Location are injected during extraction.

Key Design: - Decoupled Extractors: time_in_edge_extractor and location_in_edge_extractor are separate. - Composite Key: Automatically fuses base key, time, and location for unique identification.

Example

from pydantic import BaseModel, Field

class MyEntity(BaseModel): ... name: str

class MySTEdge(BaseModel): ... src: str ... dst: str ... relation: str ... time: Optional[str] = None ... place: Optional[str] = None

kg = AutoSpatioTemporalGraph( ... node_schema=MyEntity, ... edge_schema=MySTEdge, ... node_key_extractor=lambda x: x.name, ... edge_key_extractor=lambda x: f"{x.src}|{x.relation}|{x.dst}", ... time_in_edge_extractor=lambda x: x.time or "", ... location_in_edge_extractor=lambda x: x.place or "", ... nodes_in_edge_extractor=lambda x: (x.src, x.dst), ... llm_client=llm, ... embedder=embedder, ... observation_time="2024-01-15", ... observation_location="Beijing" ... )

Attributes

observation_time = observation_time or datetime.now().strftime('%Y-%m-%d') instance-attribute

observation_location = observation_location or 'Unknown' instance-attribute

raw_edge_key_extractor = edge_key_extractor instance-attribute

time_in_edge_extractor = time_in_edge_extractor instance-attribute

location_in_edge_extractor = location_in_edge_extractor instance-attribute

_constructor_kwargs = kwargs instance-attribute

Functions

__init__(node_schema: Type[NodeSchema], edge_schema: Type[EdgeSchema], node_key_extractor: Callable[[NodeSchema], str], edge_key_extractor: Callable[[EdgeSchema], str], time_in_edge_extractor: Callable[[EdgeSchema], str], location_in_edge_extractor: Callable[[EdgeSchema], str], nodes_in_edge_extractor: Callable[[EdgeSchema], Tuple[str, str]], llm_client: BaseChatModel, embedder: Embeddings, observation_time: str | None = None, observation_location: str | None = None, extraction_mode: str = 'two_stage', node_strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, edge_strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, prompt_for_node_extraction: str = '', prompt_for_edge_extraction: str = '', prompt: str = '', chunk_size: int = 2048, chunk_overlap: int = 256, max_workers: int = 10, verbose: bool = False, node_fields_for_index: List[str] | None = None, edge_fields_for_index: List[str] | None = None, **kwargs: Any)

Initialize AutoSpatioTemporalGraph.

Parameters:

Name Type Description Default
node_schema Type[NodeSchema]

User-defined Node Pydantic model.

required
edge_schema Type[EdgeSchema]

User-defined Edge Pydantic model with time/space fields.

required
node_key_extractor Callable[[NodeSchema], str]

Function to extract unique key from node.

required
edge_key_extractor Callable[[EdgeSchema], str]

Function to extract the base unique identifier for an edge.

required
time_in_edge_extractor Callable[[EdgeSchema], str]

Function to extract the time component.

required
location_in_edge_extractor Callable[[EdgeSchema], str]

Function to extract the spatial component.

required
nodes_in_edge_extractor Callable[[EdgeSchema], Tuple[str, str]]

Function to extract (source_key, target_key) from edge.

required
llm_client BaseChatModel

LangChain BaseChatModel for extraction.

required
embedder Embeddings

LangChain Embeddings for semantic operations.

required
observation_time str | None

Date context for relative time resolution (default: today in YYYY-MM-DD format).

None
extraction_mode str

"one_stage" or "two_stage" (default: "two_stage").

'two_stage'
node_strategy_or_merger MergeStrategy | BaseMerger

Merge strategy for duplicate nodes (default: LLM.BALANCED).

BALANCED
edge_strategy_or_merger MergeStrategy | BaseMerger

Merge strategy for duplicate edges (default: LLM.BALANCED).

BALANCED
prompt_for_node_extraction str

Additional user prompt to append to node extraction system prompt.

''
prompt_for_edge_extraction str

Additional user prompt to append to edge extraction system prompt.

''
prompt str

Additional user prompt to append to one-stage graph extraction system prompt.

''
chunk_size int

Size of text chunks for processing (default: 2048).

2048
chunk_overlap int

Overlap between text chunks (default: 256).

256
max_workers int

Maximum number of concurrent LLM calls (default: 10).

10
verbose bool

Whether to print verbose output (default: False).

False
node_fields_for_index List[str] | None

List of node fields to include in vector index (optional).

None
edge_fields_for_index List[str] | None

List of edge fields to include in vector index (optional).

None
**kwargs Any

Additional arguments passed to create_merger() when strategy_or_merger is a MergeStrategy enum. Ignored if strategy_or_merger is a BaseMerger instance.

{}

_extract_edges_batch(chunks: List[str], node_lists: List[NodeListSchema[NodeSchema]]) -> List[EdgeListSchema[EdgeSchema]]

Inject observation_time and observation_location into edge extraction.

_extract_data_by_one_stage(text: str) -> Any

Inject observation_time and observation_location into one-stage extraction.

_create_empty_instance() -> AutoSpatioTemporalGraph[NodeSchema, EdgeSchema]

Recreate instance with all spatio-temporal attributes.