图谱类型¶

用于表示实体之间关系的自动类型。

AutoGraph¶

`hyperextract.types.AutoGraph` ¶

Bases: BaseAutoType[AutoGraphSchema[NodeSchema, EdgeSchema]], Generic[NodeSchema, EdgeSchema]

AutoGraph - extracts knowledge graphs with nodes and edges from text.

This pattern extracts structured knowledge graphs consisting of entities (nodes) and their relationships (edges). Suitable for entity relationship extraction, knowledge graph construction, and semantic network building.

Key characteristics

Extraction target: Graph structure with nodes and edges
Deduplication: Automatic deduplication for both nodes and edges using OMem
Node merge strategy: Configurable (LLM-powered intelligent merging by default)
Edge merge strategy: Configurable (simple merge by default)
Extraction modes:
- one_stage: Extract nodes and edges simultaneously (faster, simpler prompt)
- two_stage: Extract nodes first, then edges with node context (more accurate)
Consistency validation: Ensures edges only connect existing nodes

Example

class Entity(BaseModel): ... name: str ... type: str ... properties: dict = {}

class Relation(BaseModel): ... source: str ... target: str ... relation_type: str

graph = AutoGraph( ... node_schema=Entity, ... edge_schema=Relation, ... node_key_extractor=lambda x: x.name, ... edge_key_extractor=lambda x: f"{x.source}-{x.relation_type}-{x.target}", ... nodes_in_edge_extractor=lambda x: (x.source, x.target), ... llm_client=llm, ... embedder=embedder, ... extraction_mode="two_stage" ... )

Attributes¶

`graph_schema: Type[AutoGraphSchema[NodeSchema, EdgeSchema]] = create_model(graph_schema_name, nodes=(List[node_schema], Field(default_factory=list)), edges=(List[edge_schema], Field(default_factory=list)))` `instance-attribute` ¶

`node_schema = node_schema` `instance-attribute` ¶

`edge_schema = edge_schema` `instance-attribute` ¶

`node_key_extractor = node_key_extractor` `instance-attribute` ¶

`edge_key_extractor = edge_key_extractor` `instance-attribute` ¶

`nodes_in_edge_extractor = nodes_in_edge_extractor` `instance-attribute` ¶

`extraction_mode = extraction_mode` `instance-attribute` ¶

`node_fields_for_index = node_fields_for_index` `instance-attribute` ¶

`edge_fields_for_index = edge_fields_for_index` `instance-attribute` ¶

`_constructor_kwargs = kwargs` `instance-attribute` ¶

`node_list_schema = create_model('NodeList', items=(List[node_schema], Field(default_factory=list, description='Extracted nodes')))` `instance-attribute` ¶

`edge_list_schema = create_model('EdgeList', items=(List[edge_schema], Field(default_factory=list, description='Extracted edges')))` `instance-attribute` ¶

`node_merger = node_strategy_or_merger` `instance-attribute` ¶

`edge_merger = edge_strategy_or_merger` `instance-attribute` ¶

`_node_memory = OMem(memory_schema=node_schema, key_extractor=node_key_extractor, llm_client=llm_client, embedder=embedder, strategy_or_merger=(self.node_merger), verbose=verbose, fields_for_index=node_fields_for_index)` `instance-attribute` ¶

`_edge_memory = OMem(memory_schema=edge_schema, key_extractor=edge_key_extractor, llm_client=llm_client, embedder=embedder, strategy_or_merger=(self.edge_merger), verbose=verbose, fields_for_index=edge_fields_for_index)` `instance-attribute` ¶

`_node_label_extractor = node_label_extractor` `instance-attribute` ¶

`_edge_label_extractor = edge_label_extractor` `instance-attribute` ¶

`node_prompt = prompt_for_node_extraction or DEFAULT_NODE_PROMPT` `instance-attribute` ¶

`edge_prompt = prompt_for_edge_extraction or DEFAULT_EDGE_PROMPT` `instance-attribute` ¶

`prompt_template = ChatPromptTemplate.from_template(self.node_prompt)` `instance-attribute` ¶

`node_extractor = self.prompt_template | self.llm_client.with_structured_output(self.node_list_schema)` `instance-attribute` ¶

`edge_prompt_template = ChatPromptTemplate.from_template(self.edge_prompt)` `instance-attribute` ¶

`edge_extractor = self.edge_prompt_template | self.llm_client.with_structured_output(self.edge_list_schema)` `instance-attribute` ¶

`data: AutoGraphSchema[NodeSchema, EdgeSchema]` `property` ¶

Returns the current graph state (nodes and edges).

Returns:

Type	Description
`AutoGraphSchema[NodeSchema, EdgeSchema]`	AutoGraphSchema containing all nodes and edges.

`nodes: List[NodeSchema]` `property` ¶

Returns the current node collection.

Returns:

Type	Description
`List[NodeSchema]`	List of nodes.

`edges: List[EdgeSchema]` `property` ¶

Returns the current edge collection.

Returns:

Type	Description
`List[EdgeSchema]`	List of edges.

Functions¶

init(node_schema: Type[NodeSchema], edge_schema: Type[EdgeSchema], node_key_extractor: Callable[[NodeSchema], str], edge_key_extractor: Callable[[EdgeSchema], str], nodes_in_edge_extractor: Callable[[EdgeSchema], Tuple[str, str]], llm_client: BaseChatModel, embedder: Embeddings, *, extraction_mode: str = 'one_stage', node_strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, edge_strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, prompt: str = '', prompt_for_node_extraction: str = '', prompt_for_edge_extraction: str = '', node_label_extractor: Callable[[NodeSchema], str] = None, edge_label_extractor: Callable[[EdgeSchema], str] = None, chunk_size: int = 2048, chunk_overlap: int = 256, max_workers: int = 10, verbose: bool = False, node_fields_for_index: List[str] | None = None, edge_fields_for_index: List[str] | None = None, **kwargs: Any) ¶

Initialize AutoGraph with node/edge schemas and configuration.

Parameters:

Name	Type	Description	Default
`node_schema`	`Type[NodeSchema]`	Pydantic BaseModel for nodes/entities.	required
`edge_schema`	`Type[EdgeSchema]`	Pydantic BaseModel for edges/relationships.	required
`node_key_extractor`	`Callable[[NodeSchema], str]`	Function to extract unique key from node (e.g., lambda x: x.id).	required
`edge_key_extractor`	`Callable[[EdgeSchema], str]`	Function to extract unique key from edge (e.g., lambda x: f"{x.src}-{x.rel}-{x.dst}").	required
`nodes_in_edge_extractor`	`Callable[[EdgeSchema], Tuple[str, str]]`	Function to extract (source_key, target_key) node keys from an edge for validation.	required
`llm_client`	`BaseChatModel`	Language model client for extraction.	required
`embedder`	`Embeddings`	Embedding model for vector indexing.	required
`extraction_mode`	`str`	"one_stage" (extract nodes+edges together) or "two_stage" (nodes first, then edges).	`'one_stage'`
`node_strategy_or_merger`	`MergeStrategy \| BaseMerger`	Merge strategy for duplicate nodes (default: LLM.BALANCED).	`BALANCED`
`edge_strategy_or_merger`	`MergeStrategy \| BaseMerger`	Merge strategy for duplicate edges (default: LLM.BALANCED).	`BALANCED`
`prompt`	`str`	Custom extraction prompt for one-stage mode.	`''`
`prompt_for_node_extraction`	`str`	Custom extraction prompt for two-stage node extraction.	`''`
`prompt_for_edge_extraction`	`str`	Custom extraction prompt for two-stage edge extraction.	`''`
`node_label_extractor`	`Callable[[NodeSchema], str]`	Optional function to extract label from node for visualization.	`None`
`edge_label_extractor`	`Callable[[EdgeSchema], str]`	Optional function to extract label from edge for visualization.	`None`
`chunk_size`	`int`	Maximum characters per chunk.	`2048`
`chunk_overlap`	`int`	Overlapping characters between chunks.	`256`
`max_workers`	`int`	Maximum concurrent extraction tasks.	`10`
`verbose`	`bool`	Whether to log progress.	`False`
`node_fields_for_index`	`List[str] \| None`	Optional list of field names in node_schema to include in vector index. If None, all text fields are indexed by default. Example: ['name', 'description'] (only index these node fields)	`None`
`edge_fields_for_index`	`List[str] \| None`	Optional list of field names in edge_schema to include in vector index. If None, all text fields are indexed by default. Example: ['relation_type', 'description'] (only index these edge fields)	`None`
`**kwargs`	`Any`	Additional arguments passed to create_merger() when strategy_or_merger is a MergeStrategy enum. Ignored if strategy_or_merger is a BaseMerger instance.	`{}`

`_default_prompt() -> str` ¶

Returns the default prompt for one-stage graph extraction.

`_create_empty_instance() -> AutoGraph[NodeSchema, EdgeSchema]` ¶

Creates a new empty AutoGraph instance with the same configuration as this one.

Overrides parent method to handle AutoGraph-specific parameters.

Returns:

Type	Description
`AutoGraph[NodeSchema, EdgeSchema]`	A new empty AutoGraph instance with identical configuration.

`empty() -> bool` ¶

Checks if the graph is empty (no nodes).

Returns:

Type	Description
`bool`	True if node collections are empty, False otherwise.

`_init_data_state() -> None` ¶

Initialize or reset graph data structures.

`_init_index_state() -> None` ¶

Initialize vector index to empty state.

`_set_data_state(data: AutoGraphSchema[NodeSchema, EdgeSchema]) -> None` ¶

Replace graph data with new data (full reset).

Parameters:

Name	Type	Description	Default
`data`	`AutoGraphSchema[NodeSchema, EdgeSchema]`	New graph data to set.	required

`_update_data_state(incoming_data: AutoGraphSchema[NodeSchema, EdgeSchema]) -> None` ¶

Merge incoming graph data into current state.

Parameters:

Name	Type	Description	Default
`incoming_data`	`AutoGraphSchema[NodeSchema, EdgeSchema]`	Incremental graph data to merge.	required

`_extract_data(text: str) -> AutoGraphSchema[NodeSchema, EdgeSchema]` ¶

Main extraction logic dispatcher.

Parameters:

Name	Type	Description	Default
`text`	`str`	Input text to extract graph from.	required

Returns:

Type	Description
`AutoGraphSchema[NodeSchema, EdgeSchema]`	Extracted and validated graph.

`_extract_data_by_one_stage(text: str) -> AutoGraphSchema[NodeSchema, EdgeSchema]` ¶

Extract nodes and edges simultaneously using single LLM call.

Parameters:

Name	Type	Description	Default
`text`	`str`	Input text.	required

Returns:

Type	Description
`AutoGraphSchema[NodeSchema, EdgeSchema]`	Raw extracted graph data.

`_extract_data_by_two_stage(text: str) -> AutoGraphSchema[NodeSchema, EdgeSchema]` ¶

Extract nodes first, then edges with node context (batch processing).

Process: 1. Split text into chunks. 2. Batch extract nodes for all chunks. 3. Batch extract edges for all chunks (using chunk-specific nodes as context). 4. Construct partial graphs (tuples of nodes/edges). 5. Merge all partial graphs into one global graph.

Parameters:

Name	Type	Description	Default
`text`	`str`	Input text.	required

Returns:

Type	Description
`AutoGraphSchema[NodeSchema, EdgeSchema]`	Extracted and validated graph.

`_extract_nodes_batch(chunks: List[str]) -> List[NodeListSchema[NodeSchema]]` ¶

Batch extract nodes from multiple text chunks.

Parameters:

Name	Type	Description	Default
`chunks`	`List[str]`	List of text chunks.	required

Returns:

Type	Description
`List[NodeListSchema[NodeSchema]]`	List of NodeListSchema objects with extracted nodes.

`_extract_edges_batch(chunks: List[str], node_lists: List[NodeListSchema[NodeSchema]]) -> List[EdgeListSchema[EdgeSchema]]` ¶

Batch extract edges using corresponding node lists as context.

Parameters:

Name	Type	Description	Default
`chunks`	`List[str]`	List of text chunks.	required
`node_lists`	`List[NodeListSchema[NodeSchema]]`	List of NodeListSchema objects (one per chunk).	required

Returns:

Type	Description
`List[EdgeListSchema[EdgeSchema]]`	List of EdgeListSchema objects with extracted edges.

`_prune_dangling_edges(graph: AutoGraphSchema[NodeSchema, EdgeSchema]) -> AutoGraphSchema[NodeSchema, EdgeSchema]` ¶

Prune edges that connect to non-existent nodes (Consistency Check).

Ensures graph consistency by removing any edges where either endpoint (source or target) does not exist in the node list.

Parameters:

Name	Type	Description	Default
`graph`	`AutoGraphSchema[NodeSchema, EdgeSchema]`	Raw graph that may contain dangling edges.	required

Returns:

Type	Description
`AutoGraphSchema[NodeSchema, EdgeSchema]`	Graph with only valid edges (endpoints must strictly exist in nodes).

`merge_batch_data(data_list_or_tuple: List[AutoGraphSchema[NodeSchema, EdgeSchema]] | Tuple[List[List[NodeSchema]], List[List[EdgeSchema]]]) -> AutoGraphSchema[NodeSchema, EdgeSchema]` ¶

Merge multiple graphs or node/edge tuples into one.

Supports two input formats: - List of AutoGraphSchema objects (standard format) - Tuple of (List[List[NodeSchema]], List[List[EdgeSchema]]) (optimization for batch processing)

Parameters:

Name	Type	Description	Default
`data_list_or_tuple`	`List[AutoGraphSchema[NodeSchema, EdgeSchema]] \| Tuple[List[List[NodeSchema]], List[List[EdgeSchema]]]`	Either a list of AutoGraphSchema objects or a tuple of (nodes_lists, edges_lists) where each list contains items from multiple chunks.	required

Returns:

Type	Description
`AutoGraphSchema[NodeSchema, EdgeSchema]`	Merged graph.

`build_index(index_nodes: bool = True, index_edges: bool = True)` ¶

Build vector index for graph search.

By default, builds indices for both nodes and edges to support comprehensive search.

Parameters:

Name	Type	Description	Default
`index_nodes`	`bool`	Whether to index nodes (default: True).	`True`
`index_edges`	`bool`	Whether to index edges (default: True).	`True`

`build_node_index() -> None` ¶

Build vector index specifically for nodes.

`build_edge_index() -> None` ¶

Build vector index specifically for edges.

`search(query: str, top_k_nodes: int = 3, top_k_edges: int = 3, top_k: int | None = None) -> Tuple[List[NodeSchema], List[EdgeSchema]]` ¶

Unified graph search interface.

Retrieves nodes and edges semantically related to the query. Always returns a tuple of (nodes, edges). If a count is 0, the corresponding list will be empty.

Parameters:

Name	Type	Description	Default
`query`	`str`	Search query string.	required
`top_k_nodes`	`int`	Number of node results to return (default: 3). Set to 0 to disable node search.	`3`
`top_k_edges`	`int`	Number of edge results to return (default: 3). Set to 0 to disable edge search.	`3`
`top_k`	`int \| None`	If provided, sets both top_k_nodes and top_k_edges to this value.	`None`

Returns:

Type	Description
`Tuple[List[NodeSchema], List[EdgeSchema]]`	Tuple[List[NodeSchema], List[EdgeSchema]]: A tuple containing: - List of matching nodes (empty if top_k_nodes <= 0) - List of matching edges (empty if top_k_edges <= 0)

Raises:

Type	Description
`ValueError`	If both top_k_nodes and top_k_edges are <= 0.
`ValueError`	If search is requested (top_k > 0) but the corresponding index is not built.

`search_nodes(query: str, top_k: int = 3) -> List[NodeSchema]` ¶

Semantic search for nodes/entities only.

Parameters:

Name	Type	Description	Default
`query`	`str`	Search query string.	required
`top_k`	`int`	Number of results to return (default: 3).	`3`

Returns:

Type	Description
`List[NodeSchema]`	List of matching nodes using semantic similarity.

`search_edges(query: str, top_k: int = 3) -> List[EdgeSchema]` ¶

Semantic search for edges/relationships only.

Parameters:

Name	Type	Description	Default
`query`	`str`	Search query string.	required
`top_k`	`int`	Number of results to return (default: 3).	`3`

Returns:

Type	Description
`List[EdgeSchema]`	List of matching edges using semantic similarity.

`chat(query: str, top_k: int | None = None, top_k_nodes: int = 3, top_k_edges: int = 3) -> AIMessage` ¶

Performs a chat-like interaction using graph knowledge.

Retrieves relevant nodes and/or edges based on the query and retrieval counts, formats them into a structured context with clear headers, and generates an answer.

Parameters:

Name	Type	Description	Default
`query`	`str`	User query string.	required
`top_k`	`int \| None`	Number of relevant items to retrieve for both nodes and edges (default: 3). If provided, sets both top_k_nodes and top_k_edges to this value.	`None`
`top_k_nodes`	`int`	Number of relevant nodes to retrieve (default: 3). Set to 0 to disable node context.	`3`
`top_k_edges`	`int`	Number of relevant edges to retrieve (default: 3). Set to 0 to disable edge context.	`3`

Returns:

Type	Description
`AIMessage`	An AIMessage object containing the LLM-generated response.
`AIMessage`	Access the text content via response.content.

Example

Chat using 5 nodes and 2 edges as context¶

response = ka.chat("What is X?", top_k_nodes=5, top_k_edges=2) print(response.content) # Print the generated answer

`dump_index(folder_path: str | Path) -> None` ¶

Save indices to disk.

`load_index(folder_path: str | Path) -> None` ¶

Load indices from disk.

`show(node_label_extractor: Callable[[NodeSchema], str] = None, edge_label_extractor: Callable[[EdgeSchema], str] = None, *, top_k_nodes_for_search: int = 3, top_k_edges_for_search: int = 3, top_k_nodes_for_chat: int = 3, top_k_edges_for_chat: int = 3) -> None` ¶

Visualize the graph using OntoSight.

Parameters:

Name	Type	Description	Default
`node_label_extractor`	`Callable[[NodeSchema], str]`	Optional function to extract label from node for visualization. If not provided, uses the one from init.	`None`
`edge_label_extractor`	`Callable[[EdgeSchema], str]`	Optional function to extract label from edge for visualization. If not provided, uses the one from init.	`None`
`top_k_nodes_for_search`	`int`	Number of nodes to retrieve for search callback (default: 3).	`3`
`top_k_edges_for_search`	`int`	Number of edges to retrieve for search callback (default: 3).	`3`
`top_k_nodes_for_chat`	`int`	Number of nodes to retrieve for chat callback (default: 3).	`3`
`top_k_edges_for_chat`	`int`	Number of edges to retrieve for chat callback (default: 3).	`3`

AutoHypergraph¶

`hyperextract.types.AutoHypergraph` ¶

Bases: BaseAutoType[AutoHypergraphSchema[NodeSchema, EdgeSchema]], Generic[NodeSchema, EdgeSchema]

AutoHypergraph - Node-First extraction: Extract entities first, then relationships.

Extraction Strategy: Two-Stage (Node-First) 1. Stage 1: Extract all Nodes (entities) from text. 2. Stage 2: Extract Edges (relationships) using identified Nodes as context.

Suitable for: - Events with multiple participants (e.g., meetings, transactions) - Group memberships - Complex dependencies

The nodes_in_edge_extractor must return a Tuple of ALL node keys involved in the edge. Validation uses 'Strict Mode': checks if ALL nodes in that tuple exist.

Example

class Person(BaseModel): ... name: str ... class Event(BaseModel): ... label: str # e.g., "meeting", "collaboration" ... participants: List[str] # IDs of people involved ... hypergraph = AutoHypergraph( ... node_schema=Person, ... edge_schema=Event, ... node_key_extractor=lambda x: x.name, ... # CRITICAL: Sort participants to ensure {A, B} and {B, A} map to same key ... edge_key_extractor=lambda x: f"{x.label}_{sorted(x.participants)}", ... nodes_in_edge_extractor=lambda x: tuple(x.participants), ... llm_client=llm, ... embedder=embedder, ... extraction_mode="two_stage" ... )

Attributes¶

`graph_schema: Type[AutoHypergraphSchema[NodeSchema, EdgeSchema]] = create_model(graph_schema_name, nodes=(List[node_schema], Field(default_factory=list)), edges=(List[edge_schema], Field(default_factory=list)))` `instance-attribute` ¶

`node_schema = node_schema` `instance-attribute` ¶

`edge_schema = edge_schema` `instance-attribute` ¶

`node_key_extractor = node_key_extractor` `instance-attribute` ¶

`edge_key_extractor = edge_key_extractor` `instance-attribute` ¶

`nodes_in_edge_extractor = nodes_in_edge_extractor` `instance-attribute` ¶

`extraction_mode = extraction_mode` `instance-attribute` ¶

`chunk_size = chunk_size` `instance-attribute` ¶

`chunk_overlap = chunk_overlap` `instance-attribute` ¶

`max_workers = max_workers` `instance-attribute` ¶

`verbose = verbose` `instance-attribute` ¶

`node_fields_for_index = node_fields_for_index` `instance-attribute` ¶

`edge_fields_for_index = edge_fields_for_index` `instance-attribute` ¶

`node_list_schema = create_model(f'{node_schema.name}List', items=(List[node_schema], Field(default_factory=list)))` `instance-attribute` ¶

`edge_list_schema = create_model(f'{edge_schema.name}List', items=(List[edge_schema], Field(default_factory=list)))` `instance-attribute` ¶

`node_merger = node_strategy_or_merger` `instance-attribute` ¶

`edge_merger = edge_strategy_or_merger` `instance-attribute` ¶

`_node_memory = OMem(memory_schema=node_schema, key_extractor=node_key_extractor, llm_client=llm_client, embedder=embedder, strategy_or_merger=(self.node_merger), verbose=verbose, fields_for_index=node_fields_for_index)` `instance-attribute` ¶

`_edge_memory = OMem(memory_schema=edge_schema, key_extractor=edge_key_extractor, llm_client=llm_client, embedder=embedder, strategy_or_merger=(self.edge_merger), verbose=verbose, fields_for_index=edge_fields_for_index)` `instance-attribute` ¶

`_node_label_extractor = node_label_extractor` `instance-attribute` ¶

`_edge_label_extractor = edge_label_extractor` `instance-attribute` ¶

`node_prompt = prompt_for_node_extraction or DEFAULT_NODE_PROMPT` `instance-attribute` ¶

`edge_prompt = prompt_for_edge_extraction or DEFAULT_EDGE_PROMPT` `instance-attribute` ¶

`prompt_template = ChatPromptTemplate.from_template(self.node_prompt)` `instance-attribute` ¶

`node_extractor = self.prompt_template | self.llm_client.with_structured_output(self.node_list_schema)` `instance-attribute` ¶

`edge_prompt_template = ChatPromptTemplate.from_template(self.edge_prompt)` `instance-attribute` ¶

`edge_extractor = self.edge_prompt_template | self.llm_client.with_structured_output(self.edge_list_schema)` `instance-attribute` ¶

`data: AutoHypergraphSchema` `property` ¶

Returns the current hypergraph state.

`nodes: List[NodeSchema]` `property` ¶

Returns the current node collection.

`edges: List[EdgeSchema]` `property` ¶

Returns the current hyperedge collection.

Functions¶

init(node_schema: Type[NodeSchema], edge_schema: Type[EdgeSchema], node_key_extractor: Callable[[NodeSchema], str], edge_key_extractor: Callable[[EdgeSchema], str], nodes_in_edge_extractor: Callable[[EdgeSchema], Tuple[str, ...]], llm_client: BaseChatModel, embedder: Embeddings, *, extraction_mode: str = 'two_stage', node_strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, edge_strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, prompt: str = '', prompt_for_node_extraction: str = '', prompt_for_edge_extraction: str = '', node_label_extractor: Callable[[NodeSchema], str] = None, edge_label_extractor: Callable[[EdgeSchema], str] = None, chunk_size: int = 2048, chunk_overlap: int = 256, max_workers: int = 10, verbose: bool = False, node_fields_for_index: List[str] | None = None, edge_fields_for_index: List[str] | None = None, **kwargs: Any) ¶

Initialize AutoHypergraph with node/edge schemas and configuration.

Parameters:

Name	Type	Description	Default
`node_schema`	`Type[NodeSchema]`	Pydantic BaseModel for nodes/entities.	required
`edge_schema`	`Type[EdgeSchema]`	Pydantic BaseModel for hyperedges/relationships.	required
`node_key_extractor`	`Callable[[NodeSchema], str]`	Function to extract unique key from node.	required
`edge_key_extractor`	`Callable[[EdgeSchema], str]`	Function to extract unique key from hyperedge. CRITICAL: Since hyperedges are unordered sets of nodes (i.e., {A, B} == {B, A}), you MUST sort the participating node keys in this function to ensure consistent deduplication. For example: lambda x: f"{x.relation_type}_{sorted(x.participants)}" Without sorting, two hyperedges with the same nodes in different order will be treated as different edges, breaking the deduplication mechanism.	required
`nodes_in_edge_extractor`	`Callable[[EdgeSchema], Tuple[str, ...]]`	Function to extract ALL node keys from a hyperedge as a tuple (e.g., lambda x: tuple(x.participants)).	required
`llm_client`	`BaseChatModel`	Language model client for extraction.	required
`embedder`	`Embeddings`	Embedding model for vector indexing.	required
`extraction_mode`	`str`	"two_stage" (recommended) or "one_stage".	`'two_stage'`
`node_strategy_or_merger`	`MergeStrategy \| BaseMerger`	Merge strategy for duplicate nodes.	`BALANCED`
`edge_strategy_or_merger`	`MergeStrategy \| BaseMerger`	Merge strategy for duplicate hyperedges.	`BALANCED`
`prompt`	`str`	Custom extraction prompt for one-stage mode.	`''`
`prompt_for_node_extraction`	`str`	Custom extraction prompt for node extraction.	`''`
`prompt_for_edge_extraction`	`str`	Custom extraction prompt for hyperedge extraction.	`''`
`node_label_extractor`	`Callable[[NodeSchema], str]`	Optional function to extract label from node for visualization.	`None`
`edge_label_extractor`	`Callable[[EdgeSchema], str]`	Optional function to extract label from edge for visualization.	`None`
`chunk_size`	`int`	Maximum characters per chunk.	`2048`
`chunk_overlap`	`int`	Overlapping characters between chunks.	`256`
`max_workers`	`int`	Maximum concurrent extraction tasks.	`10`
`verbose`	`bool`	Whether to display detailed execution logs and progress information.	`False`
`node_fields_for_index`	`List[str] \| None`	Optional list of field names in node_schema to include in vector index. If None, all text fields are indexed by default. Example: ['name', 'properties'] (only index these node fields)	`None`
`edge_fields_for_index`	`List[str] \| None`	Optional list of field names in edge_schema to include in vector index. If None, all text fields are indexed by default. Example: ['summary', 'type'] (only index these edge fields)	`None`
`**kwargs`	`Any`	Additional arguments for merger creation.	`{}`

`_default_prompt() -> str` ¶

Returns the default prompt for one-stage hypergraph extraction.

This is the primary prompt used when extraction_mode is 'one_stage'.

`_create_empty_instance() -> AutoHypergraph[NodeSchema, EdgeSchema]` ¶

Creates a new empty AutoHypergraph instance with the same configuration.

Overrides parent method to handle AutoHypergraph-specific parameters.

Returns:

Type	Description
`AutoHypergraph[NodeSchema, EdgeSchema]`	A new empty AutoHypergraph instance with identical configuration.

`_init_data_state() -> None` ¶

Initialize or reset hypergraph data structures.

`_init_index_state() -> None` ¶

Initialize vector index to empty state.

`_set_data_state(data: AutoHypergraphSchema) -> None` ¶

Replace hypergraph data with new data (full reset).

`_update_data_state(incoming_data: AutoHypergraphSchema) -> None` ¶

Merge incoming hypergraph data into current state.

`empty() -> bool` ¶

Checks if the hypergraph is empty.

`_extract_data(text: str) -> AutoHypergraphSchema` ¶

Main extraction logic dispatcher.

`_extract_data_by_one_stage(text: str) -> AutoHypergraphSchema` ¶

Extract nodes and edges simultaneously using single LLM call.

Parameters:

Name	Type	Description	Default
`text`	`str`	Input text.	required

Returns:

Type	Description
`AutoHypergraphSchema`	Raw extracted hypergraph data.

`_extract_data_by_two_stage(text: str) -> AutoHypergraphSchema` ¶

Extract nodes first, then hyperedges with node context (batch processing).

Process: 1. Split text into chunks. 2. Batch extract nodes for all chunks. 3. Batch extract hyperedges for all chunks (using chunk-specific nodes as context). 4. Merge all partial results into one global hypergraph.

`_extract_nodes_batch(chunks: List[str]) -> List[NodeListSchema[NodeSchema]]` ¶

Batch extract nodes from multiple text chunks.

`_extract_edges_batch(chunks: List[str], node_lists: List[NodeListSchema[NodeSchema]]) -> List[EdgeListSchema[EdgeSchema]]` ¶

Batch extract hyperedges using corresponding node lists as context.

`_prune_dangling_edges(graph: AutoHypergraphSchema) -> AutoHypergraphSchema` ¶

Prune hyperedges where ANY participating node does not exist (Strict Consistency).

Parameters:

Name	Type	Description	Default
`graph`	`AutoHypergraphSchema`	Raw hypergraph that may contain dangling hyperedges.	required

Returns:

Type	Description
`AutoHypergraphSchema`	Hypergraph with only valid hyperedges (ALL participants must exist in nodes).

`merge_batch_data(data_list_or_tuple: List[AutoHypergraphSchema] | Tuple[List[List[NodeSchema]], List[List[EdgeSchema]]]) -> AutoHypergraphSchema` ¶

Merge multiple hypergraphs or node/edge tuples into one.

Supports two input formats: - List of AutoHypergraphSchema objects (standard format) - Tuple of (List[List[Node]], List[List[Edge]]) (optimization for batch processing)

Parameters:

Name	Type	Description	Default
`data_list_or_tuple`	`List[AutoHypergraphSchema] \| Tuple[List[List[NodeSchema]], List[List[EdgeSchema]]]`	Either a list of AutoHypergraphSchema objects or a tuple of (nodes_lists, edges_lists) where each list contains items from multiple chunks.	required

Returns:

Type	Description
`AutoHypergraphSchema`	Merged hypergraph.

`build_index(index_nodes: bool = True, index_edges: bool = True)` ¶

Build vector index for hypergraph search.

By default, builds indices for both nodes and hyperedges to support comprehensive search.

Parameters:

Name	Type	Description	Default
`index_nodes`	`bool`	Whether to index nodes (default: True).	`True`
`index_edges`	`bool`	Whether to index hyperedges (default: True).	`True`

`build_node_index() -> None` ¶

Build vector index specifically for nodes.

`build_edge_index() -> None` ¶

Build vector index specifically for hyperedges.

`search(query: str, top_k_nodes: int = 3, top_k_edges: int = 3, top_k: int | None = None) -> Tuple[List[NodeSchema], List[EdgeSchema]]` ¶

Unified hypergraph search interface.

Retrieves nodes and hyperedges semantically related to the query. Always returns a tuple of (nodes, edges). If a count is 0, the corresponding list will be empty.

Parameters:

Name	Type	Description	Default
`query`	`str`	Search query string.	required
`top_k_nodes`	`int`	Number of node results to return (default: 3). Set to 0 to disable node search.	`3`
`top_k_edges`	`int`	Number of hyperedge results to return (default: 3). Set to 0 to disable hyperedge search.	`3`
`top_k`	`int \| None`	If provided, sets both top_k_nodes and top_k_edges to this value.	`None`

Returns:

Type	Description
`Tuple[List[NodeSchema], List[EdgeSchema]]`	Tuple[List[Node], List[Edge]]: A tuple containing: - List of matching nodes (empty if top_k_nodes <= 0) - List of matching hyperedges (empty if top_k_edges <= 0)

Raises:

Type	Description
`ValueError`	If both top_k_nodes and top_k_edges are <= 0.
`ValueError`	If search is requested (top_k > 0) but the corresponding index is not built.

`search_nodes(query: str, top_k: int = 3) -> List[NodeSchema]` ¶

Semantic search for nodes/entities only.

`search_edges(query: str, top_k: int = 3) -> List[EdgeSchema]` ¶

Semantic search for hyperedges/relationships only.

`chat(query: str, top_k_nodes: int = 3, top_k_edges: int = 3, top_k: int | None = None) -> AIMessage` ¶

Performs a chat-like interaction using hypergraph knowledge.

Retrieves relevant nodes and/or hyperedges based on the query and retrieval counts, formats them into a structured context with clear headers, and generates an answer.

Parameters:

Name	Type	Description	Default
`query`	`str`	User query string.	required
`top_k_nodes`	`int`	Number of relevant nodes to retrieve (default: 3). Set to 0 to disable node context.	`3`
`top_k_edges`	`int`	Number of relevant hyperedges to retrieve (default: 3). Set to 0 to disable edge context.	`3`
`top_k`	`int \| None`	If provided, sets both top_k_nodes and top_k_edges to this value.	`None`

Returns:

Type	Description
`AIMessage`	An AIMessage object containing the LLM-generated response.
`AIMessage`	Access the text content via response.content.

Example

Chat using 5 nodes and 2 hyperedges as context¶

response = hg.chat("What participates in X?", top_k_nodes=5, top_k_edges=2) print(response.content) # Print the generated answer

`dump_index(folder_path: str | Path) -> None` ¶

Save indices to disk.

`load_index(folder_path: str | Path) -> None` ¶

Load indices from disk.

`show(node_label_extractor: Callable[[NodeSchema], str] = None, edge_label_extractor: Callable[[EdgeSchema], str] = None, *, top_k_nodes_for_search: int = 3, top_k_edges_for_search: int = 3, top_k_nodes_for_chat: int = 3, top_k_edges_for_chat: int = 3) -> None` ¶

Visualize the hypergraph using OntoSight.

Parameters:

Name	Type	Description	Default
`node_label_extractor`	`Callable[[NodeSchema], str]`	Optional function to extract label from node for visualization. If not provided, uses the one from init.	`None`
`edge_label_extractor`	`Callable[[EdgeSchema], str]`	Optional function to extract label from edge for visualization. If not provided, uses the one from init.	`None`
`top_k_nodes_for_search`	`int`	Number of nodes to retrieve for search callback (default: 3).	`3`
`top_k_edges_for_search`	`int`	Number of edges to retrieve for search callback (default: 3).	`3`
`top_k_nodes_for_chat`	`int`	Number of nodes to retrieve for chat callback (default: 3).	`3`
`top_k_edges_for_chat`	`int`	Number of edges to retrieve for chat callback (default: 3).	`3`

AutoTemporalGraph¶

`hyperextract.types.AutoTemporalGraph` ¶

Bases: AutoGraph[NodeSchema, EdgeSchema]

Generic Temporal Graph Extractor (AutoTemporalGraph).

A flexible implementation supporting user-defined Node and Edge schemas with temporal awareness: - Schema Agnosticism: Support any user-defined Node and Edge Pydantic models. - Temporal-Aware Deduplication: Time information is integrated into edge deduplication logic. - Dynamic Time Injection: Observation Date is injected during extraction for relative time resolution.

Key Design: - temporal_edge_key_extractor: A unified function to extract the unique key for an edge, including temporal components (e.g., lambda x: f"{x.src}|{x.relation}|{x.dst}|{x.year}"). This ensures (A, rel, B) @ 2020 and (A, rel, B) @ 2021 are treated as different edges. - Unified Prompt Management: Temporal system prompts are prepended to any user-provided prompts.

Example

from pydantic import BaseModel, Field

class MyEntity(BaseModel): ... name: str ... category: str = "Unknown"

class MyTemporalEdge(BaseModel): ... src: str ... dst: str ... relation: str ... year: Optional[str] = None

kg = AutoTemporalGraph( ... node_schema=MyEntity, ... edge_schema=MyTemporalEdge, ... node_key_extractor=lambda x: x.name, ... edge_key_extractor=lambda x: f"{x.src}|{x.relation}|{x.dst}", ... time_in_edge_extractor=lambda x: x.year or "", ... nodes_in_edge_extractor=lambda x: (x.src, x.dst), ... llm_client=llm, ... embedder=embedder, ... observation_time="2024-01-15" ... )

Attributes¶

`observation_time = observation_time or datetime.now().strftime('%Y-%m-%d')` `instance-attribute` ¶

`raw_edge_key_extractor = edge_key_extractor` `instance-attribute` ¶

`time_in_edge_extractor = time_in_edge_extractor` `instance-attribute` ¶

`_constructor_kwargs = kwargs` `instance-attribute` ¶

Functions¶

init(node_schema: Type[NodeSchema], edge_schema: Type[EdgeSchema], node_key_extractor: Callable[[NodeSchema], str], edge_key_extractor: Callable[[EdgeSchema], str], time_in_edge_extractor: Callable[[EdgeSchema], str], nodes_in_edge_extractor: Callable[[EdgeSchema], Tuple[str, str]], llm_client: BaseChatModel, embedder: Embeddings, observation_time: str | None = None, extraction_mode: str = 'two_stage', node_strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, edge_strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, prompt_for_node_extraction: str = '', prompt_for_edge_extraction: str = '', prompt: str = '', chunk_size: int = 2048, chunk_overlap: int = 256, max_workers: int = 10, verbose: bool = False, node_fields_for_index: List[str] | None = None, edge_fields_for_index: List[str] | None = None, **kwargs: Any) ¶

Initialize AutoTemporalGraph.

Parameters:

Name	Type	Description	Default
`node_schema`	`Type[NodeSchema]`	User-defined Node Pydantic model.	required
`edge_schema`	`Type[EdgeSchema]`	User-defined Edge Pydantic model with time fields.	required
`node_key_extractor`	`Callable[[NodeSchema], str]`	Function to extract unique key from node (e.g., lambda x: x.name).	required
`edge_key_extractor`	`Callable[[EdgeSchema], str]`	Function to extract the base unique identifier for an edge purely based on entities and relation (e.g., lambda x: f"{x.src}\|{x.relation}\|{x.dst}").	required
`time_in_edge_extractor`	`Callable[[EdgeSchema], str]`	Function to extract the time component from an edge (e.g., lambda x: x.year or "permanent"). This ensures (A, rel, B) @ 2020 and (A, rel, B) @ 2021 are treated as different edges.	required
`nodes_in_edge_extractor`	`Callable[[EdgeSchema], Tuple[str, str]]`	Function to extract (source_key, target_key) from edge.	required
`llm_client`	`BaseChatModel`	LangChain BaseChatModel for extraction.	required
`embedder`	`Embeddings`	LangChain Embeddings for semantic operations.	required
`observation_time`	`str \| None`	Date context for relative time resolution (default: today in YYYY-MM-DD format).	`None`
`extraction_mode`	`str`	"one_stage" or "two_stage" (default: "two_stage").	`'two_stage'`
`node_strategy_or_merger`	`MergeStrategy \| BaseMerger`	Merge strategy for duplicate nodes (default: LLM.BALANCED).	`BALANCED`
`edge_strategy_or_merger`	`MergeStrategy \| BaseMerger`	Merge strategy for duplicate edges (default: LLM.BALANCED).	`BALANCED`
`prompt_for_node_extraction`	`str`	Additional user prompt to append to node extraction system prompt.	`''`
`prompt_for_edge_extraction`	`str`	Additional user prompt to append to edge extraction system prompt.	`''`
`prompt`	`str`	Additional user prompt to append to one-stage graph extraction system prompt.	`''`
`chunk_size`	`int`	Size of text chunks for processing (default: 2048).	`2048`
`chunk_overlap`	`int`	Overlap between text chunks (default: 256).	`256`
`max_workers`	`int`	Maximum number of concurrent LLM calls (default: 10).	`10`
`verbose`	`bool`	Whether to print verbose output (default: False).	`False`
`node_fields_for_index`	`List[str] \| None`	List of node fields to include in vector index (optional).	`None`
`edge_fields_for_index`	`List[str] \| None`	List of edge fields to include in vector index (optional).	`None`
`**kwargs`	`Any`	Additional arguments passed to create_merger() when strategy_or_merger is a MergeStrategy enum. Ignored if strategy_or_merger is a BaseMerger instance.	`{}`

`_extract_edges_batch(chunks: List[str], node_lists: List[NodeListSchema[NodeSchema]]) -> List[EdgeListSchema[EdgeSchema]]` ¶

Override: Inject observation_time into edge extraction during two-stage extraction.

`_extract_data_by_one_stage(text: str) -> Any` ¶

Override: Inject observation_time into one-stage extraction.

`_create_empty_instance() -> AutoTemporalGraph[NodeSchema, EdgeSchema]` ¶

Override: Recreate instance with all temporal-specific attributes.

AutoSpatialGraph¶

`hyperextract.types.AutoSpatialGraph` ¶

Bases: AutoGraph[NodeSchema, EdgeSchema]

Generic Spatial Graph Extractor (AutoSpatialGraph).

A flexible implementation supporting user-defined Node and Edge schemas with spatial awareness: - Schema Agnosticism: Support any user-defined Node and Edge Pydantic models. - Location-Aware Deduplication: Spatial information is integrated into edge deduplication. - Dynamic Localization: Observation Location is injected during extraction for relative resolution.

Key Design: - Decoupled Extractors: location_in_edge_extractor is a separate function. - Spatial Identity: Edges are uniquely identified by (source, relation, target) + location.

Example

from pydantic import BaseModel, Field

class MyEntity(BaseModel): ... name: str ... class MySpatialEdge(BaseModel): ... src: str ... dst: str ... relation: str ... place: Optional[str] = None

kg = AutoSpatialGraph( ... node_schema=MyEntity, ... edge_schema=MySpatialEdge, ... node_key_extractor=lambda x: x.name, ... edge_key_extractor=lambda x: f"{x.src}|{x.relation}|{x.dst}", ... location_in_edge_extractor=lambda x: x.place or "", ... nodes_in_edge_extractor=lambda x: (x.src, x.dst), ... llm_client=llm, ... embedder=embedder, ... observation_location="Main Hall" ... )

Attributes¶

`observation_location = observation_location or 'Unknown Location'` `instance-attribute` ¶

`_constructor_kwargs = kwargs` `instance-attribute` ¶

`raw_edge_key_extractor = edge_key_extractor` `instance-attribute` ¶

`location_in_edge_extractor = location_in_edge_extractor` `instance-attribute` ¶

Functions¶

init(node_schema: Type[NodeSchema], edge_schema: Type[EdgeSchema], node_key_extractor: Callable[[NodeSchema], str], edge_key_extractor: Callable[[EdgeSchema], str], location_in_edge_extractor: Callable[[EdgeSchema], str], nodes_in_edge_extractor: Callable[[EdgeSchema], Tuple[str, str]], llm_client: BaseChatModel, embedder: Embeddings, *, observation_location: str | None = None, extraction_mode: str = 'two_stage', node_strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, edge_strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, prompt_for_node_extraction: str = '', prompt_for_edge_extraction: str = '', prompt: str = '', chunk_size: int = 2048, chunk_overlap: int = 256, max_workers: int = 10, verbose: bool = False, node_fields_for_index: List[str] | None = None, edge_fields_for_index: List[str] | None = None, **kwargs: Any) ¶

Initialize AutoSpatialGraph.

Parameters:

Name	Type	Description	Default
`node_schema`	`Type[NodeSchema]`	User-defined Node Pydantic model.	required
`edge_schema`	`Type[EdgeSchema]`	User-defined Edge Pydantic model with location fields.	required
`node_key_extractor`	`Callable[[NodeSchema], str]`	Function to extract unique key from node.	required
`edge_key_extractor`	`Callable[[EdgeSchema], str]`	Function to extract the base unique identifier for an edge.	required
`location_in_edge_extractor`	`Callable[[EdgeSchema], str]`	Function to extract the spatial component from an edge.	required
`nodes_in_edge_extractor`	`Callable[[EdgeSchema], Tuple[str, str]]`	Function to extract (source_key, target_key) from edge.	required
`llm_client`	`BaseChatModel`	LangChain BaseChatModel for extraction.	required
`embedder`	`Embeddings`	LangChain Embeddings for semantic operations.	required
`observation_location`	`str \| None`	Location context for spatial resolution (default: "Unknown Location").	`None`
`extraction_mode`	`str`	"one_stage" or "two_stage" (default: "two_stage").	`'two_stage'`
`node_strategy_or_merger`	`MergeStrategy \| BaseMerger`	Merge strategy for duplicate nodes (default: LLM.BALANCED).	`BALANCED`
`edge_strategy_or_merger`	`MergeStrategy \| BaseMerger`	Merge strategy for duplicate edges (default: LLM.BALANCED).	`BALANCED`
`prompt_for_node_extraction`	`str`	Additional user prompt to append to node extraction system prompt.	`''`
`prompt_for_edge_extraction`	`str`	Additional user prompt to append to edge extraction system prompt.	`''`
`prompt`	`str`	Additional user prompt to append to one-stage graph extraction system prompt.	`''`
`chunk_size`	`int`	Size of text chunks for processing (default: 2048).	`2048`
`chunk_overlap`	`int`	Overlap between text chunks (default: 256).	`256`
`max_workers`	`int`	Maximum number of concurrent LLM calls (default: 10).	`10`
`verbose`	`bool`	Whether to print verbose output (default: False).	`False`
`node_fields_for_index`	`List[str] \| None`	List of node fields to include in vector index (optional).	`None`
`edge_fields_for_index`	`List[str] \| None`	List of edge fields to include in vector index (optional).	`None`
`**kwargs`	`Any`	Additional arguments passed to create_merger() when strategy_or_merger is a MergeStrategy enum. Ignored if strategy_or_merger is a BaseMerger instance.	`{}`

`_extract_edges_batch(chunks: List[str], node_lists: List[NodeListSchema[NodeSchema]]) -> List[EdgeListSchema[EdgeSchema]]` ¶

Override: Inject observation_location into edge extraction during two-stage extraction.

`_extract_data_by_one_stage(text: str) -> Any` ¶

Override: Inject observation_location into one-stage extraction.

`_create_empty_instance() -> AutoSpatialGraph[NodeSchema, EdgeSchema]` ¶

Override: Recreate instance with spatial attributes preserved.

AutoSpatioTemporalGraph¶

`hyperextract.types.AutoSpatioTemporalGraph` ¶

Bases: AutoGraph[NodeSchema, EdgeSchema]

Generic Spatio-Temporal Graph Extractor (AutoSpatioTemporalGraph).

A flexible implementation supporting user-defined Node and Edge schemas with spatio-temporal awareness: - Schema Agnosticism: Support any user-defined Node and Edge Pydantic models. - Spatio-Temporal-Aware Deduplication: Time and Space info are integrated into edge deduplication. - Dynamic Context Injection: Observation Date and Location are injected during extraction.

Key Design: - Decoupled Extractors: time_in_edge_extractor and location_in_edge_extractor are separate. - Composite Key: Automatically fuses base key, time, and location for unique identification.

Example

from pydantic import BaseModel, Field

class MyEntity(BaseModel): ... name: str

class MySTEdge(BaseModel): ... src: str ... dst: str ... relation: str ... time: Optional[str] = None ... place: Optional[str] = None

kg = AutoSpatioTemporalGraph( ... node_schema=MyEntity, ... edge_schema=MySTEdge, ... node_key_extractor=lambda x: x.name, ... edge_key_extractor=lambda x: f"{x.src}|{x.relation}|{x.dst}", ... time_in_edge_extractor=lambda x: x.time or "", ... location_in_edge_extractor=lambda x: x.place or "", ... nodes_in_edge_extractor=lambda x: (x.src, x.dst), ... llm_client=llm, ... embedder=embedder, ... observation_time="2024-01-15", ... observation_location="Beijing" ... )

Attributes¶

`observation_time = observation_time or datetime.now().strftime('%Y-%m-%d')` `instance-attribute` ¶

`observation_location = observation_location or 'Unknown'` `instance-attribute` ¶

`raw_edge_key_extractor = edge_key_extractor` `instance-attribute` ¶

`time_in_edge_extractor = time_in_edge_extractor` `instance-attribute` ¶

`location_in_edge_extractor = location_in_edge_extractor` `instance-attribute` ¶

`_constructor_kwargs = kwargs` `instance-attribute` ¶

Functions¶

init(node_schema: Type[NodeSchema], edge_schema: Type[EdgeSchema], node_key_extractor: Callable[[NodeSchema], str], edge_key_extractor: Callable[[EdgeSchema], str], time_in_edge_extractor: Callable[[EdgeSchema], str], location_in_edge_extractor: Callable[[EdgeSchema], str], nodes_in_edge_extractor: Callable[[EdgeSchema], Tuple[str, str]], llm_client: BaseChatModel, embedder: Embeddings, observation_time: str | None = None, observation_location: str | None = None, extraction_mode: str = 'two_stage', node_strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, edge_strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, prompt_for_node_extraction: str = '', prompt_for_edge_extraction: str = '', prompt: str = '', chunk_size: int = 2048, chunk_overlap: int = 256, max_workers: int = 10, verbose: bool = False, node_fields_for_index: List[str] | None = None, edge_fields_for_index: List[str] | None = None, **kwargs: Any) ¶

Initialize AutoSpatioTemporalGraph.

Parameters:

Name	Type	Description	Default
`node_schema`	`Type[NodeSchema]`	User-defined Node Pydantic model.	required
`edge_schema`	`Type[EdgeSchema]`	User-defined Edge Pydantic model with time/space fields.	required
`node_key_extractor`	`Callable[[NodeSchema], str]`	Function to extract unique key from node.	required
`edge_key_extractor`	`Callable[[EdgeSchema], str]`	Function to extract the base unique identifier for an edge.	required
`time_in_edge_extractor`	`Callable[[EdgeSchema], str]`	Function to extract the time component.	required
`location_in_edge_extractor`	`Callable[[EdgeSchema], str]`	Function to extract the spatial component.	required
`nodes_in_edge_extractor`	`Callable[[EdgeSchema], Tuple[str, str]]`	Function to extract (source_key, target_key) from edge.	required
`llm_client`	`BaseChatModel`	LangChain BaseChatModel for extraction.	required
`embedder`	`Embeddings`	LangChain Embeddings for semantic operations.	required
`observation_time`	`str \| None`	Date context for relative time resolution (default: today in YYYY-MM-DD format).	`None`
`extraction_mode`	`str`	"one_stage" or "two_stage" (default: "two_stage").	`'two_stage'`
`node_strategy_or_merger`	`MergeStrategy \| BaseMerger`	Merge strategy for duplicate nodes (default: LLM.BALANCED).	`BALANCED`
`edge_strategy_or_merger`	`MergeStrategy \| BaseMerger`	Merge strategy for duplicate edges (default: LLM.BALANCED).	`BALANCED`
`prompt_for_node_extraction`	`str`	Additional user prompt to append to node extraction system prompt.	`''`
`prompt_for_edge_extraction`	`str`	Additional user prompt to append to edge extraction system prompt.	`''`
`prompt`	`str`	Additional user prompt to append to one-stage graph extraction system prompt.	`''`
`chunk_size`	`int`	Size of text chunks for processing (default: 2048).	`2048`
`chunk_overlap`	`int`	Overlap between text chunks (default: 256).	`256`
`max_workers`	`int`	Maximum number of concurrent LLM calls (default: 10).	`10`
`verbose`	`bool`	Whether to print verbose output (default: False).	`False`
`node_fields_for_index`	`List[str] \| None`	List of node fields to include in vector index (optional).	`None`
`edge_fields_for_index`	`List[str] \| None`	List of edge fields to include in vector index (optional).	`None`
`**kwargs`	`Any`	Additional arguments passed to create_merger() when strategy_or_merger is a MergeStrategy enum. Ignored if strategy_or_merger is a BaseMerger instance.	`{}`

`_extract_edges_batch(chunks: List[str], node_lists: List[NodeListSchema[NodeSchema]]) -> List[EdgeListSchema[EdgeSchema]]` ¶

Inject observation_time and observation_location into edge extraction.

`_extract_data_by_one_stage(text: str) -> Any` ¶

Inject observation_time and observation_location into one-stage extraction.

`_create_empty_instance() -> AutoSpatioTemporalGraph[NodeSchema, EdgeSchema]` ¶

Recreate instance with all spatio-temporal attributes.

图谱类型¶

AutoGraph¶

hyperextract.types.AutoGraph ¶

Attributes¶

graph_schema: Type[AutoGraphSchema[NodeSchema, EdgeSchema]] = create_model(graph_schema_name, nodes=(List[node_schema], Field(default_factory=list)), edges=(List[edge_schema], Field(default_factory=list))) instance-attribute ¶

node_schema = node_schema instance-attribute ¶

edge_schema = edge_schema instance-attribute ¶

node_key_extractor = node_key_extractor instance-attribute ¶

edge_key_extractor = edge_key_extractor instance-attribute ¶

nodes_in_edge_extractor = nodes_in_edge_extractor instance-attribute ¶

extraction_mode = extraction_mode instance-attribute ¶

node_fields_for_index = node_fields_for_index instance-attribute ¶

edge_fields_for_index = edge_fields_for_index instance-attribute ¶

_constructor_kwargs = kwargs instance-attribute ¶

node_list_schema = create_model('NodeList', items=(List[node_schema], Field(default_factory=list, description='Extracted nodes'))) instance-attribute ¶

edge_list_schema = create_model('EdgeList', items=(List[edge_schema], Field(default_factory=list, description='Extracted edges'))) instance-attribute ¶

node_merger = node_strategy_or_merger instance-attribute ¶

edge_merger = edge_strategy_or_merger instance-attribute ¶

_node_memory = OMem(memory_schema=node_schema, key_extractor=node_key_extractor, llm_client=llm_client, embedder=embedder, strategy_or_merger=(self.node_merger), verbose=verbose, fields_for_index=node_fields_for_index) instance-attribute ¶

_edge_memory = OMem(memory_schema=edge_schema, key_extractor=edge_key_extractor, llm_client=llm_client, embedder=embedder, strategy_or_merger=(self.edge_merger), verbose=verbose, fields_for_index=edge_fields_for_index) instance-attribute ¶

_node_label_extractor = node_label_extractor instance-attribute ¶

_edge_label_extractor = edge_label_extractor instance-attribute ¶

node_prompt = prompt_for_node_extraction or DEFAULT_NODE_PROMPT instance-attribute ¶

edge_prompt = prompt_for_edge_extraction or DEFAULT_EDGE_PROMPT instance-attribute ¶

prompt_template = ChatPromptTemplate.from_template(self.node_prompt) instance-attribute ¶

node_extractor = self.prompt_template | self.llm_client.with_structured_output(self.node_list_schema) instance-attribute ¶

edge_prompt_template = ChatPromptTemplate.from_template(self.edge_prompt) instance-attribute ¶

edge_extractor = self.edge_prompt_template | self.llm_client.with_structured_output(self.edge_list_schema) instance-attribute ¶

data: AutoGraphSchema[NodeSchema, EdgeSchema] property ¶

nodes: List[NodeSchema] property ¶

edges: List[EdgeSchema] property ¶

Functions¶

_default_prompt() -> str ¶

_create_empty_instance() -> AutoGraph[NodeSchema, EdgeSchema] ¶

empty() -> bool ¶

_init_data_state() -> None ¶

_init_index_state() -> None ¶

_set_data_state(data: AutoGraphSchema[NodeSchema, EdgeSchema]) -> None ¶

_update_data_state(incoming_data: AutoGraphSchema[NodeSchema, EdgeSchema]) -> None ¶

_extract_data(text: str) -> AutoGraphSchema[NodeSchema, EdgeSchema] ¶

_extract_data_by_one_stage(text: str) -> AutoGraphSchema[NodeSchema, EdgeSchema] ¶

_extract_data_by_two_stage(text: str) -> AutoGraphSchema[NodeSchema, EdgeSchema] ¶

_extract_nodes_batch(chunks: List[str]) -> List[NodeListSchema[NodeSchema]] ¶

_extract_edges_batch(chunks: List[str], node_lists: List[NodeListSchema[NodeSchema]]) -> List[EdgeListSchema[EdgeSchema]] ¶

_prune_dangling_edges(graph: AutoGraphSchema[NodeSchema, EdgeSchema]) -> AutoGraphSchema[NodeSchema, EdgeSchema] ¶

merge_batch_data(data_list_or_tuple: List[AutoGraphSchema[NodeSchema, EdgeSchema]] | Tuple[List[List[NodeSchema]], List[List[EdgeSchema]]]) -> AutoGraphSchema[NodeSchema, EdgeSchema] ¶

build_index(index_nodes: bool = True, index_edges: bool = True) ¶

build_node_index() -> None ¶

build_edge_index() -> None ¶

search(query: str, top_k_nodes: int = 3, top_k_edges: int = 3, top_k: int | None = None) -> Tuple[List[NodeSchema], List[EdgeSchema]] ¶

search_nodes(query: str, top_k: int = 3) -> List[NodeSchema] ¶

search_edges(query: str, top_k: int = 3) -> List[EdgeSchema] ¶

chat(query: str, top_k: int | None = None, top_k_nodes: int = 3, top_k_edges: int = 3) -> AIMessage ¶

Chat using 5 nodes and 2 edges as context¶

dump_index(folder_path: str | Path) -> None ¶

load_index(folder_path: str | Path) -> None ¶

show(node_label_extractor: Callable[[NodeSchema], str] = None, edge_label_extractor: Callable[[EdgeSchema], str] = None, *, top_k_nodes_for_search: int = 3, top_k_edges_for_search: int = 3, top_k_nodes_for_chat: int = 3, top_k_edges_for_chat: int = 3) -> None ¶

AutoHypergraph¶

hyperextract.types.AutoHypergraph ¶

Attributes¶

graph_schema: Type[AutoHypergraphSchema[NodeSchema, EdgeSchema]] = create_model(graph_schema_name, nodes=(List[node_schema], Field(default_factory=list)), edges=(List[edge_schema], Field(default_factory=list))) instance-attribute ¶

node_schema = node_schema instance-attribute ¶

edge_schema = edge_schema instance-attribute ¶

node_key_extractor = node_key_extractor instance-attribute ¶

edge_key_extractor = edge_key_extractor instance-attribute ¶

nodes_in_edge_extractor = nodes_in_edge_extractor instance-attribute ¶

extraction_mode = extraction_mode instance-attribute ¶

chunk_size = chunk_size instance-attribute ¶

chunk_overlap = chunk_overlap instance-attribute ¶

max_workers = max_workers instance-attribute ¶

verbose = verbose instance-attribute ¶

node_fields_for_index = node_fields_for_index instance-attribute ¶

edge_fields_for_index = edge_fields_for_index instance-attribute ¶

node_list_schema = create_model(f'{node_schema.__name__}List', items=(List[node_schema], Field(default_factory=list))) instance-attribute ¶

edge_list_schema = create_model(f'{edge_schema.__name__}List', items=(List[edge_schema], Field(default_factory=list))) instance-attribute ¶

node_merger = node_strategy_or_merger instance-attribute ¶

edge_merger = edge_strategy_or_merger instance-attribute ¶

_node_memory = OMem(memory_schema=node_schema, key_extractor=node_key_extractor, llm_client=llm_client, embedder=embedder, strategy_or_merger=(self.node_merger), verbose=verbose, fields_for_index=node_fields_for_index) instance-attribute ¶

_edge_memory = OMem(memory_schema=edge_schema, key_extractor=edge_key_extractor, llm_client=llm_client, embedder=embedder, strategy_or_merger=(self.edge_merger), verbose=verbose, fields_for_index=edge_fields_for_index) instance-attribute ¶

_node_label_extractor = node_label_extractor instance-attribute ¶

`hyperextract.types.AutoGraph` ¶

`graph_schema: Type[AutoGraphSchema[NodeSchema, EdgeSchema]] = create_model(graph_schema_name, nodes=(List[node_schema], Field(default_factory=list)), edges=(List[edge_schema], Field(default_factory=list)))` `instance-attribute` ¶

`node_schema = node_schema` `instance-attribute` ¶

`edge_schema = edge_schema` `instance-attribute` ¶

`node_key_extractor = node_key_extractor` `instance-attribute` ¶

`edge_key_extractor = edge_key_extractor` `instance-attribute` ¶

`nodes_in_edge_extractor = nodes_in_edge_extractor` `instance-attribute` ¶

`extraction_mode = extraction_mode` `instance-attribute` ¶

`node_fields_for_index = node_fields_for_index` `instance-attribute` ¶

`edge_fields_for_index = edge_fields_for_index` `instance-attribute` ¶

`_constructor_kwargs = kwargs` `instance-attribute` ¶

`node_list_schema = create_model('NodeList', items=(List[node_schema], Field(default_factory=list, description='Extracted nodes')))` `instance-attribute` ¶

`edge_list_schema = create_model('EdgeList', items=(List[edge_schema], Field(default_factory=list, description='Extracted edges')))` `instance-attribute` ¶

`node_merger = node_strategy_or_merger` `instance-attribute` ¶

`edge_merger = edge_strategy_or_merger` `instance-attribute` ¶

`_node_memory = OMem(memory_schema=node_schema, key_extractor=node_key_extractor, llm_client=llm_client, embedder=embedder, strategy_or_merger=(self.node_merger), verbose=verbose, fields_for_index=node_fields_for_index)` `instance-attribute` ¶

`_edge_memory = OMem(memory_schema=edge_schema, key_extractor=edge_key_extractor, llm_client=llm_client, embedder=embedder, strategy_or_merger=(self.edge_merger), verbose=verbose, fields_for_index=edge_fields_for_index)` `instance-attribute` ¶

`_node_label_extractor = node_label_extractor` `instance-attribute` ¶

`_edge_label_extractor = edge_label_extractor` `instance-attribute` ¶

`node_prompt = prompt_for_node_extraction or DEFAULT_NODE_PROMPT` `instance-attribute` ¶

`edge_prompt = prompt_for_edge_extraction or DEFAULT_EDGE_PROMPT` `instance-attribute` ¶

`prompt_template = ChatPromptTemplate.from_template(self.node_prompt)` `instance-attribute` ¶

`node_extractor = self.prompt_template | self.llm_client.with_structured_output(self.node_list_schema)` `instance-attribute` ¶

`edge_prompt_template = ChatPromptTemplate.from_template(self.edge_prompt)` `instance-attribute` ¶

`edge_extractor = self.edge_prompt_template | self.llm_client.with_structured_output(self.edge_list_schema)` `instance-attribute` ¶

`data: AutoGraphSchema[NodeSchema, EdgeSchema]` `property` ¶

`nodes: List[NodeSchema]` `property` ¶

`edges: List[EdgeSchema]` `property` ¶

`_default_prompt() -> str` ¶

`_create_empty_instance() -> AutoGraph[NodeSchema, EdgeSchema]` ¶

`empty() -> bool` ¶

`_init_data_state() -> None` ¶

`_init_index_state() -> None` ¶

`_set_data_state(data: AutoGraphSchema[NodeSchema, EdgeSchema]) -> None` ¶

`_update_data_state(incoming_data: AutoGraphSchema[NodeSchema, EdgeSchema]) -> None` ¶

`_extract_data(text: str) -> AutoGraphSchema[NodeSchema, EdgeSchema]` ¶

`_extract_data_by_one_stage(text: str) -> AutoGraphSchema[NodeSchema, EdgeSchema]` ¶

`_extract_data_by_two_stage(text: str) -> AutoGraphSchema[NodeSchema, EdgeSchema]` ¶

`_extract_nodes_batch(chunks: List[str]) -> List[NodeListSchema[NodeSchema]]` ¶

`_extract_edges_batch(chunks: List[str], node_lists: List[NodeListSchema[NodeSchema]]) -> List[EdgeListSchema[EdgeSchema]]` ¶

`_prune_dangling_edges(graph: AutoGraphSchema[NodeSchema, EdgeSchema]) -> AutoGraphSchema[NodeSchema, EdgeSchema]` ¶

`merge_batch_data(data_list_or_tuple: List[AutoGraphSchema[NodeSchema, EdgeSchema]] | Tuple[List[List[NodeSchema]], List[List[EdgeSchema]]]) -> AutoGraphSchema[NodeSchema, EdgeSchema]` ¶

`build_index(index_nodes: bool = True, index_edges: bool = True)` ¶

`build_node_index() -> None` ¶

`build_edge_index() -> None` ¶

`search(query: str, top_k_nodes: int = 3, top_k_edges: int = 3, top_k: int | None = None) -> Tuple[List[NodeSchema], List[EdgeSchema]]` ¶

`search_nodes(query: str, top_k: int = 3) -> List[NodeSchema]` ¶

`search_edges(query: str, top_k: int = 3) -> List[EdgeSchema]` ¶

`chat(query: str, top_k: int | None = None, top_k_nodes: int = 3, top_k_edges: int = 3) -> AIMessage` ¶

`dump_index(folder_path: str | Path) -> None` ¶

`load_index(folder_path: str | Path) -> None` ¶

`show(node_label_extractor: Callable[[NodeSchema], str] = None, edge_label_extractor: Callable[[EdgeSchema], str] = None, *, top_k_nodes_for_search: int = 3, top_k_edges_for_search: int = 3, top_k_nodes_for_chat: int = 3, top_k_edges_for_chat: int = 3) -> None` ¶

`hyperextract.types.AutoHypergraph` ¶

`graph_schema: Type[AutoHypergraphSchema[NodeSchema, EdgeSchema]] = create_model(graph_schema_name, nodes=(List[node_schema], Field(default_factory=list)), edges=(List[edge_schema], Field(default_factory=list)))` `instance-attribute` ¶

`node_schema = node_schema` `instance-attribute` ¶

`edge_schema = edge_schema` `instance-attribute` ¶

`node_key_extractor = node_key_extractor` `instance-attribute` ¶

`edge_key_extractor = edge_key_extractor` `instance-attribute` ¶

`nodes_in_edge_extractor = nodes_in_edge_extractor` `instance-attribute` ¶

`extraction_mode = extraction_mode` `instance-attribute` ¶

`chunk_size = chunk_size` `instance-attribute` ¶

`chunk_overlap = chunk_overlap` `instance-attribute` ¶

`max_workers = max_workers` `instance-attribute` ¶

`verbose = verbose` `instance-attribute` ¶

`node_fields_for_index = node_fields_for_index` `instance-attribute` ¶

`edge_fields_for_index = edge_fields_for_index` `instance-attribute` ¶

`node_list_schema = create_model(f'{node_schema.name}List', items=(List[node_schema], Field(default_factory=list)))` `instance-attribute` ¶

`edge_list_schema = create_model(f'{edge_schema.name}List', items=(List[edge_schema], Field(default_factory=list)))` `instance-attribute` ¶

`node_merger = node_strategy_or_merger` `instance-attribute` ¶

`edge_merger = edge_strategy_or_merger` `instance-attribute` ¶

`_node_memory = OMem(memory_schema=node_schema, key_extractor=node_key_extractor, llm_client=llm_client, embedder=embedder, strategy_or_merger=(self.node_merger), verbose=verbose, fields_for_index=node_fields_for_index)` `instance-attribute` ¶

`_edge_memory = OMem(memory_schema=edge_schema, key_extractor=edge_key_extractor, llm_client=llm_client, embedder=embedder, strategy_or_merger=(self.edge_merger), verbose=verbose, fields_for_index=edge_fields_for_index)` `instance-attribute` ¶

`_node_label_extractor = node_label_extractor` `instance-attribute` ¶

`_edge_label_extractor = edge_label_extractor` `instance-attribute` ¶

`node_prompt = prompt_for_node_extraction or DEFAULT_NODE_PROMPT` `instance-attribute` ¶

`edge_prompt = prompt_for_edge_extraction or DEFAULT_EDGE_PROMPT` `instance-attribute` ¶

`prompt_template = ChatPromptTemplate.from_template(self.node_prompt)` `instance-attribute` ¶

`node_extractor = self.prompt_template | self.llm_client.with_structured_output(self.node_list_schema)` `instance-attribute` ¶

`edge_prompt_template = ChatPromptTemplate.from_template(self.edge_prompt)` `instance-attribute` ¶

`edge_extractor = self.edge_prompt_template | self.llm_client.with_structured_output(self.edge_list_schema)` `instance-attribute` ¶