图谱类型¶
用于表示实体之间关系的自动类型。
AutoGraph¶
hyperextract.types.AutoGraph
¶
Bases: BaseAutoType[AutoGraphSchema[NodeSchema, EdgeSchema]], Generic[NodeSchema, EdgeSchema]
AutoGraph - extracts knowledge graphs with nodes and edges from text.
This pattern extracts structured knowledge graphs consisting of entities (nodes) and their relationships (edges). Suitable for entity relationship extraction, knowledge graph construction, and semantic network building.
Key characteristics
- Extraction target: Graph structure with nodes and edges
- Deduplication: Automatic deduplication for both nodes and edges using OMem
- Node merge strategy: Configurable (LLM-powered intelligent merging by default)
- Edge merge strategy: Configurable (simple merge by default)
- Extraction modes:
- one_stage: Extract nodes and edges simultaneously (faster, simpler prompt)
- two_stage: Extract nodes first, then edges with node context (more accurate)
- Consistency validation: Ensures edges only connect existing nodes
Example
class Entity(BaseModel): ... name: str ... type: str ... properties: dict = {}
class Relation(BaseModel): ... source: str ... target: str ... relation_type: str
graph = AutoGraph( ... node_schema=Entity, ... edge_schema=Relation, ... node_key_extractor=lambda x: x.name, ... edge_key_extractor=lambda x: f"{x.source}-{x.relation_type}-{x.target}", ... nodes_in_edge_extractor=lambda x: (x.source, x.target), ... llm_client=llm, ... embedder=embedder, ... extraction_mode="two_stage" ... )
Attributes¶
graph_schema: Type[AutoGraphSchema[NodeSchema, EdgeSchema]] = create_model(graph_schema_name, nodes=(List[node_schema], Field(default_factory=list)), edges=(List[edge_schema], Field(default_factory=list)))
instance-attribute
¶
node_schema = node_schema
instance-attribute
¶
edge_schema = edge_schema
instance-attribute
¶
node_key_extractor = node_key_extractor
instance-attribute
¶
edge_key_extractor = edge_key_extractor
instance-attribute
¶
nodes_in_edge_extractor = nodes_in_edge_extractor
instance-attribute
¶
extraction_mode = extraction_mode
instance-attribute
¶
node_fields_for_index = node_fields_for_index
instance-attribute
¶
edge_fields_for_index = edge_fields_for_index
instance-attribute
¶
_constructor_kwargs = kwargs
instance-attribute
¶
node_list_schema = create_model('NodeList', items=(List[node_schema], Field(default_factory=list, description='Extracted nodes')))
instance-attribute
¶
edge_list_schema = create_model('EdgeList', items=(List[edge_schema], Field(default_factory=list, description='Extracted edges')))
instance-attribute
¶
node_merger = node_strategy_or_merger
instance-attribute
¶
edge_merger = edge_strategy_or_merger
instance-attribute
¶
_node_memory = OMem(memory_schema=node_schema, key_extractor=node_key_extractor, llm_client=llm_client, embedder=embedder, strategy_or_merger=(self.node_merger), verbose=verbose, fields_for_index=node_fields_for_index)
instance-attribute
¶
_edge_memory = OMem(memory_schema=edge_schema, key_extractor=edge_key_extractor, llm_client=llm_client, embedder=embedder, strategy_or_merger=(self.edge_merger), verbose=verbose, fields_for_index=edge_fields_for_index)
instance-attribute
¶
_node_label_extractor = node_label_extractor
instance-attribute
¶
_edge_label_extractor = edge_label_extractor
instance-attribute
¶
node_prompt = prompt_for_node_extraction or DEFAULT_NODE_PROMPT
instance-attribute
¶
edge_prompt = prompt_for_edge_extraction or DEFAULT_EDGE_PROMPT
instance-attribute
¶
prompt_template = ChatPromptTemplate.from_template(self.node_prompt)
instance-attribute
¶
node_extractor = self.prompt_template | self.llm_client.with_structured_output(self.node_list_schema)
instance-attribute
¶
edge_prompt_template = ChatPromptTemplate.from_template(self.edge_prompt)
instance-attribute
¶
edge_extractor = self.edge_prompt_template | self.llm_client.with_structured_output(self.edge_list_schema)
instance-attribute
¶
data: AutoGraphSchema[NodeSchema, EdgeSchema]
property
¶
Returns the current graph state (nodes and edges).
Returns:
| Type | Description |
|---|---|
AutoGraphSchema[NodeSchema, EdgeSchema]
|
AutoGraphSchema containing all nodes and edges. |
nodes: List[NodeSchema]
property
¶
Returns the current node collection.
Returns:
| Type | Description |
|---|---|
List[NodeSchema]
|
List of nodes. |
edges: List[EdgeSchema]
property
¶
Returns the current edge collection.
Returns:
| Type | Description |
|---|---|
List[EdgeSchema]
|
List of edges. |
Functions¶
__init__(node_schema: Type[NodeSchema], edge_schema: Type[EdgeSchema], node_key_extractor: Callable[[NodeSchema], str], edge_key_extractor: Callable[[EdgeSchema], str], nodes_in_edge_extractor: Callable[[EdgeSchema], Tuple[str, str]], llm_client: BaseChatModel, embedder: Embeddings, *, extraction_mode: str = 'one_stage', node_strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, edge_strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, prompt: str = '', prompt_for_node_extraction: str = '', prompt_for_edge_extraction: str = '', node_label_extractor: Callable[[NodeSchema], str] = None, edge_label_extractor: Callable[[EdgeSchema], str] = None, chunk_size: int = 2048, chunk_overlap: int = 256, max_workers: int = 10, verbose: bool = False, node_fields_for_index: List[str] | None = None, edge_fields_for_index: List[str] | None = None, **kwargs: Any)
¶
Initialize AutoGraph with node/edge schemas and configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node_schema
|
Type[NodeSchema]
|
Pydantic BaseModel for nodes/entities. |
required |
edge_schema
|
Type[EdgeSchema]
|
Pydantic BaseModel for edges/relationships. |
required |
node_key_extractor
|
Callable[[NodeSchema], str]
|
Function to extract unique key from node (e.g., lambda x: x.id). |
required |
edge_key_extractor
|
Callable[[EdgeSchema], str]
|
Function to extract unique key from edge (e.g., lambda x: f"{x.src}-{x.rel}-{x.dst}"). |
required |
nodes_in_edge_extractor
|
Callable[[EdgeSchema], Tuple[str, str]]
|
Function to extract (source_key, target_key) node keys from an edge for validation. |
required |
llm_client
|
BaseChatModel
|
Language model client for extraction. |
required |
embedder
|
Embeddings
|
Embedding model for vector indexing. |
required |
extraction_mode
|
str
|
"one_stage" (extract nodes+edges together) or "two_stage" (nodes first, then edges). |
'one_stage'
|
node_strategy_or_merger
|
MergeStrategy | BaseMerger
|
Merge strategy for duplicate nodes (default: LLM.BALANCED). |
BALANCED
|
edge_strategy_or_merger
|
MergeStrategy | BaseMerger
|
Merge strategy for duplicate edges (default: LLM.BALANCED). |
BALANCED
|
prompt
|
str
|
Custom extraction prompt for one-stage mode. |
''
|
prompt_for_node_extraction
|
str
|
Custom extraction prompt for two-stage node extraction. |
''
|
prompt_for_edge_extraction
|
str
|
Custom extraction prompt for two-stage edge extraction. |
''
|
node_label_extractor
|
Callable[[NodeSchema], str]
|
Optional function to extract label from node for visualization. |
None
|
edge_label_extractor
|
Callable[[EdgeSchema], str]
|
Optional function to extract label from edge for visualization. |
None
|
chunk_size
|
int
|
Maximum characters per chunk. |
2048
|
chunk_overlap
|
int
|
Overlapping characters between chunks. |
256
|
max_workers
|
int
|
Maximum concurrent extraction tasks. |
10
|
verbose
|
bool
|
Whether to log progress. |
False
|
node_fields_for_index
|
List[str] | None
|
Optional list of field names in node_schema to include in vector index. If None, all text fields are indexed by default. Example: ['name', 'description'] (only index these node fields) |
None
|
edge_fields_for_index
|
List[str] | None
|
Optional list of field names in edge_schema to include in vector index. If None, all text fields are indexed by default. Example: ['relation_type', 'description'] (only index these edge fields) |
None
|
**kwargs
|
Any
|
Additional arguments passed to create_merger() when strategy_or_merger is a MergeStrategy enum. Ignored if strategy_or_merger is a BaseMerger instance. |
{}
|
_default_prompt() -> str
¶
Returns the default prompt for one-stage graph extraction.
_create_empty_instance() -> AutoGraph[NodeSchema, EdgeSchema]
¶
Creates a new empty AutoGraph instance with the same configuration as this one.
Overrides parent method to handle AutoGraph-specific parameters.
Returns:
| Type | Description |
|---|---|
AutoGraph[NodeSchema, EdgeSchema]
|
A new empty AutoGraph instance with identical configuration. |
empty() -> bool
¶
Checks if the graph is empty (no nodes).
Returns:
| Type | Description |
|---|---|
bool
|
True if node collections are empty, False otherwise. |
_init_data_state() -> None
¶
Initialize or reset graph data structures.
_init_index_state() -> None
¶
Initialize vector index to empty state.
_set_data_state(data: AutoGraphSchema[NodeSchema, EdgeSchema]) -> None
¶
Replace graph data with new data (full reset).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
AutoGraphSchema[NodeSchema, EdgeSchema]
|
New graph data to set. |
required |
_update_data_state(incoming_data: AutoGraphSchema[NodeSchema, EdgeSchema]) -> None
¶
Merge incoming graph data into current state.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
incoming_data
|
AutoGraphSchema[NodeSchema, EdgeSchema]
|
Incremental graph data to merge. |
required |
_extract_data(text: str) -> AutoGraphSchema[NodeSchema, EdgeSchema]
¶
Main extraction logic dispatcher.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Input text to extract graph from. |
required |
Returns:
| Type | Description |
|---|---|
AutoGraphSchema[NodeSchema, EdgeSchema]
|
Extracted and validated graph. |
_extract_data_by_one_stage(text: str) -> AutoGraphSchema[NodeSchema, EdgeSchema]
¶
Extract nodes and edges simultaneously using single LLM call.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Input text. |
required |
Returns:
| Type | Description |
|---|---|
AutoGraphSchema[NodeSchema, EdgeSchema]
|
Raw extracted graph data. |
_extract_data_by_two_stage(text: str) -> AutoGraphSchema[NodeSchema, EdgeSchema]
¶
Extract nodes first, then edges with node context (batch processing).
Process: 1. Split text into chunks. 2. Batch extract nodes for all chunks. 3. Batch extract edges for all chunks (using chunk-specific nodes as context). 4. Construct partial graphs (tuples of nodes/edges). 5. Merge all partial graphs into one global graph.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Input text. |
required |
Returns:
| Type | Description |
|---|---|
AutoGraphSchema[NodeSchema, EdgeSchema]
|
Extracted and validated graph. |
_extract_nodes_batch(chunks: List[str]) -> List[NodeListSchema[NodeSchema]]
¶
Batch extract nodes from multiple text chunks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunks
|
List[str]
|
List of text chunks. |
required |
Returns:
| Type | Description |
|---|---|
List[NodeListSchema[NodeSchema]]
|
List of NodeListSchema objects with extracted nodes. |
_extract_edges_batch(chunks: List[str], node_lists: List[NodeListSchema[NodeSchema]]) -> List[EdgeListSchema[EdgeSchema]]
¶
Batch extract edges using corresponding node lists as context.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunks
|
List[str]
|
List of text chunks. |
required |
node_lists
|
List[NodeListSchema[NodeSchema]]
|
List of NodeListSchema objects (one per chunk). |
required |
Returns:
| Type | Description |
|---|---|
List[EdgeListSchema[EdgeSchema]]
|
List of EdgeListSchema objects with extracted edges. |
_prune_dangling_edges(graph: AutoGraphSchema[NodeSchema, EdgeSchema]) -> AutoGraphSchema[NodeSchema, EdgeSchema]
¶
Prune edges that connect to non-existent nodes (Consistency Check).
Ensures graph consistency by removing any edges where either endpoint (source or target) does not exist in the node list.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
AutoGraphSchema[NodeSchema, EdgeSchema]
|
Raw graph that may contain dangling edges. |
required |
Returns:
| Type | Description |
|---|---|
AutoGraphSchema[NodeSchema, EdgeSchema]
|
Graph with only valid edges (endpoints must strictly exist in nodes). |
merge_batch_data(data_list_or_tuple: List[AutoGraphSchema[NodeSchema, EdgeSchema]] | Tuple[List[List[NodeSchema]], List[List[EdgeSchema]]]) -> AutoGraphSchema[NodeSchema, EdgeSchema]
¶
Merge multiple graphs or node/edge tuples into one.
Supports two input formats: - List of AutoGraphSchema objects (standard format) - Tuple of (List[List[NodeSchema]], List[List[EdgeSchema]]) (optimization for batch processing)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_list_or_tuple
|
List[AutoGraphSchema[NodeSchema, EdgeSchema]] | Tuple[List[List[NodeSchema]], List[List[EdgeSchema]]]
|
Either a list of AutoGraphSchema objects or a tuple of (nodes_lists, edges_lists) where each list contains items from multiple chunks. |
required |
Returns:
| Type | Description |
|---|---|
AutoGraphSchema[NodeSchema, EdgeSchema]
|
Merged graph. |
build_index(index_nodes: bool = True, index_edges: bool = True)
¶
Build vector index for graph search.
By default, builds indices for both nodes and edges to support comprehensive search.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
index_nodes
|
bool
|
Whether to index nodes (default: True). |
True
|
index_edges
|
bool
|
Whether to index edges (default: True). |
True
|
build_node_index() -> None
¶
Build vector index specifically for nodes.
build_edge_index() -> None
¶
Build vector index specifically for edges.
search(query: str, top_k_nodes: int = 3, top_k_edges: int = 3, top_k: int | None = None) -> Tuple[List[NodeSchema], List[EdgeSchema]]
¶
Unified graph search interface.
Retrieves nodes and edges semantically related to the query. Always returns a tuple of (nodes, edges). If a count is 0, the corresponding list will be empty.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Search query string. |
required |
top_k_nodes
|
int
|
Number of node results to return (default: 3). Set to 0 to disable node search. |
3
|
top_k_edges
|
int
|
Number of edge results to return (default: 3). Set to 0 to disable edge search. |
3
|
top_k
|
int | None
|
If provided, sets both top_k_nodes and top_k_edges to this value. |
None
|
Returns:
| Type | Description |
|---|---|
Tuple[List[NodeSchema], List[EdgeSchema]]
|
Tuple[List[NodeSchema], List[EdgeSchema]]: A tuple containing: - List of matching nodes (empty if top_k_nodes <= 0) - List of matching edges (empty if top_k_edges <= 0) |
Raises:
| Type | Description |
|---|---|
ValueError
|
If both top_k_nodes and top_k_edges are <= 0. |
ValueError
|
If search is requested (top_k > 0) but the corresponding index is not built. |
search_nodes(query: str, top_k: int = 3) -> List[NodeSchema]
¶
Semantic search for nodes/entities only.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Search query string. |
required |
top_k
|
int
|
Number of results to return (default: 3). |
3
|
Returns:
| Type | Description |
|---|---|
List[NodeSchema]
|
List of matching nodes using semantic similarity. |
search_edges(query: str, top_k: int = 3) -> List[EdgeSchema]
¶
Semantic search for edges/relationships only.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Search query string. |
required |
top_k
|
int
|
Number of results to return (default: 3). |
3
|
Returns:
| Type | Description |
|---|---|
List[EdgeSchema]
|
List of matching edges using semantic similarity. |
chat(query: str, top_k: int | None = None, top_k_nodes: int = 3, top_k_edges: int = 3) -> AIMessage
¶
Performs a chat-like interaction using graph knowledge.
Retrieves relevant nodes and/or edges based on the query and retrieval counts, formats them into a structured context with clear headers, and generates an answer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
User query string. |
required |
top_k
|
int | None
|
Number of relevant items to retrieve for both nodes and edges (default: 3). If provided, sets both top_k_nodes and top_k_edges to this value. |
None
|
top_k_nodes
|
int
|
Number of relevant nodes to retrieve (default: 3). Set to 0 to disable node context. |
3
|
top_k_edges
|
int
|
Number of relevant edges to retrieve (default: 3). Set to 0 to disable edge context. |
3
|
Returns:
| Type | Description |
|---|---|
AIMessage
|
An AIMessage object containing the LLM-generated response. |
AIMessage
|
Access the text content via response.content. |
Example
Chat using 5 nodes and 2 edges as context¶
response = ka.chat("What is X?", top_k_nodes=5, top_k_edges=2) print(response.content) # Print the generated answer
dump_index(folder_path: str | Path) -> None
¶
Save indices to disk.
load_index(folder_path: str | Path) -> None
¶
Load indices from disk.
show(node_label_extractor: Callable[[NodeSchema], str] = None, edge_label_extractor: Callable[[EdgeSchema], str] = None, *, top_k_nodes_for_search: int = 3, top_k_edges_for_search: int = 3, top_k_nodes_for_chat: int = 3, top_k_edges_for_chat: int = 3) -> None
¶
Visualize the graph using OntoSight.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node_label_extractor
|
Callable[[NodeSchema], str]
|
Optional function to extract label from node for visualization. If not provided, uses the one from init. |
None
|
edge_label_extractor
|
Callable[[EdgeSchema], str]
|
Optional function to extract label from edge for visualization. If not provided, uses the one from init. |
None
|
top_k_nodes_for_search
|
int
|
Number of nodes to retrieve for search callback (default: 3). |
3
|
top_k_edges_for_search
|
int
|
Number of edges to retrieve for search callback (default: 3). |
3
|
top_k_nodes_for_chat
|
int
|
Number of nodes to retrieve for chat callback (default: 3). |
3
|
top_k_edges_for_chat
|
int
|
Number of edges to retrieve for chat callback (default: 3). |
3
|
AutoHypergraph¶
hyperextract.types.AutoHypergraph
¶
Bases: BaseAutoType[AutoHypergraphSchema[NodeSchema, EdgeSchema]], Generic[NodeSchema, EdgeSchema]
AutoHypergraph - Node-First extraction: Extract entities first, then relationships.
Extraction Strategy: Two-Stage (Node-First) 1. Stage 1: Extract all Nodes (entities) from text. 2. Stage 2: Extract Edges (relationships) using identified Nodes as context.
Suitable for: - Events with multiple participants (e.g., meetings, transactions) - Group memberships - Complex dependencies
The nodes_in_edge_extractor must return a Tuple of ALL node keys involved in the edge.
Validation uses 'Strict Mode': checks if ALL nodes in that tuple exist.
Example
class Person(BaseModel): ... name: str ... class Event(BaseModel): ... label: str # e.g., "meeting", "collaboration" ... participants: List[str] # IDs of people involved ... hypergraph = AutoHypergraph( ... node_schema=Person, ... edge_schema=Event, ... node_key_extractor=lambda x: x.name, ... # CRITICAL: Sort participants to ensure {A, B} and {B, A} map to same key ... edge_key_extractor=lambda x: f"{x.label}_{sorted(x.participants)}", ... nodes_in_edge_extractor=lambda x: tuple(x.participants), ... llm_client=llm, ... embedder=embedder, ... extraction_mode="two_stage" ... )
Attributes¶
graph_schema: Type[AutoHypergraphSchema[NodeSchema, EdgeSchema]] = create_model(graph_schema_name, nodes=(List[node_schema], Field(default_factory=list)), edges=(List[edge_schema], Field(default_factory=list)))
instance-attribute
¶
node_schema = node_schema
instance-attribute
¶
edge_schema = edge_schema
instance-attribute
¶
node_key_extractor = node_key_extractor
instance-attribute
¶
edge_key_extractor = edge_key_extractor
instance-attribute
¶
nodes_in_edge_extractor = nodes_in_edge_extractor
instance-attribute
¶
extraction_mode = extraction_mode
instance-attribute
¶
chunk_size = chunk_size
instance-attribute
¶
chunk_overlap = chunk_overlap
instance-attribute
¶
max_workers = max_workers
instance-attribute
¶
verbose = verbose
instance-attribute
¶
node_fields_for_index = node_fields_for_index
instance-attribute
¶
edge_fields_for_index = edge_fields_for_index
instance-attribute
¶
node_list_schema = create_model(f'{node_schema.__name__}List', items=(List[node_schema], Field(default_factory=list)))
instance-attribute
¶
edge_list_schema = create_model(f'{edge_schema.__name__}List', items=(List[edge_schema], Field(default_factory=list)))
instance-attribute
¶
node_merger = node_strategy_or_merger
instance-attribute
¶
edge_merger = edge_strategy_or_merger
instance-attribute
¶
_node_memory = OMem(memory_schema=node_schema, key_extractor=node_key_extractor, llm_client=llm_client, embedder=embedder, strategy_or_merger=(self.node_merger), verbose=verbose, fields_for_index=node_fields_for_index)
instance-attribute
¶
_edge_memory = OMem(memory_schema=edge_schema, key_extractor=edge_key_extractor, llm_client=llm_client, embedder=embedder, strategy_or_merger=(self.edge_merger), verbose=verbose, fields_for_index=edge_fields_for_index)
instance-attribute
¶
_node_label_extractor = node_label_extractor
instance-attribute
¶
_edge_label_extractor = edge_label_extractor
instance-attribute
¶
node_prompt = prompt_for_node_extraction or DEFAULT_NODE_PROMPT
instance-attribute
¶
edge_prompt = prompt_for_edge_extraction or DEFAULT_EDGE_PROMPT
instance-attribute
¶
prompt_template = ChatPromptTemplate.from_template(self.node_prompt)
instance-attribute
¶
node_extractor = self.prompt_template | self.llm_client.with_structured_output(self.node_list_schema)
instance-attribute
¶
edge_prompt_template = ChatPromptTemplate.from_template(self.edge_prompt)
instance-attribute
¶
edge_extractor = self.edge_prompt_template | self.llm_client.with_structured_output(self.edge_list_schema)
instance-attribute
¶
data: AutoHypergraphSchema
property
¶
Returns the current hypergraph state.
nodes: List[NodeSchema]
property
¶
Returns the current node collection.
edges: List[EdgeSchema]
property
¶
Returns the current hyperedge collection.
Functions¶
__init__(node_schema: Type[NodeSchema], edge_schema: Type[EdgeSchema], node_key_extractor: Callable[[NodeSchema], str], edge_key_extractor: Callable[[EdgeSchema], str], nodes_in_edge_extractor: Callable[[EdgeSchema], Tuple[str, ...]], llm_client: BaseChatModel, embedder: Embeddings, *, extraction_mode: str = 'two_stage', node_strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, edge_strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, prompt: str = '', prompt_for_node_extraction: str = '', prompt_for_edge_extraction: str = '', node_label_extractor: Callable[[NodeSchema], str] = None, edge_label_extractor: Callable[[EdgeSchema], str] = None, chunk_size: int = 2048, chunk_overlap: int = 256, max_workers: int = 10, verbose: bool = False, node_fields_for_index: List[str] | None = None, edge_fields_for_index: List[str] | None = None, **kwargs: Any)
¶
Initialize AutoHypergraph with node/edge schemas and configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node_schema
|
Type[NodeSchema]
|
Pydantic BaseModel for nodes/entities. |
required |
edge_schema
|
Type[EdgeSchema]
|
Pydantic BaseModel for hyperedges/relationships. |
required |
node_key_extractor
|
Callable[[NodeSchema], str]
|
Function to extract unique key from node. |
required |
edge_key_extractor
|
Callable[[EdgeSchema], str]
|
Function to extract unique key from hyperedge. CRITICAL: Since hyperedges are unordered sets of nodes (i.e., {A, B} == {B, A}), you MUST sort the participating node keys in this function to ensure consistent deduplication. For example: lambda x: f"{x.relation_type}_{sorted(x.participants)}" Without sorting, two hyperedges with the same nodes in different order will be treated as different edges, breaking the deduplication mechanism. |
required |
nodes_in_edge_extractor
|
Callable[[EdgeSchema], Tuple[str, ...]]
|
Function to extract ALL node keys from a hyperedge as a tuple (e.g., lambda x: tuple(x.participants)). |
required |
llm_client
|
BaseChatModel
|
Language model client for extraction. |
required |
embedder
|
Embeddings
|
Embedding model for vector indexing. |
required |
extraction_mode
|
str
|
"two_stage" (recommended) or "one_stage". |
'two_stage'
|
node_strategy_or_merger
|
MergeStrategy | BaseMerger
|
Merge strategy for duplicate nodes. |
BALANCED
|
edge_strategy_or_merger
|
MergeStrategy | BaseMerger
|
Merge strategy for duplicate hyperedges. |
BALANCED
|
prompt
|
str
|
Custom extraction prompt for one-stage mode. |
''
|
prompt_for_node_extraction
|
str
|
Custom extraction prompt for node extraction. |
''
|
prompt_for_edge_extraction
|
str
|
Custom extraction prompt for hyperedge extraction. |
''
|
node_label_extractor
|
Callable[[NodeSchema], str]
|
Optional function to extract label from node for visualization. |
None
|
edge_label_extractor
|
Callable[[EdgeSchema], str]
|
Optional function to extract label from edge for visualization. |
None
|
chunk_size
|
int
|
Maximum characters per chunk. |
2048
|
chunk_overlap
|
int
|
Overlapping characters between chunks. |
256
|
max_workers
|
int
|
Maximum concurrent extraction tasks. |
10
|
verbose
|
bool
|
Whether to display detailed execution logs and progress information. |
False
|
node_fields_for_index
|
List[str] | None
|
Optional list of field names in node_schema to include in vector index. If None, all text fields are indexed by default. Example: ['name', 'properties'] (only index these node fields) |
None
|
edge_fields_for_index
|
List[str] | None
|
Optional list of field names in edge_schema to include in vector index. If None, all text fields are indexed by default. Example: ['summary', 'type'] (only index these edge fields) |
None
|
**kwargs
|
Any
|
Additional arguments for merger creation. |
{}
|
_default_prompt() -> str
¶
Returns the default prompt for one-stage hypergraph extraction.
This is the primary prompt used when extraction_mode is 'one_stage'.
_create_empty_instance() -> AutoHypergraph[NodeSchema, EdgeSchema]
¶
Creates a new empty AutoHypergraph instance with the same configuration.
Overrides parent method to handle AutoHypergraph-specific parameters.
Returns:
| Type | Description |
|---|---|
AutoHypergraph[NodeSchema, EdgeSchema]
|
A new empty AutoHypergraph instance with identical configuration. |
_init_data_state() -> None
¶
Initialize or reset hypergraph data structures.
_init_index_state() -> None
¶
Initialize vector index to empty state.
_set_data_state(data: AutoHypergraphSchema) -> None
¶
Replace hypergraph data with new data (full reset).
_update_data_state(incoming_data: AutoHypergraphSchema) -> None
¶
Merge incoming hypergraph data into current state.
empty() -> bool
¶
Checks if the hypergraph is empty.
_extract_data(text: str) -> AutoHypergraphSchema
¶
Main extraction logic dispatcher.
_extract_data_by_one_stage(text: str) -> AutoHypergraphSchema
¶
Extract nodes and edges simultaneously using single LLM call.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Input text. |
required |
Returns:
| Type | Description |
|---|---|
AutoHypergraphSchema
|
Raw extracted hypergraph data. |
_extract_data_by_two_stage(text: str) -> AutoHypergraphSchema
¶
Extract nodes first, then hyperedges with node context (batch processing).
Process: 1. Split text into chunks. 2. Batch extract nodes for all chunks. 3. Batch extract hyperedges for all chunks (using chunk-specific nodes as context). 4. Merge all partial results into one global hypergraph.
_extract_nodes_batch(chunks: List[str]) -> List[NodeListSchema[NodeSchema]]
¶
Batch extract nodes from multiple text chunks.
_extract_edges_batch(chunks: List[str], node_lists: List[NodeListSchema[NodeSchema]]) -> List[EdgeListSchema[EdgeSchema]]
¶
Batch extract hyperedges using corresponding node lists as context.
_prune_dangling_edges(graph: AutoHypergraphSchema) -> AutoHypergraphSchema
¶
Prune hyperedges where ANY participating node does not exist (Strict Consistency).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
AutoHypergraphSchema
|
Raw hypergraph that may contain dangling hyperedges. |
required |
Returns:
| Type | Description |
|---|---|
AutoHypergraphSchema
|
Hypergraph with only valid hyperedges (ALL participants must exist in nodes). |
merge_batch_data(data_list_or_tuple: List[AutoHypergraphSchema] | Tuple[List[List[NodeSchema]], List[List[EdgeSchema]]]) -> AutoHypergraphSchema
¶
Merge multiple hypergraphs or node/edge tuples into one.
Supports two input formats: - List of AutoHypergraphSchema objects (standard format) - Tuple of (List[List[Node]], List[List[Edge]]) (optimization for batch processing)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_list_or_tuple
|
List[AutoHypergraphSchema] | Tuple[List[List[NodeSchema]], List[List[EdgeSchema]]]
|
Either a list of AutoHypergraphSchema objects or a tuple of (nodes_lists, edges_lists) where each list contains items from multiple chunks. |
required |
Returns:
| Type | Description |
|---|---|
AutoHypergraphSchema
|
Merged hypergraph. |
build_index(index_nodes: bool = True, index_edges: bool = True)
¶
Build vector index for hypergraph search.
By default, builds indices for both nodes and hyperedges to support comprehensive search.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
index_nodes
|
bool
|
Whether to index nodes (default: True). |
True
|
index_edges
|
bool
|
Whether to index hyperedges (default: True). |
True
|
build_node_index() -> None
¶
Build vector index specifically for nodes.
build_edge_index() -> None
¶
Build vector index specifically for hyperedges.
search(query: str, top_k_nodes: int = 3, top_k_edges: int = 3, top_k: int | None = None) -> Tuple[List[NodeSchema], List[EdgeSchema]]
¶
Unified hypergraph search interface.
Retrieves nodes and hyperedges semantically related to the query. Always returns a tuple of (nodes, edges). If a count is 0, the corresponding list will be empty.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Search query string. |
required |
top_k_nodes
|
int
|
Number of node results to return (default: 3). Set to 0 to disable node search. |
3
|
top_k_edges
|
int
|
Number of hyperedge results to return (default: 3). Set to 0 to disable hyperedge search. |
3
|
top_k
|
int | None
|
If provided, sets both top_k_nodes and top_k_edges to this value. |
None
|
Returns:
| Type | Description |
|---|---|
Tuple[List[NodeSchema], List[EdgeSchema]]
|
Tuple[List[Node], List[Edge]]: A tuple containing: - List of matching nodes (empty if top_k_nodes <= 0) - List of matching hyperedges (empty if top_k_edges <= 0) |
Raises:
| Type | Description |
|---|---|
ValueError
|
If both top_k_nodes and top_k_edges are <= 0. |
ValueError
|
If search is requested (top_k > 0) but the corresponding index is not built. |
search_nodes(query: str, top_k: int = 3) -> List[NodeSchema]
¶
Semantic search for nodes/entities only.
search_edges(query: str, top_k: int = 3) -> List[EdgeSchema]
¶
Semantic search for hyperedges/relationships only.
chat(query: str, top_k_nodes: int = 3, top_k_edges: int = 3, top_k: int | None = None) -> AIMessage
¶
Performs a chat-like interaction using hypergraph knowledge.
Retrieves relevant nodes and/or hyperedges based on the query and retrieval counts, formats them into a structured context with clear headers, and generates an answer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
User query string. |
required |
top_k_nodes
|
int
|
Number of relevant nodes to retrieve (default: 3). Set to 0 to disable node context. |
3
|
top_k_edges
|
int
|
Number of relevant hyperedges to retrieve (default: 3). Set to 0 to disable edge context. |
3
|
top_k
|
int | None
|
If provided, sets both top_k_nodes and top_k_edges to this value. |
None
|
Returns:
| Type | Description |
|---|---|
AIMessage
|
An AIMessage object containing the LLM-generated response. |
AIMessage
|
Access the text content via response.content. |
Example
Chat using 5 nodes and 2 hyperedges as context¶
response = hg.chat("What participates in X?", top_k_nodes=5, top_k_edges=2) print(response.content) # Print the generated answer
dump_index(folder_path: str | Path) -> None
¶
Save indices to disk.
load_index(folder_path: str | Path) -> None
¶
Load indices from disk.
show(node_label_extractor: Callable[[NodeSchema], str] = None, edge_label_extractor: Callable[[EdgeSchema], str] = None, *, top_k_nodes_for_search: int = 3, top_k_edges_for_search: int = 3, top_k_nodes_for_chat: int = 3, top_k_edges_for_chat: int = 3) -> None
¶
Visualize the hypergraph using OntoSight.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node_label_extractor
|
Callable[[NodeSchema], str]
|
Optional function to extract label from node for visualization. If not provided, uses the one from init. |
None
|
edge_label_extractor
|
Callable[[EdgeSchema], str]
|
Optional function to extract label from edge for visualization. If not provided, uses the one from init. |
None
|
top_k_nodes_for_search
|
int
|
Number of nodes to retrieve for search callback (default: 3). |
3
|
top_k_edges_for_search
|
int
|
Number of edges to retrieve for search callback (default: 3). |
3
|
top_k_nodes_for_chat
|
int
|
Number of nodes to retrieve for chat callback (default: 3). |
3
|
top_k_edges_for_chat
|
int
|
Number of edges to retrieve for chat callback (default: 3). |
3
|
AutoTemporalGraph¶
hyperextract.types.AutoTemporalGraph
¶
Bases: AutoGraph[NodeSchema, EdgeSchema]
Generic Temporal Graph Extractor (AutoTemporalGraph).
A flexible implementation supporting user-defined Node and Edge schemas with temporal awareness: - Schema Agnosticism: Support any user-defined Node and Edge Pydantic models. - Temporal-Aware Deduplication: Time information is integrated into edge deduplication logic. - Dynamic Time Injection: Observation Date is injected during extraction for relative time resolution.
Key Design:
- temporal_edge_key_extractor: A unified function to extract the unique key for an edge,
including temporal components (e.g., lambda x: f"{x.src}|{x.relation}|{x.dst}|{x.year}").
This ensures (A, rel, B) @ 2020 and (A, rel, B) @ 2021 are treated as different edges.
- Unified Prompt Management: Temporal system prompts are prepended to any user-provided prompts.
Example
from pydantic import BaseModel, Field
class MyEntity(BaseModel): ... name: str ... category: str = "Unknown"
class MyTemporalEdge(BaseModel): ... src: str ... dst: str ... relation: str ... year: Optional[str] = None
kg = AutoTemporalGraph( ... node_schema=MyEntity, ... edge_schema=MyTemporalEdge, ... node_key_extractor=lambda x: x.name, ... edge_key_extractor=lambda x: f"{x.src}|{x.relation}|{x.dst}", ... time_in_edge_extractor=lambda x: x.year or "", ... nodes_in_edge_extractor=lambda x: (x.src, x.dst), ... llm_client=llm, ... embedder=embedder, ... observation_time="2024-01-15" ... )
Attributes¶
observation_time = observation_time or datetime.now().strftime('%Y-%m-%d')
instance-attribute
¶
raw_edge_key_extractor = edge_key_extractor
instance-attribute
¶
time_in_edge_extractor = time_in_edge_extractor
instance-attribute
¶
_constructor_kwargs = kwargs
instance-attribute
¶
Functions¶
__init__(node_schema: Type[NodeSchema], edge_schema: Type[EdgeSchema], node_key_extractor: Callable[[NodeSchema], str], edge_key_extractor: Callable[[EdgeSchema], str], time_in_edge_extractor: Callable[[EdgeSchema], str], nodes_in_edge_extractor: Callable[[EdgeSchema], Tuple[str, str]], llm_client: BaseChatModel, embedder: Embeddings, observation_time: str | None = None, extraction_mode: str = 'two_stage', node_strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, edge_strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, prompt_for_node_extraction: str = '', prompt_for_edge_extraction: str = '', prompt: str = '', chunk_size: int = 2048, chunk_overlap: int = 256, max_workers: int = 10, verbose: bool = False, node_fields_for_index: List[str] | None = None, edge_fields_for_index: List[str] | None = None, **kwargs: Any)
¶
Initialize AutoTemporalGraph.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node_schema
|
Type[NodeSchema]
|
User-defined Node Pydantic model. |
required |
edge_schema
|
Type[EdgeSchema]
|
User-defined Edge Pydantic model with time fields. |
required |
node_key_extractor
|
Callable[[NodeSchema], str]
|
Function to extract unique key from node (e.g., lambda x: x.name). |
required |
edge_key_extractor
|
Callable[[EdgeSchema], str]
|
Function to extract the base unique identifier for an edge purely based on entities and relation (e.g., lambda x: f"{x.src}|{x.relation}|{x.dst}"). |
required |
time_in_edge_extractor
|
Callable[[EdgeSchema], str]
|
Function to extract the time component from an edge (e.g., lambda x: x.year or "permanent"). This ensures (A, rel, B) @ 2020 and (A, rel, B) @ 2021 are treated as different edges. |
required |
nodes_in_edge_extractor
|
Callable[[EdgeSchema], Tuple[str, str]]
|
Function to extract (source_key, target_key) from edge. |
required |
llm_client
|
BaseChatModel
|
LangChain BaseChatModel for extraction. |
required |
embedder
|
Embeddings
|
LangChain Embeddings for semantic operations. |
required |
observation_time
|
str | None
|
Date context for relative time resolution (default: today in YYYY-MM-DD format). |
None
|
extraction_mode
|
str
|
"one_stage" or "two_stage" (default: "two_stage"). |
'two_stage'
|
node_strategy_or_merger
|
MergeStrategy | BaseMerger
|
Merge strategy for duplicate nodes (default: LLM.BALANCED). |
BALANCED
|
edge_strategy_or_merger
|
MergeStrategy | BaseMerger
|
Merge strategy for duplicate edges (default: LLM.BALANCED). |
BALANCED
|
prompt_for_node_extraction
|
str
|
Additional user prompt to append to node extraction system prompt. |
''
|
prompt_for_edge_extraction
|
str
|
Additional user prompt to append to edge extraction system prompt. |
''
|
prompt
|
str
|
Additional user prompt to append to one-stage graph extraction system prompt. |
''
|
chunk_size
|
int
|
Size of text chunks for processing (default: 2048). |
2048
|
chunk_overlap
|
int
|
Overlap between text chunks (default: 256). |
256
|
max_workers
|
int
|
Maximum number of concurrent LLM calls (default: 10). |
10
|
verbose
|
bool
|
Whether to print verbose output (default: False). |
False
|
node_fields_for_index
|
List[str] | None
|
List of node fields to include in vector index (optional). |
None
|
edge_fields_for_index
|
List[str] | None
|
List of edge fields to include in vector index (optional). |
None
|
**kwargs
|
Any
|
Additional arguments passed to create_merger() when strategy_or_merger is a MergeStrategy enum. Ignored if strategy_or_merger is a BaseMerger instance. |
{}
|
_extract_edges_batch(chunks: List[str], node_lists: List[NodeListSchema[NodeSchema]]) -> List[EdgeListSchema[EdgeSchema]]
¶
Override: Inject observation_time into edge extraction during two-stage extraction.
_extract_data_by_one_stage(text: str) -> Any
¶
Override: Inject observation_time into one-stage extraction.
_create_empty_instance() -> AutoTemporalGraph[NodeSchema, EdgeSchema]
¶
Override: Recreate instance with all temporal-specific attributes.
AutoSpatialGraph¶
hyperextract.types.AutoSpatialGraph
¶
Bases: AutoGraph[NodeSchema, EdgeSchema]
Generic Spatial Graph Extractor (AutoSpatialGraph).
A flexible implementation supporting user-defined Node and Edge schemas with spatial awareness: - Schema Agnosticism: Support any user-defined Node and Edge Pydantic models. - Location-Aware Deduplication: Spatial information is integrated into edge deduplication. - Dynamic Localization: Observation Location is injected during extraction for relative resolution.
Key Design:
- Decoupled Extractors: location_in_edge_extractor is a separate function.
- Spatial Identity: Edges are uniquely identified by (source, relation, target) + location.
Example
from pydantic import BaseModel, Field
class MyEntity(BaseModel): ... name: str ... class MySpatialEdge(BaseModel): ... src: str ... dst: str ... relation: str ... place: Optional[str] = None
kg = AutoSpatialGraph( ... node_schema=MyEntity, ... edge_schema=MySpatialEdge, ... node_key_extractor=lambda x: x.name, ... edge_key_extractor=lambda x: f"{x.src}|{x.relation}|{x.dst}", ... location_in_edge_extractor=lambda x: x.place or "", ... nodes_in_edge_extractor=lambda x: (x.src, x.dst), ... llm_client=llm, ... embedder=embedder, ... observation_location="Main Hall" ... )
Attributes¶
observation_location = observation_location or 'Unknown Location'
instance-attribute
¶
_constructor_kwargs = kwargs
instance-attribute
¶
raw_edge_key_extractor = edge_key_extractor
instance-attribute
¶
location_in_edge_extractor = location_in_edge_extractor
instance-attribute
¶
Functions¶
__init__(node_schema: Type[NodeSchema], edge_schema: Type[EdgeSchema], node_key_extractor: Callable[[NodeSchema], str], edge_key_extractor: Callable[[EdgeSchema], str], location_in_edge_extractor: Callable[[EdgeSchema], str], nodes_in_edge_extractor: Callable[[EdgeSchema], Tuple[str, str]], llm_client: BaseChatModel, embedder: Embeddings, *, observation_location: str | None = None, extraction_mode: str = 'two_stage', node_strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, edge_strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, prompt_for_node_extraction: str = '', prompt_for_edge_extraction: str = '', prompt: str = '', chunk_size: int = 2048, chunk_overlap: int = 256, max_workers: int = 10, verbose: bool = False, node_fields_for_index: List[str] | None = None, edge_fields_for_index: List[str] | None = None, **kwargs: Any)
¶
Initialize AutoSpatialGraph.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node_schema
|
Type[NodeSchema]
|
User-defined Node Pydantic model. |
required |
edge_schema
|
Type[EdgeSchema]
|
User-defined Edge Pydantic model with location fields. |
required |
node_key_extractor
|
Callable[[NodeSchema], str]
|
Function to extract unique key from node. |
required |
edge_key_extractor
|
Callable[[EdgeSchema], str]
|
Function to extract the base unique identifier for an edge. |
required |
location_in_edge_extractor
|
Callable[[EdgeSchema], str]
|
Function to extract the spatial component from an edge. |
required |
nodes_in_edge_extractor
|
Callable[[EdgeSchema], Tuple[str, str]]
|
Function to extract (source_key, target_key) from edge. |
required |
llm_client
|
BaseChatModel
|
LangChain BaseChatModel for extraction. |
required |
embedder
|
Embeddings
|
LangChain Embeddings for semantic operations. |
required |
observation_location
|
str | None
|
Location context for spatial resolution (default: "Unknown Location"). |
None
|
extraction_mode
|
str
|
"one_stage" or "two_stage" (default: "two_stage"). |
'two_stage'
|
node_strategy_or_merger
|
MergeStrategy | BaseMerger
|
Merge strategy for duplicate nodes (default: LLM.BALANCED). |
BALANCED
|
edge_strategy_or_merger
|
MergeStrategy | BaseMerger
|
Merge strategy for duplicate edges (default: LLM.BALANCED). |
BALANCED
|
prompt_for_node_extraction
|
str
|
Additional user prompt to append to node extraction system prompt. |
''
|
prompt_for_edge_extraction
|
str
|
Additional user prompt to append to edge extraction system prompt. |
''
|
prompt
|
str
|
Additional user prompt to append to one-stage graph extraction system prompt. |
''
|
chunk_size
|
int
|
Size of text chunks for processing (default: 2048). |
2048
|
chunk_overlap
|
int
|
Overlap between text chunks (default: 256). |
256
|
max_workers
|
int
|
Maximum number of concurrent LLM calls (default: 10). |
10
|
verbose
|
bool
|
Whether to print verbose output (default: False). |
False
|
node_fields_for_index
|
List[str] | None
|
List of node fields to include in vector index (optional). |
None
|
edge_fields_for_index
|
List[str] | None
|
List of edge fields to include in vector index (optional). |
None
|
**kwargs
|
Any
|
Additional arguments passed to create_merger() when strategy_or_merger is a MergeStrategy enum. Ignored if strategy_or_merger is a BaseMerger instance. |
{}
|
_extract_edges_batch(chunks: List[str], node_lists: List[NodeListSchema[NodeSchema]]) -> List[EdgeListSchema[EdgeSchema]]
¶
Override: Inject observation_location into edge extraction during two-stage extraction.
_extract_data_by_one_stage(text: str) -> Any
¶
Override: Inject observation_location into one-stage extraction.
_create_empty_instance() -> AutoSpatialGraph[NodeSchema, EdgeSchema]
¶
Override: Recreate instance with spatial attributes preserved.
AutoSpatioTemporalGraph¶
hyperextract.types.AutoSpatioTemporalGraph
¶
Bases: AutoGraph[NodeSchema, EdgeSchema]
Generic Spatio-Temporal Graph Extractor (AutoSpatioTemporalGraph).
A flexible implementation supporting user-defined Node and Edge schemas with spatio-temporal awareness: - Schema Agnosticism: Support any user-defined Node and Edge Pydantic models. - Spatio-Temporal-Aware Deduplication: Time and Space info are integrated into edge deduplication. - Dynamic Context Injection: Observation Date and Location are injected during extraction.
Key Design:
- Decoupled Extractors: time_in_edge_extractor and location_in_edge_extractor are separate.
- Composite Key: Automatically fuses base key, time, and location for unique identification.
Example
from pydantic import BaseModel, Field
class MyEntity(BaseModel): ... name: str
class MySTEdge(BaseModel): ... src: str ... dst: str ... relation: str ... time: Optional[str] = None ... place: Optional[str] = None
kg = AutoSpatioTemporalGraph( ... node_schema=MyEntity, ... edge_schema=MySTEdge, ... node_key_extractor=lambda x: x.name, ... edge_key_extractor=lambda x: f"{x.src}|{x.relation}|{x.dst}", ... time_in_edge_extractor=lambda x: x.time or "", ... location_in_edge_extractor=lambda x: x.place or "", ... nodes_in_edge_extractor=lambda x: (x.src, x.dst), ... llm_client=llm, ... embedder=embedder, ... observation_time="2024-01-15", ... observation_location="Beijing" ... )
Attributes¶
observation_time = observation_time or datetime.now().strftime('%Y-%m-%d')
instance-attribute
¶
observation_location = observation_location or 'Unknown'
instance-attribute
¶
raw_edge_key_extractor = edge_key_extractor
instance-attribute
¶
time_in_edge_extractor = time_in_edge_extractor
instance-attribute
¶
location_in_edge_extractor = location_in_edge_extractor
instance-attribute
¶
_constructor_kwargs = kwargs
instance-attribute
¶
Functions¶
__init__(node_schema: Type[NodeSchema], edge_schema: Type[EdgeSchema], node_key_extractor: Callable[[NodeSchema], str], edge_key_extractor: Callable[[EdgeSchema], str], time_in_edge_extractor: Callable[[EdgeSchema], str], location_in_edge_extractor: Callable[[EdgeSchema], str], nodes_in_edge_extractor: Callable[[EdgeSchema], Tuple[str, str]], llm_client: BaseChatModel, embedder: Embeddings, observation_time: str | None = None, observation_location: str | None = None, extraction_mode: str = 'two_stage', node_strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, edge_strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, prompt_for_node_extraction: str = '', prompt_for_edge_extraction: str = '', prompt: str = '', chunk_size: int = 2048, chunk_overlap: int = 256, max_workers: int = 10, verbose: bool = False, node_fields_for_index: List[str] | None = None, edge_fields_for_index: List[str] | None = None, **kwargs: Any)
¶
Initialize AutoSpatioTemporalGraph.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node_schema
|
Type[NodeSchema]
|
User-defined Node Pydantic model. |
required |
edge_schema
|
Type[EdgeSchema]
|
User-defined Edge Pydantic model with time/space fields. |
required |
node_key_extractor
|
Callable[[NodeSchema], str]
|
Function to extract unique key from node. |
required |
edge_key_extractor
|
Callable[[EdgeSchema], str]
|
Function to extract the base unique identifier for an edge. |
required |
time_in_edge_extractor
|
Callable[[EdgeSchema], str]
|
Function to extract the time component. |
required |
location_in_edge_extractor
|
Callable[[EdgeSchema], str]
|
Function to extract the spatial component. |
required |
nodes_in_edge_extractor
|
Callable[[EdgeSchema], Tuple[str, str]]
|
Function to extract (source_key, target_key) from edge. |
required |
llm_client
|
BaseChatModel
|
LangChain BaseChatModel for extraction. |
required |
embedder
|
Embeddings
|
LangChain Embeddings for semantic operations. |
required |
observation_time
|
str | None
|
Date context for relative time resolution (default: today in YYYY-MM-DD format). |
None
|
extraction_mode
|
str
|
"one_stage" or "two_stage" (default: "two_stage"). |
'two_stage'
|
node_strategy_or_merger
|
MergeStrategy | BaseMerger
|
Merge strategy for duplicate nodes (default: LLM.BALANCED). |
BALANCED
|
edge_strategy_or_merger
|
MergeStrategy | BaseMerger
|
Merge strategy for duplicate edges (default: LLM.BALANCED). |
BALANCED
|
prompt_for_node_extraction
|
str
|
Additional user prompt to append to node extraction system prompt. |
''
|
prompt_for_edge_extraction
|
str
|
Additional user prompt to append to edge extraction system prompt. |
''
|
prompt
|
str
|
Additional user prompt to append to one-stage graph extraction system prompt. |
''
|
chunk_size
|
int
|
Size of text chunks for processing (default: 2048). |
2048
|
chunk_overlap
|
int
|
Overlap between text chunks (default: 256). |
256
|
max_workers
|
int
|
Maximum number of concurrent LLM calls (default: 10). |
10
|
verbose
|
bool
|
Whether to print verbose output (default: False). |
False
|
node_fields_for_index
|
List[str] | None
|
List of node fields to include in vector index (optional). |
None
|
edge_fields_for_index
|
List[str] | None
|
List of edge fields to include in vector index (optional). |
None
|
**kwargs
|
Any
|
Additional arguments passed to create_merger() when strategy_or_merger is a MergeStrategy enum. Ignored if strategy_or_merger is a BaseMerger instance. |
{}
|
_extract_edges_batch(chunks: List[str], node_lists: List[NodeListSchema[NodeSchema]]) -> List[EdgeListSchema[EdgeSchema]]
¶
Inject observation_time and observation_location into edge extraction.
_extract_data_by_one_stage(text: str) -> Any
¶
Inject observation_time and observation_location into one-stage extraction.
_create_empty_instance() -> AutoSpatioTemporalGraph[NodeSchema, EdgeSchema]
¶
Recreate instance with all spatio-temporal attributes.