基础自动类型¶
所有提取结果类型共享的通用接口。
hyperextract.types.BaseAutoType
¶
Bases: ABC, Generic[T]
Unified knowledge abstract class integrating extraction, storage, and aggregation.
This abstract base class provides a complete framework for managing structured knowledge extracted from text. It handles the full lifecycle from extraction to serialization.
Responsibilities
- Extract structured knowledge from text using LLM
- Automatically handle long text chunking and parallel processing
- Store and aggregate extracted knowledge with configurable merge strategies
- Build and maintain vector indices for semantic search
- Provide semantic search and retrieval capabilities
- Support chat/QA interactions by retrieving relevant context and feeding it to LLM
- Provide serialization and deserialization capabilities
Attributes¶
data: T
abstractmethod
property
¶
Returns all stored knowledge (read-only access).
This is an abstract property that subclasses must implement. Subclasses may apply transformations to convert internal _data structure to the external Schema T if they differ (e.g., AutoSet converts OMem → List).
Returns:
| Type | Description |
|---|---|
T
|
The internal knowledge data as a Pydantic model instance (or converted form). |
data_schema: Type[T]
property
¶
Returns the Pydantic schema class used by this knowledge instance.
Returns:
| Type | Description |
|---|---|
Type[T]
|
The Pydantic BaseModel subclass defining the knowledge structure. |
Functions¶
parse(text: str) -> BaseAutoType[T]
¶
Parses knowledge into a NEW instance without modifying the current one.
Use this for previewing data or branching knowledge abstracts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Input text. |
required |
Returns:
| Type | Description |
|---|---|
BaseAutoType[T]
|
A new knowledge instance containing only the parsed data. |
feed_text(text: str) -> BaseAutoType[T]
¶
Ingests text into the CURRENT knowledge abstract instance.
This modifies the internal state by merging new data with existing data. Supports method chaining (e.g., ka.feed_text(text1).feed_text(text2)).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Input text. |
required |
Returns:
| Type | Description |
|---|---|
BaseAutoType[T]
|
Self (the current instance). |
search(query: str, top_k: int = 3) -> List[Any]
abstractmethod
¶
Performs semantic search over the knowledge abstract.
Standard search workflow
- Ensure index is built (call build_index if needed)
- Use vector_store.similarity_search for retrieval
- Restore original data structures from Document.metadata
Subclasses can override this method to implement custom search logic.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Search query string. |
required |
top_k
|
int
|
Number of results to return. |
3
|
Returns:
| Type | Description |
|---|---|
List[Any]
|
List of relevant knowledge items. |
chat(query: str, top_k: int = 3) -> AIMessage
¶
Performs a chat-like interaction with the knowledge abstract.
This generic method retrieves relevant items and generates a response. Subclasses with complex data structures (like graphs) should override this to provide better context formatting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
User query string. |
required |
top_k
|
int
|
Number of relevant items to retrieve (default: 3). |
3
|
Returns:
| Type | Description |
|---|---|
AIMessage
|
An AIMessage object containing the LLM-generated response. |
build_index()
abstractmethod
¶
Builds or rebuilds the vector index for semantic search.
Subclasses must implement this method to define how the vector index is constructed from the knowledge data. Uses FAISS as the vector store backend.
dump(folder_path: str | Path) -> None
¶
Saves the entire knowledge abstract (data, metadata, index) to a directory.
This is the main entry point for serialization. It creates a directory structure: /folder_path |-- data.json (The structured knowledge data) |-- metadata.json (Metadata, localized config, timestamps) |-- /index (Vector store files, e.g., FAISS)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
folder_path
|
str | Path
|
Target directory path. |
required |
load(folder_path: str | Path) -> None
¶
Loads the entire knowledge abstract from a directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
folder_path
|
str | Path
|
Source directory path. |
required |
empty() -> bool
abstractmethod
¶
Checks if the knowledge abstract is currently empty.
Returns:
| Type | Description |
|---|---|
bool
|
True if no data is stored, False otherwise. |
clear()
¶
Clears all knowledge including data and vector index.