Skip to content

Base Auto-Type

Common interface shared by all extraction result types.


hyperextract.types.BaseAutoType

Bases: ABC, Generic[T]

Unified knowledge abstract class integrating extraction, storage, and aggregation.

This abstract base class provides a complete framework for managing structured knowledge extracted from text. It handles the full lifecycle from extraction to serialization.

Responsibilities
  • Extract structured knowledge from text using LLM
  • Automatically handle long text chunking and parallel processing
  • Store and aggregate extracted knowledge with configurable merge strategies
  • Build and maintain vector indices for semantic search
  • Provide semantic search and retrieval capabilities
  • Support chat/QA interactions by retrieving relevant context and feeding it to LLM
  • Provide serialization and deserialization capabilities

Attributes

data: T abstractmethod property

Returns all stored knowledge (read-only access).

This is an abstract property that subclasses must implement. Subclasses may apply transformations to convert internal _data structure to the external Schema T if they differ (e.g., AutoSet converts OMem → List).

Returns:

Type Description
T

The internal knowledge data as a Pydantic model instance (or converted form).

data_schema: Type[T] property

Returns the Pydantic schema class used by this knowledge instance.

Returns:

Type Description
Type[T]

The Pydantic BaseModel subclass defining the knowledge structure.

Functions

parse(text: str) -> BaseAutoType[T]

Parses knowledge into a NEW instance without modifying the current one.

Use this for previewing data or branching knowledge abstracts.

Parameters:

Name Type Description Default
text str

Input text.

required

Returns:

Type Description
BaseAutoType[T]

A new knowledge instance containing only the parsed data.

feed_text(text: str) -> BaseAutoType[T]

Ingests text into the CURRENT knowledge abstract instance.

This modifies the internal state by merging new data with existing data. Supports method chaining (e.g., ka.feed_text(text1).feed_text(text2)).

Parameters:

Name Type Description Default
text str

Input text.

required

Returns:

Type Description
BaseAutoType[T]

Self (the current instance).

search(query: str, top_k: int = 3) -> List[Any] abstractmethod

Performs semantic search over the knowledge abstract.

Standard search workflow
  1. Ensure index is built (call build_index if needed)
  2. Use vector_store.similarity_search for retrieval
  3. Restore original data structures from Document.metadata

Subclasses can override this method to implement custom search logic.

Parameters:

Name Type Description Default
query str

Search query string.

required
top_k int

Number of results to return.

3

Returns:

Type Description
List[Any]

List of relevant knowledge items.

chat(query: str, top_k: int = 3) -> AIMessage

Performs a chat-like interaction with the knowledge abstract.

This generic method retrieves relevant items and generates a response. Subclasses with complex data structures (like graphs) should override this to provide better context formatting.

Parameters:

Name Type Description Default
query str

User query string.

required
top_k int

Number of relevant items to retrieve (default: 3).

3

Returns:

Type Description
AIMessage

An AIMessage object containing the LLM-generated response.

build_index() abstractmethod

Builds or rebuilds the vector index for semantic search.

Subclasses must implement this method to define how the vector index is constructed from the knowledge data. Uses FAISS as the vector store backend.

dump(folder_path: str | Path) -> None

Saves the entire knowledge abstract (data, metadata, index) to a directory.

This is the main entry point for serialization. It creates a directory structure: /folder_path |-- data.json (The structured knowledge data) |-- metadata.json (Metadata, localized config, timestamps) |-- /index (Vector store files, e.g., FAISS)

Parameters:

Name Type Description Default
folder_path str | Path

Target directory path.

required

load(folder_path: str | Path) -> None

Loads the entire knowledge abstract from a directory.

Parameters:

Name Type Description Default
folder_path str | Path

Source directory path.

required

empty() -> bool abstractmethod

Checks if the knowledge abstract is currently empty.

Returns:

Type Description
bool

True if no data is stored, False otherwise.

clear()

Clears all knowledge including data and vector index.