Skip to content

Record Types

Auto-Types for recording and storing data without entity relationships.


AutoModel

hyperextract.types.AutoModel

Bases: BaseAutoType[T]

AutoModel - extracts a single structured object from text.

This pattern is designed for extracting exactly one structured object per document, regardless of document length. Suitable for document-level information like summaries, metadata, or aggregate statistics.

Key characteristics
  • Extraction target: One unique structured object per document
  • Merge strategy: Configurable via MergeStrategy enum (supports LLM-powered intelligent merging)
    • MERGE_FIELD: Non-null fields overwrite, lists append (simple field merge)
    • LLM.BALANCED: LLM intelligently synthesizes both versions (default)
    • LLM.PREFER_EXISTING: LLM synthesis but prioritizes original data
    • LLM.PREFER_INCOMING: LLM synthesis but prioritizes new data
  • Indexing strategy: Each non-null field of the object is indexed independently
  • Processing: Uses LangChain native batch processing for efficient multi-chunk handling
  • Advanced merging: All chunk extractions are treated as the same object, triggering merge logic

Attributes

_strategy_or_merger = strategy_or_merger instance-attribute

_constructor_kwargs = kwargs instance-attribute

_label_extractor = label_extractor instance-attribute

_constant_key = 'singleton' instance-attribute

_key_extractor = lambda x: self._constant_key instance-attribute

_merger = strategy_or_merger instance-attribute

data: T property

Returns all stored knowledge (read-only access).

Returns:

Type Description
T

The internal knowledge data as a Pydantic model instance.

Functions

__init__(data_schema: Type[T], llm_client: BaseChatModel, embedder: Embeddings, *, strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, prompt: str = '', label_extractor: Callable[[T], str] = None, chunk_size: int = 2048, chunk_overlap: int = 256, max_workers: int = 10, verbose: bool = False, **kwargs)

Initialize AutoModel with schema and configuration.

Parameters:

Name Type Description Default
data_schema Type[T]

Pydantic BaseModel subclass defining the object structure.

required
llm_client BaseChatModel

Language model client for extraction.

required
embedder Embeddings

Embedding model for vector indexing.

required
strategy_or_merger MergeStrategy | BaseMerger

Merge strategy for multi-chunk results. Can be: - MergeStrategy enum value (e.g., MergeStrategy.MERGE_FIELD, MergeStrategy.LLM.BALANCED) - Custom BaseMerger instance Default: MergeStrategy.LLM.BALANCED (LLM intelligently synthesizes both versions)

BALANCED
prompt str

Custom extraction prompt (defaults to generic prompt).

''
label_extractor Callable[[T], str]

Optional function to extract label from model instance for visualization.

None
chunk_size int

Maximum characters per chunk for long texts.

2048
chunk_overlap int

Overlapping characters between adjacent chunks.

256
max_workers int

Maximum concurrent extraction tasks.

10
verbose bool

Whether to log progress information.

False

_create_empty_instance() -> AutoModel[T]

Creates a new empty instance with the same configuration.

Overrides parent method to include AutoModel-specific parameters.

Returns:

Type Description
AutoModel[T]

New AutoModel instance.

_default_prompt() -> str

Returns the default extraction prompt for single-object extraction.

empty() -> bool

Checks if the model is empty (no data stored).

Returns:

Type Description
bool

True if no data is stored, False otherwise.

_init_data_state() -> None

INIT/RESET: Initialize or reset to empty state (None). Called during init and when clear() is called.

_init_index_state() -> None

Initialize vector index to empty state.

_set_data_state(data: T) -> None

SET: Full reset. Replace with new data (e.g., load from disk). Called by parse() or load() where data IS the new state.

_update_data_state(incoming_data: T) -> None

UPDATE: Incremental merge. Merge fields with field-level update strategy (called by feed()).

For AutoModel, incremental update means filling missing fields, first extraction wins.

merge_batch_data(data_list: List[T]) -> T

Merge multiple extracted objects using configured strategy.

Leverages ontomem's merge strategies to intelligently combine results from multiple chunks. All extractions are treated as the same object (singleton) to trigger the merge logic.

Supported merge strategies: - MERGE_FIELD: Non-null fields overwrite, lists append (simple field merge) - LLM.BALANCED: LLM synthesizes both versions, balancing insights - LLM.PREFER_EXISTING: LLM synthesis prioritizing original data - LLM.PREFER_INCOMING: LLM synthesis prioritizing new data

Parameters:

Name Type Description Default
data_list List[T]

List of extracted data objects from batch processing to merge.

required

Returns:

Type Description
T

A new merged knowledge object with intelligently combined fields.

build_index()

Builds vector index from all non-null fields in the data object.

search(query: str, top_k: int = 3) -> List[Any]

Searches all indexed fields using semantic similarity.

Parameters:

Name Type Description Default
query str

Search query string.

required
top_k int

Number of results to return.

3

Returns:

Type Description
List[Any]

List of relevant knowledge items (field-value dictionaries).

dump_index(folder_path: str | Path) -> None

Saves FAISS vector index to disk.

load_index(folder_path: str | Path) -> None

Loads FAISS vector index from disk.

show(label_extractor: Callable[[T], str] = None, *, top_k: int = 3) -> None

Visualize the model using OntoSight.

Parameters:

Name Type Description Default
label_extractor Callable[[T], str]

Optional function to extract label from model instance for visualization. If not provided, uses the one from init.

None
top_k int

Number of items to retrieve for chat callback (default: 3).

3

__add__(other: Union[AutoModel, AutoList]) -> AutoList

Operator overload for '+' to combine AutoModel instances into AutoList.

Supports multiple combination patterns: - AutoModel + AutoModel → AutoList (create list from both items) - AutoModel + AutoList → AutoList (prepend model to list)

This enables intuitive chain operations like: unit1 + unit2 + unit3

Usage

unit1 = AutoModel(PersonSchema, ...) unit2 = AutoModel(PersonSchema, ...) person_list = unit1 + unit2 # → AutoList[PersonSchema]

Chain operations

unit3 = AutoModel(PersonSchema, ...) person_list = unit1 + unit2 + unit3 # → AutoList with 3 items

Parameters:

Name Type Description Default
other Union[AutoModel, AutoList]

Another AutoModel with the same data schema, or AutoList.

required

Returns:

Type Description
AutoList

AutoList containing both objects as items.

Raises:

Type Description
TypeError

If schemas don't match or invalid operand type.


AutoList

hyperextract.types.AutoList

Bases: BaseAutoType[AutoListSchema[ItemSchema]], Generic[ItemSchema]

AutoList - extracts a collection of objects from text.

This pattern extracts multiple independent objects from a document, suitable for extracting entities, events, references, or any collection of structured items.

Key characteristics
  • Extraction target: A collection of structured objects
  • Merge strategy: Append with basic deduplication (extensible by subclasses)
  • Indexing strategy: Each item in the list is indexed independently
Comparison with AutoModel
  • AutoModel: Extracts a single structured object (e.g., summary, metadata)
  • AutoList: Extracts multiple independent objects (e.g., entity list, event list)

Attributes

item_list_schema: Type[AutoListSchema[ItemSchema]] = create_model(container_name, items=(List[item_schema], Field(default_factory=list, description='Item list'))) instance-attribute

item_schema = item_schema instance-attribute

fields_for_index = fields_for_index instance-attribute

_item_label_extractor = item_label_extractor instance-attribute

items: List[ItemSchema] property

Returns the internal list of extracted items.

data: AutoListSchema property

Returns all stored knowledge (read-only access).

Returns:

Type Description
AutoListSchema

The internal knowledge data as AutoListSchema.

Functions

__init__(item_schema: Type[ItemSchema], llm_client: BaseChatModel, embedder: Embeddings, *, prompt: str = '', item_label_extractor: Callable[[ItemSchema], str] = None, chunk_size: int = 2048, chunk_overlap: int = 256, max_workers: int = 10, verbose: bool = False, fields_for_index: List[str] | None = None)

Initialize AutoList with item schema and configuration.

Parameters:

Name Type Description Default
item_schema Type[ItemSchema]

Pydantic BaseModel subclass for individual list items.

required
llm_client BaseChatModel

Language model client for extraction.

required
embedder Embeddings

Embedding model for vector indexing.

required
prompt str

Custom extraction prompt (defaults to list-oriented prompt).

''
item_label_extractor Callable[[ItemSchema], str]

Optional function to extract label from item for visualization.

None
chunk_size int

Maximum characters per chunk for long texts.

2048
chunk_overlap int

Overlapping characters between adjacent chunks.

256
max_workers int

Maximum concurrent extraction tasks.

10
verbose bool

Whether to log progress information.

False
fields_for_index List[str] | None

Optional list of field names to include in vector index. If None, all fields are indexed.

None

_default_prompt() -> str

Returns the default extraction prompt for list-based extraction.

_create_empty_instance() -> AutoList[ItemSchema]

Creates a new empty instance with the same configuration.

Overrides base class method to handle AutoList's item_schema parameter.

Returns:

Type Description
AutoList[ItemSchema]

A new AutoList instance with the same configuration.

empty() -> bool

Checks if the list is empty.

Returns:

Type Description
bool

True if no items are stored, False otherwise.

_init_data_state() -> None

INIT/RESET: Initialize or reset with empty schema. Called during init and when clear() is called.

_set_data_state(data: AutoListSchema) -> None

SET: Full reset. Replace with new data (e.g., load from disk). Called by parse() or load() where data IS the new state.

_update_data_state(incoming_data: AutoListSchema) -> None

UPDATE: Incremental merge. Append incoming items to current list (called by feed()).

For AutoList, incremental update means appending new items to existing list.

_init_index_state() -> None

Initialize vector index to empty state.

merge_batch_data(data_list: List[AutoListSchema]) -> AutoListSchema

Pure data merge method implementing list append strategy.

Merge strategy: Collects all items from all container objects and merges them into a single list. Used for aggregating extraction results from batch processing across multiple chunks. Subclasses can override this method to implement more sophisticated deduplication logic (e.g., AutoSet with custom key_extractor).

Parameters:

Name Type Description Default
data_list List[AutoListSchema]

List of container objects from batch processing to merge.

required

Returns:

Type Description
AutoListSchema

A new merged AutoListSchema object with combined items from all containers.

build_index() -> None

Builds independent vector index for each item in the list.

If fields_for_index is specified, only those fields are indexed. Otherwise, all fields are indexed.

search(query: str, top_k: int = 3) -> List[ItemSchema]

Searches items in the list using semantic similarity.

Parameters:

Name Type Description Default
query str

Search query string.

required
top_k int

Number of results to return.

3

Returns:

Type Description
List[ItemSchema]

List of relevant items.

dump_index(folder_path: str | Path) -> None

Saves FAISS vector index to disk.

load_index(folder_path: str | Path) -> None

Loads FAISS vector index from disk.

show(item_label_extractor: Callable[[ItemSchema], str] = None, *, top_k_for_search: int = 3, top_k_for_chat: int = 3) -> None

Visualize the list using OntoSight.

Parameters:

Name Type Description Default
item_label_extractor Callable[[ItemSchema], str]

Optional function to extract label from item for visualization. If not provided, uses the one from init.

None
top_k_for_search int

Number of items to retrieve for search callback (default: 3).

3
top_k_for_chat int

Number of items to retrieve for chat callback (default: 3).

3

__len__() -> int

Returns the number of elements in the list.

__getitem__(key: int | slice) -> ItemSchema | AutoList[ItemSchema]

Support index access and slicing.

Parameters:

Name Type Description Default
key int | slice

Integer index or slice object.

required

Returns:

Type Description
ItemSchema | AutoList[ItemSchema]
  • For integer index: Returns the Item at that position
ItemSchema | AutoList[ItemSchema]
  • For slice: Returns a new AutoList instance with sliced items

Raises:

Type Description
IndexError

If index is out of range.

TypeError

If key is neither int nor slice.

Examples:

>>> knowledge[0]           # First item
>>> knowledge[-1]          # Last item
>>> knowledge[1:3]         # New AutoList with items [1:3]
>>> knowledge[:5]          # First 5 items as new instance

__setitem__(index: int, item: ItemSchema) -> None

Support index assignment.

Parameters:

Name Type Description Default
index int

Position to set (supports negative indexing).

required
item ItemSchema

The item to set at that position.

required

Raises:

Type Description
TypeError

If item schema doesn't match.

IndexError

If index is out of range.

Examples:

>>> knowledge[0] = new_item
>>> knowledge[-1] = updated_item
Side Effects
  • Clears the vector index (needs rebuild)
  • Updates metadata timestamp

__add__(other: BaseAutoType[ItemSchema]) -> AutoList[ItemSchema]

Operator overload for '+' to combine knowledge instances.

Supports multiple combination patterns: - AutoList + AutoList → AutoList (merge lists) - AutoList + AutoModel → AutoList (append model to list)

This enables chain operations like: model1 + model2 + model3

Parameters:

Name Type Description Default
other BaseAutoType[ItemSchema]

Another AutoList or AutoModel with compatible schema.

required

Returns:

Type Description
AutoList[ItemSchema]

New AutoList with combined items.

Raises:

Type Description
TypeError

If schemas don't match or invalid operand type.

__delitem__(index: int) -> None

Support del operation for removing items by index.

Parameters:

Name Type Description Default
index int

Position to delete (supports negative indexing).

required

Raises:

Type Description
IndexError

If index is out of range.

Examples:

>>> del knowledge[0]      # Delete first item
>>> del knowledge[-1]     # Delete last item
Side Effects
  • Clears the vector index (needs rebuild)
  • Updates metadata timestamp

__iter__() -> Iterator[ItemSchema]

Support iteration over items.

Returns:

Type Description
Iterator[ItemSchema]

Iterator over items in the list.

Examples:

>>> for item in knowledge:
...     print(item.name)
>>> items_list = list(knowledge)
>>> names = [item.name for item in knowledge]

__contains__(item: ItemSchema) -> bool

Support 'in' operator for membership testing.

Parameters:

Name Type Description Default
item ItemSchema

The item to check for membership.

required

Returns:

Type Description
bool

True if item exists in the list, False otherwise.

Comparison Logic
  1. Check if item's model_fields match any item's schema
  2. If schemas match, compare model_dump() equality

Examples:

>>> if person in knowledge:
...     print("Person already exists")

__repr__() -> str

Return detailed string representation.

Returns:

Type Description
str

String in format: ClassNameItemSchema

Examples:

>>> repr(knowledge)
'AutoList[PersonSchema](5 items)'

__str__() -> str

Return human-readable string representation.

Returns:

Type Description
str

Brief description of the knowledge instance.

Examples:

>>> str(knowledge)
'AutoList with 5 PersonSchema items'

append(item: ItemSchema) -> None

Append a single item to the end of the list.

Parameters:

Name Type Description Default
item ItemSchema

The item to append.

required

Raises:

Type Description
TypeError

If item schema doesn't match item_schema.

Examples:

>>> knowledge.append(PersonSchema(name="Alice", age=30))
Side Effects
  • Clears the vector index (needs rebuild)
  • Updates metadata timestamp

extend(items: Iterable[ItemSchema] | AutoList[ItemSchema]) -> None

Extend the list by appending multiple items.

Parameters:

Name Type Description Default
items Iterable[ItemSchema] | AutoList[ItemSchema]

Iterable of items to append. Can be: - List of items - Another AutoList instance - Any iterable yielding items

required

Raises:

Type Description
TypeError

If any item's schema doesn't match item_schema.

Examples:

>>> knowledge.extend([person1, person2, person3])
>>> knowledge.extend(other_knowledge)
Side Effects
  • Clears the vector index (needs rebuild)
  • Updates metadata timestamp

insert(index: int, item: ItemSchema) -> None

Insert an item at a specific position.

Parameters:

Name Type Description Default
index int

Position to insert at (supports negative indexing).

required
item ItemSchema

The item to insert.

required

Raises:

Type Description
TypeError

If item schema doesn't match item_schema.

Examples:

>>> knowledge.insert(0, new_person)    # Insert at beginning
>>> knowledge.insert(-1, new_person)   # Insert before last
Side Effects
  • Clears the vector index (needs rebuild)
  • Updates metadata timestamp

remove(item: ItemSchema) -> None

Remove the first occurrence of an item from the list.

Parameters:

Name Type Description Default
item ItemSchema

The item to remove.

required

Raises:

Type Description
ValueError

If item is not found in the list.

Comparison Logic

Uses _items_equal() to find matching item.

Examples:

>>> knowledge.remove(person)
Side Effects
  • Clears the vector index (needs rebuild)
  • Updates metadata timestamp

pop(index: int = -1) -> ItemSchema

Remove and return an item at the given position.

Parameters:

Name Type Description Default
index int

Position to pop (default: -1, last item).

-1

Returns:

Type Description
ItemSchema

The removed item.

Raises:

Type Description
IndexError

If list is empty or index is out of range.

Examples:

>>> last_item = knowledge.pop()
>>> first_item = knowledge.pop(0)
Side Effects
  • Clears the vector index (needs rebuild)
  • Updates metadata timestamp

index(item: ItemSchema, start: int = 0, stop: int | None = None) -> int

Return the index of the first occurrence of item.

Parameters:

Name Type Description Default
item ItemSchema

The item to find.

required
start int

Start searching from this position (default: 0).

0
stop int | None

Stop searching at this position (default: end of list).

None

Returns:

Type Description
int

The index of the first matching item.

Raises:

Type Description
ValueError

If item is not found in the specified range.

Comparison Logic

Uses _items_equal() to match items.

Examples:

>>> idx = knowledge.index(person)
>>> idx = knowledge.index(person, 5, 10)  # Search in range [5:10]

count(item: ItemSchema) -> int

Return the number of times item appears in the list.

Parameters:

Name Type Description Default
item ItemSchema

The item to count.

required

Returns:

Type Description
int

Number of occurrences.

Comparison Logic

Uses _items_equal() to match items.

Examples:

>>> count = knowledge.count(person)

copy() -> AutoList[ItemSchema]

Create a deep copy of this AutoList instance.

Returns:

Type Description
AutoList[ItemSchema]

A new AutoList instance with copied items and metadata.

Note

The vector index is not copied; it needs to be rebuilt if needed.

Examples:

>>> backup = knowledge.copy()
>>> backup.append(new_item)  # Original unchanged

reverse() -> None

Reverse the items in place.

Examples:

>>> knowledge.reverse()
Side Effects
  • Rebuilds the vector index if it exists (to maintain consistency)
  • Updates metadata timestamp
Note

Does not call clear_index() since elements aren't modified, but rebuilds index to maintain metadata order consistency.

sort(key: Callable[[ItemSchema], Any] | None = None, reverse: bool = False) -> None

Sort the items in place.

Parameters:

Name Type Description Default
key Callable[[ItemSchema], Any] | None

Function to extract comparison key from each item. Must be provided since Items may not be directly comparable.

None
reverse bool

If True, sort in descending order (default: False).

False

Raises:

Type Description
TypeError

If key is not provided and items aren't comparable.

Examples:

>>> knowledge.sort(key=lambda x: x.name)
>>> knowledge.sort(key=lambda x: x.age, reverse=True)
Side Effects
  • Rebuilds the vector index if it exists (to maintain consistency)
  • Updates metadata timestamp
Note

Does not call clear_index() since elements aren't modified, but rebuilds index to maintain metadata order consistency.

_validate_item_schema(item: Any) -> None

Validate that item's schema matches item_schema.

Parameters:

Name Type Description Default
item Any

The item to validate.

required

Raises:

Type Description
TypeError

If schemas don't match, with detailed field difference.

_items_equal(item1: BaseModel, item2: BaseModel) -> bool

Check if two items are equal.

Parameters:

Name Type Description Default
item1 BaseModel

First item to compare.

required
item2 BaseModel

Second item to compare.

required

Returns:

Type Description
bool

True if items are equal, False otherwise.

Comparison Logic
  1. Check if both have the same model_fields (schema)
  2. If schemas match, compare model_dump() equality

AutoSet

hyperextract.types.AutoSet

Bases: BaseAutoType[AutoSetSchema[ItemSchema]], Generic[ItemSchema]

AutoSet - extracts a unique collection of objects.

This pattern automatically deduplicates items based on a user-specified key extractor function. Provides flexible merge strategies including LLM-powered intelligent merging for handling duplicates.

Key characteristics
  • Extraction target: A unique collection of structured objects
  • Deduplication: Based on key_extractor function (user-specified)
  • Merge strategy: Configurable via MergeStrategy enum:
    • KEEP_EXISTING: Preserve first (original) data, ignore updates
    • KEEP_INCOMING: Always use latest data, overwrite existing
    • MERGE_FIELD: Non-null fields overwrite, lists append (default)
    • LLM.BALANCED: LLM intelligently synthesizes both versions
    • LLM.PREFER_EXISTING: LLM synthesis but prioritizes original data
    • LLM.PREFER_INCOMING: LLM synthesis but prioritizes new data
    • LLM.CUSTOM_RULE: User-defined rules with dynamic context
  • Internal storage: Dict for O(1) lookup and deduplication
  • External interface: List (via items property)
  • Set operations: union (|), intersection (&), difference (-)
Comparison with AutoList
  • AutoList: Allows duplicates, simple append merge
  • AutoSet: Automatic deduplication, intelligent merge strategies
Example

class KeywordSchema(BaseModel): ... term: str ... category: str | None = None ... frequency: int | None = None

keywords = AutoSet( ... item_schema=KeywordSchema, ... llm_client=llm, ... embedder=embedder, ... key_extractor=lambda x: x.term, ... merge_item_strategy="field_merge" ... ) keywords.parse("Python is great. Python is powerful.") len(keywords) # Only 1 item (deduplicated) 1

Attributes

item_set_schema: Type[AutoSetSchema[ItemSchema]] = create_model(container_name, items=(List[item_schema], Field(default_factory=list, description='Set of unique items'))) instance-attribute

item_schema = item_schema instance-attribute

fields_for_index = fields_for_index instance-attribute

_constructor_kwargs = kwargs instance-attribute

key_extractor = key_extractor instance-attribute

strategy_or_merger = strategy_or_merger instance-attribute

_merger = strategy_or_merger instance-attribute

_data_memory: OMem[ItemSchema] = OMem(memory_schema=item_schema, key_extractor=key_extractor, llm_client=llm_client, embedder=embedder, strategy_or_merger=(self._merger), verbose=verbose, fields_for_index=fields_for_index) instance-attribute

_item_label_extractor = item_label_extractor instance-attribute

data: AutoSetSchema[ItemSchema] property

Returns all stored knowledge (read-only access).

Returns:

Type Description
AutoSetSchema[ItemSchema]

The internal knowledge data as a Pydantic model instance.

items: List[ItemSchema] property

Returns the internal items as a list (for external interface compatibility).

Returns:

Type Description
List[ItemSchema]

List of unique items.

keys: List[Any] property

Returns all unique key values.

Returns:

Type Description
List[Any]

List of unique key values.

Functions

__init__(item_schema: Type[ItemSchema], llm_client: BaseChatModel, embedder: Embeddings, key_extractor: Callable[[ItemSchema], Any], *, strategy_or_merger: MergeStrategy | BaseMerger = MergeStrategy.LLM.BALANCED, prompt: str = '', item_label_extractor: Callable[[ItemSchema], str] = None, chunk_size: int = 2048, chunk_overlap: int = 256, max_workers: int = 10, verbose: bool = False, fields_for_index: List[str] | None = None, **kwargs: Any)

Initialize AutoSet with key extractor and merge strategy.

Parameters:

Name Type Description Default
item_schema Type[ItemSchema]

Pydantic BaseModel subclass for individual items.

required
llm_client BaseChatModel

Language model client for extraction and merging.

required
embedder Embeddings

Embedding model for vector indexing.

required
key_extractor Callable[[ItemSchema], Any]

Function to extract unique key from an item (required).

required
strategy_or_merger MergeStrategy | BaseMerger

Merge strategy or pre-configured merger instance. Can be: 1. A MergeStrategy enum value (e.g., MergeStrategy.LLM.BALANCED) 2. A pre-configured BaseMerger instance (for full control)

BALANCED
prompt str

Custom extraction prompt.

''
item_label_extractor Callable[[ItemSchema], str]

Optional function to extract label from item for visualization.

None
chunk_size int

Maximum characters per chunk for long texts.

2048
chunk_overlap int

Overlapping characters between adjacent chunks.

256
max_workers int

Maximum concurrent extraction tasks.

10
verbose bool

Whether to display detailed execution logs and progress information.

False
fields_for_index List[str] | None

Optional list of field names in item_schema to include in vector index. If None, all text fields are indexed by default. Useful for optimizing search on complex schemas. Example: ['name', 'summary'] (only index these fields)

None
**kwargs Any

Additional arguments passed to create_merger() when strategy_or_merger is a MergeStrategy enum. Ignored if strategy_or_merger is a BaseMerger instance.

{}

_create_empty_instance() -> AutoSet[ItemSchema]

Creates a new empty instance with the same configuration.

Overrides parent method to include AutoSet-specific parameters.

Returns:

Type Description
AutoSet[ItemSchema]

New AutoSet instance with identical configuration.

_default_prompt() -> str

Returns the default extraction prompt for set-based extraction.

empty() -> bool

Checks if the set is empty.

Returns:

Type Description
bool

True if no items are stored, False otherwise.

_init_data_state() -> None

INIT/RESET: Initialize or reset OMem as empty. Called during init and when clear() is called.

_init_index_state() -> None

Initialize vector index to empty state.

_set_data_state(data: AutoSetSchema[ItemSchema]) -> None

SET: Full Reset. Wipe OMem and refill from data (e.g., load from disk). Called by parse() or load() where data IS the new state.

_update_data_state(incoming_data: AutoSetSchema[ItemSchema]) -> None

UPDATE: Incremental merge. Add to OMem efficiently (called by feed()).

Unlike the default behavior which uses merge_batch for full re-merge, AutoSet optimizes this by directly adding items to OMem, which handles deduplication and merging internally.

merge_batch_data(data_list: List[AutoSetSchema[ItemSchema]]) -> AutoSetSchema[ItemSchema]

Merges multiple data containers with automatic deduplication.

Pure function: Does not modify internal state. Delegates to OMem's merge strategy for efficient deduplication and merging. All merge strategies are handled by the Merger implementation in OMem.

Parameters:

Name Type Description Default
data_list List[AutoSetSchema[ItemSchema]]

List of container objects from batch processing to merge.

required

Returns:

Type Description
AutoSetSchema[ItemSchema]

New merged AutoSetSchema with deduplicated items and resolved conflicts.

build_index(force: bool = False) -> None

Build/rebuild independent vector index for each item in the set.

Parameters:

Name Type Description Default
force bool

If True, forces rebuilding the index even if it already exists.

False

search(query: str, top_k: int = 3) -> List[ItemSchema]

Searches items in the set using semantic similarity.

Parameters:

Name Type Description Default
query str

Search query string.

required
top_k int

Number of results to return.

3

Returns:

Type Description
List[ItemSchema]

List of relevant items.

dump_index(folder_path: str | Path) -> None

Saves FAISS vector index to disk.

load_index(folder_path: str | Path) -> None

Loads FAISS vector index from disk.

show(item_label_extractor: Callable[[ItemSchema], str] = None, *, top_k_for_search: int = 3, top_k_for_chat: int = 3) -> None

Visualize the set using OntoSight.

Parameters:

Name Type Description Default
item_label_extractor Callable[[ItemSchema], str]

Optional function to extract label from item for visualization. If not provided, uses the one from init.

None
top_k_for_search int

Number of items to retrieve for search callback (default: 3).

3
top_k_for_chat int

Number of items to retrieve for chat callback (default: 3).

3

__len__() -> int

Returns the number of unique items in the set.

__contains__(key: Any) -> bool

Checks if a unique key exists in the set.

Parameters:

Name Type Description Default
key Any

The unique key value to check.

required

Returns:

Type Description
bool

True if key exists, False otherwise.

__repr__() -> str

Returns a developer-friendly representation.

__str__() -> str

Returns a user-friendly string representation.

__iter__() -> Iterator[ItemSchema]

Enables iteration over all items in the set.

Yields:

Type Description
ItemSchema

Iterator over all unique items.

Examples:

>>> for skill in skills:
...     print(skill.name)
>>> names = [s.name for s in skills]

add(item: ItemSchema) -> None

Adds a single item to the set with automatic deduplication.

Parameters:

Name Type Description Default
item ItemSchema

The item to add.

required

remove(key: Any) -> Optional[ItemSchema]

Removes an item by its unique key value.

Parameters:

Name Type Description Default
key Any

The unique key value to remove.

required

Returns:

Type Description
Optional[ItemSchema]

The removed item, or None if not found.

contains(key: Any) -> bool

Checks if an item with the given key exists in the set.

Parameters:

Name Type Description Default
key Any

The unique key value to check.

required

Returns:

Type Description
bool

True if key exists, False otherwise.

get(key: Any, default: Optional[ItemSchema] = None) -> Optional[ItemSchema]

Gets an item by its unique key value.

Parameters:

Name Type Description Default
key Any

The unique key value to retrieve.

required
default Optional[ItemSchema]

Default value if key not found.

None

Returns:

Type Description
Optional[ItemSchema]

The item if found, otherwise default.

update(items: List[ItemSchema]) -> None

Batch adds multiple items.

Parameters:

Name Type Description Default
items List[ItemSchema]

List of items to add.

required

discard(key: Any) -> None

Removes an item by its unique key value, silently ignoring if not found.

Unlike remove(), this method does not raise an error if the key does not exist.

Parameters:

Name Type Description Default
key Any

The unique key value to remove.

required

Examples:

>>> skills.discard("Python")  # No error if not found
Side Effects
  • Clears the vector index (needs rebuild)
  • Updates metadata timestamp

pop() -> ItemSchema

Removes and returns an arbitrary item from the set.

Returns:

Type Description
ItemSchema

The removed item.

Raises:

Type Description
KeyError

If the set is empty.

Examples:

>>> skill = skills.pop()
>>> print(f"Removed: {skill.name}")
Side Effects
  • Clears the vector index (needs rebuild)
  • Updates metadata timestamp

copy() -> AutoSet[ItemSchema]

Creates a deep copy of the set.

Returns:

Type Description
AutoSet[ItemSchema]

A new AutoSet instance with copies of all items.

Examples:

>>> backup = skills.copy()
>>> backup.add(new_skill)
>>> # Original skills unchanged

__or__(other: AutoSet[ItemSchema]) -> AutoSet[ItemSchema]

Union operation: set1 | set2.

Returns a new set containing all items from both sets.

Parameters:

Name Type Description Default
other AutoSet[ItemSchema]

Another AutoSet instance.

required

Returns:

Type Description
AutoSet[ItemSchema]

New AutoSet with union of items.

__and__(other: AutoSet[ItemSchema]) -> AutoSet[ItemSchema]

Intersection operation: set1 & set2.

Returns a new set containing only items present in both sets.

Parameters:

Name Type Description Default
other AutoSet[ItemSchema]

Another AutoSet instance.

required

Returns:

Type Description
AutoSet[ItemSchema]

New AutoSet with intersection of items.

__sub__(other: AutoSet[ItemSchema]) -> AutoSet[ItemSchema]

Difference operation: set1 - set2.

Returns a new set containing items in self but not in other.

Parameters:

Name Type Description Default
other AutoSet[ItemSchema]

Another AutoSet instance.

required

Returns:

Type Description
AutoSet[ItemSchema]

New AutoSet with difference of items.

__xor__(other: AutoSet[ItemSchema]) -> AutoSet[ItemSchema]

Symmetric difference operation: set1 ^ set2.

Returns a new set containing items in either set but not in both.

Parameters:

Name Type Description Default
other AutoSet[ItemSchema]

Another AutoSet instance.

required

Returns:

Type Description
AutoSet[ItemSchema]

New AutoSet with symmetric difference of items.

union(other: AutoSet[ItemSchema]) -> AutoSet[ItemSchema]

Union operation (named method).

intersection(other: AutoSet[ItemSchema]) -> AutoSet[ItemSchema]

Intersection operation (named method).

difference(other: AutoSet[ItemSchema]) -> AutoSet[ItemSchema]

Difference operation (named method).

symmetric_difference(other: AutoSet[ItemSchema]) -> AutoSet[ItemSchema]

Symmetric difference operation (named method).

__eq__(other: Any) -> bool

Equality comparison: set1 == set2.

Two sets are equal if they have the same schema and key set. Note: Does not compare item contents, only keys.

Parameters:

Name Type Description Default
other Any

Another object to compare with.

required

Returns:

Type Description
bool

True if both sets have the same keys, False otherwise.

Examples:

>>> skills1 == skills2  # True if same keys

__ne__(other: Any) -> bool

Inequality comparison: set1 != set2.

Returns:

Type Description
bool

True if sets are not equal, False otherwise.

__le__(other: AutoSet[ItemSchema]) -> bool

Subset comparison: set1 <= set2.

Parameters:

Name Type Description Default
other AutoSet[ItemSchema]

Another AutoSet instance.

required

Returns:

Type Description
bool

True if self is a subset of other (all keys in self are in other).

Raises:

Type Description
TypeError

If other is not a AutoSet or has different schema.

Examples:

>>> skills1 <= skills2  # True if skills1 is subset of skills2

__lt__(other: AutoSet[ItemSchema]) -> bool

Proper subset comparison: set1 < set2.

Parameters:

Name Type Description Default
other AutoSet[ItemSchema]

Another AutoSet instance.

required

Returns:

Type Description
bool

True if self is a proper subset of other (subset and not equal).

Examples:

>>> skills1 < skills2  # True if skills1 is proper subset

__ge__(other: AutoSet[ItemSchema]) -> bool

Superset comparison: set1 >= set2.

Parameters:

Name Type Description Default
other AutoSet[ItemSchema]

Another AutoSet instance.

required

Returns:

Type Description
bool

True if self is a superset of other (all keys in other are in self).

Examples:

>>> skills1 >= skills2  # True if skills1 is superset of skills2

__gt__(other: AutoSet[ItemSchema]) -> bool

Proper superset comparison: set1 > set2.

Parameters:

Name Type Description Default
other AutoSet[ItemSchema]

Another AutoSet instance.

required

Returns:

Type Description
bool

True if self is a proper superset of other (superset and not equal).

Examples:

>>> skills1 > skills2  # True if skills1 is proper superset

issubset(other: AutoSet[ItemSchema]) -> bool

Test whether every key in the set is in other.

Parameters:

Name Type Description Default
other AutoSet[ItemSchema]

Another AutoSet instance.

required

Returns:

Type Description
bool

True if self is a subset of other.

Examples:

>>> skills1.issubset(skills2)

issuperset(other: AutoSet[ItemSchema]) -> bool

Test whether every key in other is in the set.

Parameters:

Name Type Description Default
other AutoSet[ItemSchema]

Another AutoSet instance.

required

Returns:

Type Description
bool

True if self is a superset of other.

Examples:

>>> skills1.issuperset(skills2)

isdisjoint(other: AutoSet[ItemSchema]) -> bool

Test whether the set has no keys in common with other.

Parameters:

Name Type Description Default
other AutoSet[ItemSchema]

Another AutoSet instance.

required

Returns:

Type Description
bool

True if the two sets have no keys in common.

Raises:

Type Description
TypeError

If other is not a AutoSet or has different schema.

Examples:

>>> skills1.isdisjoint(skills2)  # True if no common skills