IngestionStrategy

gitinsp.domain.interfaces.infrastructure.IngestionStrategy

Strategy for preprocessing and preparing documents for vector database ingestion Implements different approaches for transforming and splitting content based on content type

Attributes

Graph
Supertypes
class Object
trait Matchable
class Any
Known subtypes

Members list

Value members

Abstract methods

def documentSplitter(lang: Language, chunkSize: Int, overlap: Int): DocumentSplitter

Creates a document splitter appropriate for the given language

Creates a document splitter appropriate for the given language

Value parameters

chunkSize

The target size of each document chunk

lang

The programming language of the document content

overlap

The number of tokens to overlap between chunks

Attributes

Returns

A DocumentSplitter configured for the language

def documentTransformer(document: Document): Document

Transforms a document before splitting it into segments

Transforms a document before splitting it into segments

Value parameters

document

The document to transform

Attributes

Returns

The transformed document

def textSegmentTransformer(textSegment: TextSegment): TextSegment

Transforms a text segment before storing it in the vector database

Transforms a text segment before storing it in the vector database

Value parameters

textSegment

The text segment to transform

Attributes

Returns

The transformed text segment