TextSplitter

gitinsp.infrastructure.parser.TextSplitter
abstract class TextSplitter(val chunkSize: Int, val chunkOverlap: Int, val lengthFunction: String => Int, val keepSeparator: Either[Boolean, String], val addStartIndex: Boolean, val stripWhitespace: Boolean) extends DocumentSplitter, LazyLogging

Attributes

Graph
Supertypes
trait LazyLogging
trait DocumentSplitter
class Object
trait Matchable
class Any
Known subtypes

Members list

Value members

Abstract methods

def splitText(text: String): List[String]

Abstract method to be implemented by subclasses. Defines the core splitting logic for a given text.

Abstract method to be implemented by subclasses. Defines the core splitting logic for a given text.

Attributes

Concrete methods

def createDocuments(texts: List[String], metadatas: Option[List[Metadata]]): List[TextSegment]

Creates TextSegment objects from a list of texts and optional metadata. It calls the subclass's splitText implementation to get initial chunks and then formats them into TextSegments, potentially adding start index metadata.

Creates TextSegment objects from a list of texts and optional metadata. It calls the subclass's splitText implementation to get initial chunks and then formats them into TextSegments, potentially adding start index metadata.

Attributes

def createSegment(text: String, document: Document, index: Int): TextSegment
def split(doc: Document): List[TextSegment]

Entry point for splitting a single Langchain4j Document.

Entry point for splitting a single Langchain4j Document.

Attributes

def splitDocuments(documents: Iterable[Document]): List[TextSegment]

Splits multiple documents into a list of TextSegments.

Splits multiple documents into a list of TextSegments.

Attributes

Inherited methods

def splitAll(x$0: List[Document]): List[TextSegment]

Attributes

Inherited from:
DocumentSplitter

Concrete fields

val addStartIndex: Boolean
val chunkOverlap: Int
val chunkSize: Int
val keepSeparator: Either[Boolean, String]
val lengthFunction: String => Int
val stripWhitespace: Boolean

Inherited fields

lazy protected val logger: Logger

Attributes

Inherited from:
LazyLogging