RecursiveCharacterTextSplitter

gitinsp.infrastructure.parser.RecursiveCharacterTextSplitter
See theRecursiveCharacterTextSplitter companion object
class RecursiveCharacterTextSplitter(separators: Option[List[String]], val keepSeparator: Either[Boolean, String], isSeparatorRegex: Boolean, val chunkSize: Int, val chunkOverlap: Int, val lengthFunction: String => Int, val addStartIndex: Boolean, val stripWhitespace: Boolean) extends TextSplitter

Attributes

Companion
object
Graph
Supertypes
class TextSplitter
trait LazyLogging
trait DocumentSplitter
class Object
trait Matchable
class Any
Show all

Members list

Value members

Concrete methods

override def splitText(text: String): List[String]

Overrides the abstract splitText method to initiate the recursive splitting process.

Overrides the abstract splitText method to initiate the recursive splitting process.

Attributes

Definition Classes

Inherited methods

def createDocuments(texts: List[String], metadatas: Option[List[Metadata]]): List[TextSegment]

Creates TextSegment objects from a list of texts and optional metadata. It calls the subclass's splitText implementation to get initial chunks and then formats them into TextSegments, potentially adding start index metadata.

Creates TextSegment objects from a list of texts and optional metadata. It calls the subclass's splitText implementation to get initial chunks and then formats them into TextSegments, potentially adding start index metadata.

Attributes

Inherited from:
TextSplitter
def createSegment(text: String, document: Document, index: Int): TextSegment

Attributes

Inherited from:
TextSplitter
def split(doc: Document): List[TextSegment]

Entry point for splitting a single Langchain4j Document.

Entry point for splitting a single Langchain4j Document.

Attributes

Inherited from:
TextSplitter
def splitAll(x$0: List[Document]): List[TextSegment]

Attributes

Inherited from:
DocumentSplitter
def splitDocuments(documents: Iterable[Document]): List[TextSegment]

Splits multiple documents into a list of TextSegments.

Splits multiple documents into a list of TextSegments.

Attributes

Inherited from:
TextSplitter

Concrete fields

override val addStartIndex: Boolean
override val chunkOverlap: Int
override val chunkSize: Int
override val keepSeparator: Either[Boolean, String]
override val lengthFunction: String => Int
override val stripWhitespace: Boolean

Inherited fields

lazy protected val logger: Logger

Attributes

Inherited from:
LazyLogging