Test and unify text splitter functionality #1547
Open
+368
−130
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
The changes involve refactoring the text splitting methods, adding new helper functions, and updating the corresponding unit tests.
Proposed Changes
_split_text_on_tokens
function withsplit_multiple_texts_on_tokens
to improve control over the chunking process.split_text
method to handle both single and multiple texts, and added thesplit_single_text_on_tokens
andsplit_multiple_texts_on_tokens
functions for better text splittingChecklist