Configuration
You can configure intelligent chunking by setting thetarget_chunk_length parameter. This is the approximate number of words a chunk can contain.
Intelligent Chunking
The chunking algorithm works as follows:- Remove headers and footers
- Add segments to a chunk until we hit a breaking condition, or if the chunk length >=
target_chunk_length.
Breaking Conditions
We go down the segment hierarchy (from Title -> Section header -> Other). Once we hit asegment_type that is higher in the hierarchy than the current segment type, we break the chunk.
Turning it off
Settingtarget_chunk_length to 0 will turn off intelligent chunking, and each chunk will contain a single segment. Click here to learn more about the chunk model.