Authorizations
Path Parameters
Body
JSON request to update an task
Controls the setting for the chunking and post-processing of each chunk.
The number of seconds until task is deleted. Expried tasks can not be updated, polled or accessed via web interface.
Whether to use high-resolution images for cropping and post-processing. (Latency penalty: ~7 seconds per page)
Controls the Optical Character Recognition (OCR) strategy.
All: Processes all pages with OCR. (Latency penalty: ~0.5 seconds per page)Auto: Selectively applies OCR only to pages with missing or low-quality text. When text layer is present the bounding boxes from the text layer are used.
All, Auto The pipeline to use for processing. If pipeline is set to Azure then Azure layout analysis will be used for segmentation and OCR. The output will be unified to the Chunkr output.
Azure Controls the post-processing of each segment type.
Allows you to generate HTML and Markdown from chunkr models for each segment type.
By default, the HTML and Markdown are generated manually using the segmentation information except for Table and Formula.
You can optionally configure custom LLM prompts and models to generate an additional llm field
with LLM-processed content for each segment type.
Controls the segmentation strategy:
LayoutAnalysis: Analyzes pages for layout elements (e.g.,Table,Picture,Formula, etc.) using bounding boxes. Provides fine-grained segmentation and better chunking. (Latency penalty: ~TBD seconds per page).Page: Treats each page as a single segment. Faster processing, but without layout element detection and only simple chunking.
LayoutAnalysis, Page Response
Detailed information describing the task, its status and processed outputs
The configuration used for the task.
The date and time when the task was created and queued.
A message describing the task's status or any errors that occurred.
The status of the task.
Starting, Processing, Succeeded, Failed, Cancelled The unique identifier for the task.
The date and time when the task will expire.
The date and time when the task was finished.
The processed results of a document analysis task
The date and time when the task was started.
The presigned URL of the task.