Run Processor
Run processors (extraction, classification, splitting, etc.) on a given document.
In general, the strongly recommended way to integrate with Extend in production is via workflows, using the Run Workflow endpoint. This is due to several factors:
- file parsing/pre-processing will automatically be reused across multiple processors, which will give you simplicity and cost savings given that many use cases will require multiple processors to be run on the same document.
- workflows provide dedicated human in the loop document review, when needed.
- workflows allow you to model and manage your pipeline with a single endpoint and corresponding UI for modeling and monitoring.
However, there are a number of legitimate use cases and systems where it might be easier to model the pipeline via code and run processors directly. This endpoint is provided for this purpose.
Similar to workflow runs, processor runs are asynchronous and will return a status of PROCESSING
until the run is complete.
You can configure webhooks to receive notifications when a processor run is complete or failed.
Common errors
Returned when: - Required fields are missing (e.g., processorId
) - Neither
file
nor rawText
is provided - Invalid files: - The provided fileUrl
is
invalid - The provided fileBase64
is invalid - It’s an unsupported file type
- The file is corrupted or otherwise cannot be downloaded - The
priority
value is outside the allowed range (must be between 1 and 100)
Returned when: - The specified processor ID doesn’t exist - The specified processor version doesn’t exist
Returned when: - The API token is missing - The API token is invalid
Returned when: - The authenticated workspace doesn’t have permission to access the specified processor - The API token doesn’t have sufficient permissions
Body
The ID of the processor that will process the input. This ID can be fetched from viewing the processor on the Extend platform.
A file object containing either a URL or base64 encoded content. Must contain either fileUrl or fileBase64. Presigned URLs are recommended for most production use cases. Supported file types can be found here.
A raw string to be processed. Can be used in place of file when passing raw text data streams. Either file or rawText must be provided.
An optional version of the processor to use. When not supplied, the most recent published version of the processor will be used. Special values include: - “latest” for the most recent published version (the default). If no published versions, the draft version will be used. - “draft” for the draft version. - Specific version numbers corresponding to versions your team has published, e.g. “1.0”, “2.2”, etc.
An optional value used to determine the relative order of processor runs when rate limiting is in effect. Priority values must be an integer between 1 and 100 inclusive. Lower values will be prioritized before higher values. The default priority value is 50.
An optional object that can be passed in to identify the processor run in your systems. It will be returned in the response and webhooks.
An optional configuration object that can override the processor’s default configuration. The structure depends on the processor type. See the processor configuration documentation for more details.
This config will not persist/save to the processor version, but will be used for this run.
Response
A true or false value indicating whether the processor run was created successfully.
Details about the created processor run.
See the ProcessorRun object for more details.