In general, the strongly recommended way to integrate with Extend in production is via workflows, using the Run Workflow endpoint. This is due to several factors:

  • file parsing/pre-processing will automatically be reused across multiple processors, which will give you simplicity and cost savings given that many use cases will require multiple processors to be run on the same document.
  • workflows provide dedicated human in the loop document review, when needed.
  • workflows allow you to model and manage your pipeline with a single endpoint and corresponding UI for modeling and monitoring.

However, there are a number of legitimate use cases and systems where it might be easier to model the pipeline via code and run processors directly. This endpoint is provided for this purpose.

Similar to workflow runs, processor runs are asynchronous and will return a status of PROCESSING until the run is complete. You can configure webhooks to receive notifications when a processor run is complete or failed.

Common errors

400 Bad Request

Returned when: - Required fields are missing (e.g., processorId) - Neither file nor rawText is provided - Invalid files: - The provided fileUrl is invalid - The provided fileBase64 is invalid - It’s an unsupported file type

  • The file is corrupted or otherwise cannot be downloaded - The priority value is outside the allowed range (must be between 1 and 100)
404 Not Found

Returned when: - The specified processor ID doesn’t exist - The specified processor version doesn’t exist

401 Unauthorized

Returned when: - The API token is missing - The API token is invalid

403 Forbidden

Returned when: - The authenticated workspace doesn’t have permission to access the specified processor - The API token doesn’t have sufficient permissions

Body

processorId
string
required

The ID of the processor that will process the input. This ID can be fetched from viewing the processor on the Extend platform.

file
object

A file object containing either a URL or base64 encoded content. Must contain either fileUrl or fileBase64. Presigned URLs are recommended for most production use cases. Supported file types can be found here.

fileName
string
required

The name of the file.

fileUrl
string

A presigned URL for the file. Though we will download immediately, we recommend a 5 - 15 minute expiration time.

fileBase64
string

Base64 encoded content of the file. Can be used instead of fileUrl.

rawText
string

A raw string to be processed. Can be used in place of file when passing raw text data streams. Either file or rawText must be provided.

version
string
default:
"latest"

An optional version of the processor to use. When not supplied, the most recent published version of the processor will be used. Special values include: - “latest” for the most recent published version (the default). If no published versions, the draft version will be used. - “draft” for the draft version. - Specific version numbers corresponding to versions your team has published, e.g. “1.0”, “2.2”, etc.

priority
number
default:
"50"

An optional value used to determine the relative order of processor runs when rate limiting is in effect. Priority values must be an integer between 1 and 100 inclusive. Lower values will be prioritized before higher values. The default priority value is 50.

metadata
object

An optional object that can be passed in to identify the processor run in your systems. It will be returned in the response and webhooks.

config
object

An optional configuration object that can override the processor’s default configuration. The structure depends on the processor type. See the processor configuration documentation for more details.

This config will not persist/save to the processor version, but will be used for this run.

Response

success
boolean

A true or false value indicating whether the processor run was created successfully.

processorRun
object

Details about the created processor run.

See the ProcessorRun object for more details.