Run Processor

curl --location --request POST 'https://api-prod.extend.app/processor_runs' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <API_TOKEN>' \
--data '{
    "processorId": "dp_1234",
    "file": {
        "fileName": "example.pdf",
        "fileUrl": "https://test.s3.amazonaws.com/example.pdf"
    },
    "version": "1.0",
    "priority": 50,
    "metadata": {
        "internal_id": "id_1234"
    }
}'

{
  "success": true,
  "processorRun": {
    "object": "document_processor_run",
    "id": "dpr_1234",
    "output": null, // Will be null until the run is processed
    "processorId": "dp_5678",
    "processorVersionId": "dpv_91011",
    "processorName": "Invoice Extractor",
    "status": "PROCESSING",
    "metadata": {
      "internal_id": "id_1234"
    },
    "reviewed": false,
    "edited": false,
    "edits": null,
    "type": "EXTRACT",
    "config": {
      "schema": {
        "type": "object",
        "properties": {
          "invoice_number": {
            "type": ["string", "null"],
            "description": "The unique identifier for this invoice"
          },
          "amount": {
            "type": "object",
            "properties": {
              "value": {
                "type": ["number", "null"]
              },
              "iso_4217_currency_code": {
                "type": ["string", "null"]
              }
            },
            "required": ["value", "iso_4217_currency_code"],
            "additionalProperties": false
          }
        },
        "required": ["invoice_number", "amount"],
        "additionalProperties": false
      }
    },
    "files": [
      {
        "name": "example.pdf"
      }
    ],
    "url": "https://dashboard.extend.app/runs/dpr_1234"
  }
}

curl --location --request POST 'https://api-prod.extend.app/processor_runs' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <API_TOKEN>' \
--data '{
    "processorId": "dp_1234",
    "file": {
        "fileName": "example.pdf",
        "fileUrl": "https://test.s3.amazonaws.com/example.pdf"
    },
    "version": "1.0",
    "priority": 50,
    "metadata": {
        "internal_id": "id_1234"
    }
}'

{
  "success": true,
  "processorRun": {
    "object": "document_processor_run",
    "id": "dpr_1234",
    "output": null, // Will be null until the run is processed
    "processorId": "dp_5678",
    "processorVersionId": "dpv_91011",
    "processorName": "Invoice Extractor",
    "status": "PROCESSING",
    "metadata": {
      "internal_id": "id_1234"
    },
    "reviewed": false,
    "edited": false,
    "edits": null,
    "type": "EXTRACT",
    "config": {
      "schema": {
        "type": "object",
        "properties": {
          "invoice_number": {
            "type": ["string", "null"],
            "description": "The unique identifier for this invoice"
          },
          "amount": {
            "type": "object",
            "properties": {
              "value": {
                "type": ["number", "null"]
              },
              "iso_4217_currency_code": {
                "type": ["string", "null"]
              }
            },
            "required": ["value", "iso_4217_currency_code"],
            "additionalProperties": false
          }
        },
        "required": ["invoice_number", "amount"],
        "additionalProperties": false
      }
    },
    "files": [
      {
        "name": "example.pdf"
      }
    ],
    "url": "https://dashboard.extend.app/runs/dpr_1234"
  }
}

In general, the recommended way to integrate with Extend in production is via workflows, using the Run Workflow endpoint. This is due to several factors:

file parsing/pre-processing will automatically be reused across multiple processors, which will give you simplicity and cost savings given that many use cases will require multiple processors to be run on the same document.
workflows provide dedicated human in the loop document review, when needed.
workflows allow you to model and manage your pipeline with a single endpoint and corresponding UI for modeling and monitoring.

However, there are a number of legitimate use cases and systems where it might be easier to model the pipeline via code and run processors directly. This endpoint is provided for this purpose. Similar to workflow runs, processor runs are asynchronous and will return a status of PROCESSING until the run is complete. You can configure webhooks to receive notifications when a processor run is complete or failed.

Body

processorId

string

required

The ID of the processor that will process the input. This ID can be fetched from viewing the processor on the Extend platform.

file

object

A file object containing either a URL or base64 encoded content. Must contain either fileUrl or fileBase64. Presigned URLs are recommended for most production use cases. Supported file types can be found here.

Hide properties

fileName

string

required

The name of the file.

fileUrl

string

A presigned URL for the file. Though we will download immediately, we recommend a 5 - 15 minute expiration time.

fileBase64

string

deprecated

Base64 encoded content of the file. Can be used instead of fileUrl in development environments.DEPRECATED: This field is deprecated and will be removed in a future release. Use the /upload endpoint instead.

fileId

string

If you already have an Extend file id (for instance from running a parser or a previous file creation) then you can run a processor via file id, and any parsed data will be reused.

rawText

string

A raw string to be processed. Can be used in place of file when passing raw text data streams. Either file or rawText must be provided.

version

string

default:"latest"

An optional version of the processor to use. When not supplied, the most recent published version of the processor will be used. Special values include: - “latest” for the most recent published version (the default). If no published versions, the draft version will be used. - “draft” for the draft version. - Specific version numbers corresponding to versions your team has published, e.g. “1.0”, “2.2”, etc.

priority

number

default:"50"

An optional value used to determine the relative order of processor runs when rate limiting is in effect. Priority values must be an integer between 1 and 100 inclusive. Lower values will be prioritized before higher values. The default priority value is 50.

metadata

object

An optional object that can be passed in to identify the processor run in your systems. It will be returned in the response and webhooks.

config

object

An optional configuration object that can override the processor’s default configuration. The structure depends on the processor type. See the processor configuration documentation for more details.This config will not persist/save to the processor version, but will be used for this run.

curl --location --request POST 'https://api-prod.extend.app/processor_runs' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <API_TOKEN>' \
--data '{
    "processorId": "dp_1234",
    "file": {
        "fileName": "example.pdf",
        "fileUrl": "https://test.s3.amazonaws.com/example.pdf"
    },
    "version": "1.0",
    "priority": 50,
    "metadata": {
        "internal_id": "id_1234"
    }
}'

Response

success

boolean

A true or false value indicating whether the processor run was created successfully.

processorRun

object

Details about the created processor run.See the ProcessorRun object for more details.

Common errors

400 Bad Request

Returned when: - Required fields are missing (e.g., processorId) - Neither file nor rawText is provided - Invalid files: - The provided fileUrl is invalid - The provided fileBase64 is invalid - It’s an unsupported file type

The file is corrupted or otherwise cannot be downloaded - The priority value is outside the allowed range (must be between 1 and 100)

404 Not Found

Returned when: - The specified processor ID doesn’t exist - The specified processor version doesn’t exist

401 Unauthorized

Returned when: - The API token is missing - The API token is invalid

403 Forbidden

Returned when: - The authenticated workspace doesn’t have permission to access the specified processor - The API token doesn’t have sufficient permissions

{
  "success": true,
  "processorRun": {
    "object": "document_processor_run",
    "id": "dpr_1234",
    "output": null, // Will be null until the run is processed
    "processorId": "dp_5678",
    "processorVersionId": "dpv_91011",
    "processorName": "Invoice Extractor",
    "status": "PROCESSING",
    "metadata": {
      "internal_id": "id_1234"
    },
    "reviewed": false,
    "edited": false,
    "edits": null,
    "type": "EXTRACT",
    "config": {
      "schema": {
        "type": "object",
        "properties": {
          "invoice_number": {
            "type": ["string", "null"],
            "description": "The unique identifier for this invoice"
          },
          "amount": {
            "type": "object",
            "properties": {
              "value": {
                "type": ["number", "null"]
              },
              "iso_4217_currency_code": {
                "type": ["string", "null"]
              }
            },
            "required": ["value", "iso_4217_currency_code"],
            "additionalProperties": false
          }
        },
        "required": ["invoice_number", "amount"],
        "additionalProperties": false
      }
    },
    "files": [
      {
        "name": "example.pdf"
      }
    ],
    "url": "https://dashboard.extend.app/runs/dpr_1234"
  }
}

Create Workflow Get Processor Run

API Documentation

Workflow Endpoints

Processor Endpoints

Parse Endpoints

File Endpoints

Evaluation Set Endpoints

Objects

Guides

Webhooks

Body

Response

Common errors

API Documentation

Workflow Endpoints

Processor Endpoints

Parse Endpoints

File Endpoints

Evaluation Set Endpoints

Objects

Guides

Webhooks

​Body

​Response

​Common errors

Body

Response

Common errors