Create Processor

curl --request POST \
  --url https://api-prod.extend.app/v1/processors \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "name": "<string>",
  "type": "<string>",
  "cloneProcessorId": "<string>",
  "config": {
    "type": "<string>",
    "baseProcessor": "<string>",
    "baseVersion": "<string>",
    "schema": {},
    "fields": [
      {
        "id": "<string>",
        "name": "<string>",
        "type": "<string>",
        "description": "<string>",
        "schema": [
          {}
        ],
        "enum": [
          {
            "value": "<string>",
            "description": "<string>"
          }
        ]
      }
    ],
    "extractionRules": "<string>",
    "advancedOptions": {
      "fixedPageLimit": 123,
      "splitMethod": "<string>",
      "splitIdentifierRules": "<string>",
      "splitExcelDocumentsBySheetEnabled": true
    },
    "classifications": [
      {
        "id": "<string>",
        "type": "<string>",
        "description": "<string>"
      }
    ],
    "classificationRules": "<string>",
    "splitClassifications": [
      {
        "id": "<string>",
        "type": "<string>",
        "description": "<string>"
      }
    ],
    "splitRules": "<string>"
  }
}'

{
  "success": true,
  "processor": {
    "object": "document_processor",
    "id": "processor_1234",
    "name": "New Invoice Processor",
    "type": "EXTRACT",
    "createdAt": "2024-03-01T12:00:00Z",
    "updatedAt": "2024-03-01T12:00:00Z"
  }
}

Create Processor

curl --request POST \
  --url https://api-prod.extend.app/v1/processors \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "name": "<string>",
  "type": "<string>",
  "cloneProcessorId": "<string>",
  "config": {
    "type": "<string>",
    "baseProcessor": "<string>",
    "baseVersion": "<string>",
    "schema": {},
    "fields": [
      {
        "id": "<string>",
        "name": "<string>",
        "type": "<string>",
        "description": "<string>",
        "schema": [
          {}
        ],
        "enum": [
          {
            "value": "<string>",
            "description": "<string>"
          }
        ]
      }
    ],
    "extractionRules": "<string>",
    "advancedOptions": {
      "fixedPageLimit": 123,
      "splitMethod": "<string>",
      "splitIdentifierRules": "<string>",
      "splitExcelDocumentsBySheetEnabled": true
    },
    "classifications": [
      {
        "id": "<string>",
        "type": "<string>",
        "description": "<string>"
      }
    ],
    "classificationRules": "<string>",
    "splitClassifications": [
      {
        "id": "<string>",
        "type": "<string>",
        "description": "<string>"
      }
    ],
    "splitRules": "<string>"
  }
}'

{
  "success": true,
  "processor": {
    "object": "document_processor",
    "id": "processor_1234",
    "name": "New Invoice Processor",
    "type": "EXTRACT",
    "createdAt": "2024-03-01T12:00:00Z",
    "updatedAt": "2024-03-01T12:00:00Z"
  }
}

This endpoint allows you to create a new processor or clone an existing one. Typically processors are created and configured in the Extend Studio, but this endpoint can be used to create processors programmatically in order to sync ID’s across systems.

Body

name

string

required

The name of the new processor.

type

string

required

The type of the processor.

cloneProcessorId

string

The ID of an existing processor to clone. If provided, a new processor will be created that clones the config of this processor.

config

object

Optionally supply a config to be used when creating a processor. Any fields not supplied will use the listed defaults.

Show properties

type

string

required

Must be "EXTRACT" for extraction processors.

baseProcessor

string

The base processor to use. For extractors, this is either "extraction_performance" or "extraction_light". See the base processor documentation for more details.

baseVersion

string

The version of the base processor to use (e.g. "4.0.0"). If this is provided, baseProcessor must be provided as well. See the processor changelog for available versions.

schema

object

The schema that defines the structure of data to extract from documents. One of schema or fields must be provided. We recommend using schema as fields is deprecated. See the extraction processor schema documentation for more details.

fields

array

deprecated

Show properties

string

required

Unique identifier for the field.

name

string

required

Human-readable name for the field.

type

string

required

Type of the field. Supported values:

string: Text values
number: Numeric values
currency: Monetary values
boolean: True/false values
date: Date values
array: Lists of values (requires schema)
enum: Values from a predefined list (requires enum)
object: Nested structure (requires schema)
signature: Signature information

description

string

required

Detailed description of the field, including expected content and format.

schema

array

Required when type is “array” or “object”. Contains nested field definitions.

enum

array

Required when type is “enum”. List of allowed values.

Show properties

value

string

required

The enum value.

description

string

required

Description of the enum value.

extractionRules

string

Custom rules to guide the extraction process in natural language.

advancedOptions

object

Advanced configuration options.

Show properties

fixedPageLimit

number

Limit processing to a specific number of pages from the beginning of the document.

documentKind

string

Provide a hint about the document type (e.g. “invoice”, “receipt”, etc.).

keyDefinitions

string

Define specific key terms or concepts relevant to the document type.

modelReasoningInsightsEnabled

boolean

Enable model reasoning insights in the extraction results.

advancedMultimodalEnabled

boolean

Enable advanced multimodal processing for better handling of visual elements.

citationsEnabled

boolean

Enable citation information for extracted fields.

advancedFigureParsingEnabled

boolean

Enable advanced parsing of figures and diagrams in the document.

chunkingOptions

object

Options for controlling document chunking.

Show properties

chunkingStrategy

string

Strategy for chunking the document. Supported values:

standard: Default chunking strategy
semantic: Content-aware chunking based on document structure

customSemanticChunkingRules

string

Custom rules for semantic chunking in natural language.

pageChunkSize

number

Number of pages per chunk.

chunkSelectionStrategy

string

Strategy for selecting chunks. Supported values:

intelligent: AI-based selection
confidence: Select based on confidence score
take_first: Always use first chunk
take_last: Always use last chunk

Response

success

boolean

A true or false value indicating whether the processor was created successfully or not.

processor

DocumentProcessor

A Processor object representing the newly created processor. See the DocumentProcessor object for more details.

Error Responses

success

boolean

Will be false if the request failed.

error

string

A description of the error that occurred.

Possible Common Errors

400 Bad Request: If the request body fails schema validation.
404 Not Found: If the processor to clone (cloneProcessorId) is not found.

{
  "success": true,
  "processor": {
    "object": "document_processor",
    "id": "processor_1234",
    "name": "New Invoice Processor",
    "type": "EXTRACT",
    "createdAt": "2024-03-01T12:00:00Z",
    "updatedAt": "2024-03-01T12:00:00Z"
  }
}

Get Batch Processor Run Update Processor

API Documentation

Workflow Endpoints

Processor Endpoints

Parse Endpoints

File Endpoints

Evaluation Set Endpoints

Objects

Guides

Webhooks

Create Processor

Body

Response

Error Responses

Possible Common Errors

API Documentation

Workflow Endpoints

Processor Endpoints

Parse Endpoints

File Endpoints

Evaluation Set Endpoints

Objects

Guides

Webhooks

​Body

​Response

​Error Responses

​Possible Common Errors

Body

Response

Error Responses

Possible Common Errors