Run processors (extraction, classification, splitting, etc.) on a given document.
curl--location--request POST 'https://api-prod.extend.app/v1/processor_runs'\--header'Content-Type: application/json'\--header'Authorization: Bearer <API_TOKEN>'\--data '{"processorId":"dp_1234","file":{"fileName":"example.pdf","fileUrl":"https://test.s3.amazonaws.com/example.pdf"},"version":"1.0","priority":50,"metadata":{"internal_id":"id_1234"}}'
{"success":true,"processorRun":{"object":"document_processor_run","id":"dpr_1234","output":null,// Will be null until the run is processed"processorId":"dp_5678","processorVersionId":"dpv_91011","processorName":"Invoice Extractor","status":"PROCESSING","metadata":{"internal_id":"id_1234"},"reviewed":false,"edited":false,"edits":null,"type":"EXTRACT","config":{"fields":[{"id":"total_amount","name":"Total Amount","type":"currency"}]},"files":[{"name":"example.pdf"}],"url":"https://dashboard.extend.app/runs/dpr_1234"}}
In general, the recommended way to integrate with Extend in production is via workflows, using the Run Workflow endpoint.
This is due to several factors:
file parsing/pre-processing will automatically be reused across multiple processors, which will give you simplicity and cost savings given that many use cases will require multiple processors to be run on the same document.
workflows provide dedicated human in the loop document review, when needed.
workflows allow you to model and manage your pipeline with a single endpoint and corresponding UI for modeling and monitoring.
However, there are a number of legitimate use cases and systems where it might be easier to model the pipeline via code and run processors directly. This endpoint is provided for this purpose.
Similar to workflow runs, processor runs are asynchronous and will return a status of PROCESSING until the run is complete.
You can configure webhooks to receive notifications when a processor run is complete or failed.
A file object containing either a URL or base64 encoded content. Must contain
either fileUrl or fileBase64. Presigned URLs are recommended for most
production use cases. Supported file types can be found
here.
If you already have an Extend file id (for instance from running a parser or a previous file creation) then you can
run a processor via file id, and any parsed data will be reused.
An optional version of the processor to use. When not supplied, the most
recent published version of the processor will be used. Special values
include: - “latest” for the most recent published version (the default). If no
published versions, the draft version will be used. - “draft” for the draft
version. - Specific version numbers corresponding to versions your team has
published, e.g. “1.0”, “2.2”, etc.
An optional value used to determine the relative order of processor runs when
rate limiting is in effect. Priority values must be an integer between 1 and
100 inclusive. Lower values will be prioritized before higher values. The
default priority value is 50.
An option to override the config of the processor. Any fields that are not provided will use the processor’s existing configuration, based on the version you specify.
Your config will take one of three shapes depending on the processor type:
Returned when: - Required fields are missing (e.g., processorId) - Neither
file nor rawText is provided - Invalid files: - The provided fileUrl is
invalid - The provided fileBase64 is invalid - It’s an unsupported file type
The file is corrupted or otherwise cannot be downloaded - The priority
value is outside the allowed range (must be between 1 and 100)
Returned when: - The authenticated workspace doesn’t have permission to access
the specified processor - The API token doesn’t have sufficient permissions
{"success":true,"processorRun":{"object":"document_processor_run","id":"dpr_1234","output":null,// Will be null until the run is processed"processorId":"dp_5678","processorVersionId":"dpv_91011","processorName":"Invoice Extractor","status":"PROCESSING","metadata":{"internal_id":"id_1234"},"reviewed":false,"edited":false,"edits":null,"type":"EXTRACT","config":{"fields":[{"id":"total_amount","name":"Total Amount","type":"currency"}]},"files":[{"name":"example.pdf"}],"url":"https://dashboard.extend.app/runs/dpr_1234"}}
curl--location--request POST 'https://api-prod.extend.app/v1/processor_runs'\--header'Content-Type: application/json'\--header'Authorization: Bearer <API_TOKEN>'\--data '{"processorId":"dp_1234","file":{"fileName":"example.pdf","fileUrl":"https://test.s3.amazonaws.com/example.pdf"},"version":"1.0","priority":50,"metadata":{"internal_id":"id_1234"}}'
{"success":true,"processorRun":{"object":"document_processor_run","id":"dpr_1234","output":null,// Will be null until the run is processed"processorId":"dp_5678","processorVersionId":"dpv_91011","processorName":"Invoice Extractor","status":"PROCESSING","metadata":{"internal_id":"id_1234"},"reviewed":false,"edited":false,"edits":null,"type":"EXTRACT","config":{"fields":[{"id":"total_amount","name":"Total Amount","type":"currency"}]},"files":[{"name":"example.pdf"}],"url":"https://dashboard.extend.app/runs/dpr_1234"}}