Processor output types
Understanding different output types for document processors.
Document processor outputs follow standardized formats based on the processor type. Understanding these formats is essential when working with evaluation sets, webhooks, and API responses.
Extraction output type (JSON Schema)
This section is relevant for processors using the JSON Schema config type. If you are using the Fields Array config type, please see the Extraction output type (Fields Array) documentation. If you aren’t sure which config type you are using, please see the Migrating to JSON Schema documentation.
The output structure for JSON Schema processors is composed of two properties: value
and metadata
.
The value
object is the actual data extracted from the document which conforms to the JSON Schema defined in the processor config.
The metadata
object holds details like confidence scores and citations for the extracted data. It uses keys that represent the path to the corresponding data within the value object. Crucially, the keys in the metadata
object mirror the structure of the value
object using a path-like notation (e.g., line_items[0].description
), allowing you to precisely pinpoint metadata for any specific field, including those nested within objects or arrays. For instance, if your data has value.line_items[0].name, the metadata specifically for that name field will be found using the key ‘line_items[0].name’ within the metadata object.
Type definition
Accessing Metadata
To access the metadata for a specific field, especially nested ones like items in an array, you use a path-like key string. For example, to get the metadata for the description
of the first item in a line_items
array, the key would be line_items[0].description
.
Here are examples in Python and TypeScript:
Examples
Extraction output type (Fields Array)
This section is relevant for the Fields Array config type. If you are using the JSON Schema config type, please see the Extraction output type (JSON Schema) documentation. If you aren’t sure which config type you are using, please see the Migrating to JSON Schema documentation.
For processors using the legacy Fields Array configuration, the extraction output is a flat dictionary where each key is the fieldName
(or sometimes the id
if names aren’t unique) you defined in the configuration, and the value is an ExtractionFieldResult
object containing the extracted data and associated details.
Type definition
Each ExtractionFieldResult
object contains the core id
, type
, and extracted value
. It can also include the following optional details:
schema
: The schema definition for nested fields (like objects or array items).insights
: Reasoning or explanations from the model (if enabled).references
: Location information, including the page number and specific Bounding Boxes relevant to the legacy Fields Array configuration (see Bounding Boxes Guide).enum
: The available options if the field type isenum
.
References
Examples
Classification Output Type
Type Definition
Example
Splitter Output Type
Type Definition
Example
Shared Types
Certain types are shared across different processor outputs. These provide additional context and information about the processor’s decisions.
Type Definition
Example
Insights can appear in both Extraction and Classification outputs to provide transparency into the model’s decision-making process. They are particularly useful when debugging or validating processor results.