Understanding different output types for document processors.
value
and metadata
.
The value
object is the actual data extracted from the document which conforms to the JSON Schema defined in the processor config.
The metadata
object holds details like confidence scores and citations for the extracted data. It uses keys that represent the path to the corresponding data within the value object. Crucially, the keys in the metadata
object mirror the structure of the value
object using a path-like notation (e.g., line_items[0].description
), allowing you to precisely pinpoint metadata for any specific field, including those nested within objects or arrays. For instance, if your data has value.line_items[0].name, the metadata specifically for that name field will be found using the key ‘line_items[0].name’ within the metadata object.
description
of the first item in a line_items
array, the key would be line_items[0].description
.
Here are examples in Python and TypeScript:
Basic Field Types
Nested Structures
fieldName
(or sometimes the id
if names aren’t unique) you defined in the configuration, and the value is an ExtractionFieldResult
object containing the extracted data and associated details.
ExtractionFieldResult
object contains the core id
, type
, and extracted value
. It can also include the following optional details:
schema
: The schema definition for nested fields (like objects or array items).insights
: Reasoning or explanations from the model (if enabled).references
: Location information, including the page number and specific Bounding Boxes relevant to the legacy Fields Array configuration (see Bounding Boxes Guide).enum
: The available options if the field type is enum
.Basic Field Types
Nested Structures with References and Insights