Migrating to JSON Schema
How to migrate processors from the legacy Fields Array config type to the JSON Schema config type
Current Config Structure
This section gives some background on the current config structure and the new JSON Schema config structure. If you’d like to jump to migrating to the new JSON Schema config structure, you can go straight to the Migrating to JSON Schema section.
If your organization started using Extend before April 2025, you likely have been using the legacy Fields Array config type.
This means that the config
object in processor has a fields
array that contains the fields for the processor. Here is an example config object of this type:
This schema has worked well, however since releasing it, the industry has standardized around JSON Schema as the way response schemas are defined. To make our processors easier to use for developers, we are moving to JSON Schema as the way schemas are defined for processors.
New JSON Schema Config Structure
A JSON Schema config object equivalent of the above example is:
You’ll notice that instead of the fields
array, we have a schema
object. This object is a JSON Schema object that describes the shape of the output you will receive from the processor.
The JSON Schema standard is extremely flexible. We’ve implemented a subset of the standard to support the needs of document extraction. Your schema must follow these rules:
- The root must be an
object
type - Allowed types are
string
,number
,integer
,boolean
,object
, andarray
- All primitive fields (
string
,number
,boolean
,integer
) must be nullable (use array type with “null” as an option e.g."type": ["string", "null"]
) - Maximum nesting level is 3 (each non-root object counts as 1 level)
- Property keys and names must only contain lowercase letters, numbers, and underscores
- Array items must be objects
- Enums must only contain strings and must contain a
null
option
While we support the JSON Schema structure, we do not support many of the additional features some of which include:
- Schema composition like
anyOf
,oneOf
,allOf
, schema definitions, or recursive schemas - Regular expressions and other type-specific validation keywords
- Conditional schema validation
- Constant values
Current Output Structure
The current output structure for Extraction processors is an object with the field names as keys and the values inside an object with the following properties:
id
: The unique identifier for the fieldtype
: The type of the fieldvalue
: The value of the fieldconfidence
: The confidence score of the fieldinsights
: The insights for the fieldreferences
: The references for the field
Here is an example of the output:
In this output, the metadata like confidence
, insights
, and references
are nested inside each field’s object right next to the value
. The benefit of this is it’s very easy to access the metadata for a specific field. The downside is that it doesn’t work very well for recursive fields like arrays and objects.
New JSON Schema Output Structure
The output structure for JSON Schema processors is composed of two properties: value
and metadata
.
The value
property is the actual data extracted from the document which conforms to the JSON Schema defined in the processor config.
The metadata
property contains additional information about the data extracted from the document like confidence scores, reasoning, and citations.
Below is an example of the output you will receive from a JSON Schema processor:
The benefit of this output structure is that it’s very easy to access the data for a specific field and it should be easy to ingest it as a typed object because it conforms to the JSON Schema defined in the processor config.
The Typescript types for the output are the following:
Migrating to JSON Schema
To migrate a processor from the legacy Fields Array config type to the JSON Schema config type, you will need to:
- Go to the processor in Studio that you’d like to migrate.
- Click the button with the three vertical dots in the top right corner to open the settings menu.
- Click “Migrate to JSON Schema”. This will open a modal where you can select the version and choose the name for the new processor. Click “Migrate to JSON Schema”. This will create a new processor with the fields array replaced with a JSON Schema config object.
Please share any feedback you have on the new JSON Schema config type and output structure with us on Slack!