Document processor outputs follow standardized formats based on the processor type. Understanding these formats is essential when working with evaluation sets, webhooks, and API responses.

Extraction output type

Type definition

type ExtractionOutput = {
  [fieldName: string]: ExtractionFieldResult;
}

type ExtractionFieldResult = {
  id: string;
  type: "string" | "number" | "currency" | "boolean" | "date" | 
        "enum" | "array" | "object" | "signature";
  value: string | number | Currency | boolean | Date | 
         ExtractionValueArray | ExtractionValueObject | Signature | null;

  /* The following fields are included in outputs, but not required for creating an evaluation set item */
  
  /* Includes the field schema of nested fields (e.g. array fields, object fields, signature fields etc) */
  schema: ExtractionFieldSchemaValue[];

  /* Insights the reasoning and other insights outputs of the model (when reasoning is enabled) */
  insights: Insight[];

  /* References for the extracted field, always includes the page number for all fields, and might include bounding boxes and citations when available. */
  references: ExtractionFieldResultReference[];

  /* The enum options for enum fields, only set when type=enum */
  enum: EnumOption[];
}

type Currency = {
  amount: number;
  iso_4217_currency_code: string;
}

type Signature = {
  printed_name: string;
  signature_date: string;
  is_signed: boolean;
  title_or_role: string;
}

type EnumOption = {
  value: string; // The enum value (e.g. "ANNUAL", "MONTHLY", etc.)
  description: string; // The description of the enum value
}

type ExtractionValueArray = Array<ExtractionValueObject>;
type ExtractionValueObject = Record<string, any>;

References

type ExtractionFieldResultReference = {
  /* The field id. When nested for arrays, this is the index of the row number */
  id: string;
  /* The field name */
  fieldName: string;
  /* The page number (starting at 1) that this bounding box is from */
  page: number;
  /**
   * Array of bounding box references for this field.
   * There can be multiple is the extraction result was drawn from multiple distinct sources on the page.
   */
  boundingBoxes: BoundingBox[];
};

/* See the Bounding boxes guide for information on how to use/interpret this data */
type BoundingBox = {
  /* The left most position of the bounding box */
  left: number;
  /* The top most position of the bounding box */
  top: number;
  /* The right most position of the bounding box */
  right: number;
  /* The bottom most position of the bounding box */
  bottom: number;
};

Examples

Basic Field Types

{
  "invoice_number": {
    "id": "field_123",
    "type": "string",
    "value": "INV-2024-001"
  },
  "amount_due": {
    "id": "field_456",
    "type": "currency",
    "value": {
      "amount": 1250.50,
      "iso_4217_currency_code": "USD"
    }
  }
}

Nested Structures

{
  "line_items": {
    "id": "field_789",
    "type": "array",
    "value": [
      {
        "item": "Widget A",
        "quantity": 5,
        "price": {
          "amount": 10.00,
          "iso_4217_currency_code": "USD"
        }
      },
      {
        "item": "Widget B",
        "quantity": 2,
        "price": {
          "amount": 15.00,
          "iso_4217_currency_code": "USD"
        }
      }
    ]
  },
  "signature_block": {
    "id": "field_101",
    "type": "signature",
    "value": {
      "printed_name": "John Smith",
      "signature_date": "2024-03-15",
      "is_signed": true,
      "title_or_role": "Purchasing Manager"
    }
  }
}

Classification Output Type

Type Definition

type ClassificationOutput = {
  id: string;
  type: string;
}

Example

{
  "id": "classification_123",
  "type": "INVOICE"
}

Splitter Output Type

Type Definition

type SplitterOutput = {
  classificationId: string; // The id of the classification type (set in the processor config)
  type: string; // The type of the split document (set in the processor config), corresponds to the classificationId.
  startPage: number; // The start page of the split document
  endPage: number; // The end page of the split document

  // Fields included in outputs, but not required for creating an evaluation set item
  identifier?: string; // Identifier for the split document (e.g. invoice number)
  observation?: string; // Explanation of the results
}

Example

{
  "classificationId": "invoice",
  "type": "invoice",
  "startPage": 1,
  "endPage": 3
}

Shared Types

Certain types are shared across different processor outputs. These provide additional context and information about the processor’s decisions.

Type Definition

type Insight = {
  type: "reasoning";  // Currently only reasoning is supported
  content: string;    // The explanation or reasoning provided by the model
}

Example

{
  "insights": [
    {
      "type": "reasoning",
      "content": "This was classified as an invoice because it contains standard invoice elements including an invoice number, billing details, and itemized charges."
    }
  ]
}

Insights can appear in both Extraction and Classification outputs to provide transparency into the model’s decision-making process. They are particularly useful when debugging or validating processor results.