Skip to main content

Overview

The Parse endpoint converts any document into structured JSON. It extracts text, tables, figures, and metadata with precise bounding boxes for every element. Credits: 1 credit per page

Basic Usage

curl -X POST "https://platform.aifano.com/parse" \
  -H "Authorization: Bearer $AIFANO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input": "https://example.com/report.pdf"}'

Input Options

The input field accepts:
FormatExampleDescription
Public URLhttps://example.com/doc.pdfAny publicly accessible document URL
Presigned URLhttps://s3.amazonaws.com/...AWS S3 presigned URLs
Aifano referenceaifano://abc123.pdfFile uploaded via /upload
Job referencejobid://job_abc123Reuse parsed result from a previous job

Configuration Options

Enhance

Use vision language models to improve accuracy for specific block types:
{
  "input": "aifano://document.pdf",
  "enhance": {
    "agentic": [
      { "scope": "table" },
      { "scope": "figure", "prompt": "Describe the chart data points" }
    ],
    "summarize_figures": true
  }
}

Chunking & Retrieval

Configure how content is chunked for RAG pipelines:
{
  "input": "aifano://document.pdf",
  "retrieval": {
    "chunking": {
      "chunk_mode": "variable",
      "chunk_size": 1000
    },
    "embedding_optimized": true
  }
}
Chunk ModeDescription
disabledNo chunking (default)
variableVariable-size chunks based on content
sectionOne chunk per document section
pageOne chunk per page
blockOne chunk per block
page_sectionsSections within pages

Formatting

Control output format for tables and special content:
{
  "input": "aifano://document.pdf",
  "formatting": {
    "table_output_format": "html",
    "add_page_markers": true,
    "merge_tables": true,
    "include": ["hyperlinks", "signatures"]
  }
}

Settings

Fine-tune OCR and processing behavior:
{
  "input": "aifano://document.pdf",
  "settings": {
    "ocr_system": "standard",
    "extraction_mode": "hybrid",
    "page_range": { "start": 1, "end": 10 },
    "document_password": "secret123"
  }
}
SettingOptionsDescription
ocr_systemstandard, legacyStandard supports all languages; legacy for Germanic only
extraction_modehybrid, ocrHybrid combines OCR + embedded text for best accuracy
page_range{start, end}Process only specific pages
document_passwordstringPassword for encrypted PDFs

Response Structure

{
  "job_id": "job_abc123",
  "duration": 3.21,
  "usage": { "num_pages": 10, "credits": 10 },
  "result": {
    "type": "full",
    "chunks": [
      {
        "content": "# Section Title\n\nParagraph text...",
        "embed": "Section Title. Paragraph text...",
        "blocks": [
          {
            "type": "Title",
            "content": "Section Title",
            "bbox": { "left": 0.1, "top": 0.05, "width": 0.8, "height": 0.04, "page": 1 },
            "confidence": "high"
          }
        ]
      }
    ]
  }
}

Block Types

TypeDescription
TitleDocument or section title
Section HeaderSub-section heading
TextBody text paragraph
TableTabular data
FigureImage or chart
List ItemBulleted or numbered list item
HeaderPage header
FooterPage footer
Page NumberPage number
Key ValueKey-value pair
CommentAnnotation or comment
SignatureSignature block

Async Processing

For large documents, use the async variant:
curl -X POST "https://platform.aifano.com/parse_async" \
  -H "Authorization: Bearer $AIFANO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input": "aifano://large-document.pdf"}'
See Async Processing for details on polling and webhooks.