Amazon Textract — Document data extraction API that goes beyond OCR to understand forms, tables, and key-value pairs.


What is Amazon Textract?

Amazon Textract is a machine learning service that automatically extracts text, handwriting, and data from scanned documents. Unlike traditional OCR that only gives you text, Textract understands document structure — tables, forms, key-value pairs, and layout elements.

Key Insight: Textract goes beyond basic OCR by identifying document structure such as form fields, table cells, and line items.


Key Features

FeatureDescription
Text Extraction (OCR)Extract printed text and handwriting from any document
Form Data ExtractionIdentify key-value pairs (e.g., “Invoice Number: 12345”)
Table ExtractionExtract tables with cell structure, merged cells, headers
Document LayoutUnderstand titles, sections, footers, page numbers
Query-Based ExtractionAsk specific questions to get exact answers (e.g., “What is the total amount?“)
Signature DetectionDetect where signatures appear on documents
Identity Document AnalysisExtract structured data from IDs, passports, driver’s licenses
Invoice & Receipt AnalysisPre-trained models for invoices and receipts

Use Cases

Loan Processing

Extract key data from mortgage applications, tax forms, bank statements — reduce manual review from hours to minutes.

Invoice Automation

Extract line items, totals, vendor info from invoices for accounts payable automation.

Healthcare

Process insurance claims, patient intake forms, pre-authorizations — extract diagnosis codes, patient data into EHR systems.

Extract clauses, signatures, dates from contracts and legal briefs for document management.

Government

Process tax forms, applications, benefits paperwork — automate data entry.


How It Works

1. Upload Document: PDF, image, or from S3

2. Choose API:

  • DetectDocumentText — Basic text extraction
  • AnalyzeDocument — Forms, tables, layout
  • AnalyzeExpense — Invoices and receipts
  • AnalyzeID — Identity documents

3. Receive Structured Data:

{
  "Blocks": [
    {"Type": "KEY_VALUE_SET", "Key": "Invoice Date", "Value": "2025-01-15"},
    {"Type": "TABLE", "Cells": [...]},
    {"Type": "LINE", "Text": "Total: $1,234.56"}
  ]
}

4. Integrate: Use extracted data directly in your systems


Pricing & Free Tier

AspectDetails
Free Tier (first 3 months)1,000 pages/month
DetectDocumentText$0.0015 per page
AnalyzeDocument$0.015 per page (forms, tables)
AnalyzeExpense$0.01 per document
AnalyzeID$0.01 per document
Queries$0.001 per query (add-on to AnalyzeDocument)

Cost Tip: Use DetectDocumentText for simple OCR (cheaper). Use AnalyzeDocument only when you need forms/tables.

⚠️ Pricing Disclaimer: AWS pricing is subject to change. Always verify current pricing at the official Amazon Textract pricing page.


When to Use Textract

UseDon’t Use
Documents with forms/tablesSimple photos (use Rekognition)
Invoices, receipts, IDsReal-time video text extraction
Structured data extractionHandwriting-only documents
Document processing pipelinesVery low latency requirements (<100ms)

Textract vs Rekognition (for Text)

AspectTextractRekognition
InputDocuments (PDF, images)Images/video
OutputStructured data (forms, tables)Text only
TablesFull structure (cells, headers)None
FormsKey-value pairsNone
Best ForDocuments, invoices, formsPhotos, scenes, video

Important Notes

  • Accuracy Updates (June 2025): Improved accuracy for DetectDocumentText and AnalyzeDocument APIs
  • Async Operations: Available for large multi-page documents
  • AnalyzeExpense: Specialized API for invoices and receipts
  • AnalyzeID: Optimized for government-issued IDs

TL;DR

  • Textract = Document data extraction (OCR++)
  • Features: Text, forms, tables, key-value pairs, layouts, signatures, IDs, invoices
  • Free Tier: 1,000 pages/month for first 3 months
  • Pricing: $0.0015/page (text) to $0.015/page (forms/tables)
  • Best for: Documents, invoices, receipts, forms, IDs, contracts
  • Not for: Simple photos (use Rekognition), real-time video

Resources

Amazon Textract Official product page and overview.

Textract Documentation Complete API reference and guides.

Textract Pricing Detailed pricing breakdown.