Amazon Textract — Document data extraction API that goes beyond OCR to understand forms, tables, and key-value pairs.
What is Amazon Textract?
Amazon Textract is a machine learning service that automatically extracts text, handwriting, and data from scanned documents. Unlike traditional OCR that only gives you text, Textract understands document structure — tables, forms, key-value pairs, and layout elements.
Key Insight: Textract goes beyond basic OCR by identifying document structure such as form fields, table cells, and line items.
Key Features
| Feature | Description |
|---|---|
| Text Extraction (OCR) | Extract printed text and handwriting from any document |
| Form Data Extraction | Identify key-value pairs (e.g., “Invoice Number: 12345”) |
| Table Extraction | Extract tables with cell structure, merged cells, headers |
| Document Layout | Understand titles, sections, footers, page numbers |
| Query-Based Extraction | Ask specific questions to get exact answers (e.g., “What is the total amount?“) |
| Signature Detection | Detect where signatures appear on documents |
| Identity Document Analysis | Extract structured data from IDs, passports, driver’s licenses |
| Invoice & Receipt Analysis | Pre-trained models for invoices and receipts |
Use Cases
Loan Processing
Extract key data from mortgage applications, tax forms, bank statements — reduce manual review from hours to minutes.
Invoice Automation
Extract line items, totals, vendor info from invoices for accounts payable automation.
Healthcare
Process insurance claims, patient intake forms, pre-authorizations — extract diagnosis codes, patient data into EHR systems.
Legal
Extract clauses, signatures, dates from contracts and legal briefs for document management.
Government
Process tax forms, applications, benefits paperwork — automate data entry.
How It Works
1. Upload Document: PDF, image, or from S3
2. Choose API:
DetectDocumentText— Basic text extractionAnalyzeDocument— Forms, tables, layoutAnalyzeExpense— Invoices and receiptsAnalyzeID— Identity documents
3. Receive Structured Data:
{
"Blocks": [
{"Type": "KEY_VALUE_SET", "Key": "Invoice Date", "Value": "2025-01-15"},
{"Type": "TABLE", "Cells": [...]},
{"Type": "LINE", "Text": "Total: $1,234.56"}
]
}4. Integrate: Use extracted data directly in your systems
Pricing & Free Tier
| Aspect | Details |
|---|---|
| Free Tier (first 3 months) | 1,000 pages/month |
| DetectDocumentText | $0.0015 per page |
| AnalyzeDocument | $0.015 per page (forms, tables) |
| AnalyzeExpense | $0.01 per document |
| AnalyzeID | $0.01 per document |
| Queries | $0.001 per query (add-on to AnalyzeDocument) |
Cost Tip: Use
DetectDocumentTextfor simple OCR (cheaper). UseAnalyzeDocumentonly when you need forms/tables.
⚠️ Pricing Disclaimer: AWS pricing is subject to change. Always verify current pricing at the official Amazon Textract pricing page.
When to Use Textract
| Use | Don’t Use |
|---|---|
| Documents with forms/tables | Simple photos (use Rekognition) |
| Invoices, receipts, IDs | Real-time video text extraction |
| Structured data extraction | Handwriting-only documents |
| Document processing pipelines | Very low latency requirements (<100ms) |
Textract vs Rekognition (for Text)
| Aspect | Textract | Rekognition |
|---|---|---|
| Input | Documents (PDF, images) | Images/video |
| Output | Structured data (forms, tables) | Text only |
| Tables | Full structure (cells, headers) | None |
| Forms | Key-value pairs | None |
| Best For | Documents, invoices, forms | Photos, scenes, video |
Important Notes
- Accuracy Updates (June 2025): Improved accuracy for
DetectDocumentTextandAnalyzeDocumentAPIs - Async Operations: Available for large multi-page documents
- AnalyzeExpense: Specialized API for invoices and receipts
- AnalyzeID: Optimized for government-issued IDs
TL;DR
- Textract = Document data extraction (OCR++)
- Features: Text, forms, tables, key-value pairs, layouts, signatures, IDs, invoices
- Free Tier: 1,000 pages/month for first 3 months
- Pricing: $0.0015/page (text) to $0.015/page (forms/tables)
- Best for: Documents, invoices, receipts, forms, IDs, contracts
- Not for: Simple photos (use Rekognition), real-time video
Resources
Amazon Textract Official product page and overview.
Textract Documentation Complete API reference and guides.
Textract Pricing Detailed pricing breakdown.