Amazon Polly — Text-to-speech API with 100+ voices across 40+ languages, powered by neural networks.
What is Amazon Polly?
Amazon Polly is a text-to-speech (TTS) service that converts text into lifelike speech. Using deep learning neural TTS technology, Polly provides natural-sounding voices in dozens of languages for applications that need voice output.
Key Insight: Polly uses neural TTS to generate natural-sounding speech with more realistic emotion, intonation, and style.
Key Features
| Feature | Description |
|---|---|
| 100+ Voices | Male and female voices across 40+ languages |
| Neural TTS Engine | High-quality, human-like speech using billion-parameter transformers |
| Standard TTS Engine | Lower latency, cost-effective option |
| SSML Support | Speech Synthesis Markup Language for fine control (emphasis, pauses, pronunciation) |
| Lexicons | Custom pronunciation for company names, acronyms |
| Speech Marks | Timestamps for lip-sync, word highlighting |
| Streaming | Real-time audio streaming |
| Multiple Formats | MP3, OGG Vorbis, PCM (raw audio) |
| Sample Rates | 8kHz, 16kHz, 22.05kHz |
Use Cases
Accessibility
Provide voice output for visually impaired users, create audio versions of articles and content.
IVR & Contact Centers
Generate dynamic voice prompts for phone systems instead of recording studio voice talent.
Media & Entertainment
Create voiceovers for animations, games, videos directly from scripts.
Education
Convert textbooks, courses, and learning materials to audio format.
IoT & Devices
Add speech capabilities to smart devices, toys, appliances.
How It Works
1. Send Text: Plain text or SSML
2. Choose Voice: Select from 100+ voices (e.g., Joanna, Matthew, Lupe)
3. Select Engine: Neural (high quality) or Standard (faster, cheaper)
4. Receive Audio Stream:
response = polly.synthesize_speech(
Text="Hello, this is Amazon Polly.",
OutputFormat="mp3",
VoiceId="Joanna",
Engine="neural"
)5. Play or Store: Stream to speakers, save as MP3, cache for reuse
Pricing & Free Tier
| Engine | Price | Free Tier (first 12 months) |
|---|---|---|
| Standard | $4.00 per 1M chars | 5M chars/month |
| Neural | $16.00 per 1M chars | 1M chars/month |
| Long-Form | $100.00 per 1M chars | 500K chars/month |
| Generative | $30.00 per 1M chars | 100K chars/month |
Cost Tip: 1 million characters ≈ 10-15 hours of speech. Neural voices cost 4x more than Standard. Long-Form is for audiobooks/narration. Generative is the newest conversational voices.
⚠️ Pricing Disclaimer: AWS pricing is subject to change. Prices shown are based on information available as of January 2026. Always verify current pricing at the official Amazon Polly pricing page.
When to Use Polly
| Use | Don’t Use |
|---|---|
| Text-to-speech output | Speech-to-text (use Transcribe) |
| Dynamic voice generation | Recording studio voices |
| Accessibility, IVR, gaming | Real-time two-way conversation (use Lex) |
| Pre-generated audio | Ultra-low latency (<50ms) |
Neural vs Standard Engine
| Aspect | Neural | Standard |
|---|---|---|
| Quality | Human-like, expressive | Robotic but clear |
| Latency | Higher | Lower |
| Cost | $16.00 per 1M chars (4x more) | $4.00 per 1M chars |
| Best For | Final content, customer-facing | Internal, testing |
SSML Example
<speak>
Welcome to <emphasis level="strong">Amazon Polly</emphasis>.
<break time="1s"/>
This is an example of <prosody rate="slow">slow speech</prosody>
and <prosody rate="fast">fast speech</prosody>.
</speak>Important Notes
- Neural Engine: Uses billion-parameter transformer models for highest quality
- Alexa Uses Polly: Alexa voice technology is based on Polly (but exclusive voices)
- No Content Retention: AWS does not store your text submissions
- Cache Allowed: Store and reuse audio files at no extra cost
TL;DR
- Polly = Text-to-speech API (TTS)
- Features: 100+ voices, 40+ languages, neural TTS, SSML support, custom lexicons
- Free Tier: Standard (5M chars), Neural (1M chars), Long-Form (500K), Generative (100K)
- Pricing: Standard $4/M · Neural $16/M · Long-Form $100/M · Generative $30/M
- Best for: Accessibility, IVR, voiceovers, education, IoT devices
- Formats: MP3, OGG, PCM — streamable and cacheable
Resources
Amazon Polly Official product page and overview.
Polly Documentation Complete API reference and guides.
Polly Pricing Detailed pricing breakdown.