Data and AI governance in Amazon SageMaker — SageMaker Catalog for discovery, lineage, quality, and access control.
Overview
Amazon SageMaker Catalog (built on Amazon DataZone) provides:
| Capability | Purpose |
|---|
| Discovery | Find data and AI assets at scale |
| Governance | Control access with fine-grained permissions |
| Collaboration | Share assets across teams and projects |
| Quality | Monitor data quality and lineage |
| Compliance | Audit trail and responsible AI |
SageMaker Catalog
Unified catalog for all data and AI assets:
What’s in the Catalog?
| Asset Type | Examples |
|---|
| Structured data | Tables, databases, data products |
| Unstructured data | Documents, images, files |
| AI models | Trained models, prompts, agents |
| BI dashboards | QuickSight/Quick Sight reports |
| Applications | GenAI apps, notebooks |
Governance Features
Data and AI Catalog
| Feature | Details |
|---|
| Central catalog | Discover all assets in one place |
| Metadata management | Technical and business metadata |
| Asset registration | Automatic and manual registration |
| Search | Find assets by name, tag, or owner |
Business Glossary
| Feature | Details |
|---|
| Shared definitions | Standardize business terminology |
| Customizable metadata | Create metadata forms |
| Classification terms | Tag sensitive data consistently |
| Governance workflows | Enforce tagging policies |
Data Lineage
Track data flow across systems:
| Feature | Details |
|---|
| OpenLineage compatible | Industry-standard lineage format |
| Origin tracking | Where data comes from |
| Transformation history | How data changes |
| Consumption patterns | Who uses the data |
| Impact analysis | Understand downstream effects |
Data Quality Monitoring
| Feature | Details |
|---|
| Quality metrics | View metrics from AWS and third-party tools |
| Consumer trust | Show quality scores in catalog |
| API integration | Integrate external quality signals |
| Unified portal | Single view of data health |
Discovery & Search
Data Discovery
| Feature | Details |
|---|
| Business context | Enrich technical metadata with descriptions |
| Auto-enrichment | AI-generated metadata |
| Quick understanding | Help users find and trust data |
| Feature | Details |
|---|
| LLM-powered | AI generates business-friendly names |
| Descriptions | Auto-generate asset descriptions |
| Consistency | Improve clarity across assets |
Semantic Search
| Feature | Details |
|---|
| Natural language | Search using plain English |
| Intent understanding | Goes beyond keywords |
| Context-aware | Understands relationships |
| Relevant results | Returns what you mean, not just what you type |
Data Products
Package related assets into reusable products:
| Feature | Details |
|---|
| Bundled assets | Group related tables, models, dashboards |
| Shared metadata | Common business descriptions |
| Unified access | Single subscription request |
| Consumption tracking | Monitor product usage |
| Reduced overhead | Fewer individual permissions |
Access Control
Permission Model
| Level | Control |
|---|
| Domain | Top-level organizational boundary |
| Project | Team-scoped access and collaboration |
| Asset | Individual table/model permissions |
| Column | Fine-grained column-level access |
- Fine-grained permissions at no extra cost
- Row and column level security
- Tag-based access control
- Cross-account sharing
Responsible AI
AI Governance Features
| Feature | Purpose |
|---|
| Data classification | Label sensitive data |
| Toxicity detection | Identify harmful content |
| Guardrails | Apply responsible AI policies |
| Model cards | Document model behavior and limitations |
| Bias detection | Identify and mitigate bias |
ML Lineage
Track the full ML lifecycle:
- Training data sources
- Model versions
- Experiment parameters
- Deployment history
SageMaker Model Dashboard (Brief)
SageMaker Model Dashboard is a centralized repository and single interface for tracking model governance and performance.
| Coverage | Details |
|---|
| Model inventory | Consolidates models in your account, including outputs from SageMaker training jobs |
| Imported models | Supports models trained outside SageMaker and then hosted on SageMaker |
| Single stakeholder view | Gives IT admins, model risk managers, and business leaders one place to review model status |
| Cross-service signals | Aggregates data from multiple AWS services to indicate model health and performance |
| Deployment insights | Shows endpoint details and deployed model visibility |
| Batch insights | Includes batch transform job details for offline inference workloads |
| Monitoring insights | Surfaces monitoring job information for drift, quality, and ongoing model behavior checks |
Think of it as a control panel for model inventory, deployment visibility, and risk/performance oversight.
Collaboration
Projects
| Feature | Details |
|---|
| Team spaces | Isolated collaboration environments |
| Asset sharing | Publish and subscribe workflows |
| Centralized or decentralized | Flexible governance models |
| Self-service | Teams can request access independently |
Publishing & Subscribing
| Workflow | Description |
|---|
| Publish | Make assets available to others |
| Subscribe | Request access to shared assets |
| Approval | Data owners approve/deny requests |
| Audit | Track all sharing activities |
Pricing
| Component | Pricing |
|---|
| Catalog | Free usage tier available |
| Metadata storage | Per GB stored |
| API requests | Per request (with free tier) |
| Lineage | Included in standard pricing |
| Lake Formation | No extra cost for permissions |
TL;DR
- SageMaker Catalog = Central catalog for data + AI assets (built on DataZone)
- Discovery = Semantic search, LLM-powered metadata, business glossary
- Lineage = OpenLineage-compatible tracking of data flow
- Quality = Unified view of data health metrics
- Access control = Fine-grained to column level via Lake Formation
- Collaboration = Projects for teams, publish/subscribe for sharing
- Responsible AI = Data classification, guardrails, bias detection, model cards
- Model Dashboard = Centralized model repository with endpoint, batch transform, and monitoring visibility
Resources
SageMaker Catalog 🔴
Data and AI governance capabilities.
Amazon DataZone 🔴
Underlying data governance platform.
AWS Lake Formation 🔴
Fine-grained access control.