AWS SageMaker: Machine Learning Deployment Guide
AWS SageMaker is the most complete managed ML platform available today. It covers the entire machine learning lifecycle — from data preparation and model training to deployment, monitoring, and retraining — without requiring you to manage any underlying infrastructure. This guide walks through every major SageMaker capability you'll encounter in production: training jobs with spot instances, real-time and serverless endpoints, MLOps pipelines, Feature Store, and model monitoring. Code examples use boto3 and the SageMaker Python SDK throughout.
Table of Contents
- SageMaker Platform Overview
- Training Jobs: Built-in Algorithms vs Custom Containers
- Spot Training: 70% Cost Savings
- Hyperparameter Tuning Jobs
- Model Deployment: Endpoints and Inference Modes
- SageMaker Pipelines: MLOps CI/CD
- Feature Store: Online and Offline
- Model Monitoring: Drift, Quality, Bias
- SageMaker Canvas: No-Code ML
- Cost Optimization Strategies
- Frequently Asked Questions
SageMaker Platform Overview
SageMaker is not a single service — it is a family of integrated tools that cover every phase of ML development. Understanding what each component does prevents confusion and helps you pick the right tool for each task.
| Component | Purpose | When to Use |
|---|---|---|
| SageMaker Studio | Browser-based IDE for ML | Exploratory work, notebook-first development |
| SageMaker Notebooks | Managed Jupyter notebooks | Quick experiments, no Studio needed |
| Training Jobs | Managed distributed training on EC2 | Training any model at scale |
| Processing Jobs | Managed data preprocessing / evaluation | ETL, feature engineering, batch scoring |
| Real-time Endpoints | Always-on HTTPS inference endpoint | Low-latency online predictions |
| Serverless Inference | Pay-per-invocation inference | Sporadic or unpredictable traffic |
| Batch Transform | Offline bulk predictions on S3 data | Scoring large datasets overnight |
| Async Inference | Queued inference for large payloads | Large inputs (video, documents) |
| Pipelines | ML CI/CD orchestration | Repeatable model build + deploy workflows |
| Feature Store | Centralised feature repository | Sharing features across teams/models |
| Model Monitor | Production data/model quality checks | Detecting drift and degradation |
| Canvas | No-code AutoML UI | Business analysts, rapid prototyping |
Training Jobs: Built-in Algorithms vs Custom Containers
SageMaker training jobs run on fully managed compute. You supply your training script and data location; SageMaker provisions the instance, copies the data, runs your code, saves the model artifact to S3, and terminates the instance. You are only billed for the seconds the instance is running.
Built-in Algorithms
SageMaker ships ~20 built-in algorithms as Docker images maintained by AWS. Common ones include XGBoost, Linear Learner, K-Means, Random Cut Forest (anomaly detection), BlazingText (NLP), and Object Detection. Using built-ins means zero container maintenance — just point to the image URI and supply hyperparameters.
Custom Training Scripts with Framework Containers
For PyTorch, TensorFlow, Scikit-learn, or HuggingFace, SageMaker provides framework-specific managed containers. You write your training script as a standard Python file and pass it to the estimator — SageMaker handles the rest.
import boto3
import sagemaker
from sagemaker.pytorch import PyTorch
# Initialise session
sess = sagemaker.Session()
role = sagemaker.get_execution_role()
bucket = sess.default_bucket()
# Define PyTorch estimator
estimator = PyTorch(
entry_point="train.py", # Your training script
source_dir="./src", # Directory containing train.py + requirements.txt
role=role,
instance_type="ml.p3.2xlarge", # GPU instance
instance_count=1,
framework_version="2.1",
py_version="py310",
hyperparameters={
"epochs": 30,
"batch-size": 64,
"learning-rate": 0.001,
},
output_path=f"s3://{bucket}/models/",
environment={"WANDB_DISABLED": "true"},
)
# Launch training job — blocks until complete
estimator.fit({
"train": f"s3://{bucket}/data/train/",
"val": f"s3://{bucket}/data/val/",
})
# Model artifact location
print(estimator.model_data)
# s3://my-bucket/models/pytorch-training-2026-06-06-12-00-00-000/output/model.tar.gz
Inside train.py, SageMaker injects the hyperparameters as CLI arguments and sets environment variables like SM_CHANNEL_TRAIN and SM_MODEL_DIR so your script knows where to read data and write the model artifact.
image_uri parameter of the generic Estimator class. The SageMaker Training Toolkit is open source — you can add it to any base image to get the environment variable injection for free.Spot Training: 70% Cost Savings
SageMaker Managed Spot Training runs your training job on Spot instances. AWS can interrupt the job, but SageMaker automatically resumes from the last checkpoint when capacity becomes available. For a 10-hour training job, this can cut the compute cost from $80 to $24.
from sagemaker.pytorch import PyTorch
estimator = PyTorch(
entry_point="train.py",
source_dir="./src",
role=role,
instance_type="ml.p3.2xlarge",
instance_count=1,
framework_version="2.1",
py_version="py310",
# --- Spot training settings ---
use_spot_instances=True,
max_run=7200, # Maximum total training time: 2 hours
max_wait=10800, # Max time including spot waits: 3 hours
checkpoint_s3_uri=f"s3://{bucket}/checkpoints/my-job/",
checkpoint_local_path="/opt/ml/checkpoints",
hyperparameters={"epochs": 50},
)
estimator.fit({"train": f"s3://{bucket}/data/train/"})
# Check savings in the training job metadata
job_name = estimator.latest_training_job.name
sm = boto3.client("sagemaker")
desc = sm.describe_training_job(TrainingJobName=job_name)
billed = desc["BillableTimeInSeconds"]
total = desc["TrainingTimeInSeconds"]
savings = round((1 - billed / total) * 100, 1)
print(f"Spot savings: {savings}% (billed {billed}s of {total}s)")
checkpoint_local_path and resume from it on startup if a checkpoint exists. SageMaker syncs this path with checkpoint_s3_uri automatically. Without checkpointing, an interruption restarts training from epoch 0.Hyperparameter Tuning Jobs
SageMaker Automatic Model Tuning (AMT) runs multiple training jobs in parallel, using Bayesian optimisation to find the best hyperparameter combination. You define the metric to optimise and the search ranges; SageMaker handles the rest.
from sagemaker.tuner import (
HyperparameterTuner, ContinuousParameter, IntegerParameter, CategoricalParameter
)
tuner = HyperparameterTuner(
estimator=estimator,
objective_metric_name="validation:accuracy",
objective_type="Maximize",
hyperparameter_ranges={
"learning-rate": ContinuousParameter(1e-4, 1e-1, scaling_type="Logarithmic"),
"batch-size": CategoricalParameter([32, 64, 128]),
"dropout": ContinuousParameter(0.1, 0.5),
"hidden-units": IntegerParameter(64, 512),
},
max_jobs=20,
max_parallel_jobs=4,
strategy="Bayesian", # or "Random", "Grid", "Hyperband"
)
tuner.fit({"train": f"s3://{bucket}/data/train/"})
tuner.wait()
# Get the best training job
best = tuner.best_training_job()
print(f"Best job: {best}")
Model Deployment: Endpoints and Inference Modes
SageMaker supports four inference patterns. Choosing the right one has a major impact on cost and latency.
| Mode | Latency | Cost Model | Best For |
|---|---|---|---|
| Real-time endpoint | <100ms | Per instance-hour (always on) | Online serving, <6MB payload |
| Serverless inference | Cold start ~1s | Per invocation + GB-seconds | Sporadic traffic, dev/test |
| Batch transform | Minutes to hours | Per instance-hour (job duration) | Offline bulk scoring, large datasets |
| Async inference | Seconds to minutes | Per invocation, idle scale-to-zero | Large payloads (>6MB), long inference |
Real-time Endpoint Deployment
import boto3
import json
# Deploy from a completed training job
predictor = estimator.deploy(
initial_instance_count=2,
instance_type="ml.m5.xlarge",
endpoint_name="my-pytorch-endpoint",
serializer=sagemaker.serializers.JSONSerializer(),
deserializer=sagemaker.deserializers.JSONDeserializer(),
)
# Invoke the endpoint
response = predictor.predict({"inputs": [[1.2, 3.4, 5.6, 7.8]]})
print(response) # {"predictions": [0.97]}
# --- Or invoke via boto3 directly ---
sm_rt = boto3.client("sagemaker-runtime")
response = sm_rt.invoke_endpoint(
EndpointName="my-pytorch-endpoint",
ContentType="application/json",
Body=json.dumps({"inputs": [[1.2, 3.4, 5.6, 7.8]]}),
)
result = json.loads(response["Body"].read())
print(result)
# Auto scaling policy for the endpoint
aas = boto3.client("application-autoscaling")
aas.register_scalable_target(
ServiceNamespace="sagemaker",
ResourceId="endpoint/my-pytorch-endpoint/variant/AllTraffic",
ScalableDimension="sagemaker:variant:DesiredInstanceCount",
MinCapacity=1,
MaxCapacity=10,
)
aas.put_scaling_policy(
PolicyName="sagemaker-invocations-scaling",
ServiceNamespace="sagemaker",
ResourceId="endpoint/my-pytorch-endpoint/variant/AllTraffic",
ScalableDimension="sagemaker:variant:DesiredInstanceCount",
PolicyType="TargetTrackingScaling",
TargetTrackingScalingPolicyConfiguration={
"TargetValue": 1000.0, # Target: 1000 invocations per instance per minute
"PredefinedMetricSpecification": {
"PredefinedMetricType": "SageMakerVariantInvocationsPerInstance"
},
"ScaleInCooldown": 300,
"ScaleOutCooldown": 60,
},
)
Serverless Inference
Serverless inference is ideal for models with sporadic or unpredictable traffic. There are no idle instances — you pay only when inference requests arrive. Cold starts add roughly 1–3 seconds on first invocation.
from sagemaker.serverless import ServerlessInferenceConfig
serverless_config = ServerlessInferenceConfig(
memory_size_in_mb=2048, # 1024, 2048, 3072, 4096, 5120, or 6144
max_concurrency=10, # Max concurrent invocations
)
predictor = estimator.deploy(
serverless_inference_config=serverless_config,
endpoint_name="my-serverless-endpoint",
)
SageMaker Pipelines: MLOps CI/CD
SageMaker Pipelines is a DAG-based orchestration engine for ML workflows. A pipeline defines steps (Processing, Training, Evaluation, RegisterModel, CreateModel, Deploy) and the dependencies between them. Pipelines are versioned, repeatable, and can be triggered manually, on a schedule, or via EventBridge when new data arrives in S3.
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.steps import TrainingStep, ProcessingStep
from sagemaker.workflow.step_collections import RegisterModel
from sagemaker.workflow.parameters import ParameterString, ParameterFloat
from sagemaker.workflow.conditions import ConditionGreaterThanOrEqualTo
from sagemaker.workflow.condition_step import ConditionStep
from sagemaker.workflow.functions import JsonGet
from sagemaker.processing import ScriptProcessor, ProcessingInput, ProcessingOutput
from sagemaker.inputs import TrainingInput
# Pipeline parameters — can be overridden at runtime
input_data = ParameterString(name="InputData", default_value=f"s3://{bucket}/data/")
accuracy_gate = ParameterFloat(name="AccuracyGate", default_value=0.85)
# Step 1: Preprocessing
processor = ScriptProcessor(
image_uri="683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:1.2-1",
command=["python3"],
instance_type="ml.m5.xlarge",
instance_count=1,
role=role,
)
preprocess_step = ProcessingStep(
name="Preprocess",
processor=processor,
inputs=[ProcessingInput(source=input_data, destination="/opt/ml/processing/input")],
outputs=[
ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
ProcessingOutput(output_name="val", source="/opt/ml/processing/val"),
],
code="preprocess.py",
)
# Step 2: Training
train_step = TrainingStep(
name="TrainModel",
estimator=estimator,
inputs={
"train": TrainingInput(preprocess_step.properties.ProcessingOutputConfig
.Outputs["train"].S3Output.S3Uri),
"val": TrainingInput(preprocess_step.properties.ProcessingOutputConfig
.Outputs["val"].S3Output.S3Uri),
},
)
# Step 3: Evaluate
eval_step = ProcessingStep(
name="EvaluateModel",
processor=processor,
inputs=[
ProcessingInput(source=train_step.properties.ModelArtifacts.S3ModelArtifacts,
destination="/opt/ml/processing/model"),
],
outputs=[ProcessingOutput(output_name="evaluation", source="/opt/ml/processing/evaluation")],
code="evaluate.py",
property_files=[
sagemaker.workflow.properties.PropertyFile(
name="EvaluationReport", output_name="evaluation", path="evaluation.json"
)
],
)
# Step 4: Conditional register — only if accuracy >= gate
accuracy_condition = ConditionGreaterThanOrEqualTo(
left=JsonGet(step_name=eval_step.name, property_file="EvaluationReport", json_path="accuracy"),
right=accuracy_gate,
)
register_step = RegisterModel(
name="RegisterModel",
estimator=estimator,
model_data=train_step.properties.ModelArtifacts.S3ModelArtifacts,
content_types=["application/json"],
response_types=["application/json"],
model_package_group_name="my-model-group",
approval_status="PendingManualApproval",
)
condition_step = ConditionStep(
name="CheckAccuracy",
conditions=[accuracy_condition],
if_steps=[register_step],
else_steps=[],
)
# Assemble and upsert the pipeline
pipeline = Pipeline(
name="my-ml-pipeline",
parameters=[input_data, accuracy_gate],
steps=[preprocess_step, train_step, eval_step, condition_step],
sagemaker_session=sess,
)
pipeline.upsert(role_arn=role)
# Execute with custom parameters
execution = pipeline.start(
parameters={"InputData": f"s3://{bucket}/data/2026-06-06/", "AccuracyGate": 0.88}
)
execution.wait()
print(execution.list_steps())
RegisterModel step adds the model to a Model Package Group with PendingManualApproval status. A data scientist reviews the evaluation report in SageMaker Studio and approves or rejects it. An EventBridge rule on the Approved status change can then trigger a Lambda function to automatically deploy the model to the production endpoint.Feature Store: Online and Offline
SageMaker Feature Store is a centralised repository for ML features. It solves two major problems: (1) training-serving skew — features computed differently at training time vs inference time, and (2) feature duplication — multiple teams recomputing the same features independently. Feature Store has two backends that can be used independently or together:
- Online Store — DynamoDB-backed, single-digit millisecond reads, stores the latest value per entity. Used during real-time inference to retrieve the current feature values.
- Offline Store — S3-backed Parquet, stores all historical values with timestamps. Used during training to generate point-in-time correct feature datasets.
import boto3
import pandas as pd
import time
from sagemaker.feature_store.feature_group import FeatureGroup
from sagemaker.feature_store.feature_definition import (
FeatureDefinition, FeatureTypeEnum
)
sm_client = boto3.client("sagemaker")
fs_runtime = boto3.client("sagemaker-featurestore-runtime")
# --- Create a Feature Group ---
feature_group = FeatureGroup(
name="customer-purchase-features",
sagemaker_session=sess,
)
feature_group.load_feature_definitions(data_frame=pd.DataFrame({
"customer_id": pd.Series(dtype="str"),
"total_spend_30d": pd.Series(dtype="float64"),
"num_orders_30d": pd.Series(dtype="int64"),
"avg_order_value": pd.Series(dtype="float64"),
"days_since_last_order": pd.Series(dtype="int64"),
"event_time": pd.Series(dtype="str"), # ISO-8601 timestamp — required
}))
feature_group.create(
s3_uri=f"s3://{bucket}/feature-store/",
record_identifier_name="customer_id",
event_time_feature_name="event_time",
role_arn=role,
enable_online_store=True,
enable_offline_store=True,
)
# Wait for the Feature Group to become active
while feature_group.describe()["FeatureGroupStatus"] != "Created":
time.sleep(5)
print("Feature Group is active")
# --- Ingest features ---
records = pd.DataFrame([
{"customer_id": "C001", "total_spend_30d": 420.50, "num_orders_30d": 5,
"avg_order_value": 84.10, "days_since_last_order": 3,
"event_time": "2026-06-06T10:00:00Z"},
{"customer_id": "C002", "total_spend_30d": 112.00, "num_orders_30d": 2,
"avg_order_value": 56.00, "days_since_last_order": 14,
"event_time": "2026-06-06T10:00:00Z"},
])
feature_group.ingest(data_frame=records, max_workers=4, wait=True)
# --- Online retrieval at inference time ---
response = fs_runtime.get_record(
FeatureGroupName="customer-purchase-features",
RecordIdentifierValueAsString="C001",
FeatureNames=["total_spend_30d", "num_orders_30d", "avg_order_value"],
)
features = {f["FeatureName"]: f["ValueAsString"] for f in response["Record"]}
print(features)
# {'total_spend_30d': '420.5', 'num_orders_30d': '5', 'avg_order_value': '84.1'}
# --- Offline: generate training dataset with Athena ---
feature_group.athena_query().run(
query_string="""
SELECT customer_id, total_spend_30d, num_orders_30d, avg_order_value,
days_since_last_order
FROM "customer-purchase-features"
WHERE event_time BETWEEN '2026-01-01' AND '2026-06-01'
""",
output_location=f"s3://{bucket}/athena-results/",
)
Model Monitoring: Data Drift, Model Quality, Bias Detection
Model performance degrades over time as the real-world data distribution shifts away from the training distribution. SageMaker Model Monitor continuously checks four dimensions: data quality (feature distribution drift), model quality (prediction accuracy vs ground truth), bias drift (demographic parity, equal opportunity), and explainability drift (SHAP value shifts).
The workflow is: (1) capture endpoint traffic to S3 using Data Capture, (2) create a baseline from your training dataset, (3) schedule a monitoring job that runs hourly or daily and compares live traffic against the baseline, (4) publish violation reports to CloudWatch Metrics and trigger alerts.
from sagemaker.model_monitor import DefaultModelMonitor, DataCaptureConfig
from sagemaker.model_monitor.dataset_format import DatasetFormat
from sagemaker.model_monitor import CronExpressionGenerator
# Step 1: Enable data capture on the endpoint
data_capture_config = DataCaptureConfig(
enable_capture=True,
sampling_percentage=20, # Capture 20% of traffic
destination_s3_uri=f"s3://{bucket}/capture/",
capture_options=["Input", "Output"],
csv_content_types=["text/csv"],
json_content_types=["application/json"],
)
# Pass data_capture_config=data_capture_config when calling estimator.deploy()
# Step 2: Create a baseline from the training dataset
monitor = DefaultModelMonitor(
role=role,
instance_count=1,
instance_type="ml.m5.xlarge",
volume_size_in_gb=20,
max_runtime_in_seconds=3600,
)
monitor.suggest_baseline(
baseline_dataset=f"s3://{bucket}/data/train/baseline.csv",
dataset_format=DatasetFormat.csv(header=True),
output_s3_uri=f"s3://{bucket}/monitor/baseline/",
wait=True,
)
# Step 3: Schedule monitoring
monitor.create_monitoring_schedule(
monitor_schedule_name="my-model-monitor",
endpoint_input="my-pytorch-endpoint",
output_s3_uri=f"s3://{bucket}/monitor/reports/",
statistics=monitor.baseline_statistics(),
constraints=monitor.suggested_constraints(),
schedule_cron_expression=CronExpressionGenerator.hourly(),
enable_cloudwatch_metrics=True,
)
print("Monitoring schedule created — violations will appear in CloudWatch")
When a violation is detected, SageMaker writes a constraint_violations.json report to S3 and emits a CloudWatch metric. Wire a CloudWatch Alarm to this metric and route it to SNS → Lambda to trigger an automatic retraining pipeline execution.
SageMaker Canvas: No-Code ML
SageMaker Canvas allows business analysts and domain experts to build, train, and deploy ML models through a point-and-click interface — no Python required. Users import a CSV from S3 or upload it directly, select the target column, and Canvas automatically chooses the best algorithm (classification, regression, or time series forecasting), trains multiple models, and presents an accuracy leaderboard. The winning model can be shared with a data scientist for review or deployed to a real-time endpoint with a single click.
Canvas is priced per session hour (the time the UI is open) plus model training time. For citizen data scientists doing periodic analysis, the total monthly cost is usually well below what a data engineer would charge to build the equivalent pipeline. Canvas models are fully compatible with the SageMaker Model Registry — a Canvas model can be registered, approved, and deployed via the same Pipeline workflow as any programmatically trained model.
Cost Optimization Strategies
SageMaker costs accumulate in three places: training compute, inference compute, and Studio notebook instances. Each requires a different optimisation strategy.
| Strategy | Saving | Applies To |
|---|---|---|
| Managed Spot Training | Up to 70% | Training jobs |
| Serverless Inference | 100% during idle time | Sporadic inference traffic |
| Multi-Model Endpoints (MME) | 50–90% | Many models with low traffic each |
| Auto Scaling endpoints to 0 | 100% during off-hours | Dev/staging endpoints |
| Graviton3 instances (ml.m7g) | ~20% | CPU inference endpoints |
| Inf2 instances (AWS Inferentia) | Up to 50% | Deep learning inference at scale |
| Right-size instances with CloudWatch | 10–40% | Any endpoint or training job |
| Lifecycle configs to auto-stop notebooks | Eliminates waste | Studio / classic notebooks |
Multi-Model Endpoints
If you have 500 customer-specific models that each receive only a few requests per day, creating 500 individual endpoints would cost tens of thousands of dollars per month. A Multi-Model Endpoint (MME) hosts all models on a single fleet. SageMaker lazy-loads models into memory on first request and evicts least-recently-used models when memory is full. You pay for one endpoint regardless of how many models it hosts.
from sagemaker.multidatamodel import MultiDataModel
# All model artifacts live in the same S3 prefix
model_data_prefix = f"s3://{bucket}/multi-model-artifacts/"
mme = MultiDataModel(
name="customer-churn-mme",
model_data_prefix=model_data_prefix,
model=estimator.create_model(), # Base container
sagemaker_session=sess,
)
predictor = mme.deploy(
initial_instance_count=2,
instance_type="ml.m5.2xlarge",
endpoint_name="customer-churn-mme-endpoint",
)
# Copy a model into the MME (can be done without restarting the endpoint)
mme.add_model(model_data_source=f"s3://{bucket}/models/customer-A/model.tar.gz",
model_data_path="customer-A.tar.gz")
# Invoke a specific model by name
sm_rt = boto3.client("sagemaker-runtime")
response = sm_rt.invoke_endpoint(
EndpointName="customer-churn-mme-endpoint",
ContentType="application/json",
TargetModel="customer-A.tar.gz", # SageMaker routes to this specific model
Body=json.dumps({"inputs": [[1.2, 3.4, 5.6]]}),
)
print(json.loads(response["Body"].read()))
aws sagemaker stop-notebook-instance if the instance has been idle for more than 60 minutes. AWS provides a reference implementation in the sagemaker-studio-lifecycle-config-examples GitHub repo.Frequently Asked Questions
What is the difference between SageMaker Training Jobs and SageMaker Processing Jobs?
Training Jobs are designed for fitting ML models — they integrate with the model artifact pipeline, Model Registry, and Estimator classes. Processing Jobs are for arbitrary Python scripts that don't produce a model: data preprocessing, feature engineering, post-training evaluation, or batch inference scoring. Processing Jobs use ScriptProcessor, SKLearnProcessor, PySparkProcessor, etc. Both run on managed compute that is billed per-second and terminated when the job completes.
When should I use Serverless Inference vs a real-time endpoint?
Use Serverless Inference when traffic is sporadic — for example, an internal tool used only during business hours, or a model that gets a few hundred requests per day. At that scale, the always-on cost of a real-time endpoint (minimum ~$50/month for a ml.t3.medium) exceeds the per-invocation cost of serverless. For models that need consistent sub-100ms latency or receive sustained traffic (thousands of requests per minute), real-time endpoints with auto scaling are more cost-effective and predictable.
How do I do A/B testing with SageMaker endpoints?
SageMaker supports production variants — multiple model versions deployed behind a single endpoint with configurable traffic weights. Set up two variants with InitialVariantWeight of 90/10 to send 90% of traffic to the champion model and 10% to the challenger. CloudWatch metrics are emitted per-variant, so you can compare latency and invocation counts. Once the challenger wins, update the weights to 0/100 and delete the old variant — all without restarting the endpoint or changing client code.
Can SageMaker Pipelines trigger automatically when new data arrives?
Yes. Create an EventBridge rule that fires when an s3:ObjectCreated event matches your data prefix, then route it to a Lambda function that calls pipeline.start(). Alternatively, use SageMaker Pipelines' built-in EventBridge trigger which lets you define the trigger condition directly on the pipeline without writing Lambda code. For time-based execution (e.g., nightly retraining), use a scheduled EventBridge rule with a cron expression.
What is the SageMaker Model Registry used for?
The Model Registry is a versioned catalogue of trained models. Each model version stores the model artifact URI, container image, inference specification, metrics from evaluation, and approval status. In a CI/CD ML pipeline, the registry acts as the handoff point between the data science team (who train and register models) and the platform team (who deploy approved models). When a model version is approved, a downstream system can automatically deploy it — decoupling training from deployment and providing a full audit trail of which model is running in production at any point in time.