AWS Rekognition: Image and Video Analysis with Deep Learning
AWS Rekognition is Amazon's fully managed computer vision service that makes deep learning-powered image and video analysis accessible without any ML expertise. Behind the scenes, Rekognition runs large-scale convolutional neural networks trained on hundreds of millions of images across billions of parameters — but you interact with it through simple API calls that return JSON. The service handles model hosting, scaling, GPU infrastructure, and updates automatically. Whether you need to detect objects in product photos, build a face recognition system for physical access control, moderate user-generated content at millions of images per hour, or analyse hours of surveillance video for safety equipment compliance, Rekognition provides purpose-built APIs for each scenario. This guide covers every major capability with complete Python boto3 code, real JSON response examples, CLI commands, and production architecture patterns.
Table of Contents
- Rekognition Capabilities Overview
- Image Analysis: Labels, Faces, and Text Detection
- Face Collections: Building a Face Recognition System
- Content Moderation with Human Review (A2I)
- Custom Labels: Training on Your Own Dataset
- Async Video Analysis: Label and Face Detection
- Real-Time Video Streams with Kinesis Video Streams
- PPE Detection for Workplace Safety
- Lambda + S3 Trigger: Auto-Moderate Uploads
- Pricing, Limits, and Cost Optimisation
- Frequently Asked Questions
Rekognition Capabilities Overview
Rekognition exposes eight distinct detection capabilities. Each is a separate API call returning a structured JSON response. They can be called independently or composed into pipelines — for example, run detect_labels first to check whether an image contains a person, then run detect_faces only if it does, halving your API costs for images of inanimate objects.
| API / Feature | What It Returns | Common Use Case |
|---|---|---|
detect_labels | Objects, scenes, activities with confidence scores and bounding boxes | Auto-tagging product images, content classification |
detect_faces | Face bounding boxes, landmarks, emotions, age range, attributes | Demographics analytics, photo organisation |
search_faces_by_image | Matching faces from a pre-built collection | Access control, duplicate account detection |
recognize_celebrities | Named public figures with confidence and IMDB URL | Media tagging, social listening |
detect_text | Text lines and words, bounding boxes, confidence | OCR on signs, forms, licence plates |
detect_moderation_labels | Explicit/suggestive content categories with hierarchy | User-generated content platforms |
detect_protective_equipment | Hard hat, mask, glove coverage per body part per person | Warehouse/construction site safety audits |
| Custom Labels | Domain-specific labels trained on your images | Manufacturing defect detection, medical imaging |
{"S3Object": {"Bucket": "...", "Name": "..."}}) or a raw bytes payload ({"Bytes": b"..."}) up to 5 MB. For production pipelines, always use S3 references — they avoid base64 encoding overhead and support images up to 15 MB. The Bytes mode is convenient for real-time webcam frames or mobile uploads where the image has not yet been persisted.All image APIs are synchronous — you get a response within milliseconds. Video analysis uses an asynchronous pattern: you start a job, receive a JobId, and poll (or get notified via SNS) when results are ready. Real-time video analysis uses a streaming processor connected to a Kinesis Video Stream. The rest of this guide walks through each pattern in detail.
Image Analysis: Labels, Faces, and Text Detection
The three most widely used image APIs are detect_labels, detect_faces, and detect_text. Together they cover the majority of practical computer vision use cases.
detect_labels — Object and Scene Detection
detect_labels identifies thousands of objects, scenes, activities, and concepts. Each label comes with a confidence score (0–100), a parent hierarchy (so "Car" is a child of "Vehicle"), and optional bounding boxes for localised objects. The MaxLabels and MinConfidence parameters give you control over cost and precision.
import boto3
import json
rekognition = boto3.client("rekognition", region_name="us-east-1")
# --- Detect labels from an S3 image ---
response = rekognition.detect_labels(
Image={
"S3Object": {
"Bucket": "my-content-bucket",
"Name": "uploads/user-photo.jpg",
}
},
MaxLabels=20, # Return at most 20 labels
MinConfidence=75, # Discard labels below 75% confidence
Features=["GENERAL_LABELS", "IMAGE_PROPERTIES"], # Also get dominant colours
)
# Parse labels
for label in response["Labels"]:
name = label["Name"]
confidence = label["Confidence"]
parents = [p["Name"] for p in label.get("Parents", [])]
boxes = label.get("Instances", [])
print(f"{name} ({confidence:.1f}%) — parents: {parents}")
for box in boxes:
b = box["BoundingBox"]
print(f" BoundingBox: left={b['Left']:.3f} top={b['Top']:.3f} "
f"width={b['Width']:.3f} height={b['Height']:.3f} "
f"confidence={box['Confidence']:.1f}%")
# Sample output:
# Person (99.2%) — parents: []
# Car (97.8%) — parents: ['Vehicle', 'Transportation']
# BoundingBox: left=0.123 top=0.341 width=0.289 height=0.201 confidence=97.8%
# Road (95.4%) — parents: ['Infrastructure']
# Image properties (dominant colours)
if "ImageProperties" in response:
for colour in response["ImageProperties"]["DominantColors"][:3]:
print(f"Dominant colour: R={colour['Red']} G={colour['Green']} B={colour['Blue']} "
f"({colour['PixelPercent']:.1f}%)")
detect_faces — Face Attribute Analysis
detect_faces locates every face in an image and returns detailed attributes for each: bounding box, 27 facial landmarks (eye corners, nose tip, mouth corners, etc.), estimated age range, detected emotions (with confidence scores per emotion), and boolean attributes for features like Smile, EyesOpen, MouthOpen, Sunglasses, and Beard. The Attributes parameter controls the level of detail returned.
import boto3
rekognition = boto3.client("rekognition", region_name="us-east-1")
response = rekognition.detect_faces(
Image={
"S3Object": {"Bucket": "my-content-bucket", "Name": "team-photo.jpg"}
},
Attributes=["ALL"], # ALL returns emotions, age range, gender, etc.
# DEFAULT returns only bounding box and landmarks
)
for i, face in enumerate(response["FaceDetails"]):
box = face["BoundingBox"]
age = face["AgeRange"]
emotions = sorted(face["Emotions"], key=lambda e: e["Confidence"], reverse=True)
print(f"\nFace {i+1}:")
print(f" Bounding box: L={box['Left']:.3f} T={box['Top']:.3f} "
f"W={box['Width']:.3f} H={box['Height']:.3f}")
print(f" Age range: {age['Low']}–{age['High']} years")
print(f" Top emotion: {emotions[0]['Type']} ({emotions[0]['Confidence']:.1f}%)")
print(f" Smile: {face['Smile']['Value']} ({face['Smile']['Confidence']:.1f}%)")
print(f" Eyes open: {face['EyesOpen']['Value']}")
print(f" Image quality — Brightness={face['Quality']['Brightness']:.1f} "
f"Sharpness={face['Quality']['Sharpness']:.1f}")
# Sample output:
# Face 1:
# Bounding box: L=0.312 T=0.120 W=0.145 H=0.218
# Age range: 28–38 years
# Top emotion: HAPPY (98.3%)
# Smile: True (97.1%)
# Eyes open: True
# Image quality — Brightness=72.4 Sharpness=89.1
detect_text — OCR on Images
detect_text runs OCR on images and returns both individual words (WORD type) and assembled lines (LINE type). It handles skewed, rotated, and stylised text — useful for reading licence plates, street signs, whiteboard photos, and screenshots. Each text detection includes a polygon (not just a bounding box) for accurate localisation of rotated text.
import boto3
rekognition = boto3.client("rekognition", region_name="us-east-1")
response = rekognition.detect_text(
Image={"S3Object": {"Bucket": "my-bucket", "Name": "road-sign.jpg"}},
Filters={
"WordFilter": {"MinConfidence": 80}, # Drop low-confidence words
"RegionsOfInterest": [ # Only scan the top half of the image
{
"BoundingBox": {
"Width": 1.0, "Height": 0.5,
"Left": 0.0, "Top": 0.0,
}
}
],
},
)
lines = [t for t in response["TextDetections"] if t["Type"] == "LINE"]
words = [t for t in response["TextDetections"] if t["Type"] == "WORD"]
print(f"Found {len(lines)} lines, {len(words)} words")
for line in lines:
print(f" LINE: '{line['DetectedText']}' — confidence={line['Confidence']:.1f}%")
poly = line["Geometry"]["Polygon"]
print(f" Polygon: {[(round(p['X'],3), round(p['Y'],3)) for p in poly]}")
# CLI equivalent — useful for quick testing:
# aws rekognition detect-text \
# --image '{"S3Object":{"Bucket":"my-bucket","Name":"road-sign.jpg"}}' \
# --region us-east-1
Face Collections: Building a Face Recognition System
Face search works differently from the other APIs. Instead of detecting faces in a vacuum, you build a face collection — a searchable index of known faces stored server-side by Rekognition. When a new image arrives, you call search_faces_by_image to find the closest matching face in the collection within milliseconds. This pattern powers access control systems, duplicate account detection, and employee attendance tracking.
The workflow has three phases: (1) create and populate the collection with create_collection and index_faces, (2) search the collection with search_faces_by_image or search_faces, and (3) manage the collection with list_faces, delete_faces. Face vectors are stored durably by Rekognition — you do not manage the embeddings yourself.
import boto3
import json
rekognition = boto3.client("rekognition", region_name="us-east-1")
dynamodb = boto3.resource("dynamodb", region_name="us-east-1")
COLLECTION_ID = "office-employees"
METADATA_TABLE = "face-metadata" # DynamoDB table: FaceId (PK) → employee info
# ---- Step 1: Create the collection (one-time setup) ----
try:
rekognition.create_collection(CollectionId=COLLECTION_ID)
print(f"Collection '{COLLECTION_ID}' created")
except rekognition.exceptions.ResourceAlreadyExistsException:
print(f"Collection '{COLLECTION_ID}' already exists")
# ---- Step 2: Index known employee photos ----
def index_employee(s3_bucket, s3_key, employee_id, employee_name):
"""Add a person's face to the collection and store metadata in DynamoDB."""
response = rekognition.index_faces(
CollectionId=COLLECTION_ID,
Image={"S3Object": {"Bucket": s3_bucket, "Name": s3_key}},
ExternalImageId=employee_id, # Your own identifier — returned in search results
DetectionAttributes=["DEFAULT"],
MaxFaces=1, # Only index the largest face in the photo
QualityFilter="AUTO", # Skip blurry or occluded faces
)
for face_record in response["FaceRecords"]:
face_id = face_record["Face"]["FaceId"]
confidence = face_record["Face"]["Confidence"]
print(f"Indexed: FaceId={face_id} Employee={employee_name} Confidence={confidence:.1f}%")
# Store the FaceId → employee mapping in DynamoDB
table = dynamodb.Table(METADATA_TABLE)
table.put_item(Item={
"FaceId": face_id,
"EmployeeId": employee_id,
"EmployeeName": employee_name,
"S3Key": s3_key,
})
# Report unindexed faces (bad quality, multiple faces, etc.)
for unindexed in response.get("UnindexedFaces", []):
print(f" Skipped face — reasons: {unindexed['Reasons']}")
# Index a batch of employees
employees = [
("hr-photos", "alice-smith.jpg", "EMP001", "Alice Smith"),
("hr-photos", "bob-jones.jpg", "EMP002", "Bob Jones"),
("hr-photos", "carol-white.jpg", "EMP003", "Carol White"),
]
for bucket, key, emp_id, name in employees:
index_employee(bucket, key, emp_id, name)
# ---- Step 3: Search for a face (e.g., at a door camera) ----
def identify_person(image_bytes):
"""Return employee info for the person in image_bytes, or None if unknown."""
response = rekognition.search_faces_by_image(
CollectionId=COLLECTION_ID,
Image={"Bytes": image_bytes},
MaxFaces=1,
FaceMatchThreshold=90.0, # Only accept matches above 90% similarity
)
matches = response.get("FaceMatches", [])
if not matches:
return None
best_match = matches[0]
face_id = best_match["Face"]["FaceId"]
similarity = best_match["Similarity"]
# Retrieve employee metadata from DynamoDB
table = dynamodb.Table(METADATA_TABLE)
item = table.get_item(Key={"FaceId": face_id}).get("Item")
if item:
return {
"employee_id": item["EmployeeId"],
"employee_name": item["EmployeeName"],
"similarity": round(similarity, 2),
}
return None
# Usage:
with open("door-camera-frame.jpg", "rb") as f:
result = identify_person(f.read())
if result:
print(f"Access granted: {result['employee_name']} ({result['similarity']}% match)")
else:
print("Unknown person — access denied")
# ---- List all indexed faces ----
paginator = rekognition.get_paginator("list_faces")
total = 0
for page in paginator.paginate(CollectionId=COLLECTION_ID, MaxResults=100):
total += len(page["Faces"])
print(f"Total indexed faces: {total}")
# CLI: describe a collection
# aws rekognition describe-collection --collection-id office-employees
delete_faces), and store metadata with minimal retention periods. Tag your DynamoDB records with consent timestamps and review dates.Content Moderation with Human Review (A2I)
Content moderation is one of the highest-value use cases for Rekognition. detect_moderation_labels returns a two-level hierarchy of unsafe content categories — for example, "Explicit Nudity" → "Graphic Male Nudity" — each with a confidence score. In a production moderation pipeline you typically auto-block content above a high threshold, auto-allow content below a low threshold, and route the middle band to human reviewers using Amazon Augmented AI (A2I).
import boto3
import json
rekognition = boto3.client("rekognition", region_name="us-east-1")
a2i_runtime = boto3.client("sagemaker-a2i-runtime", region_name="us-east-1")
s3 = boto3.client("s3", region_name="us-east-1")
# ---- Step 1: Detect moderation labels ----
def moderate_image(bucket, key):
response = rekognition.detect_moderation_labels(
Image={"S3Object": {"Bucket": bucket, "Name": key}},
MinConfidence=50, # Catch anything above 50% for the full picture
HumanLoopConfig={
"HumanLoopName": f"review-{key.replace('/', '-')}",
"FlowDefinitionArn": "arn:aws:sagemaker:us-east-1:123456789012:flow-definition/content-moderation-flow",
"DataAttributes": {"ContentClassifiers": ["FreeOfPersonallyIdentifiableInformation"]},
},
)
labels = response.get("ModerationLabels", [])
if not labels:
return {"action": "allow", "labels": []}
# Build a summary
summary = []
for label in labels:
summary.append({
"name": label["Name"],
"parent": label.get("ParentName", ""),
"confidence": round(label["Confidence"], 2),
})
# Decision thresholds
max_confidence = max(l["Confidence"] for l in labels)
if max_confidence >= 95:
action = "block"
elif max_confidence >= 60:
action = "human_review"
else:
action = "allow"
return {"action": action, "max_confidence": max_confidence, "labels": summary}
result = moderate_image("user-uploads", "avatars/suspect-image.jpg")
print(json.dumps(result, indent=2))
# Sample output:
# {
# "action": "human_review",
# "max_confidence": 78.34,
# "labels": [
# {"name": "Suggestive", "parent": "", "confidence": 78.34},
# {"name": "Female Swimwear Or Underwear", "parent": "Suggestive", "confidence": 78.34}
# ]
# }
# ---- Step 2: Process A2I human review results via EventBridge ----
# When a reviewer completes the task, A2I sends a completion event.
# The Lambda below receives it and applies the reviewer decision.
LAMBDA_HANDLER = '''
import boto3, json
s3 = boto3.client("s3")
def handler(event, context):
detail = event["detail"]
output = detail["humanLoopOutput"]["outputS3Uri"]
decision = detail["humanLoopStatus"] # "Completed" or "Stopped"
# Fetch the reviewer decision from S3
bucket, key = output.replace("s3://", "").split("/", 1)
obj = s3.get_object(Bucket=bucket, Key=key)
body = json.loads(obj["Body"].read())
# body["humanAnswers"][0]["answerContent"]["category"]["label"] == "safe" or "unsafe"
label = body["humanAnswers"][0]["answerContent"]["category"]["label"]
image_key = body["inputContent"]["taskObject"]
if label == "unsafe":
print(f"Reviewer marked {image_key} as UNSAFE — deleting")
# s3.delete_object(Bucket="user-uploads", Key=image_key)
else:
print(f"Reviewer marked {image_key} as SAFE — approved")
return {"statusCode": 200}
'''
print("Lambda handler code printed above — deploy to process A2I review events")
# CLI: check moderation labels
# aws rekognition detect-moderation-labels \
# --image '{"S3Object":{"Bucket":"user-uploads","Name":"avatars/test.jpg"}}' \
# --min-confidence 60
ParentName field in the response tells you which top-level category each child label belongs to.Custom Labels: Training on Your Own Dataset
Rekognition Custom Labels lets you train a model on your own image dataset and get the same simple API experience for domain-specific labels that the built-in models cannot detect — manufacturing defects, retail product types, crop disease stages, medical imaging anomalies, or any other specialised visual category. You need as few as 10 images per label to start, though 50–100 per label produces much better accuracy.
Preparing Your Dataset
Custom Labels uses Amazon Rekognition datasets backed by manifest files in S3. Each line of the manifest is a JSON object referencing an S3 image and its ground-truth labels. You can prepare the manifest manually, generate it from a CSV, or use the Rekognition console's built-in labelling tool.
import json
import boto3
# --- Build a manifest file for image classification ---
# Each line: one JSON object per image
manifest_lines = []
image_labels = [
("s3://my-dataset/defects/crack_001.jpg", "crack"),
("s3://my-dataset/defects/crack_002.jpg", "crack"),
("s3://my-dataset/defects/scratch_001.jpg", "scratch"),
("s3://my-dataset/defects/scratch_002.jpg", "scratch"),
("s3://my-dataset/ok/good_001.jpg", "no_defect"),
("s3://my-dataset/ok/good_002.jpg", "no_defect"),
]
for s3_uri, label in image_labels:
manifest_lines.append(json.dumps({
"source-ref": s3_uri,
"labels": {
"annotations": [{"label": label}],
"image_size": [{"width": 800, "height": 600}],
},
"labels-metadata": {
"job-name": "labeling-job/defect-classification",
"class-map": {"0": "crack", "1": "scratch", "2": "no_defect"},
"human-annotated": "yes",
"creation-date": "2026-06-08T00:00:00",
"type": "groundtruth/image-classification",
},
}))
manifest_content = "\n".join(manifest_lines)
s3 = boto3.client("s3")
s3.put_object(
Bucket="my-dataset",
Key="manifests/train.manifest",
Body=manifest_content.encode("utf-8"),
)
print("Manifest uploaded")
# --- Create dataset and start training via SDK ---
rekognition = boto3.client("rekognition", region_name="us-east-1")
# Create project
project = rekognition.create_project(ProjectName="defect-detection")
project_arn = project["ProjectArn"]
print(f"Project ARN: {project_arn}")
# Create dataset from manifest
dataset = rekognition.create_dataset(
DatasetType="TRAIN",
ProjectArn=project_arn,
DatasetSource={
"GroundTruthManifest": {
"S3Object": {
"Bucket": "my-dataset",
"Name": "manifests/train.manifest",
}
}
},
)
train_dataset_arn = dataset["DatasetArn"]
# Train the model (also need a test dataset ARN — omitted for brevity)
version = rekognition.create_project_version(
ProjectArn=project_arn,
VersionName="v1-2026-06-08",
OutputConfig={
"S3Bucket": "my-dataset",
"S3KeyPrefix": "model-output/",
},
TrainingData={"Assets": [{"GroundTruthManifest": {
"S3Object": {"Bucket": "my-dataset", "Name": "manifests/train.manifest"}
}}]},
TestingData={"Assets": [{"GroundTruthManifest": {
"S3Object": {"Bucket": "my-dataset", "Name": "manifests/test.manifest"}
}}]},
)
model_arn = version["ProjectVersionArn"]
print(f"Training started — model ARN: {model_arn}")
# Wait for training to complete (can take 30–90 minutes)
waiter = rekognition.get_waiter("project_version_training_completed")
waiter.wait(
ProjectArn=project_arn,
VersionNames=["v1-2026-06-08"],
WaiterConfig={"Delay": 60, "MaxAttempts": 120},
)
# Check metrics
desc = rekognition.describe_project_versions(
ProjectArn=project_arn,
VersionNames=["v1-2026-06-08"],
)
metrics = desc["ProjectVersionDescriptions"][0]["EvaluationResult"]
print(f"F1 score: {metrics['F1Score']:.3f}")
print(f"Precision: {metrics['Summary']['EvaluationResultSummary']['Precision']:.3f}")
# ---- Start the model (hosted on dedicated inference units) ----
rekognition.start_project_version(
ProjectVersionArn=model_arn,
MinInferenceUnits=1, # 1 unit = ~5 TPS; scale up as needed
)
# ---- Run inference ----
result = rekognition.detect_custom_labels(
ProjectVersionArn=model_arn,
Image={"S3Object": {"Bucket": "production-images", "Name": "conveyor/part-0042.jpg"}},
MinConfidence=70,
)
for label in result["CustomLabels"]:
print(f"{label['Name']}: {label['Confidence']:.1f}%")
if "Geometry" in label:
box = label["Geometry"]["BoundingBox"]
print(f" Box: L={box['Left']:.3f} T={box['Top']:.3f} W={box['Width']:.3f} H={box['Height']:.3f}")
# ---- Stop the model when not needed (avoid idle charges) ----
rekognition.stop_project_version(ProjectVersionArn=model_arn)
stop_project_version when the model is not in use. For batch workloads, schedule start/stop with Lambda and EventBridge to minimise idle cost.Async Video Analysis: Label and Face Detection
For video files stored in S3, Rekognition uses an asynchronous job pattern. You start a job with start_label_detection (or start_face_detection, start_content_moderation, etc.), receive a JobId immediately, and poll get_label_detection until the job status is SUCCEEDED. For production pipelines, use SNS notification instead of polling — Rekognition publishes to your SNS topic when the job completes, which triggers a Lambda to retrieve results.
import boto3
import time
import json
rekognition = boto3.client("rekognition", region_name="us-east-1")
sns_topic_arn = "arn:aws:sns:us-east-1:123456789012:rekognition-job-complete"
iam_role_arn = "arn:aws:iam::123456789012:role/RekognitionSNSPublishRole"
VIDEO_BUCKET = "my-video-bucket"
VIDEO_KEY = "footage/factory-floor-2026-06-08.mp4"
# ---- Start a label detection job ----
response = rekognition.start_label_detection(
Video={
"S3Object": {
"Bucket": VIDEO_BUCKET,
"Name": VIDEO_KEY,
}
},
MinConfidence=70,
NotificationChannel={
"SNSTopicArn": sns_topic_arn,
"RoleArn": iam_role_arn,
},
JobTag="factory-safety-check",
Features=["GENERAL_LABELS"],
Settings={
"GeneralLabels": {
"LabelInclusionFilters": ["Person", "Helmet", "Vehicle", "Forklift"],
}
},
)
job_id = response["JobId"]
print(f"Started label detection job: {job_id}")
# ---- Polling approach (for development / small jobs) ----
def wait_for_job(job_id, max_polls=60, poll_interval=10):
for i in range(max_polls):
result = rekognition.get_label_detection(
JobId=job_id,
SortBy="TIMESTAMP",
AggregateBy="TIMESTAMPS",
)
status = result["JobStatus"]
print(f" Poll {i+1}: status={status}")
if status == "SUCCEEDED":
return result
elif status == "FAILED":
raise RuntimeError(f"Job failed: {result.get('StatusMessage')}")
time.sleep(poll_interval)
raise TimeoutError("Job did not complete in time")
# result = wait_for_job(job_id) # Uncomment to poll (dev only)
# ---- SNS-triggered Lambda handler (production) ----
LAMBDA_HANDLER = '''
import boto3, json
rekognition = boto3.client("rekognition", region_name="us-east-1")
def handler(event, context):
# SNS wraps the message
for record in event["Records"]:
message = json.loads(record["Sns"]["Message"])
job_id = message["JobId"]
status = message["Status"]
if status != "SUCCEEDED":
print(f"Job {job_id} ended with status {status}")
return
# Paginate through all results
all_labels = []
kwargs = {"JobId": job_id, "SortBy": "TIMESTAMP", "MaxResults": 1000}
while True:
page = rekognition.get_label_detection(**kwargs)
all_labels.extend(page["Labels"])
next_token = page.get("NextToken")
if not next_token:
break
kwargs["NextToken"] = next_token
print(f"Job {job_id}: {len(all_labels)} label detections")
# Find timestamps where a Person was detected without a Helmet
person_times = set()
helmet_times = set()
for entry in all_labels:
ts = entry["Timestamp"] # milliseconds from start of video
name = entry["Label"]["Name"]
if name == "Person":
person_times.add(ts)
elif name == "Helmet":
helmet_times.add(ts)
unsafe_times = person_times - helmet_times
if unsafe_times:
print(f"Safety violation detected at {len(unsafe_times)} timestamps!")
# Trigger alert, save report to S3, etc.
'''
print("Production Lambda handler above — wire to SNS topic for job completion events")
# ---- Face detection in video ----
face_job = rekognition.start_face_detection(
Video={"S3Object": {"Bucket": VIDEO_BUCKET, "Name": VIDEO_KEY}},
NotificationChannel={"SNSTopicArn": sns_topic_arn, "RoleArn": iam_role_arn},
FaceAttributes="ALL",
)
print(f"Started face detection job: {face_job['JobId']}")
# ---- Content moderation in video ----
mod_job = rekognition.start_content_moderation(
Video={"S3Object": {"Bucket": VIDEO_BUCKET, "Name": VIDEO_KEY}},
MinConfidence=60,
NotificationChannel={"SNSTopicArn": sns_topic_arn, "RoleArn": iam_role_arn},
)
print(f"Started content moderation job: {mod_job['JobId']}")
Real-Time Video Streams with Kinesis Video Streams
For live camera feeds — security cameras, entry-point cameras, live broadcast streams — you connect a Kinesis Video Stream to a Rekognition Streaming Processor. Rekognition reads frames from the stream continuously and publishes detection events (faces matched against a collection, or connected home labels) to a Kinesis Data Stream in near real time. Latency from camera to detection event is typically under 2 seconds.
import boto3
rekognition = boto3.client("rekognition", region_name="us-east-1")
kinesis = boto3.client("kinesis", region_name="us-east-1")
COLLECTION_ID = "office-employees"
KVS_STREAM_ARN = "arn:aws:kinesisvideo:us-east-1:123456789012:stream/lobby-camera/0123456789"
KINESIS_DATA_ARN = "arn:aws:kinesis:us-east-1:123456789012:stream/rekognition-events"
IAM_ROLE_ARN = "arn:aws:iam::123456789012:role/RekognitionKinesisRole"
PROCESSOR_NAME = "lobby-face-search"
# ---- Create a streaming processor ----
response = rekognition.create_stream_processor(
Name=PROCESSOR_NAME,
Input={
"KinesisVideoStream": {
"Arn": KVS_STREAM_ARN,
}
},
Output={
"KinesisDataStream": {
"Arn": KINESIS_DATA_ARN,
}
},
RoleArn=IAM_ROLE_ARN,
Settings={
"FaceSearch": {
"CollectionId": COLLECTION_ID,
"FaceMatchThreshold": 85.0, # Min similarity to report a match
}
},
NotificationChannel={
"SNSTopicArn": "arn:aws:sns:us-east-1:123456789012:processor-alerts"
},
# Frame rate: process every Nth frame (1 = every frame, 2 = every other, etc.)
DataShardsPerSecond=1,
)
processor_arn = response["StreamProcessorArn"]
print(f"Stream processor ARN: {processor_arn}")
# ---- Start the processor ----
rekognition.start_stream_processor(Name=PROCESSOR_NAME)
print(f"Processor '{PROCESSOR_NAME}' started — streaming face search is live")
# ---- Consume face match events from Kinesis Data Stream ----
import json, base64, time
shard_iterator = kinesis.get_shard_iterator(
StreamName="rekognition-events",
ShardId="shardId-000000000000",
ShardIteratorType="LATEST",
)["ShardIterator"]
print("Listening for face match events...")
for _ in range(30): # Read for 30 iterations (demo)
records_response = kinesis.get_records(ShardIterator=shard_iterator, Limit=100)
shard_iterator = records_response["NextShardIterator"]
for record in records_response["Records"]:
payload = json.loads(base64.b64decode(record["Data"]))
for match_event in payload.get("FaceSearchResponse", []):
detected = match_event["DetectedFace"]
matches = match_event.get("MatchedFaces", [])
print(f"Face detected — confidence={detected['Confidence']:.1f}%")
for m in matches:
print(f" Matched: FaceId={m['Face']['FaceId']} "
f"ExternalId={m['Face']['ExternalImageId']} "
f"Similarity={m['Similarity']:.1f}%")
time.sleep(1)
# ---- Stop and delete when no longer needed ----
rekognition.stop_stream_processor(Name=PROCESSOR_NAME)
# rekognition.delete_stream_processor(Name=PROCESSOR_NAME)
print("Processor stopped")
kvssink GStreamer plugin: gst-launch-1.0 rtspsrc location=rtsp://camera-ip/stream ! kvssink stream-name=lobby-camera storage-size=512. The producer SDK handles fragmented MP4 packaging, TLS, and retry logic automatically.PPE Detection for Workplace Safety
Rekognition's detect_protective_equipment API analyses an image and, for each detected person, determines whether they are wearing protective equipment on specific body parts: head (hard hat), face (face cover/mask), left hand (glove), and right hand (glove). The response tells you both the type of PPE detected and whether it is covering the relevant body part — a hard hat resting on a table is detected but not covering. This distinction is critical for compliance checking.
import boto3
import json
rekognition = boto3.client("rekognition", region_name="us-east-1")
def check_ppe_compliance(bucket, key, required_ppe=None):
"""
Analyse an image for PPE compliance.
required_ppe: list of required equipment types, e.g. ["FACE_COVER", "HEAD_COVER"]
Returns a list of violation dicts, one per non-compliant person.
"""
if required_ppe is None:
required_ppe = ["HEAD_COVER", "FACE_COVER"]
response = rekognition.detect_protective_equipment(
Image={"S3Object": {"Bucket": bucket, "Name": key}},
SummarizationAttributes={
"MinConfidence": 80, # Min confidence for PPE detection
"RequiredEquipmentTypes": required_ppe,
},
)
violations = []
for person in response["Persons"]:
person_id = person["Id"]
person_conf = person["Confidence"]
body_parts = person.get("BodyParts", [])
# Map body part → PPE found
ppe_by_part = {}
for bp in body_parts:
part_name = bp["Name"] # HEAD, FACE, LEFT_HAND, RIGHT_HAND
for eq in bp.get("EquipmentDetections", []):
ppe_by_part[part_name] = {
"type": eq["Type"],
"confidence": eq["Confidence"],
"covers": eq["CoversBodyPart"]["Value"],
}
# Check each required type
person_violations = []
if "HEAD_COVER" in required_ppe:
head_ppe = ppe_by_part.get("HEAD")
if not head_ppe or not head_ppe["covers"]:
person_violations.append("Missing or improperly worn HEAD_COVER (hard hat)")
if "FACE_COVER" in required_ppe:
face_ppe = ppe_by_part.get("FACE")
if not face_ppe or not face_ppe["covers"]:
person_violations.append("Missing or improperly worn FACE_COVER (mask)")
if "HAND_COVER" in required_ppe:
lh = ppe_by_part.get("LEFT_HAND")
rh = ppe_by_part.get("RIGHT_HAND")
if not (lh and lh["covers"]) or not (rh and rh["covers"]):
person_violations.append("Missing gloves on one or both hands")
if person_violations:
box = person["BoundingBox"]
violations.append({
"person_id": person_id,
"confidence": round(person_conf, 2),
"bounding_box": box,
"violations": person_violations,
})
# Rekognition also provides a pre-computed summary
summary = response.get("Summary", {})
persons_with_req = summary.get("PersonsWithRequiredEquipment", [])
persons_without = summary.get("PersonsWithoutRequiredEquipment", [])
persons_indeterminate = summary.get("PersonsIndeterminate", [])
print(f"Compliant persons: {len(persons_with_req)}")
print(f"Non-compliant persons: {len(persons_without)}")
print(f"Indeterminate: {len(persons_indeterminate)}")
return violations
# Run compliance check on a batch of images
import os
images = [
("safety-images", "site/morning-shift-001.jpg"),
("safety-images", "site/morning-shift-002.jpg"),
("safety-images", "site/morning-shift-003.jpg"),
]
for bucket, key in images:
violations = check_ppe_compliance(bucket, key, required_ppe=["HEAD_COVER", "FACE_COVER"])
if violations:
print(f"\n[VIOLATION] {key}: {len(violations)} person(s) non-compliant")
for v in violations:
print(f" Person {v['person_id']}: {'; '.join(v['violations'])}")
else:
print(f"\n[OK] {key}: All persons compliant")
# CLI quick check:
# aws rekognition detect-protective-equipment \
# --image '{"S3Object":{"Bucket":"safety-images","Name":"site/morning-shift-001.jpg"}}' \
# --summarization-attributes '{"MinConfidence":80,"RequiredEquipmentTypes":["HEAD_COVER","FACE_COVER"]}'
Lambda + S3 Trigger: Auto-Moderate Uploads
The most common production pattern for Rekognition is a serverless pipeline triggered by S3 uploads. When a user uploads an image, S3 notifies Lambda, which runs moderation and label detection, tags the S3 object with results, and optionally writes a record to DynamoDB for downstream use. This entire pipeline costs fractions of a cent per image and scales to millions of images per day with no infrastructure management.
import boto3
import json
import os
# Environment variables set on the Lambda function:
# METADATA_TABLE — DynamoDB table for image records
# QUARANTINE_BUCKET — S3 bucket for blocked images
# MOD_THRESHOLD — float confidence threshold for auto-block
rekognition = boto3.client("rekognition")
s3 = boto3.client("s3")
dynamodb = boto3.resource("dynamodb")
METADATA_TABLE = os.environ.get("METADATA_TABLE", "image-metadata")
QUARANTINE_BUCKET = os.environ.get("QUARANTINE_BUCKET", "quarantine-uploads")
MOD_THRESHOLD = float(os.environ.get("MOD_THRESHOLD", "90"))
def handler(event, context):
"""
Triggered by S3 PutObject event.
1. Detects moderation labels
2. Detects object labels
3. Tags the S3 object
4. Moves to quarantine if unsafe
5. Stores metadata in DynamoDB
"""
for record in event["Records"]:
bucket = record["s3"]["bucket"]["name"]
key = record["s3"]["object"]["key"]
size = record["s3"]["object"].get("size", 0)
print(f"Processing: s3://{bucket}/{key} ({size} bytes)")
# ----- 1. Content moderation -----
mod_response = rekognition.detect_moderation_labels(
Image={"S3Object": {"Bucket": bucket, "Name": key}},
MinConfidence=50,
)
mod_labels = mod_response.get("ModerationLabels", [])
max_mod_confidence = max((l["Confidence"] for l in mod_labels), default=0)
is_unsafe = max_mod_confidence >= MOD_THRESHOLD
# ----- 2. Object labels -----
label_response = rekognition.detect_labels(
Image={"S3Object": {"Bucket": bucket, "Name": key}},
MaxLabels=15,
MinConfidence=70,
)
detected_labels = [
{"name": l["Name"], "confidence": round(l["Confidence"], 2)}
for l in label_response["Labels"]
]
label_names = [l["name"] for l in detected_labels]
# ----- 3. Tag the S3 object -----
tags = {
"moderation-status": "unsafe" if is_unsafe else "safe",
"mod-max-confidence": str(round(max_mod_confidence, 1)),
"top-labels": ",".join(label_names[:5]),
"processed-by": "rekognition-lambda",
}
s3.put_object_tagging(
Bucket=bucket,
Key=key,
Tagging={"TagSet": [{"Key": k, "Value": v} for k, v in tags.items()]},
)
# ----- 4. Quarantine unsafe images -----
if is_unsafe:
quarantine_key = f"quarantine/{key}"
s3.copy_object(
CopySource={"Bucket": bucket, "Key": key},
Bucket=QUARANTINE_BUCKET,
Key=quarantine_key,
)
s3.delete_object(Bucket=bucket, Key=key)
print(f"QUARANTINED: {key} — max mod confidence={max_mod_confidence:.1f}%")
# ----- 5. Write metadata to DynamoDB -----
table = dynamodb.Table(METADATA_TABLE)
table.put_item(Item={
"ImageKey": key,
"Bucket": bucket,
"Status": "quarantined" if is_unsafe else "approved",
"ModerationLabels": [
{"name": l["Name"], "parent": l.get("ParentName", ""),
"confidence": round(l["Confidence"], 2)}
for l in mod_labels
],
"DetectedLabels": detected_labels,
"MaxModConfidence": round(max_mod_confidence, 2),
"FileSize": size,
"ProcessedAt": context.aws_request_id,
})
print(f"Done: {key} → {'UNSAFE' if is_unsafe else 'SAFE'} "
f"| Labels: {label_names[:3]}")
return {"statusCode": 200, "body": "OK"}
# ---- Infrastructure as Code (AWS CLI) ----
# Create the Lambda function and wire the S3 trigger:
#
# aws lambda create-function \
# --function-name image-moderation-pipeline \
# --runtime python3.12 \
# --role arn:aws:iam::123456789012:role/LambdaRekognitionRole \
# --handler lambda_function.handler \
# --zip-file fileb://function.zip \
# --timeout 30 \
# --environment "Variables={METADATA_TABLE=image-metadata,
# QUARANTINE_BUCKET=quarantine-uploads,
# MOD_THRESHOLD=90}"
#
# aws s3api put-bucket-notification-configuration \
# --bucket user-uploads \
# --notification-configuration '{
# "LambdaFunctionConfigurations": [{
# "LambdaFunctionArn": "arn:aws:lambda:us-east-1:123456789012:function:image-moderation-pipeline",
# "Events": ["s3:ObjectCreated:*"],
# "Filter": {"Key": {"FilterRules": [{"Name": "suffix", "Value": ".jpg"}]}}
# }]
# }'
rekognition:DetectLabels, rekognition:DetectModerationLabels, s3:GetObject on the source bucket, s3:PutObjectTagging on the source bucket, s3:CopyObject and s3:DeleteObject, plus dynamodb:PutItem on the metadata table. Never grant broad rekognition:* — use least privilege.Pricing, Limits, and Cost Optimisation
Rekognition's standard image APIs are priced per image analysed, with tiered pricing that rewards scale. Video analysis is priced per minute of video processed. Custom Labels has an additional per-hour charge for running the model. Understanding the pricing structure and the service limits helps you architect a cost-efficient system from the start.
| API | Price (us-east-1, mid-2026) | Free Tier |
|---|---|---|
| detect_labels | $0.001 per image (first 1M/month) $0.0008 per image (next 9M) | 5,000 images/month for 12 months |
| detect_faces | $0.001 per image | 5,000 images/month for 12 months |
| search_faces_by_image | $0.001 per image | 5,000 images/month for 12 months |
| detect_moderation_labels | $0.001 per image | 5,000 images/month for 12 months |
| detect_text | $0.001 per image | 5,000 images/month for 12 months |
| detect_protective_equipment | $0.004 per image | Not included in free tier |
| Video label/face detection | $0.10 per minute of video | Not included in free tier |
| Custom Labels — inference | $4.00 per inference unit per hour | Not included in free tier |
| Kinesis Video streaming processor | $0.10 per minute of stream processed | Not included in free tier |
Service Limits (Default, us-east-1)
| Operation | Default TPS Limit | Max Image Size |
|---|---|---|
| detect_labels, detect_faces, detect_text | 50 TPS per account | 15 MB (S3), 5 MB (Bytes) |
| detect_moderation_labels | 50 TPS | 15 MB (S3), 5 MB (Bytes) |
| search_faces_by_image | 50 TPS | 15 MB (S3), 5 MB (Bytes) |
| index_faces | 10 TPS | 15 MB (S3), 5 MB (Bytes) |
| start_label_detection (video) | 20 concurrent jobs | 10 GB video file (S3) |
| Face collection size | 20 million faces per collection | — |
Cost Optimisation Strategies
import boto3
import hashlib
# ---- 1. Cache results in DynamoDB to avoid re-analysing identical images ----
dynamodb = boto3.resource("dynamodb")
cache_table = dynamodb.Table("rekognition-cache")
rekognition = boto3.client("rekognition")
def detect_labels_cached(bucket, key):
"""Run detect_labels with DynamoDB caching keyed on the S3 ETag."""
# Get the ETag (MD5 hash of file content) — free API call
head = boto3.client("s3").head_object(Bucket=bucket, Key=key)
etag = head["ETag"].strip('"')
# Check cache
cached = cache_table.get_item(Key={"ETag": etag}).get("Item")
if cached:
print(f"Cache HIT for {key}")
return cached["Labels"]
# Cache miss — call Rekognition
print(f"Cache MISS for {key} — calling Rekognition")
response = rekognition.detect_labels(
Image={"S3Object": {"Bucket": bucket, "Name": key}},
MaxLabels=15,
MinConfidence=70,
)
labels = response["Labels"]
# Store in cache (TTL = 30 days)
import time
cache_table.put_item(Item={
"ETag": etag,
"Labels": labels,
"TTL": int(time.time()) + 30 * 24 * 3600,
})
return labels
# ---- 2. Resize images before sending (reduce cost + increase throughput) ----
from PIL import Image
import io
def resize_for_rekognition(image_bytes, max_side=1024):
"""Resize image so the longer side is max_side pixels — sufficient for detection."""
img = Image.open(io.BytesIO(image_bytes))
img.thumbnail((max_side, max_side), Image.LANCZOS)
buf = io.BytesIO()
img.save(buf, format="JPEG", quality=85)
return buf.getvalue()
# ---- 3. Filter by confidence early — don't store or act on low-confidence detections ----
def filter_high_confidence(labels, threshold=80):
return [l for l in labels if l["Confidence"] >= threshold]
# ---- 4. Stop Custom Labels model outside business hours ----
# In EventBridge, schedule two rules:
# cron(0 8 * * ? *) → Lambda: rekognition.start_project_version(...)
# cron(0 20 * * ? *) → Lambda: rekognition.stop_project_version(...)
# This saves up to 12 hours of idle charges per day = ~$48/day per inference unit.
# ---- 5. Use SQS + Lambda to stay within TPS limits ----
# Instead of calling Rekognition directly from your web servers, push image keys to an SQS queue.
# A Lambda consumer processes the queue at a controlled rate (e.g., reserved concurrency = 40
# gives you 40 × 1 TPS = 40 TPS — safely under the 50 TPS limit).
print("Cost optimisation strategies applied — see comments above for details")
Frequently Asked Questions
What is the difference between Rekognition and Rekognition Custom Labels?
Rekognition's standard APIs use general-purpose models trained on diverse datasets and can detect thousands of common object categories, faces, text, and moderation content. Custom Labels lets you train a model specifically on your own images and labels — for example, "cracked circuit board" vs "intact circuit board" — which the general-purpose model cannot distinguish. Custom Labels requires data preparation, a training step (30–90 minutes), and a hosted inference unit, whereas standard APIs are available immediately per-call with no training.
Is Rekognition GDPR-compliant for face recognition?
AWS provides GDPR-compliant infrastructure (data processing agreements, data residency controls), but compliance for your application depends on your data practices. Under GDPR and similar regulations, face recognition data is biometric personal data requiring explicit legal basis and often explicit consent. You are responsible for obtaining consent, implementing the right to erasure (via delete_faces), minimising data retention, and ensuring data is processed only in permitted regions. Consult your legal team before deploying face recognition in the EU.
How accurate is Rekognition compared to open-source models?
For well-lit, frontal face images, Rekognition achieves >99% face verification accuracy. For general object detection it is competitive with state-of-the-art models. The real advantage of Rekognition over running your own models (e.g., YOLO, FaceNet, ResNet) is the operational simplicity: no GPU management, no model versioning headaches, no scaling infrastructure, and built-in SLA. For domain-specific tasks where accuracy is critical, Custom Labels may still underperform specialised open-source models — benchmark on your own dataset.
Can Rekognition process images that contain people without explicit consent?
detect_labels, detect_faces, and detect_moderation_labels analyse the visual content of images you own. They do not identify named individuals (that requires recognize_celebrities or a face collection). Processing images for moderation or tagging purposes on your own platform is generally permissible under most privacy frameworks, but check local regulations — some jurisdictions restrict automated processing of images containing people even without identification.
What video formats does Rekognition support?
For S3-based video analysis, Rekognition supports H.264-encoded MP4 and MOV files up to 10 GB in size and up to 6 hours in length. For Kinesis Video Streams, the producer SDK encodes video as fragmented MP4 (H.264). Audio tracks are ignored. Maximum video resolution is 1920×1080. If your source video is in another format (H.265, AV1, MKV), transcode it with MediaConvert first.