AWS Rekognition: Image and Video Analysis with Deep Learning

AWS Rekognition is Amazon's fully managed computer vision service that makes deep learning-powered image and video analysis accessible without any ML expertise. Behind the scenes, Rekognition runs large-scale convolutional neural networks trained on hundreds of millions of images across billions of parameters — but you interact with it through simple API calls that return JSON. The service handles model hosting, scaling, GPU infrastructure, and updates automatically. Whether you need to detect objects in product photos, build a face recognition system for physical access control, moderate user-generated content at millions of images per hour, or analyse hours of surveillance video for safety equipment compliance, Rekognition provides purpose-built APIs for each scenario. This guide covers every major capability with complete Python boto3 code, real JSON response examples, CLI commands, and production architecture patterns.

Rekognition Capabilities Overview
Image Analysis: Labels, Faces, and Text Detection
Face Collections: Building a Face Recognition System
Content Moderation with Human Review (A2I)
Custom Labels: Training on Your Own Dataset
Async Video Analysis: Label and Face Detection
Real-Time Video Streams with Kinesis Video Streams
PPE Detection for Workplace Safety
Lambda + S3 Trigger: Auto-Moderate Uploads
Pricing, Limits, and Cost Optimisation
Frequently Asked Questions

Rekognition Capabilities Overview

Rekognition exposes eight distinct detection capabilities. Each is a separate API call returning a structured JSON response. They can be called independently or composed into pipelines — for example, run detect_labels first to check whether an image contains a person, then run detect_faces only if it does, halving your API costs for images of inanimate objects.

API / Feature	What It Returns	Common Use Case
`detect_labels`	Objects, scenes, activities with confidence scores and bounding boxes	Auto-tagging product images, content classification
`detect_faces`	Face bounding boxes, landmarks, emotions, age range, attributes	Demographics analytics, photo organisation
`search_faces_by_image`	Matching faces from a pre-built collection	Access control, duplicate account detection
`recognize_celebrities`	Named public figures with confidence and IMDB URL	Media tagging, social listening
`detect_text`	Text lines and words, bounding boxes, confidence	OCR on signs, forms, licence plates
`detect_moderation_labels`	Explicit/suggestive content categories with hierarchy	User-generated content platforms
`detect_protective_equipment`	Hard hat, mask, glove coverage per body part per person	Warehouse/construction site safety audits
Custom Labels	Domain-specific labels trained on your images	Manufacturing defect detection, medical imaging

Two Input Modes: Every image API accepts either an S3 object reference ({"S3Object": {"Bucket": "...", "Name": "..."}}) or a raw bytes payload ({"Bytes": b"..."}) up to 5 MB. For production pipelines, always use S3 references — they avoid base64 encoding overhead and support images up to 15 MB. The Bytes mode is convenient for real-time webcam frames or mobile uploads where the image has not yet been persisted.

All image APIs are synchronous — you get a response within milliseconds. Video analysis uses an asynchronous pattern: you start a job, receive a JobId, and poll (or get notified via SNS) when results are ready. Real-time video analysis uses a streaming processor connected to a Kinesis Video Stream. The rest of this guide walks through each pattern in detail.

Image Analysis: Labels, Faces, and Text Detection

The three most widely used image APIs are detect_labels, detect_faces, and detect_text. Together they cover the majority of practical computer vision use cases.

detect_labels — Object and Scene Detection

detect_labels identifies thousands of objects, scenes, activities, and concepts. Each label comes with a confidence score (0–100), a parent hierarchy (so "Car" is a child of "Vehicle"), and optional bounding boxes for localised objects. The MaxLabels and MinConfidence parameters give you control over cost and precision.

import boto3
import json

rekognition = boto3.client("rekognition", region_name="us-east-1")

# --- Detect labels from an S3 image ---
response = rekognition.detect_labels(
    Image={
        "S3Object": {
            "Bucket": "my-content-bucket",
            "Name":   "uploads/user-photo.jpg",
        }
    },
    MaxLabels=20,        # Return at most 20 labels
    MinConfidence=75,    # Discard labels below 75% confidence
    Features=["GENERAL_LABELS", "IMAGE_PROPERTIES"],  # Also get dominant colours
)

# Parse labels
for label in response["Labels"]:
    name       = label["Name"]
    confidence = label["Confidence"]
    parents    = [p["Name"] for p in label.get("Parents", [])]
    boxes      = label.get("Instances", [])

    print(f"{name} ({confidence:.1f}%) — parents: {parents}")
    for box in boxes:
        b = box["BoundingBox"]
        print(f"  BoundingBox: left={b['Left']:.3f} top={b['Top']:.3f} "
              f"width={b['Width']:.3f} height={b['Height']:.3f} "
              f"confidence={box['Confidence']:.1f}%")

# Sample output:
# Person (99.2%) — parents: []
# Car (97.8%) — parents: ['Vehicle', 'Transportation']
#   BoundingBox: left=0.123 top=0.341 width=0.289 height=0.201 confidence=97.8%
# Road (95.4%) — parents: ['Infrastructure']

# Image properties (dominant colours)
if "ImageProperties" in response:
    for colour in response["ImageProperties"]["DominantColors"][:3]:
        print(f"Dominant colour: R={colour['Red']} G={colour['Green']} B={colour['Blue']} "
              f"({colour['PixelPercent']:.1f}%)")

detect_faces — Face Attribute Analysis

detect_faces locates every face in an image and returns detailed attributes for each: bounding box, 27 facial landmarks (eye corners, nose tip, mouth corners, etc.), estimated age range, detected emotions (with confidence scores per emotion), and boolean attributes for features like Smile, EyesOpen, MouthOpen, Sunglasses, and Beard. The Attributes parameter controls the level of detail returned.

import boto3

rekognition = boto3.client("rekognition", region_name="us-east-1")

response = rekognition.detect_faces(
    Image={
        "S3Object": {"Bucket": "my-content-bucket", "Name": "team-photo.jpg"}
    },
    Attributes=["ALL"],   # ALL returns emotions, age range, gender, etc.
                          # DEFAULT returns only bounding box and landmarks
)

for i, face in enumerate(response["FaceDetails"]):
    box  = face["BoundingBox"]
    age  = face["AgeRange"]
    emotions = sorted(face["Emotions"], key=lambda e: e["Confidence"], reverse=True)

    print(f"\nFace {i+1}:")
    print(f"  Bounding box: L={box['Left']:.3f} T={box['Top']:.3f} "
          f"W={box['Width']:.3f} H={box['Height']:.3f}")
    print(f"  Age range: {age['Low']}–{age['High']} years")
    print(f"  Top emotion: {emotions[0]['Type']} ({emotions[0]['Confidence']:.1f}%)")
    print(f"  Smile: {face['Smile']['Value']} ({face['Smile']['Confidence']:.1f}%)")
    print(f"  Eyes open: {face['EyesOpen']['Value']}")
    print(f"  Image quality — Brightness={face['Quality']['Brightness']:.1f} "
          f"Sharpness={face['Quality']['Sharpness']:.1f}")

# Sample output:
# Face 1:
#   Bounding box: L=0.312 T=0.120 W=0.145 H=0.218
#   Age range: 28–38 years
#   Top emotion: HAPPY (98.3%)
#   Smile: True (97.1%)
#   Eyes open: True
#   Image quality — Brightness=72.4 Sharpness=89.1

detect_text — OCR on Images

detect_text runs OCR on images and returns both individual words (WORD type) and assembled lines (LINE type). It handles skewed, rotated, and stylised text — useful for reading licence plates, street signs, whiteboard photos, and screenshots. Each text detection includes a polygon (not just a bounding box) for accurate localisation of rotated text.

import boto3

rekognition = boto3.client("rekognition", region_name="us-east-1")

response = rekognition.detect_text(
    Image={"S3Object": {"Bucket": "my-bucket", "Name": "road-sign.jpg"}},
    Filters={
        "WordFilter": {"MinConfidence": 80},    # Drop low-confidence words
        "RegionsOfInterest": [                  # Only scan the top half of the image
            {
                "BoundingBox": {
                    "Width": 1.0, "Height": 0.5,
                    "Left": 0.0, "Top": 0.0,
                }
            }
        ],
    },
)

lines = [t for t in response["TextDetections"] if t["Type"] == "LINE"]
words = [t for t in response["TextDetections"] if t["Type"] == "WORD"]

print(f"Found {len(lines)} lines, {len(words)} words")
for line in lines:
    print(f"  LINE: '{line['DetectedText']}' — confidence={line['Confidence']:.1f}%")
    poly = line["Geometry"]["Polygon"]
    print(f"  Polygon: {[(round(p['X'],3), round(p['Y'],3)) for p in poly]}")

# CLI equivalent — useful for quick testing:
# aws rekognition detect-text \
#   --image '{"S3Object":{"Bucket":"my-bucket","Name":"road-sign.jpg"}}' \
#   --region us-east-1

Face Collections: Building a Face Recognition System

Face search works differently from the other APIs. Instead of detecting faces in a vacuum, you build a face collection — a searchable index of known faces stored server-side by Rekognition. When a new image arrives, you call search_faces_by_image to find the closest matching face in the collection within milliseconds. This pattern powers access control systems, duplicate account detection, and employee attendance tracking.

The workflow has three phases: (1) create and populate the collection with create_collection and index_faces, (2) search the collection with search_faces_by_image or search_faces, and (3) manage the collection with list_faces, delete_faces. Face vectors are stored durably by Rekognition — you do not manage the embeddings yourself.

import boto3
import json

rekognition = boto3.client("rekognition", region_name="us-east-1")
dynamodb     = boto3.resource("dynamodb",  region_name="us-east-1")

COLLECTION_ID = "office-employees"
METADATA_TABLE = "face-metadata"   # DynamoDB table: FaceId (PK) → employee info

# ---- Step 1: Create the collection (one-time setup) ----
try:
    rekognition.create_collection(CollectionId=COLLECTION_ID)
    print(f"Collection '{COLLECTION_ID}' created")
except rekognition.exceptions.ResourceAlreadyExistsException:
    print(f"Collection '{COLLECTION_ID}' already exists")

# ---- Step 2: Index known employee photos ----
def index_employee(s3_bucket, s3_key, employee_id, employee_name):
    """Add a person's face to the collection and store metadata in DynamoDB."""
    response = rekognition.index_faces(
        CollectionId=COLLECTION_ID,
        Image={"S3Object": {"Bucket": s3_bucket, "Name": s3_key}},
        ExternalImageId=employee_id,     # Your own identifier — returned in search results
        DetectionAttributes=["DEFAULT"],
        MaxFaces=1,                       # Only index the largest face in the photo
        QualityFilter="AUTO",            # Skip blurry or occluded faces
    )

    for face_record in response["FaceRecords"]:
        face_id = face_record["Face"]["FaceId"]
        confidence = face_record["Face"]["Confidence"]
        print(f"Indexed: FaceId={face_id} Employee={employee_name} Confidence={confidence:.1f}%")

        # Store the FaceId → employee mapping in DynamoDB
        table = dynamodb.Table(METADATA_TABLE)
        table.put_item(Item={
            "FaceId":       face_id,
            "EmployeeId":   employee_id,
            "EmployeeName": employee_name,
            "S3Key":        s3_key,
        })

    # Report unindexed faces (bad quality, multiple faces, etc.)
    for unindexed in response.get("UnindexedFaces", []):
        print(f"  Skipped face — reasons: {unindexed['Reasons']}")

# Index a batch of employees
employees = [
    ("hr-photos", "alice-smith.jpg",    "EMP001", "Alice Smith"),
    ("hr-photos", "bob-jones.jpg",      "EMP002", "Bob Jones"),
    ("hr-photos", "carol-white.jpg",    "EMP003", "Carol White"),
]
for bucket, key, emp_id, name in employees:
    index_employee(bucket, key, emp_id, name)

# ---- Step 3: Search for a face (e.g., at a door camera) ----
def identify_person(image_bytes):
    """Return employee info for the person in image_bytes, or None if unknown."""
    response = rekognition.search_faces_by_image(
        CollectionId=COLLECTION_ID,
        Image={"Bytes": image_bytes},
        MaxFaces=1,
        FaceMatchThreshold=90.0,   # Only accept matches above 90% similarity
    )

    matches = response.get("FaceMatches", [])
    if not matches:
        return None

    best_match = matches[0]
    face_id    = best_match["Face"]["FaceId"]
    similarity = best_match["Similarity"]

    # Retrieve employee metadata from DynamoDB
    table = dynamodb.Table(METADATA_TABLE)
    item  = table.get_item(Key={"FaceId": face_id}).get("Item")
    if item:
        return {
            "employee_id":   item["EmployeeId"],
            "employee_name": item["EmployeeName"],
            "similarity":    round(similarity, 2),
        }
    return None

# Usage:
with open("door-camera-frame.jpg", "rb") as f:
    result = identify_person(f.read())

if result:
    print(f"Access granted: {result['employee_name']} ({result['similarity']}% match)")
else:
    print("Unknown person — access denied")

# ---- List all indexed faces ----
paginator = rekognition.get_paginator("list_faces")
total = 0
for page in paginator.paginate(CollectionId=COLLECTION_ID, MaxResults=100):
    total += len(page["Faces"])
print(f"Total indexed faces: {total}")

# CLI: describe a collection
# aws rekognition describe-collection --collection-id office-employees

Privacy and Legal Compliance: Face recognition systems are subject to biometric data laws in many jurisdictions (Illinois BIPA, GDPR, Brazil LGPD). Always obtain explicit informed consent before indexing an individual's face, provide a mechanism to delete their data (delete_faces), and store metadata with minimal retention periods. Tag your DynamoDB records with consent timestamps and review dates.

Content Moderation with Human Review (A2I)

Content moderation is one of the highest-value use cases for Rekognition. detect_moderation_labels returns a two-level hierarchy of unsafe content categories — for example, "Explicit Nudity" → "Graphic Male Nudity" — each with a confidence score. In a production moderation pipeline you typically auto-block content above a high threshold, auto-allow content below a low threshold, and route the middle band to human reviewers using Amazon Augmented AI (A2I).

import boto3
import json

rekognition  = boto3.client("rekognition",   region_name="us-east-1")
a2i_runtime  = boto3.client("sagemaker-a2i-runtime", region_name="us-east-1")
s3           = boto3.client("s3",            region_name="us-east-1")

# ---- Step 1: Detect moderation labels ----
def moderate_image(bucket, key):
    response = rekognition.detect_moderation_labels(
        Image={"S3Object": {"Bucket": bucket, "Name": key}},
        MinConfidence=50,   # Catch anything above 50% for the full picture
        HumanLoopConfig={
            "HumanLoopName":       f"review-{key.replace('/', '-')}",
            "FlowDefinitionArn":   "arn:aws:sagemaker:us-east-1:123456789012:flow-definition/content-moderation-flow",
            "DataAttributes":      {"ContentClassifiers": ["FreeOfPersonallyIdentifiableInformation"]},
        },
    )

    labels = response.get("ModerationLabels", [])
    if not labels:
        return {"action": "allow", "labels": []}

    # Build a summary
    summary = []
    for label in labels:
        summary.append({
            "name":        label["Name"],
            "parent":      label.get("ParentName", ""),
            "confidence":  round(label["Confidence"], 2),
        })

    # Decision thresholds
    max_confidence = max(l["Confidence"] for l in labels)
    if max_confidence >= 95:
        action = "block"
    elif max_confidence >= 60:
        action = "human_review"
    else:
        action = "allow"

    return {"action": action, "max_confidence": max_confidence, "labels": summary}

result = moderate_image("user-uploads", "avatars/suspect-image.jpg")
print(json.dumps(result, indent=2))

# Sample output:
# {
#   "action": "human_review",
#   "max_confidence": 78.34,
#   "labels": [
#     {"name": "Suggestive", "parent": "", "confidence": 78.34},
#     {"name": "Female Swimwear Or Underwear", "parent": "Suggestive", "confidence": 78.34}
#   ]
# }

# ---- Step 2: Process A2I human review results via EventBridge ----
# When a reviewer completes the task, A2I sends a completion event.
# The Lambda below receives it and applies the reviewer decision.

LAMBDA_HANDLER = '''
import boto3, json

s3 = boto3.client("s3")

def handler(event, context):
    detail   = event["detail"]
    output   = detail["humanLoopOutput"]["outputS3Uri"]
    decision = detail["humanLoopStatus"]  # "Completed" or "Stopped"

    # Fetch the reviewer decision from S3
    bucket, key = output.replace("s3://", "").split("/", 1)
    obj  = s3.get_object(Bucket=bucket, Key=key)
    body = json.loads(obj["Body"].read())

    # body["humanAnswers"][0]["answerContent"]["category"]["label"] == "safe" or "unsafe"
    label     = body["humanAnswers"][0]["answerContent"]["category"]["label"]
    image_key = body["inputContent"]["taskObject"]

    if label == "unsafe":
        print(f"Reviewer marked {image_key} as UNSAFE — deleting")
        # s3.delete_object(Bucket="user-uploads", Key=image_key)
    else:
        print(f"Reviewer marked {image_key} as SAFE — approved")
    return {"statusCode": 200}
'''
print("Lambda handler code printed above — deploy to process A2I review events")

# CLI: check moderation labels
# aws rekognition detect-moderation-labels \
#   --image '{"S3Object":{"Bucket":"user-uploads","Name":"avatars/test.jpg"}}' \
#   --min-confidence 60

Moderation Label Taxonomy: Rekognition's moderation taxonomy has two levels. Top-level categories include Explicit Nudity, Suggestive, Violence, Visually Disturbing, Rude Gestures, Drugs, Tobacco, Alcohol, Gambling, and Hate Symbols. Each has several child labels. You can filter on top-level categories alone for a broad policy or target specific child labels for fine-grained rules. The ParentName field in the response tells you which top-level category each child label belongs to.

Custom Labels: Training on Your Own Dataset

Rekognition Custom Labels lets you train a model on your own image dataset and get the same simple API experience for domain-specific labels that the built-in models cannot detect — manufacturing defects, retail product types, crop disease stages, medical imaging anomalies, or any other specialised visual category. You need as few as 10 images per label to start, though 50–100 per label produces much better accuracy.

Preparing Your Dataset

Custom Labels uses Amazon Rekognition datasets backed by manifest files in S3. Each line of the manifest is a JSON object referencing an S3 image and its ground-truth labels. You can prepare the manifest manually, generate it from a CSV, or use the Rekognition console's built-in labelling tool.

import json
import boto3

# --- Build a manifest file for image classification ---
# Each line: one JSON object per image

manifest_lines = []

image_labels = [
    ("s3://my-dataset/defects/crack_001.jpg",   "crack"),
    ("s3://my-dataset/defects/crack_002.jpg",   "crack"),
    ("s3://my-dataset/defects/scratch_001.jpg", "scratch"),
    ("s3://my-dataset/defects/scratch_002.jpg", "scratch"),
    ("s3://my-dataset/ok/good_001.jpg",         "no_defect"),
    ("s3://my-dataset/ok/good_002.jpg",         "no_defect"),
]

for s3_uri, label in image_labels:
    manifest_lines.append(json.dumps({
        "source-ref": s3_uri,
        "labels": {
            "annotations": [{"label": label}],
            "image_size": [{"width": 800, "height": 600}],
        },
        "labels-metadata": {
            "job-name":    "labeling-job/defect-classification",
            "class-map":   {"0": "crack", "1": "scratch", "2": "no_defect"},
            "human-annotated": "yes",
            "creation-date":   "2026-06-08T00:00:00",
            "type":            "groundtruth/image-classification",
        },
    }))

manifest_content = "\n".join(manifest_lines)
s3 = boto3.client("s3")
s3.put_object(
    Bucket="my-dataset",
    Key="manifests/train.manifest",
    Body=manifest_content.encode("utf-8"),
)
print("Manifest uploaded")

# --- Create dataset and start training via SDK ---
rekognition = boto3.client("rekognition", region_name="us-east-1")

# Create project
project = rekognition.create_project(ProjectName="defect-detection")
project_arn = project["ProjectArn"]
print(f"Project ARN: {project_arn}")

# Create dataset from manifest
dataset = rekognition.create_dataset(
    DatasetType="TRAIN",
    ProjectArn=project_arn,
    DatasetSource={
        "GroundTruthManifest": {
            "S3Object": {
                "Bucket": "my-dataset",
                "Name":   "manifests/train.manifest",
            }
        }
    },
)
train_dataset_arn = dataset["DatasetArn"]

# Train the model (also need a test dataset ARN — omitted for brevity)
version = rekognition.create_project_version(
    ProjectArn=project_arn,
    VersionName="v1-2026-06-08",
    OutputConfig={
        "S3Bucket":    "my-dataset",
        "S3KeyPrefix": "model-output/",
    },
    TrainingData={"Assets": [{"GroundTruthManifest": {
        "S3Object": {"Bucket": "my-dataset", "Name": "manifests/train.manifest"}
    }}]},
    TestingData={"Assets": [{"GroundTruthManifest": {
        "S3Object": {"Bucket": "my-dataset", "Name": "manifests/test.manifest"}
    }}]},
)
model_arn = version["ProjectVersionArn"]
print(f"Training started — model ARN: {model_arn}")

# Wait for training to complete (can take 30–90 minutes)
waiter = rekognition.get_waiter("project_version_training_completed")
waiter.wait(
    ProjectArn=project_arn,
    VersionNames=["v1-2026-06-08"],
    WaiterConfig={"Delay": 60, "MaxAttempts": 120},
)

# Check metrics
desc = rekognition.describe_project_versions(
    ProjectArn=project_arn,
    VersionNames=["v1-2026-06-08"],
)
metrics = desc["ProjectVersionDescriptions"][0]["EvaluationResult"]
print(f"F1 score: {metrics['F1Score']:.3f}")
print(f"Precision: {metrics['Summary']['EvaluationResultSummary']['Precision']:.3f}")

# ---- Start the model (hosted on dedicated inference units) ----
rekognition.start_project_version(
    ProjectVersionArn=model_arn,
    MinInferenceUnits=1,   # 1 unit = ~5 TPS; scale up as needed
)

# ---- Run inference ----
result = rekognition.detect_custom_labels(
    ProjectVersionArn=model_arn,
    Image={"S3Object": {"Bucket": "production-images", "Name": "conveyor/part-0042.jpg"}},
    MinConfidence=70,
)

for label in result["CustomLabels"]:
    print(f"{label['Name']}: {label['Confidence']:.1f}%")
    if "Geometry" in label:
        box = label["Geometry"]["BoundingBox"]
        print(f"  Box: L={box['Left']:.3f} T={box['Top']:.3f} W={box['Width']:.3f} H={box['Height']:.3f}")

# ---- Stop the model when not needed (avoid idle charges) ----
rekognition.stop_project_version(ProjectVersionArn=model_arn)

Custom Labels Inference Units: Unlike the standard Rekognition APIs (which are purely pay-per-call), Custom Labels models run on dedicated inference units billed per hour ($4/hour per unit in us-east-1 as of mid-2026). Always call stop_project_version when the model is not in use. For batch workloads, schedule start/stop with Lambda and EventBridge to minimise idle cost.

Async Video Analysis: Label and Face Detection

For video files stored in S3, Rekognition uses an asynchronous job pattern. You start a job with start_label_detection (or start_face_detection, start_content_moderation, etc.), receive a JobId immediately, and poll get_label_detection until the job status is SUCCEEDED. For production pipelines, use SNS notification instead of polling — Rekognition publishes to your SNS topic when the job completes, which triggers a Lambda to retrieve results.

import boto3
import time
import json

rekognition = boto3.client("rekognition", region_name="us-east-1")
sns_topic_arn = "arn:aws:sns:us-east-1:123456789012:rekognition-job-complete"
iam_role_arn  = "arn:aws:iam::123456789012:role/RekognitionSNSPublishRole"

VIDEO_BUCKET = "my-video-bucket"
VIDEO_KEY    = "footage/factory-floor-2026-06-08.mp4"

# ---- Start a label detection job ----
response = rekognition.start_label_detection(
    Video={
        "S3Object": {
            "Bucket": VIDEO_BUCKET,
            "Name":   VIDEO_KEY,
        }
    },
    MinConfidence=70,
    NotificationChannel={
        "SNSTopicArn": sns_topic_arn,
        "RoleArn":     iam_role_arn,
    },
    JobTag="factory-safety-check",
    Features=["GENERAL_LABELS"],
    Settings={
        "GeneralLabels": {
            "LabelInclusionFilters": ["Person", "Helmet", "Vehicle", "Forklift"],
        }
    },
)
job_id = response["JobId"]
print(f"Started label detection job: {job_id}")

# ---- Polling approach (for development / small jobs) ----
def wait_for_job(job_id, max_polls=60, poll_interval=10):
    for i in range(max_polls):
        result = rekognition.get_label_detection(
            JobId=job_id,
            SortBy="TIMESTAMP",
            AggregateBy="TIMESTAMPS",
        )
        status = result["JobStatus"]
        print(f"  Poll {i+1}: status={status}")

        if status == "SUCCEEDED":
            return result
        elif status == "FAILED":
            raise RuntimeError(f"Job failed: {result.get('StatusMessage')}")

        time.sleep(poll_interval)
    raise TimeoutError("Job did not complete in time")

# result = wait_for_job(job_id)  # Uncomment to poll (dev only)

# ---- SNS-triggered Lambda handler (production) ----
LAMBDA_HANDLER = '''
import boto3, json

rekognition = boto3.client("rekognition", region_name="us-east-1")

def handler(event, context):
    # SNS wraps the message
    for record in event["Records"]:
        message = json.loads(record["Sns"]["Message"])
        job_id  = message["JobId"]
        status  = message["Status"]

        if status != "SUCCEEDED":
            print(f"Job {job_id} ended with status {status}")
            return

        # Paginate through all results
        all_labels = []
        kwargs = {"JobId": job_id, "SortBy": "TIMESTAMP", "MaxResults": 1000}
        while True:
            page = rekognition.get_label_detection(**kwargs)
            all_labels.extend(page["Labels"])
            next_token = page.get("NextToken")
            if not next_token:
                break
            kwargs["NextToken"] = next_token

        print(f"Job {job_id}: {len(all_labels)} label detections")

        # Find timestamps where a Person was detected without a Helmet
        person_times  = set()
        helmet_times  = set()
        for entry in all_labels:
            ts    = entry["Timestamp"]   # milliseconds from start of video
            name  = entry["Label"]["Name"]
            if name == "Person":
                person_times.add(ts)
            elif name == "Helmet":
                helmet_times.add(ts)

        unsafe_times = person_times - helmet_times
        if unsafe_times:
            print(f"Safety violation detected at {len(unsafe_times)} timestamps!")
            # Trigger alert, save report to S3, etc.
'''
print("Production Lambda handler above — wire to SNS topic for job completion events")

# ---- Face detection in video ----
face_job = rekognition.start_face_detection(
    Video={"S3Object": {"Bucket": VIDEO_BUCKET, "Name": VIDEO_KEY}},
    NotificationChannel={"SNSTopicArn": sns_topic_arn, "RoleArn": iam_role_arn},
    FaceAttributes="ALL",
)
print(f"Started face detection job: {face_job['JobId']}")

# ---- Content moderation in video ----
mod_job = rekognition.start_content_moderation(
    Video={"S3Object": {"Bucket": VIDEO_BUCKET, "Name": VIDEO_KEY}},
    MinConfidence=60,
    NotificationChannel={"SNSTopicArn": sns_topic_arn, "RoleArn": iam_role_arn},
)
print(f"Started content moderation job: {mod_job['JobId']}")

Real-Time Video Streams with Kinesis Video Streams

For live camera feeds — security cameras, entry-point cameras, live broadcast streams — you connect a Kinesis Video Stream to a Rekognition Streaming Processor. Rekognition reads frames from the stream continuously and publishes detection events (faces matched against a collection, or connected home labels) to a Kinesis Data Stream in near real time. Latency from camera to detection event is typically under 2 seconds.

import boto3

rekognition  = boto3.client("rekognition",  region_name="us-east-1")
kinesis      = boto3.client("kinesis",      region_name="us-east-1")

COLLECTION_ID      = "office-employees"
KVS_STREAM_ARN     = "arn:aws:kinesisvideo:us-east-1:123456789012:stream/lobby-camera/0123456789"
KINESIS_DATA_ARN   = "arn:aws:kinesis:us-east-1:123456789012:stream/rekognition-events"
IAM_ROLE_ARN       = "arn:aws:iam::123456789012:role/RekognitionKinesisRole"
PROCESSOR_NAME     = "lobby-face-search"

# ---- Create a streaming processor ----
response = rekognition.create_stream_processor(
    Name=PROCESSOR_NAME,
    Input={
        "KinesisVideoStream": {
            "Arn": KVS_STREAM_ARN,
        }
    },
    Output={
        "KinesisDataStream": {
            "Arn": KINESIS_DATA_ARN,
        }
    },
    RoleArn=IAM_ROLE_ARN,
    Settings={
        "FaceSearch": {
            "CollectionId":         COLLECTION_ID,
            "FaceMatchThreshold":   85.0,    # Min similarity to report a match
        }
    },
    NotificationChannel={
        "SNSTopicArn": "arn:aws:sns:us-east-1:123456789012:processor-alerts"
    },
    # Frame rate: process every Nth frame (1 = every frame, 2 = every other, etc.)
    DataShardsPerSecond=1,
)
processor_arn = response["StreamProcessorArn"]
print(f"Stream processor ARN: {processor_arn}")

# ---- Start the processor ----
rekognition.start_stream_processor(Name=PROCESSOR_NAME)
print(f"Processor '{PROCESSOR_NAME}' started — streaming face search is live")

# ---- Consume face match events from Kinesis Data Stream ----
import json, base64, time

shard_iterator = kinesis.get_shard_iterator(
    StreamName="rekognition-events",
    ShardId="shardId-000000000000",
    ShardIteratorType="LATEST",
)["ShardIterator"]

print("Listening for face match events...")
for _ in range(30):   # Read for 30 iterations (demo)
    records_response = kinesis.get_records(ShardIterator=shard_iterator, Limit=100)
    shard_iterator   = records_response["NextShardIterator"]

    for record in records_response["Records"]:
        payload = json.loads(base64.b64decode(record["Data"]))
        for match_event in payload.get("FaceSearchResponse", []):
            detected = match_event["DetectedFace"]
            matches  = match_event.get("MatchedFaces", [])

            print(f"Face detected — confidence={detected['Confidence']:.1f}%")
            for m in matches:
                print(f"  Matched: FaceId={m['Face']['FaceId']} "
                      f"ExternalId={m['Face']['ExternalImageId']} "
                      f"Similarity={m['Similarity']:.1f}%")

    time.sleep(1)

# ---- Stop and delete when no longer needed ----
rekognition.stop_stream_processor(Name=PROCESSOR_NAME)
# rekognition.delete_stream_processor(Name=PROCESSOR_NAME)
print("Processor stopped")

Kinesis Video Streams Producer SDK: To push a live camera feed into KVS, use the open-source Kinesis Video Streams C/C++ or GStreamer producer SDK. For IP cameras that support RTSP, use the kvssink GStreamer plugin: gst-launch-1.0 rtspsrc location=rtsp://camera-ip/stream ! kvssink stream-name=lobby-camera storage-size=512. The producer SDK handles fragmented MP4 packaging, TLS, and retry logic automatically.

PPE Detection for Workplace Safety

Rekognition's detect_protective_equipment API analyses an image and, for each detected person, determines whether they are wearing protective equipment on specific body parts: head (hard hat), face (face cover/mask), left hand (glove), and right hand (glove). The response tells you both the type of PPE detected and whether it is covering the relevant body part — a hard hat resting on a table is detected but not covering. This distinction is critical for compliance checking.

import boto3
import json

rekognition = boto3.client("rekognition", region_name="us-east-1")

def check_ppe_compliance(bucket, key, required_ppe=None):
    """
    Analyse an image for PPE compliance.
    required_ppe: list of required equipment types, e.g. ["FACE_COVER", "HEAD_COVER"]
    Returns a list of violation dicts, one per non-compliant person.
    """
    if required_ppe is None:
        required_ppe = ["HEAD_COVER", "FACE_COVER"]

    response = rekognition.detect_protective_equipment(
        Image={"S3Object": {"Bucket": bucket, "Name": key}},
        SummarizationAttributes={
            "MinConfidence":       80,             # Min confidence for PPE detection
            "RequiredEquipmentTypes": required_ppe,
        },
    )

    violations = []

    for person in response["Persons"]:
        person_id   = person["Id"]
        person_conf = person["Confidence"]
        body_parts  = person.get("BodyParts", [])

        # Map body part → PPE found
        ppe_by_part = {}
        for bp in body_parts:
            part_name = bp["Name"]   # HEAD, FACE, LEFT_HAND, RIGHT_HAND
            for eq in bp.get("EquipmentDetections", []):
                ppe_by_part[part_name] = {
                    "type":       eq["Type"],
                    "confidence": eq["Confidence"],
                    "covers":     eq["CoversBodyPart"]["Value"],
                }

        # Check each required type
        person_violations = []
        if "HEAD_COVER" in required_ppe:
            head_ppe = ppe_by_part.get("HEAD")
            if not head_ppe or not head_ppe["covers"]:
                person_violations.append("Missing or improperly worn HEAD_COVER (hard hat)")

        if "FACE_COVER" in required_ppe:
            face_ppe = ppe_by_part.get("FACE")
            if not face_ppe or not face_ppe["covers"]:
                person_violations.append("Missing or improperly worn FACE_COVER (mask)")

        if "HAND_COVER" in required_ppe:
            lh = ppe_by_part.get("LEFT_HAND")
            rh = ppe_by_part.get("RIGHT_HAND")
            if not (lh and lh["covers"]) or not (rh and rh["covers"]):
                person_violations.append("Missing gloves on one or both hands")

        if person_violations:
            box = person["BoundingBox"]
            violations.append({
                "person_id":   person_id,
                "confidence":  round(person_conf, 2),
                "bounding_box": box,
                "violations":  person_violations,
            })

    # Rekognition also provides a pre-computed summary
    summary = response.get("Summary", {})
    persons_with_req  = summary.get("PersonsWithRequiredEquipment",    [])
    persons_without   = summary.get("PersonsWithoutRequiredEquipment", [])
    persons_indeterminate = summary.get("PersonsIndeterminate",        [])

    print(f"Compliant persons: {len(persons_with_req)}")
    print(f"Non-compliant persons: {len(persons_without)}")
    print(f"Indeterminate: {len(persons_indeterminate)}")

    return violations

# Run compliance check on a batch of images
import os

images = [
    ("safety-images", "site/morning-shift-001.jpg"),
    ("safety-images", "site/morning-shift-002.jpg"),
    ("safety-images", "site/morning-shift-003.jpg"),
]

for bucket, key in images:
    violations = check_ppe_compliance(bucket, key, required_ppe=["HEAD_COVER", "FACE_COVER"])
    if violations:
        print(f"\n[VIOLATION] {key}: {len(violations)} person(s) non-compliant")
        for v in violations:
            print(f"  Person {v['person_id']}: {'; '.join(v['violations'])}")
    else:
        print(f"\n[OK] {key}: All persons compliant")

# CLI quick check:
# aws rekognition detect-protective-equipment \
#   --image '{"S3Object":{"Bucket":"safety-images","Name":"site/morning-shift-001.jpg"}}' \
#   --summarization-attributes '{"MinConfidence":80,"RequiredEquipmentTypes":["HEAD_COVER","FACE_COVER"]}'

Lambda + S3 Trigger: Auto-Moderate Uploads

The most common production pattern for Rekognition is a serverless pipeline triggered by S3 uploads. When a user uploads an image, S3 notifies Lambda, which runs moderation and label detection, tags the S3 object with results, and optionally writes a record to DynamoDB for downstream use. This entire pipeline costs fractions of a cent per image and scales to millions of images per day with no infrastructure management.

import boto3
import json
import os

# Environment variables set on the Lambda function:
# METADATA_TABLE  — DynamoDB table for image records
# QUARANTINE_BUCKET — S3 bucket for blocked images
# MOD_THRESHOLD     — float confidence threshold for auto-block

rekognition = boto3.client("rekognition")
s3          = boto3.client("s3")
dynamodb    = boto3.resource("dynamodb")

METADATA_TABLE    = os.environ.get("METADATA_TABLE",    "image-metadata")
QUARANTINE_BUCKET = os.environ.get("QUARANTINE_BUCKET", "quarantine-uploads")
MOD_THRESHOLD     = float(os.environ.get("MOD_THRESHOLD", "90"))


def handler(event, context):
    """
    Triggered by S3 PutObject event.
    1. Detects moderation labels
    2. Detects object labels
    3. Tags the S3 object
    4. Moves to quarantine if unsafe
    5. Stores metadata in DynamoDB
    """
    for record in event["Records"]:
        bucket = record["s3"]["bucket"]["name"]
        key    = record["s3"]["object"]["key"]
        size   = record["s3"]["object"].get("size", 0)

        print(f"Processing: s3://{bucket}/{key} ({size} bytes)")

        # ----- 1. Content moderation -----
        mod_response = rekognition.detect_moderation_labels(
            Image={"S3Object": {"Bucket": bucket, "Name": key}},
            MinConfidence=50,
        )
        mod_labels = mod_response.get("ModerationLabels", [])
        max_mod_confidence = max((l["Confidence"] for l in mod_labels), default=0)
        is_unsafe = max_mod_confidence >= MOD_THRESHOLD

        # ----- 2. Object labels -----
        label_response = rekognition.detect_labels(
            Image={"S3Object": {"Bucket": bucket, "Name": key}},
            MaxLabels=15,
            MinConfidence=70,
        )
        detected_labels = [
            {"name": l["Name"], "confidence": round(l["Confidence"], 2)}
            for l in label_response["Labels"]
        ]
        label_names = [l["name"] for l in detected_labels]

        # ----- 3. Tag the S3 object -----
        tags = {
            "moderation-status":    "unsafe" if is_unsafe else "safe",
            "mod-max-confidence":   str(round(max_mod_confidence, 1)),
            "top-labels":           ",".join(label_names[:5]),
            "processed-by":         "rekognition-lambda",
        }
        s3.put_object_tagging(
            Bucket=bucket,
            Key=key,
            Tagging={"TagSet": [{"Key": k, "Value": v} for k, v in tags.items()]},
        )

        # ----- 4. Quarantine unsafe images -----
        if is_unsafe:
            quarantine_key = f"quarantine/{key}"
            s3.copy_object(
                CopySource={"Bucket": bucket, "Key": key},
                Bucket=QUARANTINE_BUCKET,
                Key=quarantine_key,
            )
            s3.delete_object(Bucket=bucket, Key=key)
            print(f"QUARANTINED: {key} — max mod confidence={max_mod_confidence:.1f}%")

        # ----- 5. Write metadata to DynamoDB -----
        table = dynamodb.Table(METADATA_TABLE)
        table.put_item(Item={
            "ImageKey":         key,
            "Bucket":           bucket,
            "Status":           "quarantined" if is_unsafe else "approved",
            "ModerationLabels": [
                {"name": l["Name"], "parent": l.get("ParentName", ""),
                 "confidence": round(l["Confidence"], 2)}
                for l in mod_labels
            ],
            "DetectedLabels":   detected_labels,
            "MaxModConfidence": round(max_mod_confidence, 2),
            "FileSize":         size,
            "ProcessedAt":      context.aws_request_id,
        })

        print(f"Done: {key} → {'UNSAFE' if is_unsafe else 'SAFE'} "
              f"| Labels: {label_names[:3]}")

    return {"statusCode": 200, "body": "OK"}


# ---- Infrastructure as Code (AWS CLI) ----
# Create the Lambda function and wire the S3 trigger:
#
# aws lambda create-function \
#   --function-name image-moderation-pipeline \
#   --runtime python3.12 \
#   --role arn:aws:iam::123456789012:role/LambdaRekognitionRole \
#   --handler lambda_function.handler \
#   --zip-file fileb://function.zip \
#   --timeout 30 \
#   --environment "Variables={METADATA_TABLE=image-metadata,
#                              QUARANTINE_BUCKET=quarantine-uploads,
#                              MOD_THRESHOLD=90}"
#
# aws s3api put-bucket-notification-configuration \
#   --bucket user-uploads \
#   --notification-configuration '{
#     "LambdaFunctionConfigurations": [{
#       "LambdaFunctionArn": "arn:aws:lambda:us-east-1:123456789012:function:image-moderation-pipeline",
#       "Events": ["s3:ObjectCreated:*"],
#       "Filter": {"Key": {"FilterRules": [{"Name": "suffix", "Value": ".jpg"}]}}
#     }]
#   }'

IAM Role for Lambda: The Lambda execution role needs: rekognition:DetectLabels, rekognition:DetectModerationLabels, s3:GetObject on the source bucket, s3:PutObjectTagging on the source bucket, s3:CopyObject and s3:DeleteObject, plus dynamodb:PutItem on the metadata table. Never grant broad rekognition:* — use least privilege.

Pricing, Limits, and Cost Optimisation

Rekognition's standard image APIs are priced per image analysed, with tiered pricing that rewards scale. Video analysis is priced per minute of video processed. Custom Labels has an additional per-hour charge for running the model. Understanding the pricing structure and the service limits helps you architect a cost-efficient system from the start.

API	Price (us-east-1, mid-2026)	Free Tier
detect_labels	$0.001 per image (first 1M/month) $0.0008 per image (next 9M)	5,000 images/month for 12 months
detect_faces	$0.001 per image	5,000 images/month for 12 months
search_faces_by_image	$0.001 per image	5,000 images/month for 12 months
detect_moderation_labels	$0.001 per image	5,000 images/month for 12 months
detect_text	$0.001 per image	5,000 images/month for 12 months
detect_protective_equipment	$0.004 per image	Not included in free tier
Video label/face detection	$0.10 per minute of video	Not included in free tier
Custom Labels — inference	$4.00 per inference unit per hour	Not included in free tier
Kinesis Video streaming processor	$0.10 per minute of stream processed	Not included in free tier

Service Limits (Default, us-east-1)

Operation	Default TPS Limit	Max Image Size
detect_labels, detect_faces, detect_text	50 TPS per account	15 MB (S3), 5 MB (Bytes)
detect_moderation_labels	50 TPS	15 MB (S3), 5 MB (Bytes)
search_faces_by_image	50 TPS	15 MB (S3), 5 MB (Bytes)
index_faces	10 TPS	15 MB (S3), 5 MB (Bytes)
start_label_detection (video)	20 concurrent jobs	10 GB video file (S3)
Face collection size	20 million faces per collection	—

Cost Optimisation Strategies

import boto3
import hashlib

# ---- 1. Cache results in DynamoDB to avoid re-analysing identical images ----
dynamodb = boto3.resource("dynamodb")
cache_table = dynamodb.Table("rekognition-cache")
rekognition = boto3.client("rekognition")

def detect_labels_cached(bucket, key):
    """Run detect_labels with DynamoDB caching keyed on the S3 ETag."""
    # Get the ETag (MD5 hash of file content) — free API call
    head = boto3.client("s3").head_object(Bucket=bucket, Key=key)
    etag = head["ETag"].strip('"')

    # Check cache
    cached = cache_table.get_item(Key={"ETag": etag}).get("Item")
    if cached:
        print(f"Cache HIT for {key}")
        return cached["Labels"]

    # Cache miss — call Rekognition
    print(f"Cache MISS for {key} — calling Rekognition")
    response = rekognition.detect_labels(
        Image={"S3Object": {"Bucket": bucket, "Name": key}},
        MaxLabels=15,
        MinConfidence=70,
    )
    labels = response["Labels"]

    # Store in cache (TTL = 30 days)
    import time
    cache_table.put_item(Item={
        "ETag":   etag,
        "Labels": labels,
        "TTL":    int(time.time()) + 30 * 24 * 3600,
    })
    return labels

# ---- 2. Resize images before sending (reduce cost + increase throughput) ----
from PIL import Image
import io

def resize_for_rekognition(image_bytes, max_side=1024):
    """Resize image so the longer side is max_side pixels — sufficient for detection."""
    img = Image.open(io.BytesIO(image_bytes))
    img.thumbnail((max_side, max_side), Image.LANCZOS)
    buf = io.BytesIO()
    img.save(buf, format="JPEG", quality=85)
    return buf.getvalue()

# ---- 3. Filter by confidence early — don't store or act on low-confidence detections ----
def filter_high_confidence(labels, threshold=80):
    return [l for l in labels if l["Confidence"] >= threshold]

# ---- 4. Stop Custom Labels model outside business hours ----
# In EventBridge, schedule two rules:
# cron(0 8 * * ? *)  → Lambda: rekognition.start_project_version(...)
# cron(0 20 * * ? *) → Lambda: rekognition.stop_project_version(...)
# This saves up to 12 hours of idle charges per day = ~$48/day per inference unit.

# ---- 5. Use SQS + Lambda to stay within TPS limits ----
# Instead of calling Rekognition directly from your web servers, push image keys to an SQS queue.
# A Lambda consumer processes the queue at a controlled rate (e.g., reserved concurrency = 40
# gives you 40 × 1 TPS = 40 TPS — safely under the 50 TPS limit).
print("Cost optimisation strategies applied — see comments above for details")

Confidence Threshold Calibration: The right confidence threshold depends on your use case. For content moderation on a public platform, err towards lower thresholds (60–70%) and rely on human review to handle uncertain cases — the cost of showing harmful content exceeds the cost of over-blocking. For label-based product tagging, use higher thresholds (85–95%) to ensure tag quality. Always test thresholds on a labelled holdout set before setting them in production.

Frequently Asked Questions

What is the difference between Rekognition and Rekognition Custom Labels?

Rekognition's standard APIs use general-purpose models trained on diverse datasets and can detect thousands of common object categories, faces, text, and moderation content. Custom Labels lets you train a model specifically on your own images and labels — for example, "cracked circuit board" vs "intact circuit board" — which the general-purpose model cannot distinguish. Custom Labels requires data preparation, a training step (30–90 minutes), and a hosted inference unit, whereas standard APIs are available immediately per-call with no training.

Is Rekognition GDPR-compliant for face recognition?

AWS provides GDPR-compliant infrastructure (data processing agreements, data residency controls), but compliance for your application depends on your data practices. Under GDPR and similar regulations, face recognition data is biometric personal data requiring explicit legal basis and often explicit consent. You are responsible for obtaining consent, implementing the right to erasure (via delete_faces), minimising data retention, and ensuring data is processed only in permitted regions. Consult your legal team before deploying face recognition in the EU.

How accurate is Rekognition compared to open-source models?

For well-lit, frontal face images, Rekognition achieves >99% face verification accuracy. For general object detection it is competitive with state-of-the-art models. The real advantage of Rekognition over running your own models (e.g., YOLO, FaceNet, ResNet) is the operational simplicity: no GPU management, no model versioning headaches, no scaling infrastructure, and built-in SLA. For domain-specific tasks where accuracy is critical, Custom Labels may still underperform specialised open-source models — benchmark on your own dataset.

Can Rekognition process images that contain people without explicit consent?

detect_labels, detect_faces, and detect_moderation_labels analyse the visual content of images you own. They do not identify named individuals (that requires recognize_celebrities or a face collection). Processing images for moderation or tagging purposes on your own platform is generally permissible under most privacy frameworks, but check local regulations — some jurisdictions restrict automated processing of images containing people even without identification.

What video formats does Rekognition support?

For S3-based video analysis, Rekognition supports H.264-encoded MP4 and MOV files up to 10 GB in size and up to 6 hours in length. For Kinesis Video Streams, the producer SDK encodes video as fragmented MP4 (H.264). Audio tracks are ignored. Maximum video resolution is 1920×1080. If your source video is in another format (H.265, AV1, MKV), transcode it with MediaConvert first.