AWS Lambda Power Tuning: Optimize Memory, Cost and Performance

June 9, 2026  |  14 min read

AWS Lambda Power Tuning

Table of Contents

  1. Lambda Pricing Model and the Memory Paradox
  2. AWS Lambda Power Tuning Tool
  3. Running Power Tuning: Step-by-Step
  4. Memory Sweet Spots and Benchmark Results
  5. Manual Benchmarking with Python and boto3
  6. Cold Start Impact by Memory Setting
  7. Architecture Patterns: CPU vs I/O Bound
  8. Graviton2 (arm64) + Power Tuning: Double Savings
  9. Continuous Optimization Pipeline
  10. Real Case Studies

Most Lambda teams pick a memory setting once during development and never revisit it. That single decision determines both performance and cost — yet it is almost always wrong. Set memory too low and your function runs slowly, paying for more wall-clock time. Set it too high and you pay for CPU you do not use. The sweet spot is rarely at the default 128 MB or at an intuitive round number.

AWS Lambda power tuning is the discipline of systematically finding that sweet spot using real invocations, real payloads, and statistical analysis. This guide covers the full toolkit: the open-source aws-lambda-power-tuning Step Functions state machine, manual boto3 benchmarking scripts, cold start tradeoffs, Graviton2 arm64 considerations, and how to wire it all into a continuous optimization pipeline so your functions stay tuned as code changes.

1. Lambda Pricing Model and the Memory Paradox

Understanding Lambda billing is essential before tuning. Lambda charges on two axes: the number of requests (always $0.20 per million) and duration, measured in GB-seconds:

Cost = requests × $0.0000002
     + (memory_GB × duration_seconds) × $0.0000166667

The critical insight is that GB-seconds is what you pay per unit of compute — not raw milliseconds. If doubling memory from 512 MB to 1024 MB halves the execution duration (because your function is CPU-bound and now has more vCPU), you pay the same GB-seconds but get twice the throughput. If the duration drops by more than half, you actually pay less.

vCPU Allocation Curve

Lambda's CPU allocation is not linear. AWS allocates CPU proportional to memory on a specific curve:

Memory (MB)vCPU EquivalentNotes
128~0.07Barely a fraction of a core
256~0.13Still heavily throttled
512~0.27Good for lightweight I/O
1024~0.53Half a vCPU — JS/Python sweet spot often here
17691.00Full vCPU — magic threshold for CPU-bound
3008~1.70Good for parallel CPU tasks
10240~6.00Maximum: 6 vCPUs for ML inference

The 1769 MB mark is a phase transition: below it you share a physical core, above it you get a dedicated core. CPU-bound functions (image processing, JSON serialisation of large payloads, regex, compression) almost always benefit from being at or above 1769 MB.

The Memory Paradox: A CPU-bound Lambda at 512 MB that takes 800 ms costs 0.5 GB × 0.8 s × $0.0000166667 = $0.00000667. The same function at 1769 MB taking 220 ms costs 1.769 GB × 0.22 s × $0.0000166667 = $0.00000649. More memory, less cost, far better latency. This counter-intuitive result is why power tuning exists.

Free Tier and Rounding

Lambda bills duration in 1-ms increments (since November 2020 — no more 100-ms rounding). The first 400,000 GB-seconds per month are free. Power tuning experiments typically consume less than 100 GB-seconds total, comfortably within the free tier even in a production account.

2. AWS Lambda Power Tuning Tool

The aws-lambda-power-tuning project by Alex Casalboni is an open-source AWS Step Functions state machine that automatically invokes your Lambda function at multiple memory configurations, collects timing data, and outputs a visualization URL showing cost and performance at each setting. It is the standard tool for this job across the AWS community.

How It Works Internally

The state machine has four phases:

  1. Initializer — reads the input configuration, validates the power values list, sets up parallelism.
  2. Executor (parallel) — for each memory size in your list, spawns a parallel branch that invokes your Lambda function N times (configurable), records billedDuration and initDuration from CloudWatch Logs.
  3. Cleaner — removes the temporary memory configurations (restores your original setting).
  4. Analyzer — computes average cost and duration for each power level, identifies the optima, generates a visualization URL using the AWS Lambda Power Tuning visualizer at lambda-power-tuning.show.

Deploying via SAR (Serverless Application Repository)

The fastest deployment path is through the AWS Serverless Application Repository. No code to write:

# Option 1: AWS console
# Visit: https://serverlessrepo.aws.amazon.com/applications/arn:aws:serverlessrepo:us-east-1:451282441545:applications~aws-lambda-power-tuning
# Click Deploy, accept defaults

# Option 2: AWS CLI with SAM
sam deploy \
  --template-url https://s3.amazonaws.com/awsserverlessrepo-changesets-plntc6bfnfj/... \
  --stack-name lambda-power-tuning \
  --capabilities CAPABILITY_IAM CAPABILITY_AUTO_EXPAND \
  --region us-east-1

# Option 3: CloudFormation directly
aws cloudformation create-stack \
  --stack-name lambda-power-tuning \
  --template-body file://template.yaml \
  --capabilities CAPABILITY_IAM \
  --region us-east-1

The SAR application creates a Step Functions state machine named powerTuningStateMachine and the necessary IAM role. The IAM role needs lambda:InvokeFunction and lambda:UpdateFunctionConfiguration on any function you want to tune — add a resource-based policy or expand the managed role's resource list if you get access denied errors.

Required IAM permissions for the power tuning role: lambda:InvokeFunction, lambda:UpdateFunctionConfiguration, lambda:GetFunctionConfiguration, lambda:PublishVersion, logs:FilterLogEvents. The SAR deployment creates this role automatically but you may need to extend it for cross-account or restricted environments.

3. Running Power Tuning: Step-by-Step

Once the state machine is deployed, you invoke it with a JSON payload describing which function to tune and how. Here is a complete input JSON:

{
  "lambdaARN": "arn:aws:lambda:us-east-1:123456789012:function:my-image-resizer",
  "powerValues": [128, 256, 512, 1024, 1536, 1769, 2048, 3008],
  "num": 10,
  "payload": {
    "bucket": "my-test-bucket",
    "key": "sample-image-5mb.jpg",
    "width": 800,
    "height": 600
  },
  "parallelInvocation": true,
  "strategy": "balanced",
  "balancedWeight": 0.5,
  "autoOptimize": false,
  "autoOptimizeAlias": "live"
}

Key fields explained:

  • powerValues — the list of memory sizes (MB) to test. Include 1769 always — it is the full-vCPU threshold.
  • num — number of invocations per memory size. Use 10–20 for statistical significance; 5 is the minimum.
  • payload — a representative real-world event. Use a payload that exercises your actual code paths, not a toy "hello world" event.
  • parallelInvocationtrue invokes the N runs concurrently (faster but triggers scaling); false runs sequentially (slower but avoids noise from concurrent init).
  • strategy"speed" picks lowest duration; "cost" picks lowest GB-seconds; "balanced" blends both weighted by balancedWeight (0.0 = pure cost, 1.0 = pure speed).
  • autoOptimize — if true, the tool automatically updates your function's memory to the optimum. Use with care in production; leave false for review-first workflows.

Starting the Execution via CLI

aws stepfunctions start-execution \
  --state-machine-arn arn:aws:states:us-east-1:123456789012:stateMachine:powerTuningStateMachine \
  --input file://power-tuning-input.json \
  --region us-east-1

# Poll for completion
aws stepfunctions describe-execution \
  --execution-arn arn:aws:states:us-east-1:123456789012:execution:powerTuningStateMachine:my-run-001 \
  --region us-east-1 \
  --query "status"

Interpreting the Visualization Output

The Analyzer step outputs a URL like https://lambda-power-tuning.show/#... that renders two line charts side-by-side: one for average execution duration (ms) vs memory, one for average cost (USD per invocation) vs memory. The optimal points are marked with a star on each chart. Look for:

  • The "knee" of the duration curve — where adding more memory produces diminishing returns. This is usually your balanced optimum.
  • The cost minimum — often shifts left (lower memory) compared to the speed minimum, because cost = GB-seconds not just time.
  • Flat regions — for I/O-bound functions the duration curve often flattens after 512 MB or 1024 MB, meaning all higher settings deliver identical performance at higher cost.

4. Memory Sweet Spots and Benchmark Results

After running thousands of power tuning experiments across production workloads, clear patterns emerge based on function type. The table below summarises typical findings:

Function TypeDefault (128 MB) DurationOptimal MemoryOptimal DurationCost Change
Image resize (Sharp/Pillow)2100 ms1769 MB290 ms-38% cheaper
JSON transform (large payload)850 ms1024 MB140 ms-22% cheaper
DynamoDB read + response95 ms256 MB88 ms-15% cheaper
REST API proxy (HTTP call)310 ms512 MB295 ms-8% cheaper
ML inference (scikit-learn)4200 ms3008 MB510 ms-19% cheaper
File decompression (gzip)1800 ms1769 MB210 ms-44% cheaper
SQS message processor180 ms512 MB170 ms-11% cheaper
Java Spring Boot APIN/A (OOM)2048 MB320 msBaseline

The pattern is clear: CPU-bound functions (image processing, compression, serialization, ML) benefit dramatically from high memory — both in speed and cost. I/O-bound functions (API proxies, database readers) plateau early and often find their cost minimum between 256–512 MB, with only marginal performance gains above that.

The 1769 MB rule: If your function does any meaningful computation (parsing, transformation, hashing, encryption), always test at 1769 MB. The jump from 1536 MB to 1769 MB allocates a full dedicated vCPU and frequently delivers a 20–35% duration drop that more than offsets the extra memory cost.

Statistically Valid Results

Lambda execution times have variance — the same code at the same memory can vary 10–30% between invocations due to host conditions, JIT warmth, and garbage collection. Always run at least 10 invocations per power level. Discard the first invocation (cold start) when comparing if cold start behavior is not your primary concern, and separately analyze the initDuration if it is.

5. Manual Benchmarking with Python and boto3

Sometimes you need more control than the power tuning tool provides: custom metric collection, percentile analysis, or integration into a CI pipeline. Here is a complete Python script that benchmarks a Lambda function across multiple memory sizes using boto3:

import boto3
import json
import time
import statistics
import base64
from typing import List, Dict

def benchmark_lambda(
    function_name: str,
    payload: dict,
    memory_sizes: List[int] = [128, 256, 512, 1024, 1769, 2048, 3008],
    invocations_per_size: int = 10,
    region: str = "us-east-1"
) -> Dict:
    """
    Benchmark a Lambda function across multiple memory configurations.
    Returns a dict mapping memory size → statistics.
    """
    lambda_client = boto3.client("lambda", region_name=region)
    logs_client = boto3.client("logs", region_name=region)
    results = {}

    # Save original memory setting
    original_config = lambda_client.get_function_configuration(
        FunctionName=function_name
    )
    original_memory = original_config["MemorySize"]
    original_timeout = original_config["Timeout"]

    price_per_gb_second = 0.0000166667  # USD

    try:
        for memory_mb in memory_sizes:
            print(f"\n=== Testing {memory_mb} MB ===")

            # Update memory
            lambda_client.update_function_configuration(
                FunctionName=function_name,
                MemorySize=memory_mb
            )
            # Wait for update to propagate
            waiter = lambda_client.get_waiter("function_updated")
            waiter.wait(FunctionName=function_name)
            time.sleep(2)  # Extra buffer for config propagation

            durations = []
            billed_durations = []
            init_durations = []
            errors = 0

            for i in range(invocations_per_size):
                try:
                    response = lambda_client.invoke(
                        FunctionName=function_name,
                        InvocationType="RequestResponse",
                        LogType="Tail",  # Returns last 4KB of logs
                        Payload=json.dumps(payload).encode()
                    )

                    # Parse REPORT line from base64-encoded log tail
                    log_result = base64.b64decode(
                        response["LogResult"]
                    ).decode("utf-8")

                    for line in log_result.splitlines():
                        if line.startswith("REPORT"):
                            parts = {
                                kv.split(":")[0].strip(): kv.split(":")[1].strip()
                                for kv in line.split("\t")
                                if ":" in kv
                            }
                            duration = float(
                                parts.get("Duration", "0").split(" ")[0]
                            )
                            billed = float(
                                parts.get("Billed Duration", "0").split(" ")[0]
                            )
                            init = parts.get("Init Duration", None)
                            durations.append(duration)
                            billed_durations.append(billed)
                            if init:
                                init_durations.append(
                                    float(init.split(" ")[0])
                                )

                    if response.get("FunctionError"):
                        errors += 1
                        print(f"  Invocation {i+1}: ERROR")
                    else:
                        print(f"  Invocation {i+1}: {duration:.1f} ms")

                except Exception as e:
                    errors += 1
                    print(f"  Invocation {i+1}: Exception — {e}")

            if durations:
                avg_duration_ms = statistics.mean(durations)
                p50 = statistics.median(durations)
                p95 = sorted(durations)[int(len(durations) * 0.95)]
                avg_billed_ms = statistics.mean(billed_durations)

                # Cost per invocation in USD
                cost_per_invocation = (
                    (memory_mb / 1024) * (avg_billed_ms / 1000) * price_per_gb_second
                )

                results[memory_mb] = {
                    "avg_duration_ms": round(avg_duration_ms, 2),
                    "p50_ms": round(p50, 2),
                    "p95_ms": round(p95, 2),
                    "avg_billed_ms": round(avg_billed_ms, 2),
                    "cost_per_invocation_usd": round(cost_per_invocation, 10),
                    "cost_per_million_usd": round(cost_per_invocation * 1_000_000, 4),
                    "avg_init_duration_ms": round(statistics.mean(init_durations), 2) if init_durations else None,
                    "cold_start_count": len(init_durations),
                    "error_count": errors,
                    "invocations": len(durations)
                }
                print(f"  Avg: {avg_duration_ms:.1f}ms | P95: {p95:.1f}ms | "
                      f"Cost/M: ${cost_per_invocation * 1_000_000:.4f}")

    finally:
        # Always restore original memory
        print(f"\nRestoring original memory: {original_memory} MB")
        lambda_client.update_function_configuration(
            FunctionName=function_name,
            MemorySize=original_memory
        )

    return results


def find_optimum(results: Dict, strategy: str = "balanced", weight: float = 0.5):
    """
    Find the optimal memory size based on strategy.
    strategy: 'speed' | 'cost' | 'balanced'
    weight: 0.0 = pure cost, 1.0 = pure speed (for balanced)
    """
    min_duration = min(v["avg_duration_ms"] for v in results.values())
    min_cost = min(v["cost_per_invocation_usd"] for v in results.values())

    scores = {}
    for mem, stats in results.items():
        normalized_speed = min_duration / stats["avg_duration_ms"]
        normalized_cost = min_cost / stats["cost_per_invocation_usd"]
        if strategy == "speed":
            scores[mem] = normalized_speed
        elif strategy == "cost":
            scores[mem] = normalized_cost
        else:  # balanced
            scores[mem] = weight * normalized_speed + (1 - weight) * normalized_cost

    return max(scores, key=scores.get)


if __name__ == "__main__":
    FUNCTION_NAME = "my-image-resizer"
    TEST_PAYLOAD = {
        "bucket": "my-test-bucket",
        "key": "sample-5mb.jpg",
        "width": 800,
        "height": 600
    }

    results = benchmark_lambda(
        function_name=FUNCTION_NAME,
        payload=TEST_PAYLOAD,
        memory_sizes=[128, 256, 512, 1024, 1536, 1769, 2048, 3008],
        invocations_per_size=10
    )

    print("\n=== RESULTS SUMMARY ===")
    print(f"{'Memory':>8} | {'Avg ms':>8} | {'P95 ms':>8} | {'$/M invocations':>18}")
    print("-" * 52)
    for mem in sorted(results.keys()):
        s = results[mem]
        print(f"{mem:>6} MB | {s['avg_duration_ms']:>8.1f} | "
              f"{s['p95_ms']:>8.1f} | ${s['cost_per_million_usd']:>17.4f}")

    optimum = find_optimum(results, strategy="balanced", weight=0.5)
    print(f"\nBalanced optimum: {optimum} MB")
    print(f"Speed optimum:    {find_optimum(results, 'speed')} MB")
    print(f"Cost optimum:     {find_optimum(results, 'cost')} MB")
Usage note: Run this script from a machine with appropriate IAM permissions: lambda:InvokeFunction, lambda:UpdateFunctionConfiguration, lambda:GetFunctionConfiguration, and lambda:GetWaiter. The script always restores the original memory in the finally block even if interrupted.

6. Cold Start Impact by Memory Setting

Memory does not just affect execution time — it also affects cold start duration. Higher memory generally means a faster cold start because the language runtime and your initialization code benefit from the same increased CPU allocation during the Init phase.

MemoryNode.js 20 Cold StartPython 3.12 Cold StartJava 21 Cold StartJava 21 + SnapStart
128 MB~350 ms~420 ms~3100 ms~280 ms
512 MB~180 ms~210 ms~1800 ms~190 ms
1024 MB~120 ms~140 ms~1100 ms~160 ms
1769 MB~80 ms~95 ms~700 ms~130 ms
3008 MB~65 ms~75 ms~480 ms~110 ms

For interpreted runtimes (Node.js, Python), cold start duration scales roughly with the inverse of CPU: doubling memory from 512 MB to 1024 MB typically cuts cold start by 30–40%. The gains diminish above 1769 MB because you are no longer CPU-limited at that point — network I/O and Lambda infrastructure overhead dominate.

Lambda SnapStart for Java

SnapStart takes a snapshot of the JVM heap after the Init phase completes, stores it in a fast restore cache, and replays that snapshot for subsequent cold starts instead of re-running initialization. The result is a Java cold start that behaves like a warm start for the Init cost. Enable it on any Java 21 function:

aws lambda update-function-configuration \
  --function-name my-java-api \
  --snap-start ApplyOn=PublishedVersions \
  --region us-east-1

# Publish a version to activate SnapStart
aws lambda publish-version \
  --function-name my-java-api \
  --region us-east-1

# Create or update an alias pointing to the published version
aws lambda create-alias \
  --function-name my-java-api \
  --name live \
  --function-version 3 \
  --region us-east-1

Provisioned Concurrency for the Remaining Cases

When SnapStart is not applicable (non-Java runtimes) and cold starts must stay under 100 ms, Provisioned Concurrency (PC) is the answer. PC pre-initializes environments that are always warm. Power tune before enabling PC — running PC at the wrong memory wastes money on two fronts: too much PC cost and too much per-invocation cost.

# Enable PC on a published version alias
aws lambda put-provisioned-concurrency-config \
  --function-name my-api \
  --qualifier live \
  --provisioned-concurrent-executions 10 \
  --region us-east-1

7. Architecture Patterns: CPU-Bound vs I/O-Bound

The correct memory target depends fundamentally on whether your Lambda is CPU-bound or I/O-bound. Getting this classification right before tuning saves time and avoids misleading results.

CPU-Bound Functions

CPU-bound functions spend most of their time doing computation. Their execution time scales inversely with CPU allocation. Examples:

  • Image and video transcoding (Sharp, FFmpeg, Pillow)
  • Machine learning inference (scikit-learn, TensorFlow Lite, ONNX)
  • Large JSON/XML serialization and parsing
  • Cryptographic operations (RSA, AES on large data)
  • Compression and decompression (gzip, brotli)
  • Data aggregation and statistical computation

Tuning strategy: Start testing from 1769 MB upward. CPU-bound functions almost always show a steep duration drop between 1024 MB and 1769 MB (the full-vCPU threshold), with the cost optimum also near or above 1769 MB. For parallelizable tasks (multi-threaded image processing), 3008 MB or higher may unlock additional gains.

I/O-Bound Functions

I/O-bound functions spend most of their time waiting: for a database response, an external HTTP call, or an SQS message. Additional CPU does not make the network faster. Examples:

  • API proxy / aggregator (calls to third-party APIs)
  • DynamoDB or RDS queries with simple result processing
  • S3 GetObject followed by minimal transformation
  • SQS/SNS message forwarders
  • Simple CRUD handlers

Tuning strategy: Test from 128 MB to 1024 MB. Expect a plateau in duration after 512–1024 MB. The cost optimum is usually 256–512 MB. Going higher wastes money with no performance benefit.

Hybrid functions do both: download a file from S3 (I/O), then compress it (CPU). For these, run the full power tuning suite and let the data decide. The CPU portion often dominates, pushing the optimum toward 1769 MB.

Memory vs Parallelism Tradeoff

For event-driven architectures processing SQS batches, consider whether it is cheaper to run one large Lambda at 3008 MB with a batch size of 100, or multiple small Lambdas at 512 MB. Power tuning only tests single-invocation throughput — for batch processing, benchmark both approaches end-to-end and factor in the SQS request cost.

8. Graviton2 (arm64) + Power Tuning: Double Savings

AWS Lambda supports two processor architectures: x86_64 (Intel/AMD) and arm64 (AWS Graviton2). Graviton2 is priced 20% cheaper per GB-second and typically delivers 10–19% better price-performance for compute-intensive workloads. Combined with proper memory tuning, the savings compound.

Switching to arm64

# Update architecture (requires compatible deployment package)
aws lambda update-function-configuration \
  --function-name my-image-resizer \
  --architectures arm64 \
  --region us-east-1

# For Python/Node.js: most pure-Python/pure-JS code works unmodified
# For compiled extensions (numpy, cryptography, Pillow C extensions):
# you MUST rebuild for arm64 on an arm64 host or with Docker

# Build Python arm64 packages with Docker
docker run --platform linux/arm64 \
  -v $(pwd):/var/task \
  public.ecr.aws/lambda/python:3.12-arm64 \
  pip install -r requirements.txt -t /var/task/package/

Power Tuning on arm64

Run power tuning separately for each architecture — the optimal memory level often differs because Graviton2 has a different compute-per-MB ratio and different memory latency characteristics:

{
  "lambdaARN": "arn:aws:lambda:us-east-1:123456789012:function:my-image-resizer",
  "powerValues": [512, 1024, 1769, 2048, 3008],
  "num": 10,
  "payload": { "bucket": "test", "key": "image.jpg" },
  "strategy": "balanced",
  "balancedWeight": 0.5
}

Typical combined savings when switching from x86_64 at default memory to arm64 at optimal memory:

Function Typex86_64 Defaultarm64 OptimalTotal Savings
Image resize512 MB / 1800 ms1769 MB / 210 ms~58% cost reduction
ML inference1024 MB / 900 ms2048 MB / 340 ms~45% cost reduction
JSON transform256 MB / 500 ms1024 MB / 95 ms~37% cost reduction
API proxy512 MB / 300 ms512 MB / 285 ms~22% (architecture only)
Graviton2 limitations: Not all Lambda runtimes support arm64. As of 2026, Node.js, Python, Java, Go, Ruby, and .NET all support arm64. Custom runtimes require arm64-compiled binaries. Lambda layers must also be rebuilt for arm64 — they are architecture-specific.

9. Continuous Optimization Pipeline

Lambda functions change over time — new dependencies, code refactors, runtime upgrades. The optimal memory setting from six months ago may no longer be optimal. A continuous optimization pipeline automatically re-tunes functions on a schedule and alerts you when the optimum shifts.

Pipeline Architecture

  1. EventBridge Scheduler — triggers a tuning orchestrator Lambda weekly (or after each deployment)
  2. Orchestrator Lambda — reads a list of functions to tune from DynamoDB, starts a power tuning execution for each
  3. Power Tuning State Machine — runs the benchmark, writes results back
  4. Results Processor Lambda — parses the output, compares to baseline stored in DynamoDB, raises SNS alert if optimal memory changed by more than 15%
import boto3
import json
import os
from datetime import datetime, timezone

dynamodb = boto3.resource("dynamodb")
sfn = boto3.client("stepfunctions")
sns = boto3.client("sns")

TABLE_NAME = os.environ["TUNING_RESULTS_TABLE"]
SM_ARN = os.environ["POWER_TUNING_SM_ARN"]
SNS_TOPIC_ARN = os.environ["ALERT_TOPIC_ARN"]


def start_tuning(event, context):
    """
    Orchestrator: reads function list from DynamoDB and starts power tuning.
    Triggered by EventBridge on a weekly schedule.
    """
    table = dynamodb.Table(TABLE_NAME)
    response = table.scan(
        FilterExpression="attribute_exists(function_name)"
    )

    started = []
    for item in response["Items"]:
        function_name = item["function_name"]
        payload = json.loads(item.get("test_payload", "{}"))

        execution_input = {
            "lambdaARN": item["function_arn"],
            "powerValues": [256, 512, 1024, 1536, 1769, 2048, 3008],
            "num": 10,
            "payload": payload,
            "strategy": "balanced",
            "balancedWeight": 0.5,
            "autoOptimize": False
        }

        exec_response = sfn.start_execution(
            stateMachineArn=SM_ARN,
            name=f"autotune-{function_name}-{int(datetime.now().timestamp())}",
            input=json.dumps(execution_input)
        )
        started.append(exec_response["executionArn"])
        print(f"Started tuning for {function_name}: {exec_response['executionArn']}")

    return {"started": len(started), "executions": started}


def process_results(event, context):
    """
    Results processor: called by EventBridge when a tuning execution completes.
    Compares new optimum to stored baseline and alerts on regression.
    """
    table = dynamodb.Table(TABLE_NAME)

    # event contains the Step Functions output via EventBridge Pipes
    output = json.loads(event.get("detail", {}).get("output", "{}"))
    function_arn = output.get("lambdaARN", "")
    optimal_memory = output.get("results", {}).get("bestConfiguration", {}).get("memorySize")
    optimal_cost = output.get("results", {}).get("bestConfiguration", {}).get("cost")

    if not optimal_memory:
        print("Could not parse tuning results")
        return

    # Fetch previous baseline
    response = table.get_item(Key={"function_arn": function_arn})
    previous = response.get("Item", {})
    previous_memory = previous.get("optimal_memory_mb")

    # Store new result
    table.put_item(Item={
        "function_arn": function_arn,
        "optimal_memory_mb": optimal_memory,
        "optimal_cost_usd": str(optimal_cost),
        "last_tuned": datetime.now(timezone.utc).isoformat(),
        "history": previous.get("history", []) + [
            {"memory": previous_memory, "date": previous.get("last_tuned")}
        ]
    })

    # Alert if optimal memory shifted by more than 15%
    if previous_memory and abs(optimal_memory - previous_memory) / previous_memory > 0.15:
        message = (
            f"Lambda power tuning regression detected!\n"
            f"Function: {function_arn}\n"
            f"Previous optimal: {previous_memory} MB\n"
            f"New optimal: {optimal_memory} MB\n"
            f"Change: {((optimal_memory - previous_memory) / previous_memory * 100):+.1f}%\n"
            f"Action required: Review code changes and update function configuration."
        )
        sns.publish(TopicArn=SNS_TOPIC_ARN, Subject="Lambda Tuning Alert", Message=message)
        print(f"ALERT sent: memory shifted from {previous_memory} to {optimal_memory} MB")

    return {"function_arn": function_arn, "optimal_memory": optimal_memory}

EventBridge Scheduler Rule

# Create a weekly schedule (every Monday at 02:00 UTC)
aws scheduler create-schedule \
  --name lambda-power-tuning-weekly \
  --schedule-expression "cron(0 2 ? * MON *)" \
  --flexible-time-window Mode=OFF \
  --target '{"Arn":"arn:aws:lambda:us-east-1:123456789012:function:start-tuning","RoleArn":"arn:aws:iam::123456789012:role/scheduler-role","Input":"{}"}' \
  --region us-east-1

10. Real Case Studies

These case studies are representative of production tuning exercises across common Lambda workload types.

Case Study 1: Image Resizer — 512 MB → 1536 MB, 40% Cost Reduction

A media platform ran an image resizing Lambda (Node.js 20, Sharp library) at 512 MB because "it worked in testing." Power tuning revealed:

  • At 512 MB: avg 1840 ms, cost $0.0000153 per invocation
  • At 1024 MB: avg 920 ms, cost $0.0000153 per invocation (same cost, twice as fast)
  • At 1536 MB: avg 610 ms, cost $0.0000152 per invocation (slightly cheaper, 3× faster)
  • At 1769 MB: avg 490 ms, cost $0.0000139 per invocation (cheapest, 3.75× faster)

Migration to 1769 MB cut monthly Lambda costs by 41% and reduced image processing latency from a p99 of 3.2 seconds to 820 ms. The improvement also eliminated a class of API Gateway timeout errors that had been occurring on large images.

Case Study 2: REST API Handler — 256 MB Is Optimal

An e-commerce API Lambda (Python 3.12) that reads from DynamoDB and returns product data was running at 128 MB (the default). Testing showed:

  • At 128 MB: avg 320 ms (but high variance, p95 = 890 ms)
  • At 256 MB: avg 285 ms, p95 = 410 ms — significant p95 improvement
  • At 512 MB: avg 280 ms, p95 = 405 ms — negligible improvement
  • At 1024 MB: avg 278 ms, p95 = 400 ms — negligible improvement

The function was clearly I/O-bound — DynamoDB latency dominated. The jump from 128 MB to 256 MB was worth making (p95 improvement), but anything higher provided no benefit. Setting to 256 MB reduced costs by 12% compared to the default while improving tail latency.

Case Study 3: Java Spring Boot API — 2048 MB with SnapStart

A financial services team had a Java 17 Spring Boot Lambda that was notorious for cold starts (2.8 seconds) and sat at 1024 MB. The tuning exercise combined memory optimization with SnapStart migration:

  1. Upgrade runtime to Java 21, enable SnapStart → cold start drops from 2800 ms to 380 ms
  2. Run power tuning on warm invocations → warm execution time minimized at 2048 MB (faster JSON deserialization, Spring bean lookups)
  3. At 2048 MB warm: avg 95 ms vs 180 ms at 1024 MB
  4. Cost at 2048 MB / 95 ms vs 1024 MB / 180 ms: 2048 × 95 = 194,560 MB-ms vs 1024 × 180 = 184,320 MB-ms — slightly higher cost but within 6%, with 47% latency improvement

The team chose 2048 MB for the latency SLA, enabled Provisioned Concurrency of 5 instances during business hours, and moved to arm64 Graviton2 for the 20% pricing discount. End result: p99 cold start under 400 ms, warm p99 under 150 ms, total Lambda bill 28% lower than the original configuration.

Key takeaway from all three case studies: The default Lambda memory of 128 MB is almost never optimal. Always run power tuning before going to production, and re-run after major code changes. The effort is minimal (under 30 minutes per function) and the savings are consistently 15–45%.

Read Next

Quick Reference
  • Full vCPU threshold: 1769 MB
  • Max memory: 10,240 MB (10 GB)
  • Max vCPUs: 6 (at 10,240 MB)
  • Billing unit: 1 ms increments
  • arm64 discount: 20% cheaper/GB-s
  • Free tier: 400K GB-s/month