AWS Lambda Power Tuning: Optimize Memory, Cost and Performance
June 9, 2026 | 14 min read
Table of Contents
- Lambda Pricing Model and the Memory Paradox
- AWS Lambda Power Tuning Tool
- Running Power Tuning: Step-by-Step
- Memory Sweet Spots and Benchmark Results
- Manual Benchmarking with Python and boto3
- Cold Start Impact by Memory Setting
- Architecture Patterns: CPU vs I/O Bound
- Graviton2 (arm64) + Power Tuning: Double Savings
- Continuous Optimization Pipeline
- Real Case Studies
Most Lambda teams pick a memory setting once during development and never revisit it. That single decision determines both performance and cost — yet it is almost always wrong. Set memory too low and your function runs slowly, paying for more wall-clock time. Set it too high and you pay for CPU you do not use. The sweet spot is rarely at the default 128 MB or at an intuitive round number.
AWS Lambda power tuning is the discipline of systematically finding that sweet spot using real invocations, real payloads, and statistical analysis. This guide covers the full toolkit: the open-source aws-lambda-power-tuning Step Functions state machine, manual boto3 benchmarking scripts, cold start tradeoffs, Graviton2 arm64 considerations, and how to wire it all into a continuous optimization pipeline so your functions stay tuned as code changes.
1. Lambda Pricing Model and the Memory Paradox
Understanding Lambda billing is essential before tuning. Lambda charges on two axes: the number of requests (always $0.20 per million) and duration, measured in GB-seconds:
Cost = requests × $0.0000002
+ (memory_GB × duration_seconds) × $0.0000166667
The critical insight is that GB-seconds is what you pay per unit of compute — not raw milliseconds. If doubling memory from 512 MB to 1024 MB halves the execution duration (because your function is CPU-bound and now has more vCPU), you pay the same GB-seconds but get twice the throughput. If the duration drops by more than half, you actually pay less.
vCPU Allocation Curve
Lambda's CPU allocation is not linear. AWS allocates CPU proportional to memory on a specific curve:
| Memory (MB) | vCPU Equivalent | Notes |
|---|---|---|
| 128 | ~0.07 | Barely a fraction of a core |
| 256 | ~0.13 | Still heavily throttled |
| 512 | ~0.27 | Good for lightweight I/O |
| 1024 | ~0.53 | Half a vCPU — JS/Python sweet spot often here |
| 1769 | 1.00 | Full vCPU — magic threshold for CPU-bound |
| 3008 | ~1.70 | Good for parallel CPU tasks |
| 10240 | ~6.00 | Maximum: 6 vCPUs for ML inference |
The 1769 MB mark is a phase transition: below it you share a physical core, above it you get a dedicated core. CPU-bound functions (image processing, JSON serialisation of large payloads, regex, compression) almost always benefit from being at or above 1769 MB.
0.5 GB × 0.8 s × $0.0000166667 = $0.00000667. The same function at 1769 MB taking 220 ms costs 1.769 GB × 0.22 s × $0.0000166667 = $0.00000649. More memory, less cost, far better latency. This counter-intuitive result is why power tuning exists.
Free Tier and Rounding
Lambda bills duration in 1-ms increments (since November 2020 — no more 100-ms rounding). The first 400,000 GB-seconds per month are free. Power tuning experiments typically consume less than 100 GB-seconds total, comfortably within the free tier even in a production account.
2. AWS Lambda Power Tuning Tool
The aws-lambda-power-tuning project by Alex Casalboni is an open-source AWS Step Functions state machine that automatically invokes your Lambda function at multiple memory configurations, collects timing data, and outputs a visualization URL showing cost and performance at each setting. It is the standard tool for this job across the AWS community.
How It Works Internally
The state machine has four phases:
- Initializer — reads the input configuration, validates the power values list, sets up parallelism.
- Executor (parallel) — for each memory size in your list, spawns a parallel branch that invokes your Lambda function N times (configurable), records
billedDurationandinitDurationfrom CloudWatch Logs. - Cleaner — removes the temporary memory configurations (restores your original setting).
- Analyzer — computes average cost and duration for each power level, identifies the optima, generates a visualization URL using the AWS Lambda Power Tuning visualizer at
lambda-power-tuning.show.
Deploying via SAR (Serverless Application Repository)
The fastest deployment path is through the AWS Serverless Application Repository. No code to write:
# Option 1: AWS console
# Visit: https://serverlessrepo.aws.amazon.com/applications/arn:aws:serverlessrepo:us-east-1:451282441545:applications~aws-lambda-power-tuning
# Click Deploy, accept defaults
# Option 2: AWS CLI with SAM
sam deploy \
--template-url https://s3.amazonaws.com/awsserverlessrepo-changesets-plntc6bfnfj/... \
--stack-name lambda-power-tuning \
--capabilities CAPABILITY_IAM CAPABILITY_AUTO_EXPAND \
--region us-east-1
# Option 3: CloudFormation directly
aws cloudformation create-stack \
--stack-name lambda-power-tuning \
--template-body file://template.yaml \
--capabilities CAPABILITY_IAM \
--region us-east-1
The SAR application creates a Step Functions state machine named powerTuningStateMachine and the necessary IAM role. The IAM role needs lambda:InvokeFunction and lambda:UpdateFunctionConfiguration on any function you want to tune — add a resource-based policy or expand the managed role's resource list if you get access denied errors.
lambda:InvokeFunction, lambda:UpdateFunctionConfiguration, lambda:GetFunctionConfiguration, lambda:PublishVersion, logs:FilterLogEvents. The SAR deployment creates this role automatically but you may need to extend it for cross-account or restricted environments.
3. Running Power Tuning: Step-by-Step
Once the state machine is deployed, you invoke it with a JSON payload describing which function to tune and how. Here is a complete input JSON:
{
"lambdaARN": "arn:aws:lambda:us-east-1:123456789012:function:my-image-resizer",
"powerValues": [128, 256, 512, 1024, 1536, 1769, 2048, 3008],
"num": 10,
"payload": {
"bucket": "my-test-bucket",
"key": "sample-image-5mb.jpg",
"width": 800,
"height": 600
},
"parallelInvocation": true,
"strategy": "balanced",
"balancedWeight": 0.5,
"autoOptimize": false,
"autoOptimizeAlias": "live"
}
Key fields explained:
- powerValues — the list of memory sizes (MB) to test. Include 1769 always — it is the full-vCPU threshold.
- num — number of invocations per memory size. Use 10–20 for statistical significance; 5 is the minimum.
- payload — a representative real-world event. Use a payload that exercises your actual code paths, not a toy "hello world" event.
- parallelInvocation —
trueinvokes the N runs concurrently (faster but triggers scaling);falseruns sequentially (slower but avoids noise from concurrent init). - strategy —
"speed"picks lowest duration;"cost"picks lowest GB-seconds;"balanced"blends both weighted bybalancedWeight(0.0 = pure cost, 1.0 = pure speed). - autoOptimize — if
true, the tool automatically updates your function's memory to the optimum. Use with care in production; leavefalsefor review-first workflows.
Starting the Execution via CLI
aws stepfunctions start-execution \
--state-machine-arn arn:aws:states:us-east-1:123456789012:stateMachine:powerTuningStateMachine \
--input file://power-tuning-input.json \
--region us-east-1
# Poll for completion
aws stepfunctions describe-execution \
--execution-arn arn:aws:states:us-east-1:123456789012:execution:powerTuningStateMachine:my-run-001 \
--region us-east-1 \
--query "status"
Interpreting the Visualization Output
The Analyzer step outputs a URL like https://lambda-power-tuning.show/#... that renders two line charts side-by-side: one for average execution duration (ms) vs memory, one for average cost (USD per invocation) vs memory. The optimal points are marked with a star on each chart. Look for:
- The "knee" of the duration curve — where adding more memory produces diminishing returns. This is usually your balanced optimum.
- The cost minimum — often shifts left (lower memory) compared to the speed minimum, because cost = GB-seconds not just time.
- Flat regions — for I/O-bound functions the duration curve often flattens after 512 MB or 1024 MB, meaning all higher settings deliver identical performance at higher cost.
4. Memory Sweet Spots and Benchmark Results
After running thousands of power tuning experiments across production workloads, clear patterns emerge based on function type. The table below summarises typical findings:
| Function Type | Default (128 MB) Duration | Optimal Memory | Optimal Duration | Cost Change |
|---|---|---|---|---|
| Image resize (Sharp/Pillow) | 2100 ms | 1769 MB | 290 ms | -38% cheaper |
| JSON transform (large payload) | 850 ms | 1024 MB | 140 ms | -22% cheaper |
| DynamoDB read + response | 95 ms | 256 MB | 88 ms | -15% cheaper |
| REST API proxy (HTTP call) | 310 ms | 512 MB | 295 ms | -8% cheaper |
| ML inference (scikit-learn) | 4200 ms | 3008 MB | 510 ms | -19% cheaper |
| File decompression (gzip) | 1800 ms | 1769 MB | 210 ms | -44% cheaper |
| SQS message processor | 180 ms | 512 MB | 170 ms | -11% cheaper |
| Java Spring Boot API | N/A (OOM) | 2048 MB | 320 ms | Baseline |
The pattern is clear: CPU-bound functions (image processing, compression, serialization, ML) benefit dramatically from high memory — both in speed and cost. I/O-bound functions (API proxies, database readers) plateau early and often find their cost minimum between 256–512 MB, with only marginal performance gains above that.
Statistically Valid Results
Lambda execution times have variance — the same code at the same memory can vary 10–30% between invocations due to host conditions, JIT warmth, and garbage collection. Always run at least 10 invocations per power level. Discard the first invocation (cold start) when comparing if cold start behavior is not your primary concern, and separately analyze the initDuration if it is.
5. Manual Benchmarking with Python and boto3
Sometimes you need more control than the power tuning tool provides: custom metric collection, percentile analysis, or integration into a CI pipeline. Here is a complete Python script that benchmarks a Lambda function across multiple memory sizes using boto3:
import boto3
import json
import time
import statistics
import base64
from typing import List, Dict
def benchmark_lambda(
function_name: str,
payload: dict,
memory_sizes: List[int] = [128, 256, 512, 1024, 1769, 2048, 3008],
invocations_per_size: int = 10,
region: str = "us-east-1"
) -> Dict:
"""
Benchmark a Lambda function across multiple memory configurations.
Returns a dict mapping memory size → statistics.
"""
lambda_client = boto3.client("lambda", region_name=region)
logs_client = boto3.client("logs", region_name=region)
results = {}
# Save original memory setting
original_config = lambda_client.get_function_configuration(
FunctionName=function_name
)
original_memory = original_config["MemorySize"]
original_timeout = original_config["Timeout"]
price_per_gb_second = 0.0000166667 # USD
try:
for memory_mb in memory_sizes:
print(f"\n=== Testing {memory_mb} MB ===")
# Update memory
lambda_client.update_function_configuration(
FunctionName=function_name,
MemorySize=memory_mb
)
# Wait for update to propagate
waiter = lambda_client.get_waiter("function_updated")
waiter.wait(FunctionName=function_name)
time.sleep(2) # Extra buffer for config propagation
durations = []
billed_durations = []
init_durations = []
errors = 0
for i in range(invocations_per_size):
try:
response = lambda_client.invoke(
FunctionName=function_name,
InvocationType="RequestResponse",
LogType="Tail", # Returns last 4KB of logs
Payload=json.dumps(payload).encode()
)
# Parse REPORT line from base64-encoded log tail
log_result = base64.b64decode(
response["LogResult"]
).decode("utf-8")
for line in log_result.splitlines():
if line.startswith("REPORT"):
parts = {
kv.split(":")[0].strip(): kv.split(":")[1].strip()
for kv in line.split("\t")
if ":" in kv
}
duration = float(
parts.get("Duration", "0").split(" ")[0]
)
billed = float(
parts.get("Billed Duration", "0").split(" ")[0]
)
init = parts.get("Init Duration", None)
durations.append(duration)
billed_durations.append(billed)
if init:
init_durations.append(
float(init.split(" ")[0])
)
if response.get("FunctionError"):
errors += 1
print(f" Invocation {i+1}: ERROR")
else:
print(f" Invocation {i+1}: {duration:.1f} ms")
except Exception as e:
errors += 1
print(f" Invocation {i+1}: Exception — {e}")
if durations:
avg_duration_ms = statistics.mean(durations)
p50 = statistics.median(durations)
p95 = sorted(durations)[int(len(durations) * 0.95)]
avg_billed_ms = statistics.mean(billed_durations)
# Cost per invocation in USD
cost_per_invocation = (
(memory_mb / 1024) * (avg_billed_ms / 1000) * price_per_gb_second
)
results[memory_mb] = {
"avg_duration_ms": round(avg_duration_ms, 2),
"p50_ms": round(p50, 2),
"p95_ms": round(p95, 2),
"avg_billed_ms": round(avg_billed_ms, 2),
"cost_per_invocation_usd": round(cost_per_invocation, 10),
"cost_per_million_usd": round(cost_per_invocation * 1_000_000, 4),
"avg_init_duration_ms": round(statistics.mean(init_durations), 2) if init_durations else None,
"cold_start_count": len(init_durations),
"error_count": errors,
"invocations": len(durations)
}
print(f" Avg: {avg_duration_ms:.1f}ms | P95: {p95:.1f}ms | "
f"Cost/M: ${cost_per_invocation * 1_000_000:.4f}")
finally:
# Always restore original memory
print(f"\nRestoring original memory: {original_memory} MB")
lambda_client.update_function_configuration(
FunctionName=function_name,
MemorySize=original_memory
)
return results
def find_optimum(results: Dict, strategy: str = "balanced", weight: float = 0.5):
"""
Find the optimal memory size based on strategy.
strategy: 'speed' | 'cost' | 'balanced'
weight: 0.0 = pure cost, 1.0 = pure speed (for balanced)
"""
min_duration = min(v["avg_duration_ms"] for v in results.values())
min_cost = min(v["cost_per_invocation_usd"] for v in results.values())
scores = {}
for mem, stats in results.items():
normalized_speed = min_duration / stats["avg_duration_ms"]
normalized_cost = min_cost / stats["cost_per_invocation_usd"]
if strategy == "speed":
scores[mem] = normalized_speed
elif strategy == "cost":
scores[mem] = normalized_cost
else: # balanced
scores[mem] = weight * normalized_speed + (1 - weight) * normalized_cost
return max(scores, key=scores.get)
if __name__ == "__main__":
FUNCTION_NAME = "my-image-resizer"
TEST_PAYLOAD = {
"bucket": "my-test-bucket",
"key": "sample-5mb.jpg",
"width": 800,
"height": 600
}
results = benchmark_lambda(
function_name=FUNCTION_NAME,
payload=TEST_PAYLOAD,
memory_sizes=[128, 256, 512, 1024, 1536, 1769, 2048, 3008],
invocations_per_size=10
)
print("\n=== RESULTS SUMMARY ===")
print(f"{'Memory':>8} | {'Avg ms':>8} | {'P95 ms':>8} | {'$/M invocations':>18}")
print("-" * 52)
for mem in sorted(results.keys()):
s = results[mem]
print(f"{mem:>6} MB | {s['avg_duration_ms']:>8.1f} | "
f"{s['p95_ms']:>8.1f} | ${s['cost_per_million_usd']:>17.4f}")
optimum = find_optimum(results, strategy="balanced", weight=0.5)
print(f"\nBalanced optimum: {optimum} MB")
print(f"Speed optimum: {find_optimum(results, 'speed')} MB")
print(f"Cost optimum: {find_optimum(results, 'cost')} MB")
lambda:InvokeFunction, lambda:UpdateFunctionConfiguration, lambda:GetFunctionConfiguration, and lambda:GetWaiter. The script always restores the original memory in the finally block even if interrupted.
6. Cold Start Impact by Memory Setting
Memory does not just affect execution time — it also affects cold start duration. Higher memory generally means a faster cold start because the language runtime and your initialization code benefit from the same increased CPU allocation during the Init phase.
| Memory | Node.js 20 Cold Start | Python 3.12 Cold Start | Java 21 Cold Start | Java 21 + SnapStart |
|---|---|---|---|---|
| 128 MB | ~350 ms | ~420 ms | ~3100 ms | ~280 ms |
| 512 MB | ~180 ms | ~210 ms | ~1800 ms | ~190 ms |
| 1024 MB | ~120 ms | ~140 ms | ~1100 ms | ~160 ms |
| 1769 MB | ~80 ms | ~95 ms | ~700 ms | ~130 ms |
| 3008 MB | ~65 ms | ~75 ms | ~480 ms | ~110 ms |
For interpreted runtimes (Node.js, Python), cold start duration scales roughly with the inverse of CPU: doubling memory from 512 MB to 1024 MB typically cuts cold start by 30–40%. The gains diminish above 1769 MB because you are no longer CPU-limited at that point — network I/O and Lambda infrastructure overhead dominate.
Lambda SnapStart for Java
SnapStart takes a snapshot of the JVM heap after the Init phase completes, stores it in a fast restore cache, and replays that snapshot for subsequent cold starts instead of re-running initialization. The result is a Java cold start that behaves like a warm start for the Init cost. Enable it on any Java 21 function:
aws lambda update-function-configuration \
--function-name my-java-api \
--snap-start ApplyOn=PublishedVersions \
--region us-east-1
# Publish a version to activate SnapStart
aws lambda publish-version \
--function-name my-java-api \
--region us-east-1
# Create or update an alias pointing to the published version
aws lambda create-alias \
--function-name my-java-api \
--name live \
--function-version 3 \
--region us-east-1
Provisioned Concurrency for the Remaining Cases
When SnapStart is not applicable (non-Java runtimes) and cold starts must stay under 100 ms, Provisioned Concurrency (PC) is the answer. PC pre-initializes environments that are always warm. Power tune before enabling PC — running PC at the wrong memory wastes money on two fronts: too much PC cost and too much per-invocation cost.
# Enable PC on a published version alias
aws lambda put-provisioned-concurrency-config \
--function-name my-api \
--qualifier live \
--provisioned-concurrent-executions 10 \
--region us-east-1
7. Architecture Patterns: CPU-Bound vs I/O-Bound
The correct memory target depends fundamentally on whether your Lambda is CPU-bound or I/O-bound. Getting this classification right before tuning saves time and avoids misleading results.
CPU-Bound Functions
CPU-bound functions spend most of their time doing computation. Their execution time scales inversely with CPU allocation. Examples:
- Image and video transcoding (Sharp, FFmpeg, Pillow)
- Machine learning inference (scikit-learn, TensorFlow Lite, ONNX)
- Large JSON/XML serialization and parsing
- Cryptographic operations (RSA, AES on large data)
- Compression and decompression (gzip, brotli)
- Data aggregation and statistical computation
Tuning strategy: Start testing from 1769 MB upward. CPU-bound functions almost always show a steep duration drop between 1024 MB and 1769 MB (the full-vCPU threshold), with the cost optimum also near or above 1769 MB. For parallelizable tasks (multi-threaded image processing), 3008 MB or higher may unlock additional gains.
I/O-Bound Functions
I/O-bound functions spend most of their time waiting: for a database response, an external HTTP call, or an SQS message. Additional CPU does not make the network faster. Examples:
- API proxy / aggregator (calls to third-party APIs)
- DynamoDB or RDS queries with simple result processing
- S3 GetObject followed by minimal transformation
- SQS/SNS message forwarders
- Simple CRUD handlers
Tuning strategy: Test from 128 MB to 1024 MB. Expect a plateau in duration after 512–1024 MB. The cost optimum is usually 256–512 MB. Going higher wastes money with no performance benefit.
Memory vs Parallelism Tradeoff
For event-driven architectures processing SQS batches, consider whether it is cheaper to run one large Lambda at 3008 MB with a batch size of 100, or multiple small Lambdas at 512 MB. Power tuning only tests single-invocation throughput — for batch processing, benchmark both approaches end-to-end and factor in the SQS request cost.
8. Graviton2 (arm64) + Power Tuning: Double Savings
AWS Lambda supports two processor architectures: x86_64 (Intel/AMD) and arm64 (AWS Graviton2). Graviton2 is priced 20% cheaper per GB-second and typically delivers 10–19% better price-performance for compute-intensive workloads. Combined with proper memory tuning, the savings compound.
Switching to arm64
# Update architecture (requires compatible deployment package)
aws lambda update-function-configuration \
--function-name my-image-resizer \
--architectures arm64 \
--region us-east-1
# For Python/Node.js: most pure-Python/pure-JS code works unmodified
# For compiled extensions (numpy, cryptography, Pillow C extensions):
# you MUST rebuild for arm64 on an arm64 host or with Docker
# Build Python arm64 packages with Docker
docker run --platform linux/arm64 \
-v $(pwd):/var/task \
public.ecr.aws/lambda/python:3.12-arm64 \
pip install -r requirements.txt -t /var/task/package/
Power Tuning on arm64
Run power tuning separately for each architecture — the optimal memory level often differs because Graviton2 has a different compute-per-MB ratio and different memory latency characteristics:
{
"lambdaARN": "arn:aws:lambda:us-east-1:123456789012:function:my-image-resizer",
"powerValues": [512, 1024, 1769, 2048, 3008],
"num": 10,
"payload": { "bucket": "test", "key": "image.jpg" },
"strategy": "balanced",
"balancedWeight": 0.5
}
Typical combined savings when switching from x86_64 at default memory to arm64 at optimal memory:
| Function Type | x86_64 Default | arm64 Optimal | Total Savings |
|---|---|---|---|
| Image resize | 512 MB / 1800 ms | 1769 MB / 210 ms | ~58% cost reduction |
| ML inference | 1024 MB / 900 ms | 2048 MB / 340 ms | ~45% cost reduction |
| JSON transform | 256 MB / 500 ms | 1024 MB / 95 ms | ~37% cost reduction |
| API proxy | 512 MB / 300 ms | 512 MB / 285 ms | ~22% (architecture only) |
9. Continuous Optimization Pipeline
Lambda functions change over time — new dependencies, code refactors, runtime upgrades. The optimal memory setting from six months ago may no longer be optimal. A continuous optimization pipeline automatically re-tunes functions on a schedule and alerts you when the optimum shifts.
Pipeline Architecture
- EventBridge Scheduler — triggers a tuning orchestrator Lambda weekly (or after each deployment)
- Orchestrator Lambda — reads a list of functions to tune from DynamoDB, starts a power tuning execution for each
- Power Tuning State Machine — runs the benchmark, writes results back
- Results Processor Lambda — parses the output, compares to baseline stored in DynamoDB, raises SNS alert if optimal memory changed by more than 15%
import boto3
import json
import os
from datetime import datetime, timezone
dynamodb = boto3.resource("dynamodb")
sfn = boto3.client("stepfunctions")
sns = boto3.client("sns")
TABLE_NAME = os.environ["TUNING_RESULTS_TABLE"]
SM_ARN = os.environ["POWER_TUNING_SM_ARN"]
SNS_TOPIC_ARN = os.environ["ALERT_TOPIC_ARN"]
def start_tuning(event, context):
"""
Orchestrator: reads function list from DynamoDB and starts power tuning.
Triggered by EventBridge on a weekly schedule.
"""
table = dynamodb.Table(TABLE_NAME)
response = table.scan(
FilterExpression="attribute_exists(function_name)"
)
started = []
for item in response["Items"]:
function_name = item["function_name"]
payload = json.loads(item.get("test_payload", "{}"))
execution_input = {
"lambdaARN": item["function_arn"],
"powerValues": [256, 512, 1024, 1536, 1769, 2048, 3008],
"num": 10,
"payload": payload,
"strategy": "balanced",
"balancedWeight": 0.5,
"autoOptimize": False
}
exec_response = sfn.start_execution(
stateMachineArn=SM_ARN,
name=f"autotune-{function_name}-{int(datetime.now().timestamp())}",
input=json.dumps(execution_input)
)
started.append(exec_response["executionArn"])
print(f"Started tuning for {function_name}: {exec_response['executionArn']}")
return {"started": len(started), "executions": started}
def process_results(event, context):
"""
Results processor: called by EventBridge when a tuning execution completes.
Compares new optimum to stored baseline and alerts on regression.
"""
table = dynamodb.Table(TABLE_NAME)
# event contains the Step Functions output via EventBridge Pipes
output = json.loads(event.get("detail", {}).get("output", "{}"))
function_arn = output.get("lambdaARN", "")
optimal_memory = output.get("results", {}).get("bestConfiguration", {}).get("memorySize")
optimal_cost = output.get("results", {}).get("bestConfiguration", {}).get("cost")
if not optimal_memory:
print("Could not parse tuning results")
return
# Fetch previous baseline
response = table.get_item(Key={"function_arn": function_arn})
previous = response.get("Item", {})
previous_memory = previous.get("optimal_memory_mb")
# Store new result
table.put_item(Item={
"function_arn": function_arn,
"optimal_memory_mb": optimal_memory,
"optimal_cost_usd": str(optimal_cost),
"last_tuned": datetime.now(timezone.utc).isoformat(),
"history": previous.get("history", []) + [
{"memory": previous_memory, "date": previous.get("last_tuned")}
]
})
# Alert if optimal memory shifted by more than 15%
if previous_memory and abs(optimal_memory - previous_memory) / previous_memory > 0.15:
message = (
f"Lambda power tuning regression detected!\n"
f"Function: {function_arn}\n"
f"Previous optimal: {previous_memory} MB\n"
f"New optimal: {optimal_memory} MB\n"
f"Change: {((optimal_memory - previous_memory) / previous_memory * 100):+.1f}%\n"
f"Action required: Review code changes and update function configuration."
)
sns.publish(TopicArn=SNS_TOPIC_ARN, Subject="Lambda Tuning Alert", Message=message)
print(f"ALERT sent: memory shifted from {previous_memory} to {optimal_memory} MB")
return {"function_arn": function_arn, "optimal_memory": optimal_memory}
EventBridge Scheduler Rule
# Create a weekly schedule (every Monday at 02:00 UTC)
aws scheduler create-schedule \
--name lambda-power-tuning-weekly \
--schedule-expression "cron(0 2 ? * MON *)" \
--flexible-time-window Mode=OFF \
--target '{"Arn":"arn:aws:lambda:us-east-1:123456789012:function:start-tuning","RoleArn":"arn:aws:iam::123456789012:role/scheduler-role","Input":"{}"}' \
--region us-east-1
10. Real Case Studies
These case studies are representative of production tuning exercises across common Lambda workload types.
Case Study 1: Image Resizer — 512 MB → 1536 MB, 40% Cost Reduction
A media platform ran an image resizing Lambda (Node.js 20, Sharp library) at 512 MB because "it worked in testing." Power tuning revealed:
- At 512 MB: avg 1840 ms, cost $0.0000153 per invocation
- At 1024 MB: avg 920 ms, cost $0.0000153 per invocation (same cost, twice as fast)
- At 1536 MB: avg 610 ms, cost $0.0000152 per invocation (slightly cheaper, 3× faster)
- At 1769 MB: avg 490 ms, cost $0.0000139 per invocation (cheapest, 3.75× faster)
Migration to 1769 MB cut monthly Lambda costs by 41% and reduced image processing latency from a p99 of 3.2 seconds to 820 ms. The improvement also eliminated a class of API Gateway timeout errors that had been occurring on large images.
Case Study 2: REST API Handler — 256 MB Is Optimal
An e-commerce API Lambda (Python 3.12) that reads from DynamoDB and returns product data was running at 128 MB (the default). Testing showed:
- At 128 MB: avg 320 ms (but high variance, p95 = 890 ms)
- At 256 MB: avg 285 ms, p95 = 410 ms — significant p95 improvement
- At 512 MB: avg 280 ms, p95 = 405 ms — negligible improvement
- At 1024 MB: avg 278 ms, p95 = 400 ms — negligible improvement
The function was clearly I/O-bound — DynamoDB latency dominated. The jump from 128 MB to 256 MB was worth making (p95 improvement), but anything higher provided no benefit. Setting to 256 MB reduced costs by 12% compared to the default while improving tail latency.
Case Study 3: Java Spring Boot API — 2048 MB with SnapStart
A financial services team had a Java 17 Spring Boot Lambda that was notorious for cold starts (2.8 seconds) and sat at 1024 MB. The tuning exercise combined memory optimization with SnapStart migration:
- Upgrade runtime to Java 21, enable SnapStart → cold start drops from 2800 ms to 380 ms
- Run power tuning on warm invocations → warm execution time minimized at 2048 MB (faster JSON deserialization, Spring bean lookups)
- At 2048 MB warm: avg 95 ms vs 180 ms at 1024 MB
- Cost at 2048 MB / 95 ms vs 1024 MB / 180 ms: 2048 × 95 = 194,560 MB-ms vs 1024 × 180 = 184,320 MB-ms — slightly higher cost but within 6%, with 47% latency improvement
The team chose 2048 MB for the latency SLA, enabled Provisioned Concurrency of 5 instances during business hours, and moved to arm64 Graviton2 for the 20% pricing discount. End result: p99 cold start under 400 ms, warm p99 under 150 ms, total Lambda bill 28% lower than the original configuration.
Related Articles
Quick Reference
- Full vCPU threshold: 1769 MB
- Max memory: 10,240 MB (10 GB)
- Max vCPUs: 6 (at 10,240 MB)
- Billing unit: 1 ms increments
- arm64 discount: 20% cheaper/GB-s
- Free tier: 400K GB-s/month