AWS Lambda Cold Starts: SnapStart and Performance Tuning

June 6, 2026 | 12 min read

Cold starts are the most talked-about performance problem in serverless computing. When an AWS Lambda function has not been invoked recently — or when AWS needs to scale out additional instances — it must bootstrap a new execution environment from scratch. That bootstrapping time adds latency the end user feels directly.

This guide covers everything you need to eliminate or dramatically reduce cold starts in 2026: how they work internally, runtime-by-runtime benchmarks, Lambda SnapStart for Java, provisioned concurrency, and code-level optimizations. By the end you will have a repeatable strategy for keeping p99 latency under 200 ms even for Java workloads that historically suffered 3-second cold starts.

What Is a Cold Start and Why Does It Matter?

Every Lambda invocation runs inside a Firecracker microVM — a lightweight virtual machine managed entirely by AWS. The lifecycle of an execution environment has three phases:

Init — AWS provisions the microVM, downloads your deployment package, starts the language runtime, and runs your initialization code (everything outside the handler function).
Invoke — AWS calls your handler function with the event payload.
Shutdown — after a period of inactivity (typically 5–15 minutes) AWS freezes or terminates the environment.

A cold start happens during the first Invoke that requires a new environment — specifically, the entire Init phase is on the critical path. A warm start reuses an already-running environment and skips Init entirely.

Latency Breakdown of a Cold Start

A cold start has three measurable components:

Network / provisioning latency (~5–50 ms) — Firecracker boots, network interface attaches, EFS mounts resolve (if used).
Runtime init latency (~10 ms for Go up to ~800 ms for JVM) — the language runtime itself starts, JIT compiler warms, class loader runs.
Handler init latency — your static initializers, SDK client constructors, Spring context load, or any framework DI wiring.

Key insight: AWS does not charge for the Init phase duration — it is included free in the billed duration of the first invocation. However, the user waiting for the HTTP response absolutely experiences that latency. For synchronous API Gateway integrations this is the difference between a snappy API and a broken one.

Cold Start Duration by Runtime (2026 Benchmarks)

Runtime choice is the single biggest lever on cold start time. The table below shows typical Init phase durations for a minimal function with one AWS SDK client initialized at startup, on a 512 MB Lambda:

Runtime	Typical Cold Start	Worst Case (heavy deps)	Notes
Go 1.x (custom runtime)	~10–30 ms	~80 ms	Compiled binary, no GC overhead at start
Node.js 20.x	~50–150 ms	~400 ms	V8 startup is fast; large node_modules hurts
Python 3.12	~80–200 ms	~600 ms	NumPy/Pandas imports add 300–500 ms each
Ruby 3.3	~200–400 ms	~800 ms	Gem loading is the bottleneck
Java 21 (standard)	~800 ms–1.5 s	~3 s	JVM startup + class loading; Spring worst case
Java 21 + SnapStart	~100–300 ms	~500 ms	Snapshot restore replaces JVM init
.NET 8	~300–600 ms	~1.2 s	CLR startup; NativeAOT cuts this significantly

Note: These numbers assume a zip deployment and default VPC-less configuration. Lambda functions inside a VPC add an extra 500 ms–1 s for ENI attachment — though AWS has improved this significantly since 2019 with Hyperplane ENIs. VPC cold starts in 2026 are typically only ~100–200 ms extra.

Lambda SnapStart for Java: How It Works

SnapStart, introduced in 2022 and expanded to all Java runtimes through 2024, is AWS's most impactful solution for Java cold starts. Instead of initializing the JVM from scratch on every cold start, SnapStart:

Runs your full Init phase once at deployment time.
Takes a memory snapshot (a Firecracker microVM snapshot) of the fully-initialized environment.
Stores the snapshot in S3 (encrypted, managed by AWS).
On cold start, restores from the snapshot — much faster than re-running JVM startup + class loading.

The result: what was a 1.5-second cold start becomes a 150–300 ms snapshot restore. For Spring Boot functions that previously took 3 seconds, SnapStart typically brings this under 500 ms.

Enabling SnapStart

SnapStart is enabled per function version (not on $LATEST). You must publish a version to use it:

# Enable SnapStart when creating or updating a function
aws lambda create-function \
  --function-name my-java-api \
  --runtime java21 \
  --role arn:aws:iam::123456789012:role/lambda-exec-role \
  --handler com.example.Handler::handleRequest \
  --zip-file fileb://function.zip \
  --snap-start ApplyOn=PublishedVersions \
  --memory-size 512

# Publish a version (required — SnapStart doesn't apply to $LATEST)
aws lambda publish-version \
  --function-name my-java-api

Java SnapStart Handler with CRaC Hooks

SnapStart implements the CRaC (Coordinated Restore at Checkpoint) API. This lets you register callbacks that run before the snapshot is taken (to close connections, release file handles) and after restore (to re-establish connections). Without these hooks, restored environments may hold stale database connections or expired tokens.

package com.example;

import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;
import org.crac.Core;
import org.crac.Resource;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.dynamodb.DynamoDbClient;
import software.amazon.awssdk.services.s3.S3Client;

import java.util.Map;

public class Handler implements RequestHandler<Map<String, String>, String>, Resource {

    // Initialize SDK clients at class-load time (outside handler)
    // SnapStart will snapshot these already-constructed clients
    private static DynamoDbClient dynamoDb;
    private static S3Client s3;

    static {
        dynamoDb = DynamoDbClient.builder()
            .region(Region.US_EAST_1)
            .build();
        s3 = S3Client.builder()
            .region(Region.US_EAST_1)
            .build();
    }

    public Handler() {
        // Register this instance with the CRaC Core so our hooks are called
        Core.getGlobalContext().register(this);
    }

    /**
     * Called BEFORE the snapshot is taken.
     * Close connections, flush buffers, release OS resources.
     */
    @Override
    public void beforeCheckpoint(org.crac.Context<? extends Resource> context) throws Exception {
        System.out.println("SnapStart: beforeCheckpoint — closing connections");
        // Close connection pools that cannot survive a snapshot/restore cycle
        if (dynamoDb != null) {
            dynamoDb.close();
            dynamoDb = null;
        }
        if (s3 != null) {
            s3.close();
            s3 = null;
        }
    }

    /**
     * Called AFTER restore from snapshot, before the handler runs.
     * Re-establish connections, refresh credentials, re-seed random.
     */
    @Override
    public void afterRestore(org.crac.Context<? extends Resource> context) throws Exception {
        System.out.println("SnapStart: afterRestore — re-initializing clients");
        // Rebuild clients — credentials are automatically refreshed by the SDK
        dynamoDb = DynamoDbClient.builder()
            .region(Region.US_EAST_1)
            .build();
        s3 = S3Client.builder()
            .region(Region.US_EAST_1)
            .build();
        // Important: re-seed SecureRandom — the snapshot may have captured
        // the same entropy state, creating predictable random values
        new java.security.SecureRandom().nextBytes(new byte[8]);
    }

    @Override
    public String handleRequest(Map<String, String> event, Context context) {
        String key = event.getOrDefault("key", "hello");
        // Use the already-initialized (or restored) client
        var response = dynamoDb.getItem(r -> r
            .tableName("my-table")
            .key(Map.of("pk", software.amazon.awssdk.services.dynamodb.model.AttributeValue.fromS(key)))
        );
        return response.hasItem() ? response.item().toString() : "not found";
    }
}

CRaC security note: SnapStart encrypts snapshots with a KMS key managed by AWS. You can optionally supply your own CMK. Sensitive values (plaintext secrets) that are in memory at checkpoint time will be in the snapshot — always fetch secrets via SSM Parameter Store or Secrets Manager at runtime, not during static init.

SAM Template: Enabling SnapStart

If you deploy with AWS SAM, enabling SnapStart and auto-publishing a version looks like this:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Globals:
  Function:
    Timeout: 30
    MemorySize: 512
    Runtime: java21
    Architectures:
      - x86_64

Resources:
  MyJavaApiFunction:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: my-java-api
      Handler: com.example.Handler::handleRequest
      CodeUri: target/function.jar
      # SnapStart: snapshot the Init phase at publish time
      SnapStart:
        ApplyOn: PublishedVersions
      AutoPublishAlias: live      # SAM auto-publishes a version and creates an alias
      Environment:
        Variables:
          TABLE_NAME: my-table
          AWS_REGION: us-east-1
      Policies:
        - DynamoDBCrudPolicy:
            TableName: my-table
      Events:
        ApiEvent:
          Type: Api
          Properties:
            Path: /items/{key}
            Method: get
            # Point the API stage at the alias, not $LATEST
            RestApiId: !Ref MyApi

  MyApi:
    Type: AWS::Serverless::Api
    Properties:
      StageName: prod

Provisioned Concurrency: When to Use It

Provisioned Concurrency (PC) is the other major tool against cold starts. Unlike SnapStart (which speeds up cold starts), PC eliminates them entirely by keeping a fixed number of execution environments pre-initialized and ready to handle requests instantly.

How It Works

When you configure PC on a function version or alias, Lambda:

Immediately initializes N execution environments (runs your Init phase).
Keeps those environments warm permanently — even with no traffic.
Routes invocations to warm environments first; spills to on-demand (cold) only when PC is exhausted.

Cost of Provisioned Concurrency

PC has a separate, always-on cost in addition to standard invocation costs:

Provisioned Concurrency GB-seconds: ~$0.0000046/GB-second (roughly 3× more than on-demand for the idle time).
Example: 10 PC instances at 512 MB = 5 GB kept warm. At 100% idle for a month: ~$60/month just for warmth.
This makes PC cost-effective only for latency-sensitive workloads with predictable traffic, not background jobs.

# Enable Provisioned Concurrency on a published version
aws lambda put-provisioned-concurrency-config \
  --function-name my-java-api \
  --qualifier 3 \
  --provisioned-concurrent-executions 10

# Or on an alias (preferred — aliases can be updated without changing PC config)
aws lambda put-provisioned-concurrency-config \
  --function-name my-java-api \
  --qualifier live \
  --provisioned-concurrent-executions 10

# Check status — wait for "READY" before traffic
aws lambda get-provisioned-concurrency-config \
  --function-name my-java-api \
  --qualifier live

Scheduled Scaling with Application Auto Scaling

For workloads with predictable daily patterns (e.g., business-hours API traffic), use Application Auto Scaling to scale PC up before the peak and back down overnight:

# Register the Lambda alias as a scalable target
aws application-autoscaling register-scalable-target \
  --service-namespace lambda \
  --resource-id function:my-java-api:live \
  --scalable-dimension lambda:function:ProvisionedConcurrency \
  --min-capacity 2 \
  --max-capacity 50

# Scale UP at 08:00 IST (02:30 UTC) on weekdays
aws application-autoscaling put-scheduled-action \
  --service-namespace lambda \
  --resource-id function:my-java-api:live \
  --scalable-dimension lambda:function:ProvisionedConcurrency \
  --scheduled-action-name scale-up-morning \
  --schedule "cron(30 2 ? * MON-FRI *)" \
  --scalable-target-action MinCapacity=10,MaxCapacity=50

# Scale DOWN at 20:00 IST (14:30 UTC) on weekdays
aws application-autoscaling put-scheduled-action \
  --service-namespace lambda \
  --resource-id function:my-java-api:live \
  --scalable-dimension lambda:function:ProvisionedConcurrency \
  --scheduled-action-name scale-down-evening \
  --schedule "cron(30 14 ? * MON-FRI *)" \
  --scalable-target-action MinCapacity=2,MaxCapacity=10

SnapStart + Provisioned Concurrency: You can combine both. SnapStart reduces the time to initialize each PC instance during the warmup phase, which makes your PC instances become READY faster after a deployment.

Code Optimization Techniques

Even with SnapStart, reducing your Init phase duration is worthwhile — less initialization means faster snapshots, faster restores, and lower memory pressure.

1. Move SDK Clients Outside the Handler

This is the single most impactful code change for all runtimes. SDK clients are expensive to construct (HTTP connection pools, credential chain resolution, endpoint discovery). Constructing them once at module/class load time means warm invocations reuse the same client:

import boto3
import os

# BAD: client constructed on every invocation
def handler_bad(event, context):
    s3 = boto3.client('s3')           # ~20-40ms overhead every call
    dynamodb = boto3.client('dynamodb')
    # ... use clients

# GOOD: clients constructed once during Init phase
# Lambda keeps these alive between warm invocations
_s3 = None
_dynamodb = None

def _get_s3():
    """Lazy initialization — safe for SnapStart."""
    global _s3
    if _s3 is None:
        _s3 = boto3.client('s3', region_name=os.environ['AWS_REGION'])
    return _s3

def _get_dynamodb():
    global _dynamodb
    if _dynamodb is None:
        _dynamodb = boto3.resource(
            'dynamodb',
            region_name=os.environ['AWS_REGION']
        )
    return _dynamodb

def handler(event, context):
    table = _get_dynamodb().Table(os.environ['TABLE_NAME'])
    key = event.get('key', 'default')
    response = table.get_item(Key={'pk': key})
    item = response.get('Item')
    if not item:
        return {'statusCode': 404, 'body': 'Not found'}
    return {
        'statusCode': 200,
        'body': str(item)
    }

2. Reduce Deployment Package Size

Lambda downloads your package on every cold start. Smaller packages = faster download = faster Init.

Python: use --no-deps to exclude transitive deps you don't need; strip .pyc files and tests from third-party packages; use Lambda layers for heavy libraries (Pandas, NumPy).
Node.js: use esbuild or webpack to tree-shake and bundle; avoid shipping node_modules directly.
Java: prefer uber-jar with only needed dependencies; consider GraalVM Native Image (currently in preview for Lambda) for drastically smaller, faster-starting binaries.
Target: under 5 MB for Node.js/Python, under 20 MB for Java. Package size above 50 MB noticeably degrades cold start in our benchmarks.

3. Use Lambda Layers for Shared Dependencies

Lambda layers are cached on the execution host after the first invocation. Subsequent cold starts on the same host skip re-downloading the layer. This is especially effective for large ML libraries (PyTorch, scikit-learn):

# Create a layer with numpy and pandas
pip install numpy pandas -t python/
zip -r layer.zip python/
aws lambda publish-layer-version \
  --layer-name data-science-libs \
  --zip-file fileb://layer.zip \
  --compatible-runtimes python3.12

# Attach to function
aws lambda update-function-configuration \
  --function-name my-ml-function \
  --layers arn:aws:lambda:us-east-1:123456789012:layer:data-science-libs:1

4. Lazy Initialization for Optional Resources

Not every execution path needs every resource. Use lazy initialization for clients that are only needed in some code paths. The Python example above shows the pattern with module-level globals and a getter function. This avoids paying Init cost for resources you might not use.

5. Container Image Cold Starts: Tradeoffs

Lambda supports container images up to 10 GB. Container images have a slightly longer first-time cold start than zip deployments because AWS must pull the image layers. However, after the image is cached on the execution host, subsequent cold starts are comparable to zip. Key points:

Use multi-stage Docker builds to minimize the final image size — start from public.ecr.aws/lambda/python:3.12 not from a full Ubuntu image.
Put frequently-changing layers (your app code) last in the Dockerfile so Docker layer caching maximizes reuse.
Lambda pre-caches images from ECR in the same region — always push to ECR in the same region as your function.
Images larger than 1 GB can add 1–3 seconds to cold start if the host cache is cold. Zip deployments are generally faster for cold-start-sensitive functions.

Measuring Cold Starts with CloudWatch Logs Insights

Before optimizing, measure. Lambda emits an Init Duration field in the REPORT log line for every cold start. CloudWatch Logs Insights can aggregate this across thousands of invocations:

-- CloudWatch Logs Insights query
-- Run against /aws/lambda/my-function log group
-- Time range: last 24 hours

fields @timestamp, @duration, @initDuration, @memorySize, @maxMemoryUsed
| filter @type = "REPORT"
| filter ispresent(@initDuration)   -- only cold starts have initDuration
| stats
    count(*) as coldStarts,
    avg(@initDuration) as avgInitMs,
    pct(@initDuration, 50) as p50InitMs,
    pct(@initDuration, 95) as p95InitMs,
    pct(@initDuration, 99) as p99InitMs,
    max(@initDuration) as maxInitMs,
    avg(@duration) as avgHandlerMs
  by bin(1h) as hour
| sort hour desc
| limit 48

Run this query and note the p99 initDuration before and after applying SnapStart or Provisioned Concurrency. A successful SnapStart optimization typically shows p99 dropping from 1500+ ms to under 400 ms.

To also see the ratio of cold starts to total invocations (cold start rate):

fields @timestamp, @type, @initDuration
| filter @type = "REPORT"
| stats
    count(*) as totalInvocations,
    sum(ispresent(@initDuration) == 1) as coldStarts,
    (sum(ispresent(@initDuration) == 1) / count(*)) * 100 as coldStartPct
  by bin(1h)
| sort by bin(1h) desc

Tip: High cold start rate at specific times (e.g., 9 AM, right after deployments) confirms you need Provisioned Concurrency or scheduled scaling. A uniformly high rate throughout the day suggests the function runs infrequently and warming with EventBridge ping is worth considering.

Architecture Patterns: Avoiding Synchronous Cold Starts

The most effective technique is sometimes architectural: avoid the patterns that expose cold starts to end users.

Synchronous vs. Asynchronous Invocation

Cold starts hurt most in synchronous invocations (API Gateway → Lambda → user waiting). They matter much less in asynchronous invocations where the caller doesn't wait:

SQS → Lambda: SQS buffers messages. A cold start of 2 seconds delays processing by 2 seconds — unnoticeable for most queue consumers.
EventBridge → Lambda: Event-driven automation. Cold starts add latency to the pipeline but rarely impact user experience directly.
Step Functions → Lambda: Orchestration steps tolerate seconds of latency easily — Step Functions itself adds overhead.
S3 Event → Lambda: Object processing pipelines. Cold starts are a non-issue.

Only API Gateway (REST, HTTP, WebSocket) and Application Load Balancer integrations expose cold starts to waiting users. Focus your optimization effort on those.

Keep-Warm Pinging (Use Sparingly)

A common workaround is an EventBridge rule that pings the Lambda every 5 minutes to prevent the execution environment from being frozen. This works but has several drawbacks:

Only keeps one instance warm — the first real spike still causes cold starts on new instances.
Not free: 288 EventBridge rule invocations/day × your Lambda memory cost.
Does not help after a deployment, which forces all environments to reinitialize.

For anything beyond keeping a single instance warm, Provisioned Concurrency is the correct solution.

Function Per Latency Tier

If your application has mixed latency requirements, split into multiple functions:

Latency-sensitive path (user-facing API) → Java with SnapStart + Provisioned Concurrency
Background processing (async jobs, reports) → Python or Node.js on-demand, no PC needed

This avoids paying PC costs for functions that don't need it.

Summary: Cold Start Reduction Decision Tree

Measure first — Use the CloudWatch Logs Insights queries above to establish your baseline p50/p99 initDuration and cold start rate.
Quick wins first — Move SDK clients outside the handler, reduce package size, switch to a faster runtime if possible. These cost nothing.
Java? → Enable SnapStart. It is free (no extra cost) and reduces cold starts by 60–80% for most Java functions.
Need sub-100ms cold starts? → Add Provisioned Concurrency on the latency-sensitive alias. Use scheduled scaling to control costs.
Async workload? → Relax. Use SQS, EventBridge, or Step Functions and stop worrying about cold starts entirely.
Heavy Python/JS deps? → Move shared libraries to Lambda layers. Use esbuild/webpack for Node.js.

Applied together, these techniques bring even the heaviest Java Spring Boot Lambda from a 3-second cold start to under 300 ms — well within the threshold most users tolerate. The 2026 Lambda runtime improvements (Java 21 virtual threads, Python 3.12 interpreter speed) make serverless Java a genuinely competitive choice for latency-sensitive APIs, provided you instrument and optimize systematically.