AWS Interview Questions 2026 (Top 60 Q&As)

1

What is the difference between EC2 instance types: On-Demand, Reserved, Spot, and Savings Plans?Easy

▼

On-Demand — pay by the second, no commitment. Most expensive. Use for: unpredictable workloads, short-term experiments, dev/test.
Reserved Instances — 1 or 3 year commitment. Up to 72% cheaper than On-Demand. Use for: steady-state production workloads. Standard RI (least flexible) vs Convertible RI (can change instance family).
Spot Instances — bid on spare AWS capacity. Up to 90% cheaper but can be interrupted with 2-minute notice. Use for: fault-tolerant batch jobs, ML training, CI/CD workers, stateless web servers.
Savings Plans — commit to a $ spend/hour for 1 or 3 years. More flexible than RIs — applies across instance families, regions (Compute Savings Plans). Recommended over RIs for most new workloads.

Cost optimisation strategy: use Savings Plans for baseline, Spot for burst/batch, On-Demand as safety net.

2

What is the difference between vertical scaling and horizontal scaling on EC2?Easy

▼

Vertical scaling — resize the EC2 instance to a larger type (e.g. t3.medium → m5.xlarge). Requires a stop/start. Limited by the largest instance type. Single point of failure remains.
Horizontal scaling — add more instances. Requires the application to be stateless. Done automatically with Auto Scaling Groups (ASG). Paired with an Elastic Load Balancer for traffic distribution.

# Auto Scaling Group scales horizontally based on CPU:
aws autoscaling put-scaling-policy \
  --policy-name scale-out \
  --auto-scaling-group-name my-asg \
  --policy-type TargetTrackingScaling \
  --target-tracking-configuration \
    '{"PredefinedMetricSpecification":{"PredefinedMetricType":"ASGAverageCPUUtilization"},"TargetValue":70}'

3

What are EC2 placement groups and when do you use each type?Medium

▼

Cluster placement group — packs instances into a single AZ on the same underlying hardware. Lowest network latency (~10 Gbps between instances). Use for: HPC, distributed computing, tightly coupled tasks that need high throughput. Risk: all instances on one physical host — rack failure takes all.
Spread placement group — each instance on distinct underlying hardware (different racks). Maximum isolation. Use for: small groups of critical instances that must survive a hardware failure. Limit: 7 instances per AZ per group.
Partition placement group — divides instances into partitions, each on separate racks. Unlike spread, many instances per partition. Use for: large distributed systems (Hadoop, Cassandra, Kafka) where you want rack-level failure isolation at scale.

4

What is the difference between EBS, EFS, and S3? When do you use each?Easy

▼

EBS (Elastic Block Store) — block storage attached to a single EC2 instance (like a hard drive). Low latency, high IOPS. Use for: OS volumes, databases (RDS/Postgres/MySQL on EC2), applications needing block-level access.
EFS (Elastic File System) — managed NFS. Shared across multiple EC2 instances simultaneously. Scales automatically. More expensive than EBS per GB. Use for: shared content, CMS uploads, shared config, containers needing persistent shared storage.
S3 — object storage. Not mountable as a filesystem (use s3fs for limited cases). Virtually unlimited, cheapest at scale. Use for: backups, static assets, ML datasets, logs, data lake, static website hosting.

Rule: app data → EBS. Shared files → EFS. Unstructured/bulk → S3.

5

What are S3 storage classes and how do you optimise costs with lifecycle policies?Medium

▼

S3 Standard — frequently accessed, millisecond retrieval. Highest storage cost.
S3 Intelligent-Tiering — auto-moves objects between frequent/infrequent access tiers based on usage. No retrieval fee. Best for unpredictable access patterns.
S3 Standard-IA — infrequent access, cheaper storage, retrieval fee. Min 30-day storage charge.
S3 One Zone-IA — same as Standard-IA but single AZ (20% cheaper). Use for reproducible data only.
S3 Glacier Instant Retrieval — archive, millisecond retrieval. ~68% cheaper than Standard.
S3 Glacier Deep Archive — cheapest ($0.00099/GB). 12-hour retrieval. Use for compliance archives.

# Lifecycle policy: transition and expire objects automatically
{
  "Rules": [{
    "Transitions": [
      {"Days": 30,  "StorageClass": "STANDARD_IA"},
      {"Days": 90,  "StorageClass": "GLACIER"},
      {"Days": 365, "StorageClass": "DEEP_ARCHIVE"}
    ],
    "Expiration": {"Days": 2555}  // delete after 7 years
  }]
}

6

What is S3 versioning and how does it protect against accidental deletion?Easy

▼

S3 versioning keeps multiple versions of an object. Every upload creates a new version; deletes create a delete marker instead of removing data.

# Enable versioning:
aws s3api put-bucket-versioning \
  --bucket my-bucket \
  --versioning-configuration Status=Enabled

# List versions:
aws s3api list-object-versions --bucket my-bucket --prefix report.pdf

# Restore deleted object: delete the delete marker
aws s3api delete-object \
  --bucket my-bucket --key report.pdf \
  --version-id <delete-marker-version-id>

Combine with MFA Delete — requires MFA to delete object versions, protecting against both accidental and malicious deletion. Enable Object Lock (WORM) for compliance requirements where data must be unmodifiable for a defined retention period.

7

What is a VPC and explain its key components.Easy

▼

A VPC (Virtual Private Cloud) is a logically isolated network within AWS where you launch resources.

CIDR block — IP address range for the VPC (e.g. 10.0.0.0/16)
Subnets — subdivisions of the VPC CIDR, each in a single AZ. Public subnets (internet accessible) and private subnets (no direct internet).
Internet Gateway (IGW) — allows public subnets to communicate with the internet.
NAT Gateway — allows instances in private subnets to make outbound internet calls (software updates, API calls) without being directly reachable from the internet.
Route Tables — rules that determine where network traffic is directed.
Security Groups — stateful firewall rules at the instance/ENI level.
Network ACLs — stateless firewall rules at the subnet level.

8

What is the difference between Security Groups and Network ACLs?Medium

▼

Feature	Security Group	Network ACL
Level	Instance / ENI	Subnet
Stateful	Yes — return traffic allowed automatically	No — must explicitly allow return traffic
Rules	Allow only	Allow and Deny
Evaluation	All rules evaluated	Rules evaluated in number order; first match wins
Default	Deny all inbound, allow all outbound	Allow all inbound and outbound

Use Security Groups as your primary firewall (most practical). Use NACLs for subnet-level blocking — e.g. block a malicious IP range at the subnet level.

9

What is VPC Peering and what are its limitations?Medium

▼

VPC Peering creates a direct network connection between two VPCs (same or different accounts/regions). Traffic stays on AWS backbone — not the public internet.

# Peer VPC-A with VPC-B, then add routes in each:
# VPC-A route table: 10.1.0.0/16 → pcx-xxxxx (peer connection)
# VPC-B route table: 10.0.0.0/16 → pcx-xxxxx (peer connection)

Limitations:

No transitive routing — if A peers B and B peers C, A cannot reach C through B. You need a direct peering between A and C, or use AWS Transit Gateway.
No overlapping CIDRs — both VPCs must have non-overlapping IP ranges.
Scale — peering is 1:1. With 10 VPCs you need 45 peering connections. Transit Gateway solves this (hub-and-spoke model).

10

What is AWS Transit Gateway and when do you need it?Medium

▼

Transit Gateway is a managed hub that connects VPCs and on-premises networks. Each network connects once to the TGW — it routes traffic between all connected networks.

Supports thousands of VPCs (vs 125 peering connections per VPC)
Cross-account and cross-region via Transit Gateway Peering
Centralized routing tables — control which networks can reach which
Integrates with Direct Connect and VPN for hybrid connectivity

Use TGW when: you have more than ~3 VPCs that need to communicate, you need shared services VPCs (centralised logging, security, DNS), or you need transitive routing.

11

What is the difference between an Application Load Balancer and a Network Load Balancer?Medium

▼

Application Load Balancer (ALB) — Layer 7 (HTTP/HTTPS). Content-based routing: route by URL path (/api/* → API service, /app/* → web service), host header, HTTP method, query string. Supports WebSockets, HTTP/2, gRPC. Ideal for microservices and container-based apps.
Network Load Balancer (NLB) — Layer 4 (TCP/UDP/TLS). Ultra-high performance: millions of requests/second with microsecond latency. Preserves client IP. Static IP per AZ. Ideal for: game servers, financial trading, IoT, VoIP — any protocol that isn't HTTP.
Gateway Load Balancer (GWLB) — deploys inline network appliances (firewall, intrusion detection) transparently in the traffic path.

If you're running microservices and need path-based routing → ALB. If you need TCP or ultra-low latency → NLB.

12

What is Auto Scaling and what are the different scaling policies?Medium

▼

Target Tracking — maintain a metric at a target value. E.g. "keep average CPU at 70%." ASG adds/removes instances to hit the target. Simplest to configure.
Step Scaling — scale in/out by different amounts based on alarm severity. E.g. CPU 70–80% → add 1, CPU 80–90% → add 2, CPU >90% → add 4.
Scheduled Scaling — pre-configure capacity for known traffic patterns. E.g. scale up at 8 AM weekdays, scale down at 8 PM.
Predictive Scaling — uses ML to forecast traffic and pre-scales ahead of time. Good for repeating daily/weekly patterns.

Warm-up period: new instances take time to be ready. Configure Default Instance Warmup so CloudWatch doesn't see them as healthy and add more instances prematurely.

13

What are IAM users, groups, roles, and policies? How do they relate?Easy

▼

Users — represent a person or application with long-term credentials (access key + secret). Best practice: create users for humans and use roles for applications.
Groups — collection of users. Attach policies to groups, not individual users. E.g. "Developers" group has read-only S3 and EC2 policies.
Roles — temporary credentials assumed by AWS services, cross-account access, or federated users. EC2 instance role lets the app call AWS APIs without hardcoding keys. Lambda executes with a role.
Policies — JSON documents defining permissions (Allow/Deny, Resources, Actions, Conditions). Attached to users, groups, or roles.

{
  "Effect": "Allow",
  "Action": ["s3:GetObject", "s3:PutObject"],
  "Resource": "arn:aws:s3:::my-app-bucket/*",
  "Condition": {"StringEquals": {"s3:prefix": ["uploads/"]}}
}

14

What is the principle of least privilege in AWS IAM?Easy

▼

Grant only the minimum permissions needed to perform a task. Never use * on Actions or Resources in production policies.

// Bad practice:
{"Effect": "Allow", "Action": "*", "Resource": "*"}

// Good practice (specific service, specific resource):
{
  "Effect": "Allow",
  "Action": ["s3:GetObject"],
  "Resource": "arn:aws:s3:::my-bucket/reports/*"
}

Implementation steps:

Start with AWS managed policies for initial development
Use IAM Access Analyzer to identify unused permissions
Use aws iam generate-service-last-accessed-details to see what permissions are actually used
Tighten over time — remove permissions that have never been used in 90 days

15

What is IAM Role assumption and how does cross-account access work?Hard

▼

# Account A (trusted account) assumes a role in Account B (trusting account):

# 1. In Account B: create a role with trust policy allowing Account A:
{
  "Statement": [{
    "Effect": "Allow",
    "Principal": {"AWS": "arn:aws:iam::ACCOUNT_A_ID:root"},
    "Action": "sts:AssumeRole"
  }]
}

# 2. In Account A: allow users/services to assume the role:
{"Effect": "Allow", "Action": "sts:AssumeRole",
 "Resource": "arn:aws:iam::ACCOUNT_B_ID:role/ReadOnlyRole"}

# 3. Application in Account A assumes the role:
aws sts assume-role \
  --role-arn arn:aws:iam::ACCOUNT_B_ID:role/ReadOnlyRole \
  --role-session-name deploy-session

Returns temporary credentials (AccessKeyId, SecretAccessKey, SessionToken) valid for up to 12 hours. Used for: multi-account deployments, CI/CD pipelines, least-privilege service access.

16

What is AWS KMS and how does envelope encryption work?Medium

▼

KMS (Key Management Service) manages cryptographic keys. You never see the CMK (Customer Managed Key) plaintext — all encryption/decryption happens inside KMS hardware.

Envelope encryption (used by S3, RDS, EBS under the hood):

KMS generates a Data Encryption Key (DEK)
The DEK encrypts your data locally (fast, no size limit)
KMS encrypts the DEK with your CMK (stores the encrypted DEK alongside the data)
To decrypt: send the encrypted DEK to KMS → KMS returns plaintext DEK → use it to decrypt data locally

CMKs never leave KMS hardware. You control key policies, rotation (annual auto-rotation), and audit all key usage in CloudTrail.

17

What is AWS Secrets Manager vs SSM Parameter Store?Medium

▼

Feature	Secrets Manager	SSM Parameter Store
Cost	~$0.40/secret/month	Free (Standard), $0.05/adv param
Auto-rotation	Built-in (RDS, Redshift, DocumentDB)	Manual (via Lambda)
Versioning	Yes, with labels	Yes (SecureString)
Cross-region	Multi-region replication	Per-region
Best for	DB passwords, API keys with rotation	Config values, non-sensitive params

Use Secrets Manager for anything that needs automatic rotation. Use Parameter Store for app config and non-rotating secrets (cheaper).

18

What is AWS WAF and how does it protect web applications?Medium

▼

WAF (Web Application Firewall) filters HTTP/HTTPS traffic at Layer 7 based on rules. Deployed in front of CloudFront, ALB, API Gateway, or AppSync.

What WAF blocks:

SQL injection and XSS — AWS Managed Rules Core Rule Set
Bad bots — Bot Control managed rule group
IP reputation — AWS IP Reputation managed rules
Geo-blocking — allow/deny by country
Rate limiting — block IPs that exceed N requests per 5 minutes (basic DDoS protection)
Custom rules — block specific User-Agent patterns, query string values

Logs go to CloudWatch, S3, or Kinesis Data Firehose. For advanced DDoS protection, add AWS Shield Advanced (covers volumetric L3/L4 attacks, SLAs, DDoS response team access).

19

What is AWS CloudTrail and how does it differ from CloudWatch?Medium

▼

CloudTrail — records every AWS API call (who, what, when, from where). Used for: audit trails, compliance, security investigation ("who deleted that S3 bucket?"). Stored in S3. Searchable in CloudTrail Lake or Athena.
CloudWatch — metrics, logs, and alarms for your applications and AWS resources. Used for: monitoring CPU usage, application logs, alerting on error rates, dashboards.

Think: CloudTrail = "who did what in the AWS console/API." CloudWatch = "how is my app/infrastructure performing."

Enable CloudTrail in ALL regions (not just your primary region). Enable log file validation and S3 bucket MFA Delete to prevent evidence tampering. Enable CloudTrail Insights to automatically detect unusual API activity.

20

What is Amazon GuardDuty and what does it detect?Easy

▼

GuardDuty is a managed threat detection service that analyses CloudTrail, VPC Flow Logs, DNS logs, and S3 data events using ML to detect malicious behaviour. No agents required.

What it detects:

Compromised EC2 instances communicating with known malware C2 servers
Unusual API calls from unusual locations or at unusual times
Brute force SSH/RDP attacks on EC2
Cryptocurrency mining (high CPU, calls to mining pools)
Privilege escalation (IAM role enumeration, unexpected AssumeRole calls)
S3 bucket exfiltration (large downloads, public access enabled)

Findings are sent to EventBridge — automate response (isolate instance, revoke credentials) via Lambda.

21

How does AWS Lambda work and what are its limitations?Easy

▼

Lambda runs code in response to events without managing servers. AWS provisions containers on demand, runs your function, and scales to thousands of concurrent executions automatically.

Execution model:

Cold start: container initialised, code loaded, handler runs (~100ms–1s for Java)
Warm execution: container reused (milliseconds)
SnapStart (Java): pre-initialise the JVM snapshot — near-zero cold start for Java

Limits (as of 2026):

Max execution time: 15 minutes
Memory: 128 MB to 10 GB
Ephemeral storage: 512 MB to 10 GB (/tmp)
Deployment package: 250 MB unzipped (50 MB zipped)
Concurrent executions: 1,000 per region (soft limit, increasable)

Lambda is billed per invocation + duration (GB-seconds). Idle costs nothing. Ideal for: event processing, webhooks, ETL, APIs with sporadic traffic.

22

What is the difference between Lambda provisioned concurrency and reserved concurrency?Medium

▼

Reserved Concurrency — sets a maximum concurrent invocations for a function. Prevents one function from consuming all regional concurrency. If limit is hit, excess requests are throttled (429). Free to configure.
Provisioned Concurrency — pre-warms a set number of execution environments. Eliminates cold starts — all invocations within the provisioned count respond with no initialisation delay. You pay for provisioned environments even when idle.

Use provisioned concurrency for latency-sensitive APIs (payments, authentication) where cold starts are unacceptable. Use reserved concurrency to cap a non-critical function and protect other functions from being starved.

23

What triggers can invoke a Lambda function?Easy

▼

HTTP — API Gateway, Lambda Function URL (direct HTTPS)
Storage — S3 event notifications (on upload, delete)
Database — DynamoDB Streams, RDS Proxy, Kinesis Data Streams
Messaging — SQS (polls the queue), SNS (push notification), EventBridge events, Kafka (MSK)
Schedule — EventBridge Scheduler (cron/rate expressions)
AWS services — CloudWatch Logs, CodeCommit, CodePipeline, CloudFormation (custom resources), Cognito

Lambda also supports synchronous invocation (waits for response) and asynchronous invocation (returns immediately, processes in background with configurable retry and DLQ).

24

What is ECS vs EKS? When do you choose each?Medium

▼

ECS (Elastic Container Service) — AWS-native container orchestration. Simpler to operate, tighter AWS integration. EC2 launch type (manage instances yourself) or Fargate (serverless — AWS manages the nodes).
EKS (Elastic Kubernetes Service) — managed Kubernetes. More powerful and flexible but more complex. Industry-standard — teams with existing Kubernetes expertise prefer it. Easier to run open-source tools (Helm, Argo, Kustomize).

Choose ECS: small-medium teams, AWS-first stack, want simplicity, need Fargate (truly serverless containers), don't have Kubernetes expertise.

Choose EKS: existing Kubernetes workloads, need to run service meshes (Istio), multi-cloud portability, complex scheduling needs, large platform engineering team.

For either: use Fargate to avoid managing EC2 node groups unless you have GPU or specific hardware requirements.

25

What is AWS Fargate and how does it differ from ECS on EC2?Easy

▼

ECS on EC2 — you provision and manage EC2 instances as the cluster. You control instance types, OS patching, capacity planning. More control, often cheaper at scale.
Fargate — serverless containers. AWS manages the underlying compute. You specify only CPU and memory per task. No nodes to manage, patch, or scale. Pay per task-second.

Fargate benefits: no over-provisioning, tasks start independently (no node capacity to wait for), simplified security (no SSH into nodes, smaller attack surface), works with both ECS and EKS.

Fargate trade-offs: ~20-30% more expensive per vCPU/memory than equivalent EC2 at steady state. Slower cold start than running tasks on warm nodes. No GPU support.

26

What is Amazon API Gateway and what are its integration types?Medium

▼

API Gateway is a fully managed service to create, publish, and secure REST, HTTP, and WebSocket APIs at scale. It handles auth, throttling, SSL, versioning, and caching.

Integration types:

Lambda proxy — passes the full request to Lambda, returns Lambda's response directly. Simplest for Lambda backends.
Lambda non-proxy — you define request/response mapping templates (VTL). More control but more complex.
HTTP — proxy to any HTTP endpoint (your on-prem server, other services).
AWS service — call AWS services directly without Lambda (e.g. PutItem in DynamoDB, SendMessage to SQS). Lower latency, no Lambda cold start.

REST API vs HTTP API: HTTP API is 70% cheaper and lower latency but fewer features (no usage plans, no AWS WAF native integration, no response caching). Use HTTP API for simple Lambda proxies.

27

What is Amazon SQS and how does it differ from SNS?Medium

▼

SQS (Simple Queue Service) — pull-based message queue. One consumer receives each message (or one consumer group with Lambda). Messages stay in queue until consumed and deleted. Max 14-day retention. Use for: work queues, decoupling producers from consumers, rate-limiting downstream processing.
SNS (Simple Notification Service) — push-based pub/sub. One publisher, many subscribers (SQS queues, Lambda, HTTP endpoints, email). Each subscriber gets every message. Use for: fan-out, notifications, triggering multiple consumers from one event.

SQS + SNS fan-out pattern (most common):

Publisher → SNS Topic → SQS Queue A → Lambda/Worker A
                     → SQS Queue B → Lambda/Worker B
                     → Email subscriber

SQS Standard: at-least-once, unordered. SQS FIFO: exactly-once, ordered, 3,000 msg/sec with batching.

28

What is a Dead Letter Queue (DLQ) and why is it essential?Medium

▼

A DLQ is a secondary queue where messages are sent after failing all processing retries. Without a DLQ, poison pill messages (malformed data, logic errors) loop forever or are silently dropped.

# SQS DLQ: after 3 failed processing attempts, move to DLQ
aws sqs set-queue-attributes \
  --queue-url https://sqs.us-east-1.amazonaws.com/xxx/main-queue \
  --attributes '{
    "RedrivePolicy": "{\"deadLetterTargetArn\":\"arn:aws:sqs:...:dlq\",
                       \"maxReceiveCount\":\"3\"}"
  }'

After fixing the bug, use SQS DLQ Redrive (console or API) to move messages from DLQ back to the source queue for reprocessing. Set a CloudWatch alarm on DLQ message count — any message there is a bug that needs investigation.

29

What is RDS Multi-AZ vs Read Replicas?Medium

▼

Multi-AZ — synchronous replication to a standby in another AZ. Automatic failover (1–2 min) on primary failure. The standby cannot be used for reads. Purpose: high availability / disaster recovery.
Read Replicas — asynchronous replication to one or more replicas. Replicas accept read queries. Can be in same AZ, same region, or cross-region. Purpose: read scaling and reporting. NOT a failover mechanism (asynchronous = possible data lag; manual promotion required).

Combine both: Multi-AZ for HA, Read Replicas for read scaling. RDS Proxy can be added in front to pool connections and reduce DB load.

30

What is Amazon Aurora and how does it differ from standard RDS?Medium

▼

Aurora is AWS's cloud-native relational database engine (MySQL/PostgreSQL compatible). It separates storage from compute:

Storage — 6 copies of data across 3 AZs automatically. Survives up to 2 AZ failures without data loss. Auto-grows to 128 TB.
Performance — 5x MySQL throughput, 3x PostgreSQL throughput vs standard RDS.
Failover — sub-30-second failover (vs 1–2 minutes for RDS Multi-AZ).
Read Replicas — up to 15 read replicas sharing the same storage (no replication lag for storage).
Aurora Serverless v2 — auto-scales compute in fine-grained increments (0.5 ACU). Great for variable workloads.

Use Aurora for: production workloads that need high availability and don't want to manage the HA complexity. Standard RDS for: dev/test, specific engine versions, cost sensitivity.

31

What is DynamoDB and what are its data model concepts?Medium

▼

DynamoDB is a fully managed serverless key-value/document NoSQL database with single-digit millisecond latency at any scale.

Data model:

Table — collection of items (like a table but schemaless except for primary key)
Partition Key (required) — determines which physical partition stores the item. Choose a high-cardinality key for even distribution.
Sort Key (optional) — creates a composite primary key. Items with the same partition key are sorted by sort key. Enables range queries within a partition.
GSI (Global Secondary Index) — alternative primary key for query flexibility. Different partition + sort key. Eventually consistent reads from the index.
LSI (Local Secondary Index) — same partition key, different sort key. Must be defined at table creation. Strongly consistent reads.

Key DynamoDB design principle: single table design — model all access patterns in one table using compound sort keys and GSIs, avoiding the JOIN problem.

32

What is DynamoDB Streams and how does it enable event-driven patterns?Medium

▼

DynamoDB Streams captures every write (INSERT, MODIFY, REMOVE) to a table as an ordered, time-limited (24-hour) stream of change records. Lambda can be triggered on each change.

# Common patterns enabled by DynamoDB Streams:

1. Event notification:
   Order table write → Stream → Lambda → Send confirmation email

2. Cross-region replication:
   Primary table → Stream → Lambda → Write to replica table in another region

3. Audit trail:
   Any table change → Stream → Lambda → Append to S3 audit log

4. Search sync:
   Product table update → Stream → Lambda → Update Elasticsearch index

5. Aggregation:
   Order line inserts → Stream → Lambda → Update order total in Orders table

33

When would you choose ElastiCache (Redis) vs DynamoDB DAX?Medium

▼

ElastiCache for Redis — general-purpose in-memory cache. Works with any backend (RDS, DynamoDB, your own service). Supports complex data structures (sorted sets for leaderboards, pub/sub, TTL-based sessions), atomic operations, Lua scripts. Must manage cache invalidation yourself.
DynamoDB DAX (DynamoDB Accelerator) — in-memory cache specifically for DynamoDB. Transparent — same DynamoDB API, just point to DAX endpoint. Reduces read latency from milliseconds to microseconds. Handles cache invalidation automatically on writes. Only works with DynamoDB.

Use DAX when: your app uses DynamoDB and you want drop-in read acceleration with no code changes. Use ElastiCache when: you cache across multiple data sources, need Redis-specific features, or need more control over caching logic.

34

What is Amazon Kinesis and what are its components?Medium

▼

Kinesis Data Streams — real-time streaming data ingestion. Records retained up to 7 days (365 with extended). Multiple consumers can read the same data. Use for: click stream, IoT telemetry, real-time analytics, log aggregation.
Kinesis Data Firehose — fully managed delivery pipeline to S3, Redshift, Elasticsearch, Splunk. No consumers to manage. Near-real-time (60s buffering minimum). Use for: log shipping, batch-loading data into analytics stores.
Kinesis Data Analytics — run SQL or Apache Flink on streaming data in real time. Use for: real-time dashboards, anomaly detection, sessionization.

Kinesis vs SQS: Kinesis retains messages for replay; SQS deletes on consume. Kinesis supports multiple concurrent consumers of the same stream; SQS is a work queue (one consumer per message). Kinesis has ordering per shard; SQS Standard is unordered.

35

What is Amazon S3 Select and Athena? How do they enable serverless data analytics?Medium

▼

S3 Select — run SQL queries directly on S3 objects (CSV, JSON, Parquet) without downloading the full file. Returns only the queried subset. Good for: filtering large log files without loading everything.
Athena — serverless SQL query engine over S3. Uses Presto under the hood. Query Parquet/ORC/JSON/CSV data in S3 directly. Pay per TB scanned. Zero infrastructure. Integrate with Glue Data Catalog for schema-on-read.

-- Query last 24h errors from logs in S3 (Athena):
SELECT request_time, status_code, error_message
FROM cloudfront_logs
WHERE date = '2026-06-23'
  AND status_code >= 500
LIMIT 100;

Cost tip: use Parquet + Snappy compression — columnar format means Athena only reads the columns you query, reducing cost by 90% vs raw JSON.

36

What is RDS Proxy and why do you need it?Medium

▼

RDS Proxy sits between your application and RDS/Aurora and pools and manages connections to the database. Solves critical problems at scale:

Connection pooling — Lambda opens a new DB connection on every cold start. 1,000 concurrent Lambdas = 1,000 DB connections → DB is overwhelmed. RDS Proxy pools connections: 1,000 Lambda → Proxy (50 connections) → DB.
Failover speed — RDS Proxy maintains connections during RDS failover. Failover time for apps drops from ~30s to <30s because the proxy handles reconnection.
IAM authentication — apps authenticate to proxy with IAM, proxy authenticates to DB with username/password (secrets in Secrets Manager with auto-rotation).

RDS Proxy is almost mandatory when using Lambda with RDS. The connection management problem is the #1 reason Lambda+RDS architectures fail at scale.

37

What is Amazon CloudFront and how does it work?Easy

▼

CloudFront is AWS's CDN (Content Delivery Network) with 550+ edge locations worldwide. It caches content at edge locations close to users — reducing latency and origin load.

User in Mumbai → nearest CloudFront edge
  Cache HIT?  → return cached content (sub-10ms)
  Cache MISS? → fetch from origin (S3, ALB, API Gateway, EC2)
               → cache for TTL duration
               → return to user

Key features:

SSL/TLS termination at edge — free certificates via ACM
Lambda@Edge / CloudFront Functions — run code at edge (A/B testing, auth, URL rewrites)
Origin failover — primary + failover origin group
Field-level encryption — encrypt sensitive fields at edge, decrypt only in specific services
Signed URLs / Signed Cookies — time-limited access to private content

38

What is Route 53 and what routing policies does it support?Medium

▼

Route 53 is AWS's managed DNS service (100% SLA availability). Key routing policies:

Simple — one record, one resource. No health checks.
Weighted — split traffic by percentage. Use for: canary deployments, A/B testing (10% to v2, 90% to v1).
Latency-based — route to the region with lowest latency for the user. Multi-region active-active.
Geolocation — route based on user's geographic location. Use for: compliance (EU users → EU servers), localisation.
Geoproximity — route based on geographic proximity with configurable bias. Requires Traffic Flow.
Failover — active/passive. Route to secondary when primary health check fails.
Multivalue Answer — return multiple healthy IPs, client-side load balancing.

39

What is AWS Direct Connect and when do you need it over VPN?Medium

▼

Site-to-Site VPN — encrypted tunnel over the public internet. Quick to set up (~hours), ~1.25 Gbps max. Latency varies with internet quality. Use for: dev/test hybrid, backup connectivity, smaller data transfers.
Direct Connect — dedicated physical fibre from your data centre to an AWS Direct Connect location. 1–100 Gbps, consistent low latency, not over the public internet. Takes weeks to provision. Use for: large data migrations, compliance requirements (no public internet), consistent high-throughput workloads (large databases, financial systems).

Best practice: Direct Connect as primary, Site-to-Site VPN as failover. Create a DX Gateway to connect one Direct Connect to multiple VPCs and regions.

40

What is VPC Endpoints and why are they important for security?Medium

▼

VPC Endpoints allow private communication between your VPC and AWS services without traffic leaving the AWS network or going through the public internet.

Gateway Endpoint — for S3 and DynamoDB only. Free. Added to route tables as a target.
Interface Endpoint (PrivateLink) — creates an ENI in your VPC for the service. Charges per hour + data processed. Supports most AWS services (SQS, SNS, KMS, Secrets Manager, CloudWatch, etc.).

# Without endpoint: EC2 in private subnet → NAT Gateway → internet → S3
# With S3 Gateway Endpoint: EC2 in private subnet → VPC → S3 (private)
# Result: no NAT charges, no public internet exposure of S3 traffic

Security benefit: enable S3 bucket policies that require access only via endpoint — blocks all public access even if IAM is misconfigured.

41

What is the difference between horizontal and geographic high availability in AWS?Hard

▼

Multi-AZ (Availability Zone) — replicate within a region across data centres ~10km apart. Protects against a single data centre failure. Services: RDS Multi-AZ, ALB (spans AZs), ASG cross-AZ, ElastiCache Multi-AZ.
Multi-Region — replicate across AWS regions (thousands of km apart). Protects against a full region outage (extremely rare) or provides latency benefits for global users. Services: Aurora Global Database, DynamoDB Global Tables, S3 Cross-Region Replication, CloudFront.

Multi-AZ is the baseline for any production workload. Multi-region adds significant complexity (data replication lag, consistency challenges) and is only justified for global applications or the most strict RTO/RPO requirements.

42

What is AWS Global Accelerator and how does it differ from CloudFront?Medium

▼

CloudFront — caches content at edge locations. Best for static assets, cacheable API responses. HTTP/HTTPS only. Layer 7.
Global Accelerator — routes TCP/UDP traffic from the nearest edge to your AWS endpoint via the AWS private backbone (not the public internet). No caching. Provides two static Anycast IPs that don't change (good for whitelisting). Best for: non-cacheable APIs, gaming (UDP), VoIP, apps needing consistent global latency.

Key difference: CloudFront caches at edge and may serve from cache. Global Accelerator always routes to your backend, just faster (via AWS backbone instead of public internet).

43

What is AWS PrivateLink?Medium

▼

PrivateLink lets you expose your service privately to other VPCs and AWS accounts without VPC peering or making it public. Traffic never leaves the AWS network.

Your VPC (service provider)
  ↑ Network Load Balancer
  ↑ VPC Endpoint Service (PrivateLink)
  ↓ Interface Endpoint (ENI in consumer VPC)
Consumer VPC (your customer / another team's account)

Use cases:

SaaS providers exposing private APIs to enterprise customers without VPC peering (no IP overlap issues)
Internal microservices exposing APIs across accounts in an AWS Organization
AWS services themselves use PrivateLink under the hood for Interface Endpoints

44

What is AWS CloudFormation and how does it differ from CDK and Terraform?Medium

▼

CloudFormation — AWS-native IaC. JSON/YAML templates. Deep AWS integration (Stack Sets, drift detection, nested stacks). AWS manages the state. AWS-only.
AWS CDK (Cloud Development Kit) — write infrastructure as TypeScript/Python/Java/Go code that synthesizes to CloudFormation. Better developer experience — loops, conditionals, reusable constructs (higher-level abstractions). Still uses CloudFormation under the hood.
Terraform — cloud-agnostic HCL. Works with AWS, GCP, Azure, and 1,000+ providers. Manages its own state (S3 + DynamoDB lock for teams). Stronger community ecosystem. Preferred for multi-cloud or organisations standardising across clouds.

Recommendation: CDK for AWS-only teams that want developer-friendly IaC. Terraform for multi-cloud or existing Terraform expertise.

45

What are the five pillars of the AWS Well-Architected Framework?Easy

▼

Operational Excellence — run and monitor systems to deliver business value. Key practices: IaC, small reversible changes, anticipate failure, refine procedures.
Security — protect data and systems. Key practices: identity foundation (IAM), least privilege, encryption in transit and at rest, detective controls, incident response automation.
Reliability — recover from failures, scale to meet demand. Key practices: automatic recovery, test recovery, horizontal scaling, stop guessing capacity.
Performance Efficiency — use resources efficiently. Key practices: use managed services, go global in minutes, experiment with new technologies, serverless first.
Cost Optimisation — avoid unnecessary costs. Key practices: right-size, use Savings Plans/Spot, match supply to demand, measure cost of each service.

A 6th pillar — Sustainability — was added in 2021: minimise environmental impact through efficient resource use.

46

What is RTO and RPO? How do AWS services help achieve different targets?Hard

▼

RTO (Recovery Time Objective) — maximum acceptable downtime. How fast must the system recover after a failure?
RPO (Recovery Point Objective) — maximum acceptable data loss. How much data can you afford to lose?

Strategy	RTO	RPO	Cost
Backup & Restore	Hours	Hours	Lowest
Pilot Light (minimal services running)	~10 min	Minutes	Low
Warm Standby (scaled-down replica)	Minutes	Seconds	Medium
Multi-Region Active/Active	Near-zero	Near-zero	Highest

47

How would you design a serverless REST API on AWS?Hard

▼

Client
  ↓ HTTPS
CloudFront (CDN, WAF, SSL termination)
  ↓
API Gateway (HTTP API — routing, throttling, auth)
  ↓ trigger
Lambda functions (business logic, per route)
  ↓           ↓             ↓
DynamoDB    Secrets Mgr   SQS/SNS
(data)      (credentials) (async work)

Supporting services:
- Route 53: DNS
- ACM: SSL certificate (auto-renewed)
- Cognito: user pools + JWT auth for API Gateway
- CloudWatch: logs + metrics + alarms
- X-Ray: distributed tracing
- CodePipeline + SAM/CDK: CI/CD deployment

Cost: all serverless — pay only for actual requests. Zero idle cost. Auto-scales to millions of requests. The DynamoDB table auto-scales read/write capacity (on-demand mode).

48

How do you design a data lake on AWS?Hard

▼

Ingestion layer:
  Kinesis Data Streams     ← real-time streams (IoT, clicks, logs)
  AWS Database Migration Service ← bulk migration from RDBMS
  S3 Direct Upload         ← batch file drops

Storage (S3 Data Lake):
  s3://data-lake/raw/      ← unprocessed data (never delete)
  s3://data-lake/curated/  ← cleaned, partitioned Parquet
  s3://data-lake/analytics/← aggregated, business-ready

Processing:
  AWS Glue (ETL jobs, data catalog, crawlers)
  EMR (Spark for large-scale transformations)
  Lambda (lightweight transforms on small events)

Cataloging:
  AWS Glue Data Catalog    ← schema registry, Athena uses this

Query / Analytics:
  Amazon Athena            ← ad-hoc SQL on S3
  Amazon Redshift Spectrum ← join Redshift tables with S3
  QuickSight               ← BI dashboards

Governance:
  AWS Lake Formation       ← fine-grained access control on tables/columns

49

What is AWS Step Functions and when do you use it over Lambda chaining?Medium

▼

Step Functions is a serverless workflow orchestration service. You define a state machine (JSON/YAML) and Step Functions handles execution, retries, error handling, and state.

{
  "StartAt": "ValidateOrder",
  "States": {
    "ValidateOrder": {"Type": "Task", "Resource": "arn:...validate", "Next": "ReserveStock"},
    "ReserveStock":  {"Type": "Task", "Resource": "arn:...reserve",  "Next": "ChargePayment",
                      "Retry": [{"ErrorEquals":["ServiceException"],"MaxAttempts":3}],
                      "Catch": [{"ErrorEquals":["OutOfStock"],"Next": "NotifyOOS"}]},
    "ChargePayment": {"Type": "Task", "Resource": "arn:...charge",   "End": true}
  }
}

Use Step Functions over Lambda chaining when:

Workflow exceeds 15 minutes (Lambda max timeout)
You need retry logic, error handling, branching between multiple steps
You want visual audit trail of every execution (console shows each step's input/output)
Long-running workflows that need to pause and wait (Wait state, callback with task token)

50

What is AWS Organizations and how does multi-account strategy work?Hard

▼

AWS Organizations lets you manage multiple AWS accounts centrally. Typical account structure:

Management/Root Account (billing, SCPs only)
├── Security OU
│   ├── Log Archive Account (centralised CloudTrail/Config logs)
│   └── Security Tooling Account (GuardDuty, Security Hub)
├── Infrastructure OU
│   ├── Shared Services Account (DNS, monitoring, CI/CD)
│   └── Network Account (Transit Gateway, Direct Connect)
├── Workloads OU
│   ├── Dev Account
│   ├── Staging Account
│   └── Production Account
└── Sandbox OU
    └── Developer Sandbox Accounts

Benefits: blast radius isolation (prod compromise doesn't affect dev), separate billing per team/product, SCP guardrails enforce security policies across all accounts (e.g. "never disable CloudTrail", "only us-east-1 and eu-west-1 allowed").

51

What are Service Control Policies (SCPs) and how do they enforce guardrails?Hard

▼

SCPs are IAM-like policies attached to an AWS Organization, OU, or account. They set the maximum permissions available to any principal (including root) in that account. They do NOT grant permissions — they restrict the maximum that IAM policies can grant.

// Guardrail SCP: prevent disabling CloudTrail in any member account
{
  "Sid": "DenyCloudTrailDisable",
  "Effect": "Deny",
  "Action": [
    "cloudtrail:DeleteTrail",
    "cloudtrail:StopLogging",
    "cloudtrail:UpdateTrail"
  ],
  "Resource": "*"
}

// Even if an account admin has AdministratorAccess, this Deny overrides it

Common SCPs: deny region usage outside approved regions, deny creating public S3 buckets, deny root account usage, require MFA for sensitive actions.

52

What is AWS Config and how does it enforce compliance?Medium

▼

AWS Config continuously records configuration changes to AWS resources and evaluates them against compliance rules.

Configuration History — "what did this security group look like last Tuesday?"
Config Rules — evaluate resources against rules. AWS Managed Rules (200+): s3-bucket-public-read-prohibited, restricted-ssh, encrypted-volumes. Custom rules via Lambda.
Remediation — auto-remediate non-compliant resources via SSM Automation documents (e.g. automatically disable public S3 bucket access when violated).
Conformance Packs — pre-built sets of rules for compliance frameworks (NIST, PCI-DSS, CIS AWS Foundations).

Pair Config with Security Hub to aggregate findings from GuardDuty, Inspector, Macie, and Config rules in one dashboard.

53

What is Amazon EventBridge and how does it enable event-driven architectures?Medium

▼

EventBridge is a serverless event bus. Services publish events; rules route matching events to targets.

// Rule: route EC2 state changes to Lambda for notification
{
  "source": ["aws.ec2"],
  "detail-type": ["EC2 Instance State-change Notification"],
  "detail": {"state": ["terminated"]}
}
// Target: Lambda, SQS, SNS, Step Functions, API Gateway, another bus

// Custom events from your apps:
aws events put-events --entries '[{
  "Source": "com.myapp.orders",
  "DetailType": "OrderPlaced",
  "Detail": "{\"orderId\":\"123\",\"amount\":99.99}"
}]'

EventBridge differs from SNS: schema registry for event discovery, archive and replay, cross-account event routing, partner event sources (Stripe, Zendesk, GitHub directly publish to your bus). The backbone of modern serverless event-driven architectures on AWS.

54

What is AWS Cost Explorer and how do you identify and reduce AWS costs?Medium

▼

Cost Explorer visualises spending trends by service, account, tag, region. Key cost optimisation actions:

Right-sizing — use Compute Optimizer to identify over-provisioned EC2 instances. Downsize if average CPU < 20%.
Savings Plans / Reserved Instances — commit to baseline usage at 30–72% discount. Use Savings Plans recommendations in Cost Explorer.
Spot Instances — for fault-tolerant batch workloads, replace On-Demand with Spot (up to 90% savings).
S3 cost reduction — enable Intelligent-Tiering, delete incomplete multipart uploads, transition old data to Glacier.
Orphaned resources — unattached EBS volumes, unused Elastic IPs, idle NAT Gateways (significant cost), old Load Balancers.
Data transfer costs — use VPC Endpoints to avoid NAT Gateway data processing charges for S3/DynamoDB traffic.

55

What is Amazon Cognito and how does it handle authentication?Medium

▼

User Pools — user directory with sign-up/sign-in. Handles MFA, password policies, email/phone verification, social login (Google, Facebook, Apple), SAML/OIDC federation. Returns JWT tokens (ID, access, refresh).
Identity Pools (Federated Identities) — exchange User Pool JWT (or any identity provider token) for temporary AWS credentials. Gives users direct access to AWS services (S3, DynamoDB) with fine-grained IAM roles per user group.

// Common flow:
1. User signs in via Cognito User Pool → gets JWT
2. JWT attached to API Gateway request → validated via Cognito authorizer
3. JWT exchanged at Identity Pool → temporary AWS credentials
4. App accesses S3 bucket directly with user-scoped IAM role

Cognito is the go-to for application authentication in AWS without building your own auth server. Handles the heavy lifting of token management, MFA, and federation.

56

What is AWS Backup and how does it differ from service-native backups?Medium

▼

AWS Backup is a centralised managed backup service that automates backups across EBS, RDS, DynamoDB, EFS, FSx, S3, and EC2 from one place.

Backup Plans — define backup frequency, retention, lifecycle rules (move to cold storage after 30 days)
Cross-region / cross-account copies — automate copying backups to another region/account for DR
Vault Lock — WORM protection on backup vaults (cannot delete backups for the defined retention period — compliance requirement)

Service-native backups (RDS automated backups, EBS snapshots) still exist but lack centralised management. AWS Backup provides a single audit trail ("show me all backups from the past 90 days across all resources") — important for compliance.

57

How do you migrate an on-premises database to AWS with minimal downtime?Hard

▼

AWS Database Migration Service (DMS) enables near-zero downtime migration:

Schema conversion — use AWS Schema Conversion Tool (SCT) to convert stored procedures and DDL to target dialect.
Full load — DMS copies all existing data from source to target while the source remains live.
Ongoing replication (CDC) — DMS uses Change Data Capture to stream ongoing changes from source to target. Source and target stay in sync.
Validation — run application smoke tests against the target DB using a read replica or staging environment.
Cutover — stop writes to the source, wait for final changes to replicate, update connection strings, switch traffic. Downtime: minutes.

DMS supports homogeneous (Oracle → RDS Oracle) and heterogeneous (Oracle → Aurora PostgreSQL) migrations. For very large databases, use Snowball Edge for initial data transfer, then DMS for CDC replication.

58

What is AWS CodePipeline and how do you structure a CI/CD pipeline for a containerised app?Hard

▼

Stage 1: Source
  CodeCommit / GitHub → triggers on push to main

Stage 2: Build (CodeBuild)
  - Run unit tests
  - docker build
  - docker push to ECR
  - Generate imagedefinitions.json with new image tag

Stage 3: Test (CodeBuild)
  - Deploy to staging ECS/EKS using task definition with new image
  - Run integration tests
  - Run OWASP ZAP security scan

Stage 4: Deploy to Prod (CodeDeploy)
  - ECS Blue/Green deployment via CodeDeploy
  - Shift 10% traffic to new task set
  - CloudWatch alarm gates (error rate OK?)
  - Shift 100% traffic after 5 min
  - Terminate old task set after 60 min

Notifications:
  EventBridge → SNS → Slack on pipeline failure

59

What is Amazon Macie and when do you need it?Medium

▼

Amazon Macie uses ML to automatically discover, classify, and protect sensitive data in S3. It scans S3 objects and identifies:

PII (names, email addresses, SSNs, passport numbers)
Financial data (credit card numbers, bank account numbers)
Credentials (API keys, private keys)
Healthcare data (PHI)

Macie also monitors S3 bucket security configuration — alerts on public buckets, unencrypted buckets, buckets shared outside the organisation.

When you need it:

GDPR/HIPAA compliance — prove you know where PII is stored
Data breach investigation — did the compromised S3 bucket contain PII?
Shadow data discovery — find PII that developers accidentally stored in non-production buckets

60

Design a highly available, fault-tolerant web application on AWS. Walk through the architecture.Hard

▼

Global:
  Route 53 → latency-based routing to nearest region
  CloudFront → CDN, WAF, SSL termination
  Shield Standard → DDoS protection (free)

Region 1 (us-east-1): [primary]
  Application Tier (3 AZs):
    ALB → spans all 3 AZs
    ECS Fargate (or EC2 ASG) in private subnets
    Auto Scaling → target tracking on CPU/request count

  Data Tier (Multi-AZ):
    Aurora Cluster → writer in AZ-1, read replica in AZ-2, AZ-3
    ElastiCache Redis → cluster mode, cross-AZ
    RDS Proxy → connection pooling

  Supporting:
    Secrets Manager → DB credentials with auto-rotation
    SSM Parameter Store → app config
    S3 + CloudFront → static assets
    SQS + Lambda → async background work

Networking:
  VPC: 3 public subnets (ALB, NAT GW), 3 private app subnets, 3 private DB subnets
  VPC Endpoints for S3, DynamoDB, Secrets Manager
  Security Groups: layered (ALB SG → App SG → DB SG)

Observability:
  CloudWatch: metrics, logs, alarms → SNS → PagerDuty
  X-Ray: distributed tracing
  CloudTrail + Config: compliance

Disaster Recovery:
  Aurora Global DB → Region 2 as standby (RPO < 1s)
  Route 53 failover record → Region 2 if Region 1 health fails

AWS Interview Questions 2026

Top 60 Questions & Answers — EC2, S3, VPC, IAM, Lambda, RDS, EKS, Architecture

Topics Covered

Next Steps

All Interview Topics

AWS Guides