This guide covers the most frequently asked AWS interview questions in 2026 — from core services tested at associate level to architecture design questions asked at senior/solutions architect level. Each answer is written to demonstrate practical, production-relevant knowledge.
# Auto Scaling Group scales horizontally based on CPU:
aws autoscaling put-scaling-policy \
--policy-name scale-out \
--auto-scaling-group-name my-asg \
--policy-type TargetTrackingScaling \
--target-tracking-configuration \
'{"PredefinedMetricSpecification":{"PredefinedMetricType":"ASGAverageCPUUtilization"},"TargetValue":70}'
# Lifecycle policy: transition and expire objects automatically
{
"Rules": [{
"Transitions": [
{"Days": 30, "StorageClass": "STANDARD_IA"},
{"Days": 90, "StorageClass": "GLACIER"},
{"Days": 365, "StorageClass": "DEEP_ARCHIVE"}
],
"Expiration": {"Days": 2555} // delete after 7 years
}]
}
S3 versioning keeps multiple versions of an object. Every upload creates a new version; deletes create a delete marker instead of removing data.
# Enable versioning:
aws s3api put-bucket-versioning \
--bucket my-bucket \
--versioning-configuration Status=Enabled
# List versions:
aws s3api list-object-versions --bucket my-bucket --prefix report.pdf
# Restore deleted object: delete the delete marker
aws s3api delete-object \
--bucket my-bucket --key report.pdf \
--version-id <delete-marker-version-id>
Combine with MFA Delete — requires MFA to delete object versions, protecting against both accidental and malicious deletion. Enable Object Lock (WORM) for compliance requirements where data must be unmodifiable for a defined retention period.
A VPC (Virtual Private Cloud) is a logically isolated network within AWS where you launch resources.
10.0.0.0/16)| Feature | Security Group | Network ACL |
|---|---|---|
| Level | Instance / ENI | Subnet |
| Stateful | Yes — return traffic allowed automatically | No — must explicitly allow return traffic |
| Rules | Allow only | Allow and Deny |
| Evaluation | All rules evaluated | Rules evaluated in number order; first match wins |
| Default | Deny all inbound, allow all outbound | Allow all inbound and outbound |
VPC Peering creates a direct network connection between two VPCs (same or different accounts/regions). Traffic stays on AWS backbone — not the public internet.
# Peer VPC-A with VPC-B, then add routes in each:
# VPC-A route table: 10.1.0.0/16 → pcx-xxxxx (peer connection)
# VPC-B route table: 10.0.0.0/16 → pcx-xxxxx (peer connection)
Limitations:
Transit Gateway is a managed hub that connects VPCs and on-premises networks. Each network connects once to the TGW — it routes traffic between all connected networks.
Use TGW when: you have more than ~3 VPCs that need to communicate, you need shared services VPCs (centralised logging, security, DNS), or you need transitive routing.
/api/* → API service, /app/* → web service), host header, HTTP method, query string. Supports WebSockets, HTTP/2, gRPC. Ideal for microservices and container-based apps.Warm-up period: new instances take time to be ready. Configure Default Instance Warmup so CloudWatch doesn't see them as healthy and add more instances prematurely.
{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:PutObject"],
"Resource": "arn:aws:s3:::my-app-bucket/*",
"Condition": {"StringEquals": {"s3:prefix": ["uploads/"]}}
}
Grant only the minimum permissions needed to perform a task. Never use * on Actions or Resources in production policies.
// Bad practice:
{"Effect": "Allow", "Action": "*", "Resource": "*"}
// Good practice (specific service, specific resource):
{
"Effect": "Allow",
"Action": ["s3:GetObject"],
"Resource": "arn:aws:s3:::my-bucket/reports/*"
}
Implementation steps:
aws iam generate-service-last-accessed-details to see what permissions are actually used# Account A (trusted account) assumes a role in Account B (trusting account):
# 1. In Account B: create a role with trust policy allowing Account A:
{
"Statement": [{
"Effect": "Allow",
"Principal": {"AWS": "arn:aws:iam::ACCOUNT_A_ID:root"},
"Action": "sts:AssumeRole"
}]
}
# 2. In Account A: allow users/services to assume the role:
{"Effect": "Allow", "Action": "sts:AssumeRole",
"Resource": "arn:aws:iam::ACCOUNT_B_ID:role/ReadOnlyRole"}
# 3. Application in Account A assumes the role:
aws sts assume-role \
--role-arn arn:aws:iam::ACCOUNT_B_ID:role/ReadOnlyRole \
--role-session-name deploy-session
Returns temporary credentials (AccessKeyId, SecretAccessKey, SessionToken) valid for up to 12 hours. Used for: multi-account deployments, CI/CD pipelines, least-privilege service access.
KMS (Key Management Service) manages cryptographic keys. You never see the CMK (Customer Managed Key) plaintext — all encryption/decryption happens inside KMS hardware.
Envelope encryption (used by S3, RDS, EBS under the hood):
CMKs never leave KMS hardware. You control key policies, rotation (annual auto-rotation), and audit all key usage in CloudTrail.
| Feature | Secrets Manager | SSM Parameter Store |
|---|---|---|
| Cost | ~$0.40/secret/month | Free (Standard), $0.05/adv param |
| Auto-rotation | Built-in (RDS, Redshift, DocumentDB) | Manual (via Lambda) |
| Versioning | Yes, with labels | Yes (SecureString) |
| Cross-region | Multi-region replication | Per-region |
| Best for | DB passwords, API keys with rotation | Config values, non-sensitive params |
WAF (Web Application Firewall) filters HTTP/HTTPS traffic at Layer 7 based on rules. Deployed in front of CloudFront, ALB, API Gateway, or AppSync.
What WAF blocks:
Logs go to CloudWatch, S3, or Kinesis Data Firehose. For advanced DDoS protection, add AWS Shield Advanced (covers volumetric L3/L4 attacks, SLAs, DDoS response team access).
Think: CloudTrail = "who did what in the AWS console/API." CloudWatch = "how is my app/infrastructure performing."
Enable CloudTrail in ALL regions (not just your primary region). Enable log file validation and S3 bucket MFA Delete to prevent evidence tampering. Enable CloudTrail Insights to automatically detect unusual API activity.
GuardDuty is a managed threat detection service that analyses CloudTrail, VPC Flow Logs, DNS logs, and S3 data events using ML to detect malicious behaviour. No agents required.
What it detects:
Findings are sent to EventBridge — automate response (isolate instance, revoke credentials) via Lambda.
Lambda runs code in response to events without managing servers. AWS provisions containers on demand, runs your function, and scales to thousands of concurrent executions automatically.
Execution model:
Limits (as of 2026):
/tmp)Use provisioned concurrency for latency-sensitive APIs (payments, authentication) where cold starts are unacceptable. Use reserved concurrency to cap a non-critical function and protect other functions from being starved.
Lambda also supports synchronous invocation (waits for response) and asynchronous invocation (returns immediately, processes in background with configurable retry and DLQ).
Choose ECS: small-medium teams, AWS-first stack, want simplicity, need Fargate (truly serverless containers), don't have Kubernetes expertise.
Choose EKS: existing Kubernetes workloads, need to run service meshes (Istio), multi-cloud portability, complex scheduling needs, large platform engineering team.
Fargate benefits: no over-provisioning, tasks start independently (no node capacity to wait for), simplified security (no SSH into nodes, smaller attack surface), works with both ECS and EKS.
Fargate trade-offs: ~20-30% more expensive per vCPU/memory than equivalent EC2 at steady state. Slower cold start than running tasks on warm nodes. No GPU support.
API Gateway is a fully managed service to create, publish, and secure REST, HTTP, and WebSocket APIs at scale. It handles auth, throttling, SSL, versioning, and caching.
Integration types:
REST API vs HTTP API: HTTP API is 70% cheaper and lower latency but fewer features (no usage plans, no AWS WAF native integration, no response caching). Use HTTP API for simple Lambda proxies.
SQS + SNS fan-out pattern (most common):
Publisher → SNS Topic → SQS Queue A → Lambda/Worker A
→ SQS Queue B → Lambda/Worker B
→ Email subscriber
A DLQ is a secondary queue where messages are sent after failing all processing retries. Without a DLQ, poison pill messages (malformed data, logic errors) loop forever or are silently dropped.
# SQS DLQ: after 3 failed processing attempts, move to DLQ
aws sqs set-queue-attributes \
--queue-url https://sqs.us-east-1.amazonaws.com/xxx/main-queue \
--attributes '{
"RedrivePolicy": "{\"deadLetterTargetArn\":\"arn:aws:sqs:...:dlq\",
\"maxReceiveCount\":\"3\"}"
}'
After fixing the bug, use SQS DLQ Redrive (console or API) to move messages from DLQ back to the source queue for reprocessing. Set a CloudWatch alarm on DLQ message count — any message there is a bug that needs investigation.
Aurora is AWS's cloud-native relational database engine (MySQL/PostgreSQL compatible). It separates storage from compute:
Use Aurora for: production workloads that need high availability and don't want to manage the HA complexity. Standard RDS for: dev/test, specific engine versions, cost sensitivity.
DynamoDB is a fully managed serverless key-value/document NoSQL database with single-digit millisecond latency at any scale.
Data model:
DynamoDB Streams captures every write (INSERT, MODIFY, REMOVE) to a table as an ordered, time-limited (24-hour) stream of change records. Lambda can be triggered on each change.
# Common patterns enabled by DynamoDB Streams:
1. Event notification:
Order table write → Stream → Lambda → Send confirmation email
2. Cross-region replication:
Primary table → Stream → Lambda → Write to replica table in another region
3. Audit trail:
Any table change → Stream → Lambda → Append to S3 audit log
4. Search sync:
Product table update → Stream → Lambda → Update Elasticsearch index
5. Aggregation:
Order line inserts → Stream → Lambda → Update order total in Orders table
Use DAX when: your app uses DynamoDB and you want drop-in read acceleration with no code changes. Use ElastiCache when: you cache across multiple data sources, need Redis-specific features, or need more control over caching logic.
Kinesis vs SQS: Kinesis retains messages for replay; SQS deletes on consume. Kinesis supports multiple concurrent consumers of the same stream; SQS is a work queue (one consumer per message). Kinesis has ordering per shard; SQS Standard is unordered.
-- Query last 24h errors from logs in S3 (Athena):
SELECT request_time, status_code, error_message
FROM cloudfront_logs
WHERE date = '2026-06-23'
AND status_code >= 500
LIMIT 100;
Cost tip: use Parquet + Snappy compression — columnar format means Athena only reads the columns you query, reducing cost by 90% vs raw JSON.
RDS Proxy sits between your application and RDS/Aurora and pools and manages connections to the database. Solves critical problems at scale:
CloudFront is AWS's CDN (Content Delivery Network) with 550+ edge locations worldwide. It caches content at edge locations close to users — reducing latency and origin load.
User in Mumbai → nearest CloudFront edge
Cache HIT? → return cached content (sub-10ms)
Cache MISS? → fetch from origin (S3, ALB, API Gateway, EC2)
→ cache for TTL duration
→ return to user
Key features:
Route 53 is AWS's managed DNS service (100% SLA availability). Key routing policies:
Best practice: Direct Connect as primary, Site-to-Site VPN as failover. Create a DX Gateway to connect one Direct Connect to multiple VPCs and regions.
VPC Endpoints allow private communication between your VPC and AWS services without traffic leaving the AWS network or going through the public internet.
# Without endpoint: EC2 in private subnet → NAT Gateway → internet → S3
# With S3 Gateway Endpoint: EC2 in private subnet → VPC → S3 (private)
# Result: no NAT charges, no public internet exposure of S3 traffic
Security benefit: enable S3 bucket policies that require access only via endpoint — blocks all public access even if IAM is misconfigured.
Multi-AZ is the baseline for any production workload. Multi-region adds significant complexity (data replication lag, consistency challenges) and is only justified for global applications or the most strict RTO/RPO requirements.
Key difference: CloudFront caches at edge and may serve from cache. Global Accelerator always routes to your backend, just faster (via AWS backbone instead of public internet).
PrivateLink lets you expose your service privately to other VPCs and AWS accounts without VPC peering or making it public. Traffic never leaves the AWS network.
Your VPC (service provider)
↑ Network Load Balancer
↑ VPC Endpoint Service (PrivateLink)
↓ Interface Endpoint (ENI in consumer VPC)
Consumer VPC (your customer / another team's account)
Use cases:
Recommendation: CDK for AWS-only teams that want developer-friendly IaC. Terraform for multi-cloud or existing Terraform expertise.
| Strategy | RTO | RPO | Cost |
|---|---|---|---|
| Backup & Restore | Hours | Hours | Lowest |
| Pilot Light (minimal services running) | ~10 min | Minutes | Low |
| Warm Standby (scaled-down replica) | Minutes | Seconds | Medium |
| Multi-Region Active/Active | Near-zero | Near-zero | Highest |
Client
↓ HTTPS
CloudFront (CDN, WAF, SSL termination)
↓
API Gateway (HTTP API — routing, throttling, auth)
↓ trigger
Lambda functions (business logic, per route)
↓ ↓ ↓
DynamoDB Secrets Mgr SQS/SNS
(data) (credentials) (async work)
Supporting services:
- Route 53: DNS
- ACM: SSL certificate (auto-renewed)
- Cognito: user pools + JWT auth for API Gateway
- CloudWatch: logs + metrics + alarms
- X-Ray: distributed tracing
- CodePipeline + SAM/CDK: CI/CD deployment
Cost: all serverless — pay only for actual requests. Zero idle cost. Auto-scales to millions of requests. The DynamoDB table auto-scales read/write capacity (on-demand mode).
Ingestion layer:
Kinesis Data Streams ← real-time streams (IoT, clicks, logs)
AWS Database Migration Service ← bulk migration from RDBMS
S3 Direct Upload ← batch file drops
Storage (S3 Data Lake):
s3://data-lake/raw/ ← unprocessed data (never delete)
s3://data-lake/curated/ ← cleaned, partitioned Parquet
s3://data-lake/analytics/← aggregated, business-ready
Processing:
AWS Glue (ETL jobs, data catalog, crawlers)
EMR (Spark for large-scale transformations)
Lambda (lightweight transforms on small events)
Cataloging:
AWS Glue Data Catalog ← schema registry, Athena uses this
Query / Analytics:
Amazon Athena ← ad-hoc SQL on S3
Amazon Redshift Spectrum ← join Redshift tables with S3
QuickSight ← BI dashboards
Governance:
AWS Lake Formation ← fine-grained access control on tables/columns
Step Functions is a serverless workflow orchestration service. You define a state machine (JSON/YAML) and Step Functions handles execution, retries, error handling, and state.
{
"StartAt": "ValidateOrder",
"States": {
"ValidateOrder": {"Type": "Task", "Resource": "arn:...validate", "Next": "ReserveStock"},
"ReserveStock": {"Type": "Task", "Resource": "arn:...reserve", "Next": "ChargePayment",
"Retry": [{"ErrorEquals":["ServiceException"],"MaxAttempts":3}],
"Catch": [{"ErrorEquals":["OutOfStock"],"Next": "NotifyOOS"}]},
"ChargePayment": {"Type": "Task", "Resource": "arn:...charge", "End": true}
}
}
Use Step Functions over Lambda chaining when:
AWS Organizations lets you manage multiple AWS accounts centrally. Typical account structure:
Management/Root Account (billing, SCPs only)
├── Security OU
│ ├── Log Archive Account (centralised CloudTrail/Config logs)
│ └── Security Tooling Account (GuardDuty, Security Hub)
├── Infrastructure OU
│ ├── Shared Services Account (DNS, monitoring, CI/CD)
│ └── Network Account (Transit Gateway, Direct Connect)
├── Workloads OU
│ ├── Dev Account
│ ├── Staging Account
│ └── Production Account
└── Sandbox OU
└── Developer Sandbox Accounts
Benefits: blast radius isolation (prod compromise doesn't affect dev), separate billing per team/product, SCP guardrails enforce security policies across all accounts (e.g. "never disable CloudTrail", "only us-east-1 and eu-west-1 allowed").
SCPs are IAM-like policies attached to an AWS Organization, OU, or account. They set the maximum permissions available to any principal (including root) in that account. They do NOT grant permissions — they restrict the maximum that IAM policies can grant.
// Guardrail SCP: prevent disabling CloudTrail in any member account
{
"Sid": "DenyCloudTrailDisable",
"Effect": "Deny",
"Action": [
"cloudtrail:DeleteTrail",
"cloudtrail:StopLogging",
"cloudtrail:UpdateTrail"
],
"Resource": "*"
}
// Even if an account admin has AdministratorAccess, this Deny overrides it
Common SCPs: deny region usage outside approved regions, deny creating public S3 buckets, deny root account usage, require MFA for sensitive actions.
AWS Config continuously records configuration changes to AWS resources and evaluates them against compliance rules.
s3-bucket-public-read-prohibited, restricted-ssh, encrypted-volumes. Custom rules via Lambda.Pair Config with Security Hub to aggregate findings from GuardDuty, Inspector, Macie, and Config rules in one dashboard.
EventBridge is a serverless event bus. Services publish events; rules route matching events to targets.
// Rule: route EC2 state changes to Lambda for notification
{
"source": ["aws.ec2"],
"detail-type": ["EC2 Instance State-change Notification"],
"detail": {"state": ["terminated"]}
}
// Target: Lambda, SQS, SNS, Step Functions, API Gateway, another bus
// Custom events from your apps:
aws events put-events --entries '[{
"Source": "com.myapp.orders",
"DetailType": "OrderPlaced",
"Detail": "{\"orderId\":\"123\",\"amount\":99.99}"
}]'
EventBridge differs from SNS: schema registry for event discovery, archive and replay, cross-account event routing, partner event sources (Stripe, Zendesk, GitHub directly publish to your bus). The backbone of modern serverless event-driven architectures on AWS.
Cost Explorer visualises spending trends by service, account, tag, region. Key cost optimisation actions:
// Common flow:
1. User signs in via Cognito User Pool → gets JWT
2. JWT attached to API Gateway request → validated via Cognito authorizer
3. JWT exchanged at Identity Pool → temporary AWS credentials
4. App accesses S3 bucket directly with user-scoped IAM role
Cognito is the go-to for application authentication in AWS without building your own auth server. Handles the heavy lifting of token management, MFA, and federation.
AWS Backup is a centralised managed backup service that automates backups across EBS, RDS, DynamoDB, EFS, FSx, S3, and EC2 from one place.
Service-native backups (RDS automated backups, EBS snapshots) still exist but lack centralised management. AWS Backup provides a single audit trail ("show me all backups from the past 90 days across all resources") — important for compliance.
AWS Database Migration Service (DMS) enables near-zero downtime migration:
DMS supports homogeneous (Oracle → RDS Oracle) and heterogeneous (Oracle → Aurora PostgreSQL) migrations. For very large databases, use Snowball Edge for initial data transfer, then DMS for CDC replication.
Stage 1: Source
CodeCommit / GitHub → triggers on push to main
Stage 2: Build (CodeBuild)
- Run unit tests
- docker build
- docker push to ECR
- Generate imagedefinitions.json with new image tag
Stage 3: Test (CodeBuild)
- Deploy to staging ECS/EKS using task definition with new image
- Run integration tests
- Run OWASP ZAP security scan
Stage 4: Deploy to Prod (CodeDeploy)
- ECS Blue/Green deployment via CodeDeploy
- Shift 10% traffic to new task set
- CloudWatch alarm gates (error rate OK?)
- Shift 100% traffic after 5 min
- Terminate old task set after 60 min
Notifications:
EventBridge → SNS → Slack on pipeline failure
Amazon Macie uses ML to automatically discover, classify, and protect sensitive data in S3. It scans S3 objects and identifies:
Macie also monitors S3 bucket security configuration — alerts on public buckets, unencrypted buckets, buckets shared outside the organisation.
When you need it:
Global:
Route 53 → latency-based routing to nearest region
CloudFront → CDN, WAF, SSL termination
Shield Standard → DDoS protection (free)
Region 1 (us-east-1): [primary]
Application Tier (3 AZs):
ALB → spans all 3 AZs
ECS Fargate (or EC2 ASG) in private subnets
Auto Scaling → target tracking on CPU/request count
Data Tier (Multi-AZ):
Aurora Cluster → writer in AZ-1, read replica in AZ-2, AZ-3
ElastiCache Redis → cluster mode, cross-AZ
RDS Proxy → connection pooling
Supporting:
Secrets Manager → DB credentials with auto-rotation
SSM Parameter Store → app config
S3 + CloudFront → static assets
SQS + Lambda → async background work
Networking:
VPC: 3 public subnets (ALB, NAT GW), 3 private app subnets, 3 private DB subnets
VPC Endpoints for S3, DynamoDB, Secrets Manager
Security Groups: layered (ALB SG → App SG → DB SG)
Observability:
CloudWatch: metrics, logs, alarms → SNS → PagerDuty
X-Ray: distributed tracing
CloudTrail + Config: compliance
Disaster Recovery:
Aurora Global DB → Region 2 as standby (RPO < 1s)
Route 53 failover record → Region 2 if Region 1 health fails