AWS VPC Networking: Subnets, Route Tables, NAT Gateway (2026)
A well-designed VPC is the foundation of every secure, scalable AWS architecture. Get it wrong at the start — wrong CIDR block, flat subnet design, no NAT — and you'll be doing painful rearchitecting later. This guide covers VPC design from CIDR planning through Security Groups, NACLs, VPC Endpoints, and Transit Gateway, with the practical patterns used in real production environments.
Table of Contents
VPC CIDR Design
Your VPC's CIDR block determines how many IP addresses are available. AWS supports /16 to /28 blocks. The golden rule: pick a block large enough for future growth, and don't overlap with your on-premises network or other VPCs you might peer with.
RFC 1918 private ranges to choose from:
10.0.0.0/8— largest space, 16 million IPs172.16.0.0/12— 1 million IPs192.168.0.0/16— 65,536 IPs (too small for most prod environments)
A practical multi-account CIDR allocation pattern:
# Account CIDR allocations (no overlap for future Transit Gateway peering)
Production VPC: 10.0.0.0/16 # 65,536 IPs
Staging VPC: 10.1.0.0/16
Development VPC: 10.2.0.0/16
Shared Services: 10.3.0.0/16
# Within Production VPC (10.0.0.0/16), split across 3 AZs:
# Public subnets (ALB, NAT Gateway)
Public-AZ-A: 10.0.0.0/24 # 256 IPs
Public-AZ-B: 10.0.1.0/24
Public-AZ-C: 10.0.2.0/24
# Private subnets (app servers, containers)
Private-AZ-A: 10.0.10.0/23 # 512 IPs
Private-AZ-B: 10.0.12.0/23
Private-AZ-C: 10.0.14.0/23
# Database subnets (RDS, ElastiCache)
DB-AZ-A: 10.0.20.0/24
DB-AZ-B: 10.0.21.0/24
DB-AZ-C: 10.0.22.0/24
Public vs Private Subnets
The distinction is simple but critical: a public subnet has a route to an Internet Gateway; a private subnet does not. Place only resources that must be publicly reachable (load balancers, NAT Gateways, bastion hosts) in public subnets. Everything else — app servers, databases, caches, internal services — belongs in private subnets.
| Resource | Subnet Type | Reason |
|---|---|---|
| Application Load Balancer | Public | Receives inbound traffic from internet |
| NAT Gateway | Public | Needs an Elastic IP and IGW route |
| EC2 App Servers | Private | Accessed only via ALB, no direct internet |
| RDS / Aurora | Private (DB subnet) | Never exposed to internet |
| ElastiCache | Private | Internal use only |
| Lambda (VPC-attached) | Private | Access to VPC resources, outbound via NAT |
Route Tables and Internet Gateway
Every subnet is associated with a route table. The route table controls where traffic goes based on the destination CIDR. A subnet becomes "public" by associating it with a route table that has a 0.0.0.0/0 route pointing to an Internet Gateway (IGW).
# Create VPC
VPC_ID=$(aws ec2 create-vpc \
--cidr-block 10.0.0.0/16 \
--query Vpc.VpcId --output text)
aws ec2 modify-vpc-attribute \
--vpc-id $VPC_ID \
--enable-dns-hostnames
# Create and attach Internet Gateway
IGW_ID=$(aws ec2 create-internet-gateway \
--query InternetGateway.InternetGatewayId --output text)
aws ec2 attach-internet-gateway \
--internet-gateway-id $IGW_ID \
--vpc-id $VPC_ID
# Create public subnet
PUB_SUBNET=$(aws ec2 create-subnet \
--vpc-id $VPC_ID \
--cidr-block 10.0.0.0/24 \
--availability-zone us-east-1a \
--query Subnet.SubnetId --output text)
# Create public route table, add IGW route, associate
PUB_RT=$(aws ec2 create-route-table \
--vpc-id $VPC_ID \
--query RouteTable.RouteTableId --output text)
aws ec2 create-route \
--route-table-id $PUB_RT \
--destination-cidr-block 0.0.0.0/0 \
--gateway-id $IGW_ID
aws ec2 associate-route-table \
--route-table-id $PUB_RT \
--subnet-id $PUB_SUBNET
NAT Gateway vs NAT Instance
Private subnet instances often need outbound internet access — for OS updates, API calls, pulling container images. NAT Gateway is the managed solution; NAT Instance is a DIY EC2 approach.
| Feature | NAT Gateway | NAT Instance |
|---|---|---|
| Management | Fully managed by AWS | You manage EC2 patching, HA |
| Bandwidth | Up to 100 Gbps (auto-scales) | Limited by instance type |
| High Availability | Per-AZ, create one per AZ | Manual failover needed |
| Cost | $0.045/hr + $0.045/GB | EC2 instance cost only |
| Security Groups | Not supported | Supported |
| Use Case | Production | Dev/test, cost-sensitive |
# Create NAT Gateway in public subnet (requires Elastic IP)
EIP=$(aws ec2 allocate-address --domain vpc \
--query AllocationId --output text)
NAT_GW=$(aws ec2 create-nat-gateway \
--subnet-id $PUB_SUBNET \
--allocation-id $EIP \
--query NatGateway.NatGatewayId --output text)
# Wait for NAT Gateway to become available
aws ec2 wait nat-gateway-available --nat-gateway-ids $NAT_GW
# Route private subnet through NAT Gateway
aws ec2 create-route \
--route-table-id $PRIV_RT \
--destination-cidr-block 0.0.0.0/0 \
--nat-gateway-id $NAT_GW
Security Groups vs NACLs
Both filter traffic, but they operate at different levels and have different behaviors:
| Feature | Security Groups | Network ACLs |
|---|---|---|
| Scope | Instance (ENI) level | Subnet level |
| State | Stateful (return traffic auto-allowed) | Stateless (must allow inbound AND outbound) |
| Rules | Allow only | Allow and Deny |
| Evaluation | All rules evaluated | Rules evaluated in order (lowest number first) |
| Default | Deny all inbound, allow all outbound | Allow all (default NACL) |
Security Groups are your primary control mechanism. Use them to allow specific ports from specific sources (other Security Groups, CIDR ranges). Use NACLs as a subnet-level firewall for broad rules — blocking a malicious IP range, or as a defense-in-depth layer.
# Create a Security Group for app servers (allow traffic from ALB only)
APP_SG=$(aws ec2 create-security-group \
--group-name app-servers-sg \
--description "App server security group" \
--vpc-id $VPC_ID \
--query GroupId --output text)
# Allow port 8080 from ALB security group only
aws ec2 authorize-security-group-ingress \
--group-id $APP_SG \
--protocol tcp \
--port 8080 \
--source-group $ALB_SG
# Allow all outbound (default, but explicit is clearer)
aws ec2 authorize-security-group-egress \
--group-id $APP_SG \
--protocol -1 \
--cidr 0.0.0.0/0
VPC Peering and VPC Endpoints
VPC Peering connects two VPCs so resources can communicate using private IPs. It's non-transitive (A peers with B, B peers with C — A cannot reach C through B). For more than 3–4 VPCs, use Transit Gateway instead.
VPC Endpoints allow private connectivity to AWS services without going through the internet or NAT Gateway. Two types:
- Gateway Endpoints: For S3 and DynamoDB. Free. Add a route to your route table pointing S3/DynamoDB prefixes to the endpoint.
- Interface Endpoints (AWS PrivateLink): For all other AWS services (SSM, ECR, Secrets Manager, etc.). An ENI is created in your subnet; hourly cost + data processing fee.
# Create a Gateway Endpoint for S3 (free, just a route table entry)
aws ec2 create-vpc-endpoint \
--vpc-id $VPC_ID \
--service-name com.amazonaws.us-east-1.s3 \
--route-table-ids $PRIV_RT_A $PRIV_RT_B $PRIV_RT_C \
--vpc-endpoint-type Gateway
# Create Interface Endpoint for SSM (required for Session Manager in private subnets)
aws ec2 create-vpc-endpoint \
--vpc-id $VPC_ID \
--service-name com.amazonaws.us-east-1.ssm \
--subnet-ids $PRIV_SUBNET_A $PRIV_SUBNET_B \
--security-group-ids $ENDPOINT_SG \
--vpc-endpoint-type Interface \
--private-dns-enabled
ecr.api, ecr.dkr, and s3 (Gateway). Without these, every container image pull goes through NAT Gateway at $0.045/GB — in a large cluster this adds up to hundreds of dollars per month.Transit Gateway Basics
Transit Gateway (TGW) is a regional hub that connects VPCs and on-premises networks. Instead of maintaining N*(N-1)/2 peering connections between N VPCs, each VPC attaches to TGW once. TGW supports transitive routing (unlike VPC Peering), so a spoke VPC can reach any other spoke through the hub.
# CloudFormation: Transit Gateway with two VPC attachments
TransitGateway:
Type: AWS::EC2::TransitGateway
Properties:
AmazonSideAsn: 64512
DefaultRouteTableAssociation: enable
DefaultRouteTablePropagation: enable
Description: "Central hub for all VPCs"
Tags:
- Key: Name
Value: main-tgw
ProductionAttachment:
Type: AWS::EC2::TransitGatewayAttachment
Properties:
TransitGatewayId: !Ref TransitGateway
VpcId: !Ref ProductionVPC
SubnetIds:
- !Ref PrivateSubnetA
- !Ref PrivateSubnetB
- !Ref PrivateSubnetC
Frequently Asked Questions
How many AZs should I deploy my VPC across?
For production workloads, always 3 AZs. Two AZs means a single AZ failure halves your capacity; three AZs means you lose 33% on an AZ failure, which Auto Scaling can absorb. Always create public and private subnets in each AZ so resources in any AZ can reach the internet through a local NAT Gateway.
What CIDR block size should I use for my VPC?
Use /16 for production VPCs. It gives you 65,536 IPs across all subnets. For EKS workloads, you might even consider adding a secondary CIDR block (up to 5 are allowed per VPC) to handle pod IP exhaustion. Don't use anything smaller than /20 (/20 = 4,096 IPs), especially if you might run containerized workloads.
What's the difference between a Security Group and a firewall?
Security Groups are virtual firewalls at the ENI (network interface) level. They're stateful — if you allow inbound port 443, the response traffic is automatically allowed outbound. Network Firewall (AWS managed service) is a more capable firewall with deep packet inspection, intrusion detection, and domain filtering. Use Security Groups for standard port-based access control; use Network Firewall when you need IDS/IPS or need to filter by domain name (e.g., restrict outbound to specific API endpoints).
Can I change my VPC CIDR block after creation?
You can't change the primary CIDR block, but you can add secondary CIDR blocks (up to 5 total, with some restrictions on overlapping ranges). This is commonly done when you run out of IPs in a subnet — add a secondary CIDR and create new subnets from it. Planning ahead with a /16 primary CIDR avoids this complexity.
When should I use VPC Peering vs Transit Gateway?
VPC Peering is cheaper (no hourly fee, no per-GB processing fee) and fine for 2–4 VPCs with simple connectivity needs. Transit Gateway makes sense when you have 5+ VPCs, need transitive routing (spoke-to-spoke traffic), have on-premises connectivity via Direct Connect or VPN, or need centralized network security inspection. TGW costs ~$0.05/hr per attachment plus $0.02/GB processed.