AWS Terraform Guide: Infrastructure Automation on AWS
Published June 6, 2026 · 18 min read
Terraform has become the de facto Infrastructure as Code (IaC) tool for AWS deployments in 2026. Its declarative HCL syntax, massive community module ecosystem, and multi-cloud portability make it the first choice for teams that want reproducible, auditable infrastructure. This guide covers everything from initial provider setup and remote state management all the way to production-grade VPC, EKS, and RDS patterns, CI/CD integration, and security scanning.
1. Terraform vs CloudFormation vs CDK: Choosing the Right Tool
Before writing a single line of HCL, it is worth understanding where each IaC tool fits. AWS offers two native options — CloudFormation and CDK — while Terraform (by HashiCorp, now an OpenTofu-compatible open standard) is the leading third-party choice.
CloudFormation
CloudFormation is 100% native to AWS. It requires no external tooling, supports every AWS service on the day it launches, and integrates tightly with IAM, AWS Config, Service Catalog, and StackSets for multi-account deployments. The trade-off is verbose JSON/YAML, AWS-only scope, and limited community reuse patterns.
AWS CDK
CDK compiles high-level code (TypeScript, Python, Java, Go) down to CloudFormation. It is ideal when your team already writes application code and wants to use familiar languages with type-safety and IDE autocompletion. CDK still deploys through CloudFormation, so the same native AWS integration applies.
Terraform
Terraform wins on three dimensions:
- Multi-cloud: The same workflow manages AWS, GCP, Azure, Kubernetes, Datadog, GitHub, and hundreds of other providers. Teams running hybrid environments need a single tool.
- Community modules: The Terraform Registry hosts thousands of battle-tested modules.
terraform-aws-modules/vpcalone has been downloaded hundreds of millions of times. - Mature drift detection:
terraform plancomputes the diff between desired state and real infrastructure. Any manual change made in the console shows up as a drift in the next plan.
2. AWS Provider Setup: Authentication and Version Pinning
The AWS provider is the bridge between Terraform and the AWS APIs. Getting authentication and version pinning right from day one prevents the two most common production incidents: accidental credential exposure and provider upgrade breakage.
Authentication Methods
The AWS provider resolves credentials in this order:
- Static credentials in provider block — never use in production; secrets end up in state files and version control.
- Environment variables —
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,AWS_SESSION_TOKEN. Suitable for CI environments that inject short-lived credentials. - Shared credentials file —
~/.aws/credentialswith named profiles. - EC2 instance profile / ECS task role / EKS IRSA — the correct approach for workloads running inside AWS. No credentials to manage at all.
- AWS SSO / IAM Identity Center — preferred for developer workstations in 2026. Run
aws sso login --profile devand Terraform picks up the temporary token automatically.
Example 1: provider.tf with S3 Backend and DynamoDB State Locking
# provider.tf
# ─────────────────────────────────────────────────────────────────
# Terraform and provider version constraints
terraform {
required_version = ">= 1.7.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.50" # allow patch/minor upgrades, block major
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "~> 2.30"
}
}
# ── Remote state: S3 bucket + DynamoDB lock table ──────────────
backend "s3" {
bucket = "techoral-terraform-state-prod"
key = "infra/vpc/terraform.tfstate"
region = "us-east-1"
encrypt = true # SSE-S3 by default
kms_key_id = "arn:aws:kms:us-east-1:123456789012:key/mrk-abc123" # optional CMK
dynamodb_table = "terraform-state-locks" # PAY_PER_REQUEST table, LockID string key
profile = "techoral-prod" # AWS CLI named profile
}
}
# ── Primary provider (us-east-1) ───────────────────────────────
provider "aws" {
region = var.aws_region
profile = var.aws_profile
default_tags {
tags = {
ManagedBy = "terraform"
Environment = var.environment
Project = "techoral"
Owner = "platform-team"
}
}
}
# ── Secondary provider alias for us-west-2 ─────────────────────
provider "aws" {
alias = "us_west_2"
region = "us-west-2"
profile = var.aws_profile
default_tags {
tags = {
ManagedBy = "terraform"
Environment = var.environment
}
}
}
terraform init. Create them manually once (or with a separate bootstrap Terraform workspace). Enable S3 versioning so you can roll back to any previous state version. The DynamoDB table needs a single LockID String attribute as the partition key — no sort key, no additional attributes.
Bootstrapping the Backend Resources
A common pattern is to keep a bootstrap/ folder at the repo root with a minimal Terraform config that uses the local backend to create the S3 bucket and DynamoDB table. This chicken-and-egg setup only runs once per AWS account.
# bootstrap/main.tf (uses local backend — runs once)
resource "aws_s3_bucket" "tf_state" {
bucket = "techoral-terraform-state-prod"
force_destroy = false
}
resource "aws_s3_bucket_versioning" "tf_state" {
bucket = aws_s3_bucket.tf_state.id
versioning_configuration { status = "Enabled" }
}
resource "aws_s3_bucket_server_side_encryption_configuration" "tf_state" {
bucket = aws_s3_bucket.tf_state.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
}
}
}
resource "aws_s3_bucket_public_access_block" "tf_state" {
bucket = aws_s3_bucket.tf_state.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
resource "aws_dynamodb_table" "tf_locks" {
name = "terraform-state-locks"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
}
3. Terraform Modules: Reusability and the Community Ecosystem
Modules are the fundamental reuse mechanism in Terraform. A module is simply a directory containing .tf files with a defined interface: input variables, output outputs, and internal logic in locals and resource blocks.
Community Modules from the Terraform Registry
Rather than writing VPC or EKS resources from scratch, call a community module and pass your configuration as variables:
# Using community modules from registry.terraform.io
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.8.1"
name = "techoral-prod-vpc"
cidr = "10.0.0.0/16"
azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
enable_nat_gateway = true
single_nat_gateway = false # one NAT per AZ for HA
enable_dns_hostnames = true
enable_dns_support = true
# Required tags for EKS load balancer controller
public_subnet_tags = {
"kubernetes.io/role/elb" = "1"
}
private_subnet_tags = {
"kubernetes.io/role/internal-elb" = "1"
}
}
Example 2: Writing a Reusable VPC Module (variables.tf + outputs.tf)
For production teams, writing your own wrapper module lets you enforce naming conventions, mandatory tags, and security defaults across all environments. The module interface pattern using variables.tf and outputs.tf:
# modules/vpc/variables.tf
variable "name" {
description = "Name prefix for all VPC resources"
type = string
}
variable "cidr" {
description = "Primary CIDR block for the VPC (e.g. 10.0.0.0/16)"
type = string
}
variable "azs" {
description = "List of Availability Zones (minimum 3 for production)"
type = list(string)
validation {
condition = length(var.azs) >= 2
error_message = "At least 2 Availability Zones are required."
}
}
variable "private_subnets" {
description = "CIDR blocks for private subnets (one per AZ)"
type = list(string)
}
variable "public_subnets" {
description = "CIDR blocks for public subnets (one per AZ)"
type = list(string)
}
variable "enable_flow_logs" {
description = "Enable VPC Flow Logs to CloudWatch"
type = bool
default = true
}
variable "environment" {
description = "Deployment environment: dev, staging, prod"
type = string
}
variable "tags" {
description = "Additional tags to apply to all resources"
type = map(string)
default = {}
}
# ─── modules/vpc/outputs.tf ────────────────────────────────────
output "vpc_id" {
description = "The ID of the VPC"
value = aws_vpc.this.id
}
output "private_subnet_ids" {
description = "List of private subnet IDs"
value = aws_subnet.private[*].id
}
output "public_subnet_ids" {
description = "List of public subnet IDs"
value = aws_subnet.public[*].id
}
output "nat_gateway_ids" {
description = "List of NAT Gateway IDs (one per AZ)"
value = aws_nat_gateway.this[*].id
}
output "vpc_cidr_block" {
description = "The primary CIDR block of the VPC"
value = aws_vpc.this.cidr_block
}
Example 3: VPC Module main.tf — Complete Implementation
# modules/vpc/main.tf
locals {
az_count = length(var.azs)
common_tags = merge(var.tags, {
Module = "vpc"
Environment = var.environment
ManagedBy = "terraform"
})
}
# ── VPC ────────────────────────────────────────────────────────
resource "aws_vpc" "this" {
cidr_block = var.cidr
enable_dns_support = true
enable_dns_hostnames = true
tags = merge(local.common_tags, { Name = var.name })
}
# ── Internet Gateway ───────────────────────────────────────────
resource "aws_internet_gateway" "this" {
vpc_id = aws_vpc.this.id
tags = merge(local.common_tags, { Name = "${var.name}-igw" })
}
# ── Public subnets ─────────────────────────────────────────────
resource "aws_subnet" "public" {
count = local.az_count
vpc_id = aws_vpc.this.id
cidr_block = var.public_subnets[count.index]
availability_zone = var.azs[count.index]
map_public_ip_on_launch = true
tags = merge(local.common_tags, {
Name = "${var.name}-public-${var.azs[count.index]}"
"kubernetes.io/role/elb" = "1"
})
}
# ── Private subnets ────────────────────────────────────────────
resource "aws_subnet" "private" {
count = local.az_count
vpc_id = aws_vpc.this.id
cidr_block = var.private_subnets[count.index]
availability_zone = var.azs[count.index]
tags = merge(local.common_tags, {
Name = "${var.name}-private-${var.azs[count.index]}"
"kubernetes.io/role/internal-elb" = "1"
})
}
# ── Elastic IPs for NAT Gateways ───────────────────────────────
resource "aws_eip" "nat" {
count = local.az_count
domain = "vpc"
tags = merge(local.common_tags, { Name = "${var.name}-nat-eip-${count.index}" })
}
# ── NAT Gateways (one per AZ for HA) ──────────────────────────
resource "aws_nat_gateway" "this" {
count = local.az_count
allocation_id = aws_eip.nat[count.index].id
subnet_id = aws_subnet.public[count.index].id
depends_on = [aws_internet_gateway.this]
tags = merge(local.common_tags, {
Name = "${var.name}-nat-${var.azs[count.index]}"
})
}
# ── Public route table ─────────────────────────────────────────
resource "aws_route_table" "public" {
vpc_id = aws_vpc.this.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.this.id
}
tags = merge(local.common_tags, { Name = "${var.name}-public-rt" })
}
resource "aws_route_table_association" "public" {
count = local.az_count
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
# ── Private route tables (one per AZ → local NAT) ─────────────
resource "aws_route_table" "private" {
count = local.az_count
vpc_id = aws_vpc.this.id
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.this[count.index].id
}
tags = merge(local.common_tags, {
Name = "${var.name}-private-rt-${var.azs[count.index]}"
})
}
resource "aws_route_table_association" "private" {
count = local.az_count
subnet_id = aws_subnet.private[count.index].id
route_table_id = aws_route_table.private[count.index].id
}
# ── VPC Flow Logs ──────────────────────────────────────────────
resource "aws_flow_log" "this" {
count = var.enable_flow_logs ? 1 : 0
vpc_id = aws_vpc.this.id
traffic_type = "ALL"
iam_role_arn = aws_iam_role.flow_logs[0].arn
log_destination = aws_cloudwatch_log_group.flow_logs[0].arn
tags = merge(local.common_tags, { Name = "${var.name}-flow-logs" })
}
resource "aws_cloudwatch_log_group" "flow_logs" {
count = var.enable_flow_logs ? 1 : 0
name = "/aws/vpc/flowlogs/${var.name}"
retention_in_days = 30
tags = local.common_tags
}
resource "aws_iam_role" "flow_logs" {
count = var.enable_flow_logs ? 1 : 0
name = "${var.name}-flow-logs-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = { Service = "vpc-flow-logs.amazonaws.com" }
Action = "sts:AssumeRole"
}]
})
}
resource "aws_iam_role_policy" "flow_logs" {
count = var.enable_flow_logs ? 1 : 0
name = "${var.name}-flow-logs-policy"
role = aws_iam_role.flow_logs[0].id
policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Action = ["logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents", "logs:DescribeLogGroups", "logs:DescribeLogStreams"]
Resource = "*"
}]
})
}
4. Workspaces: Environment Separation
Terraform workspaces let you maintain multiple state files in the same backend configuration, giving you dev/staging/prod separation without duplicating your entire codebase.
# Create and switch workspaces
terraform workspace new dev
terraform workspace new staging
terraform workspace new prod
terraform workspace select prod
terraform workspace list
# Output:
# dev
# staging
# * prod
Reference the current workspace in your configuration to make environment-specific decisions:
# Use workspace name in resource sizing
locals {
is_prod = terraform.workspace == "prod"
instance_type = local.is_prod ? "t3.large" : "t3.small"
min_capacity = local.is_prod ? 3 : 1
max_capacity = local.is_prod ? 20 : 5
}
# Workspace-specific variable files
# terraform plan -var-file="envs/${terraform.workspace}.tfvars"
5. EKS Cluster with Terraform: Managed Node Groups and IRSA
Provisioning Amazon EKS with Terraform involves the EKS control plane, managed node groups, and IAM Roles for Service Accounts (IRSA) so pods can access AWS APIs without static credentials.
Example 4: EKS Cluster Resource with Managed Node Group
# eks/main.tf
data "aws_eks_cluster_auth" "this" {
name = aws_eks_cluster.this.name
}
# ── EKS Control Plane ──────────────────────────────────────────
resource "aws_eks_cluster" "this" {
name = var.cluster_name
version = var.kubernetes_version # e.g. "1.30"
role_arn = aws_iam_role.eks_cluster.arn
vpc_config {
subnet_ids = concat(var.private_subnet_ids, var.public_subnet_ids)
endpoint_private_access = true
endpoint_public_access = true # lock down in prod via public_access_cidrs
public_access_cidrs = var.allowed_cidrs
security_group_ids = [aws_security_group.eks_cluster.id]
}
enabled_cluster_log_types = ["api", "audit", "authenticator", "controllerManager", "scheduler"]
encryption_config {
resources = ["secrets"]
provider {
key_arn = var.kms_key_arn
}
}
depends_on = [
aws_iam_role_policy_attachment.eks_cluster_policy,
aws_iam_role_policy_attachment.eks_vpc_controller,
]
tags = var.tags
}
# ── Managed Node Group ─────────────────────────────────────────
resource "aws_eks_node_group" "app" {
cluster_name = aws_eks_cluster.this.name
node_group_name = "${var.cluster_name}-app-nodes"
node_role_arn = aws_iam_role.eks_node.arn
subnet_ids = var.private_subnet_ids
instance_types = [var.node_instance_type] # e.g. "m6i.xlarge"
scaling_config {
desired_size = var.node_desired
min_size = var.node_min
max_size = var.node_max
}
update_config {
max_unavailable_percentage = 25
}
labels = {
role = "app"
}
taint {
key = "dedicated"
value = "app"
effect = "NO_SCHEDULE" # remove for general-purpose nodes
}
launch_template {
id = aws_launch_template.eks_node.id
version = aws_launch_template.eks_node.latest_version
}
depends_on = [
aws_iam_role_policy_attachment.eks_node_policy,
aws_iam_role_policy_attachment.eks_cni_policy,
aws_iam_role_policy_attachment.eks_ecr_policy,
]
}
# ── IRSA: IAM Role for Service Account ─────────────────────────
# Allows the aws-load-balancer-controller SA to call AWS APIs
data "aws_iam_openid_connect_provider" "eks" {
url = aws_eks_cluster.this.identity[0].oidc[0].issuer
}
resource "aws_iam_role" "alb_controller" {
name = "${var.cluster_name}-alb-controller"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = {
Federated = data.aws_iam_openid_connect_provider.eks.arn
}
Action = "sts:AssumeRoleWithWebIdentity"
Condition = {
StringEquals = {
"${replace(aws_eks_cluster.this.identity[0].oidc[0].issuer, "https://", "")}:sub" = "system:serviceaccount:kube-system:aws-load-balancer-controller"
}
}
}]
})
}
resource "aws_iam_role_policy_attachment" "alb_controller" {
role = aws_iam_role.alb_controller.name
policy_arn = aws_iam_policy.alb_controller.arn
}
# ── aws-auth ConfigMap (grant node group access) ───────────────
resource "kubernetes_config_map_v1_data" "aws_auth" {
metadata {
name = "aws-auth"
namespace = "kube-system"
}
force = true
data = {
mapRoles = yamlencode([
{
rolearn = aws_iam_role.eks_node.arn
username = "system:node:{{EC2PrivateDNSName}}"
groups = ["system:bootstrappers", "system:nodes"]
},
])
}
}
6. RDS Module with Parameter Group and Subnet Group
Provisioning Amazon RDS with Terraform involves the DB instance, a subnet group spanning private subnets, a parameter group for engine tuning, and a security group that restricts access to the application tier only.
Example 5: RDS Module — Complete Resource Set
# modules/rds/main.tf
resource "aws_db_subnet_group" "this" {
name = "${var.identifier}-subnet-group"
subnet_ids = var.private_subnet_ids
description = "Subnet group for ${var.identifier} RDS instance"
tags = var.tags
}
resource "aws_db_parameter_group" "this" {
name = "${var.identifier}-params"
family = var.parameter_group_family # e.g. "postgres15"
parameter {
name = "log_connections"
value = "1"
}
parameter {
name = "log_disconnections"
value = "1"
}
parameter {
name = "log_min_duration_statement"
value = "1000" # log queries slower than 1s
}
parameter {
name = "shared_preload_libraries"
value = "pg_stat_statements"
apply_method = "pending-reboot"
}
tags = var.tags
}
resource "aws_security_group" "rds" {
name = "${var.identifier}-rds-sg"
description = "Allow inbound from app security group only"
vpc_id = var.vpc_id
ingress {
from_port = var.db_port
to_port = var.db_port
protocol = "tcp"
security_groups = var.allowed_security_group_ids
description = "App tier access to RDS"
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = merge(var.tags, { Name = "${var.identifier}-rds-sg" })
}
resource "aws_db_instance" "this" {
identifier = var.identifier
engine = var.engine # "postgres"
engine_version = var.engine_version # "15.6"
instance_class = var.instance_class # "db.r6g.large"
allocated_storage = var.allocated_storage # 100 (GB)
storage_type = "gp3"
storage_encrypted = true
kms_key_id = var.kms_key_arn
db_name = var.db_name
username = var.db_username
password = var.db_password # use aws_secretsmanager_secret_version in production
db_subnet_group_name = aws_db_subnet_group.this.name
parameter_group_name = aws_db_parameter_group.this.name
vpc_security_group_ids = [aws_security_group.rds.id]
multi_az = var.multi_az # true in prod
publicly_accessible = false
deletion_protection = var.deletion_protection
backup_retention_period = var.backup_retention_days # 7
backup_window = "03:00-04:00"
maintenance_window = "sun:04:00-sun:05:00"
performance_insights_enabled = true
performance_insights_retention_period = 7
enabled_cloudwatch_logs_exports = ["postgresql", "upgrade"]
tags = var.tags
lifecycle {
prevent_destroy = true # safety net for production databases
}
}
random_password, store it in AWS Secrets Manager, and reference it with a data source. The RDS instance can be configured with manage_master_user_password = true to have RDS itself rotate the secret automatically.
7. CI/CD: GitHub Actions Workflow for Terraform
The gold standard Terraform CI/CD pattern is: plan on every pull request (so reviewers see the infrastructure diff), and apply on merge to main (so production changes are gated behind code review).
Example 6: GitHub Actions Workflow — Plan on PR, Apply on Merge
# .github/workflows/terraform.yml
name: Terraform CI/CD
on:
pull_request:
branches: [main]
paths: ["infra/**"]
push:
branches: [main]
paths: ["infra/**"]
permissions:
id-token: write # required for OIDC authentication
contents: read
pull-requests: write # post plan output as PR comment
env:
TF_VERSION: "1.7.5"
AWS_REGION: "us-east-1"
WORKING_DIR: "infra/vpc"
jobs:
terraform-plan:
name: Terraform Plan
runs-on: ubuntu-latest
if: github.event_name == 'pull_request'
defaults:
run:
working-directory: ${{ env.WORKING_DIR }}
steps:
- uses: actions/checkout@v4
- name: Configure AWS credentials (OIDC)
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/github-actions-terraform
aws-region: ${{ env.AWS_REGION }}
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: ${{ env.TF_VERSION }}
- name: Terraform Format Check
run: terraform fmt -check -recursive
- name: Terraform Init
run: terraform init -input=false
- name: Terraform Validate
run: terraform validate
- name: Run tfsec security scan
uses: aquasecurity/tfsec-action@v1.0.0
with:
working_directory: ${{ env.WORKING_DIR }}
soft_fail: true
- name: Run Checkov policy scan
uses: bridgecrewio/checkov-action@v12
with:
directory: ${{ env.WORKING_DIR }}
soft_fail: true
- name: Terraform Plan
id: plan
run: |
terraform plan \
-var-file="envs/prod.tfvars" \
-out=tfplan \
-input=false \
-no-color 2>&1 | tee plan_output.txt
- name: Post Plan to PR
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const plan = fs.readFileSync('${{ env.WORKING_DIR }}/plan_output.txt', 'utf8');
const maxLength = 65000;
const truncated = plan.length > maxLength
? plan.substring(0, maxLength) + '\n\n... (truncated)'
: plan;
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: `## Terraform Plan\n\`\`\`hcl\n${truncated}\n\`\`\``
});
terraform-apply:
name: Terraform Apply
runs-on: ubuntu-latest
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
environment: production # requires manual approval in GitHub Environments
defaults:
run:
working-directory: ${{ env.WORKING_DIR }}
steps:
- uses: actions/checkout@v4
- name: Configure AWS credentials (OIDC)
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/github-actions-terraform
aws-region: ${{ env.AWS_REGION }}
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: ${{ env.TF_VERSION }}
- name: Terraform Init
run: terraform init -input=false
- name: Terraform Apply
run: |
terraform apply \
-var-file="envs/prod.tfvars" \
-auto-approve \
-input=false
token.actions.githubusercontent.com in your AWS account and a role with a trust policy restricting to your specific GitHub org/repo. See the IAM Roles & Policies guide for the full trust policy pattern.
8. Terraform Best Practices for AWS in 2026
Project Structure
Organize your repository so each independently deployable unit of infrastructure is its own Terraform root module (its own state file). A common layout:
infra/
├── bootstrap/ # S3 bucket + DynamoDB lock table (local backend, runs once)
├── modules/
│ ├── vpc/
│ ├── eks/
│ ├── rds/
│ └── iam/
├── envs/
│ ├── dev/
│ │ ├── main.tf # calls modules, dev-specific values
│ │ ├── provider.tf # backend key = "dev/terraform.tfstate"
│ │ └── terraform.tfvars
│ ├── staging/
│ └── prod/
└── .github/workflows/terraform.yml
Security Scanning
Integrate static analysis into your CI pipeline:
- tfsec — fast HCL-aware scanner. Catches issues like S3 buckets without public access blocks, security groups open to 0.0.0.0/0, unencrypted EBS volumes, and missing CloudTrail encryption.
- Checkov — broader policy library including CIS benchmarks for AWS. Run both since they catch different things.
- Infracost — posts a cost estimate for every PR so you know the monthly impact of the planned change before it lands.
DRY Configuration with Terragrunt
Terragrunt (by Gruntwork) is a thin Terraform wrapper that eliminates the repeated backend and provider boilerplate across environments. A root terragrunt.hcl defines the backend template, and each environment folder adds only the variables specific to that environment. The run-all plan command plans all modules in dependency order.
Importing Existing Resources
When adopting Terraform in a brownfield AWS account, import existing resources rather than recreating them:
# Terraform 1.5+ declarative import block
import {
to = aws_s3_bucket.assets
id = "my-existing-bucket-name"
}
import {
to = aws_vpc.main
id = "vpc-0abc123def456789"
}
# Then run: terraform plan
# Terraform generates the resource config diff and shows what attributes
# are not yet represented in your .tf files.
State Management Rules
- Never edit state files manually. Use
terraform state mv,terraform state rm, andterraform import. - Enable S3 bucket versioning on your state bucket — every apply creates a new object version, giving you a complete audit trail and instant rollback.
- Use DynamoDB locking in all environments, not just production. Concurrent plans in CI against the same state file cause corruptions that are difficult to recover from.
- Keep sensitive outputs out of state where possible. Prefer referencing AWS Secrets Manager ARNs instead of actual secret values.
Module Versioning
Always pin module sources to specific versions in production:
# Good — pinned to a specific semver tag
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.8.1"
}
# Risky — unpinned, will pull latest on next terraform init
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
}
Plan Discipline
In production, never run terraform apply without reviewing the plan output first. Use saved plan files (terraform plan -out=tfplan then terraform apply tfplan) to guarantee that the apply executes exactly what was reviewed — no drift from the time between plan and apply.
Frequently Asked Questions
What happens if two engineers run terraform apply at the same time?
With DynamoDB state locking, the second apply blocks immediately with a lock error showing who holds the lock. Without locking (or with local state), both applies can race and corrupt the state file. This is why remote state with locking is non-negotiable for team environments — never use local state in shared infrastructure.
How do I handle secrets like database passwords in Terraform?
The recommended pattern in 2026 is to use random_password to generate a password, store it in AWS Secrets Manager with aws_secretsmanager_secret_version, and reference the Secrets Manager ARN in your application configuration. The password value still ends up in Terraform state (encrypted at rest with KMS), so the state bucket must be access-controlled. RDS also supports manage_master_user_password = true which delegates rotation entirely to RDS and keeps the value out of state.
Should I use terraform destroy to tear down staging environments?
Yes, but with safeguards. Use lifecycle { prevent_destroy = true } on databases and S3 buckets even in staging — accidental destroys of data stores are painful to recover from. For cost savings, consider stopping EC2 instances and EKS node groups via scaling policies rather than full destruction. Terraform Cloud and Atlantis both support automated workspace destruction on a schedule.
What is Terraform Cloud and when should I use it?
Terraform Cloud (now HashiCorp Cloud Platform) provides remote plan/apply execution, policy-as-code with Sentinel/OPA, a web UI for state browsing, and audit logs. It is useful for large teams that want a managed backend without operating their own S3/DynamoDB setup. The free tier covers up to 500 resources. For AWS-centric teams already using GitHub Actions, the S3 backend with the OIDC-authenticated workflow shown above is a cost-effective self-managed alternative.
How do I upgrade the AWS provider without breaking existing resources?
Check the provider changelog for breaking changes between your current version and the target. Use version = "~> 5.50" (tilde-range) to allow automatic patch upgrades. For major version bumps (e.g., 4.x to 5.x), run terraform init -upgrade in a feature branch, run terraform plan, and review every resource for unexpected diffs. The 4.x to 5.x migration removed deprecated arguments around S3 bucket configuration — those changes are mechanical but require updating all existing bucket resources.