AWS Terraform Guide: Infrastructure Automation on AWS

Published June 6, 2026 · 18 min read

Terraform has become the de facto Infrastructure as Code (IaC) tool for AWS deployments in 2026. Its declarative HCL syntax, massive community module ecosystem, and multi-cloud portability make it the first choice for teams that want reproducible, auditable infrastructure. This guide covers everything from initial provider setup and remote state management all the way to production-grade VPC, EKS, and RDS patterns, CI/CD integration, and security scanning.

1. Terraform vs CloudFormation vs CDK: Choosing the Right Tool

Before writing a single line of HCL, it is worth understanding where each IaC tool fits. AWS offers two native options — CloudFormation and CDK — while Terraform (by HashiCorp, now an OpenTofu-compatible open standard) is the leading third-party choice.

CloudFormation

CloudFormation is 100% native to AWS. It requires no external tooling, supports every AWS service on the day it launches, and integrates tightly with IAM, AWS Config, Service Catalog, and StackSets for multi-account deployments. The trade-off is verbose JSON/YAML, AWS-only scope, and limited community reuse patterns.

AWS CDK

CDK compiles high-level code (TypeScript, Python, Java, Go) down to CloudFormation. It is ideal when your team already writes application code and wants to use familiar languages with type-safety and IDE autocompletion. CDK still deploys through CloudFormation, so the same native AWS integration applies.

Terraform

Terraform wins on three dimensions:

  • Multi-cloud: The same workflow manages AWS, GCP, Azure, Kubernetes, Datadog, GitHub, and hundreds of other providers. Teams running hybrid environments need a single tool.
  • Community modules: The Terraform Registry hosts thousands of battle-tested modules. terraform-aws-modules/vpc alone has been downloaded hundreds of millions of times.
  • Mature drift detection: terraform plan computes the diff between desired state and real infrastructure. Any manual change made in the console shows up as a drift in the next plan.
Rule of thumb: Use CloudFormation/CDK for purely AWS workloads where you want zero external dependencies and native day-1 service support. Use Terraform when you have multi-cloud requirements, when you want to leverage community modules heavily, or when your platform team wants a single IaC workflow across all providers.

2. AWS Provider Setup: Authentication and Version Pinning

The AWS provider is the bridge between Terraform and the AWS APIs. Getting authentication and version pinning right from day one prevents the two most common production incidents: accidental credential exposure and provider upgrade breakage.

Authentication Methods

The AWS provider resolves credentials in this order:

  1. Static credentials in provider block — never use in production; secrets end up in state files and version control.
  2. Environment variablesAWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN. Suitable for CI environments that inject short-lived credentials.
  3. Shared credentials file~/.aws/credentials with named profiles.
  4. EC2 instance profile / ECS task role / EKS IRSA — the correct approach for workloads running inside AWS. No credentials to manage at all.
  5. AWS SSO / IAM Identity Center — preferred for developer workstations in 2026. Run aws sso login --profile dev and Terraform picks up the temporary token automatically.

Example 1: provider.tf with S3 Backend and DynamoDB State Locking

# provider.tf
# ─────────────────────────────────────────────────────────────────
# Terraform and provider version constraints
terraform {
  required_version = ">= 1.7.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.50"    # allow patch/minor upgrades, block major
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 2.30"
    }
  }

  # ── Remote state: S3 bucket + DynamoDB lock table ──────────────
  backend "s3" {
    bucket         = "techoral-terraform-state-prod"
    key            = "infra/vpc/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true                          # SSE-S3 by default
    kms_key_id     = "arn:aws:kms:us-east-1:123456789012:key/mrk-abc123" # optional CMK
    dynamodb_table = "terraform-state-locks"       # PAY_PER_REQUEST table, LockID string key
    profile        = "techoral-prod"               # AWS CLI named profile
  }
}

# ── Primary provider (us-east-1) ───────────────────────────────
provider "aws" {
  region  = var.aws_region
  profile = var.aws_profile

  default_tags {
    tags = {
      ManagedBy   = "terraform"
      Environment = var.environment
      Project     = "techoral"
      Owner       = "platform-team"
    }
  }
}

# ── Secondary provider alias for us-west-2 ─────────────────────
provider "aws" {
  alias   = "us_west_2"
  region  = "us-west-2"
  profile = var.aws_profile

  default_tags {
    tags = {
      ManagedBy   = "terraform"
      Environment = var.environment
    }
  }
}
S3 Backend prerequisites: The S3 bucket and DynamoDB table must exist before running terraform init. Create them manually once (or with a separate bootstrap Terraform workspace). Enable S3 versioning so you can roll back to any previous state version. The DynamoDB table needs a single LockID String attribute as the partition key — no sort key, no additional attributes.

Bootstrapping the Backend Resources

A common pattern is to keep a bootstrap/ folder at the repo root with a minimal Terraform config that uses the local backend to create the S3 bucket and DynamoDB table. This chicken-and-egg setup only runs once per AWS account.

# bootstrap/main.tf  (uses local backend — runs once)
resource "aws_s3_bucket" "tf_state" {
  bucket        = "techoral-terraform-state-prod"
  force_destroy = false
}

resource "aws_s3_bucket_versioning" "tf_state" {
  bucket = aws_s3_bucket.tf_state.id
  versioning_configuration { status = "Enabled" }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "tf_state" {
  bucket = aws_s3_bucket.tf_state.id
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "aws:kms"
    }
  }
}

resource "aws_s3_bucket_public_access_block" "tf_state" {
  bucket                  = aws_s3_bucket.tf_state.id
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_dynamodb_table" "tf_locks" {
  name         = "terraform-state-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

3. Terraform Modules: Reusability and the Community Ecosystem

Modules are the fundamental reuse mechanism in Terraform. A module is simply a directory containing .tf files with a defined interface: input variables, output outputs, and internal logic in locals and resource blocks.

Community Modules from the Terraform Registry

Rather than writing VPC or EKS resources from scratch, call a community module and pass your configuration as variables:

# Using community modules from registry.terraform.io
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.8.1"

  name = "techoral-prod-vpc"
  cidr = "10.0.0.0/16"

  azs             = ["us-east-1a", "us-east-1b", "us-east-1c"]
  private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  public_subnets  = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]

  enable_nat_gateway   = true
  single_nat_gateway   = false   # one NAT per AZ for HA
  enable_dns_hostnames = true
  enable_dns_support   = true

  # Required tags for EKS load balancer controller
  public_subnet_tags = {
    "kubernetes.io/role/elb" = "1"
  }
  private_subnet_tags = {
    "kubernetes.io/role/internal-elb" = "1"
  }
}

Example 2: Writing a Reusable VPC Module (variables.tf + outputs.tf)

For production teams, writing your own wrapper module lets you enforce naming conventions, mandatory tags, and security defaults across all environments. The module interface pattern using variables.tf and outputs.tf:

# modules/vpc/variables.tf
variable "name" {
  description = "Name prefix for all VPC resources"
  type        = string
}

variable "cidr" {
  description = "Primary CIDR block for the VPC (e.g. 10.0.0.0/16)"
  type        = string
}

variable "azs" {
  description = "List of Availability Zones (minimum 3 for production)"
  type        = list(string)
  validation {
    condition     = length(var.azs) >= 2
    error_message = "At least 2 Availability Zones are required."
  }
}

variable "private_subnets" {
  description = "CIDR blocks for private subnets (one per AZ)"
  type        = list(string)
}

variable "public_subnets" {
  description = "CIDR blocks for public subnets (one per AZ)"
  type        = list(string)
}

variable "enable_flow_logs" {
  description = "Enable VPC Flow Logs to CloudWatch"
  type        = bool
  default     = true
}

variable "environment" {
  description = "Deployment environment: dev, staging, prod"
  type        = string
}

variable "tags" {
  description = "Additional tags to apply to all resources"
  type        = map(string)
  default     = {}
}

# ─── modules/vpc/outputs.tf ────────────────────────────────────
output "vpc_id" {
  description = "The ID of the VPC"
  value       = aws_vpc.this.id
}

output "private_subnet_ids" {
  description = "List of private subnet IDs"
  value       = aws_subnet.private[*].id
}

output "public_subnet_ids" {
  description = "List of public subnet IDs"
  value       = aws_subnet.public[*].id
}

output "nat_gateway_ids" {
  description = "List of NAT Gateway IDs (one per AZ)"
  value       = aws_nat_gateway.this[*].id
}

output "vpc_cidr_block" {
  description = "The primary CIDR block of the VPC"
  value       = aws_vpc.this.cidr_block
}

Example 3: VPC Module main.tf — Complete Implementation

# modules/vpc/main.tf
locals {
  az_count = length(var.azs)
  common_tags = merge(var.tags, {
    Module      = "vpc"
    Environment = var.environment
    ManagedBy   = "terraform"
  })
}

# ── VPC ────────────────────────────────────────────────────────
resource "aws_vpc" "this" {
  cidr_block           = var.cidr
  enable_dns_support   = true
  enable_dns_hostnames = true
  tags = merge(local.common_tags, { Name = var.name })
}

# ── Internet Gateway ───────────────────────────────────────────
resource "aws_internet_gateway" "this" {
  vpc_id = aws_vpc.this.id
  tags   = merge(local.common_tags, { Name = "${var.name}-igw" })
}

# ── Public subnets ─────────────────────────────────────────────
resource "aws_subnet" "public" {
  count                   = local.az_count
  vpc_id                  = aws_vpc.this.id
  cidr_block              = var.public_subnets[count.index]
  availability_zone       = var.azs[count.index]
  map_public_ip_on_launch = true
  tags = merge(local.common_tags, {
    Name                     = "${var.name}-public-${var.azs[count.index]}"
    "kubernetes.io/role/elb" = "1"
  })
}

# ── Private subnets ────────────────────────────────────────────
resource "aws_subnet" "private" {
  count             = local.az_count
  vpc_id            = aws_vpc.this.id
  cidr_block        = var.private_subnets[count.index]
  availability_zone = var.azs[count.index]
  tags = merge(local.common_tags, {
    Name                              = "${var.name}-private-${var.azs[count.index]}"
    "kubernetes.io/role/internal-elb" = "1"
  })
}

# ── Elastic IPs for NAT Gateways ───────────────────────────────
resource "aws_eip" "nat" {
  count  = local.az_count
  domain = "vpc"
  tags   = merge(local.common_tags, { Name = "${var.name}-nat-eip-${count.index}" })
}

# ── NAT Gateways (one per AZ for HA) ──────────────────────────
resource "aws_nat_gateway" "this" {
  count         = local.az_count
  allocation_id = aws_eip.nat[count.index].id
  subnet_id     = aws_subnet.public[count.index].id
  depends_on    = [aws_internet_gateway.this]
  tags = merge(local.common_tags, {
    Name = "${var.name}-nat-${var.azs[count.index]}"
  })
}

# ── Public route table ─────────────────────────────────────────
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.this.id
  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.this.id
  }
  tags = merge(local.common_tags, { Name = "${var.name}-public-rt" })
}

resource "aws_route_table_association" "public" {
  count          = local.az_count
  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}

# ── Private route tables (one per AZ → local NAT) ─────────────
resource "aws_route_table" "private" {
  count  = local.az_count
  vpc_id = aws_vpc.this.id
  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.this[count.index].id
  }
  tags = merge(local.common_tags, {
    Name = "${var.name}-private-rt-${var.azs[count.index]}"
  })
}

resource "aws_route_table_association" "private" {
  count          = local.az_count
  subnet_id      = aws_subnet.private[count.index].id
  route_table_id = aws_route_table.private[count.index].id
}

# ── VPC Flow Logs ──────────────────────────────────────────────
resource "aws_flow_log" "this" {
  count           = var.enable_flow_logs ? 1 : 0
  vpc_id          = aws_vpc.this.id
  traffic_type    = "ALL"
  iam_role_arn    = aws_iam_role.flow_logs[0].arn
  log_destination = aws_cloudwatch_log_group.flow_logs[0].arn
  tags            = merge(local.common_tags, { Name = "${var.name}-flow-logs" })
}

resource "aws_cloudwatch_log_group" "flow_logs" {
  count             = var.enable_flow_logs ? 1 : 0
  name              = "/aws/vpc/flowlogs/${var.name}"
  retention_in_days = 30
  tags              = local.common_tags
}

resource "aws_iam_role" "flow_logs" {
  count = var.enable_flow_logs ? 1 : 0
  name  = "${var.name}-flow-logs-role"
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect    = "Allow"
      Principal = { Service = "vpc-flow-logs.amazonaws.com" }
      Action    = "sts:AssumeRole"
    }]
  })
}

resource "aws_iam_role_policy" "flow_logs" {
  count  = var.enable_flow_logs ? 1 : 0
  name   = "${var.name}-flow-logs-policy"
  role   = aws_iam_role.flow_logs[0].id
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect   = "Allow"
      Action   = ["logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents", "logs:DescribeLogGroups", "logs:DescribeLogStreams"]
      Resource = "*"
    }]
  })
}

4. Workspaces: Environment Separation

Terraform workspaces let you maintain multiple state files in the same backend configuration, giving you dev/staging/prod separation without duplicating your entire codebase.

# Create and switch workspaces
terraform workspace new dev
terraform workspace new staging
terraform workspace new prod
terraform workspace select prod
terraform workspace list
# Output:
#   dev
#   staging
# * prod

Reference the current workspace in your configuration to make environment-specific decisions:

# Use workspace name in resource sizing
locals {
  is_prod = terraform.workspace == "prod"

  instance_type = local.is_prod ? "t3.large" : "t3.small"
  min_capacity  = local.is_prod ? 3 : 1
  max_capacity  = local.is_prod ? 20 : 5
}

# Workspace-specific variable files
# terraform plan -var-file="envs/${terraform.workspace}.tfvars"
Workspace limitations: Workspaces share the same backend bucket and the same root module. For large platform teams, fully separate Terraform projects per environment (or Terragrunt with separate state paths) provide better blast-radius isolation. A bug in a shared module can affect all workspaces simultaneously, whereas separate root modules give you independent apply/destroy control.

5. EKS Cluster with Terraform: Managed Node Groups and IRSA

Provisioning Amazon EKS with Terraform involves the EKS control plane, managed node groups, and IAM Roles for Service Accounts (IRSA) so pods can access AWS APIs without static credentials.

Example 4: EKS Cluster Resource with Managed Node Group

# eks/main.tf
data "aws_eks_cluster_auth" "this" {
  name = aws_eks_cluster.this.name
}

# ── EKS Control Plane ──────────────────────────────────────────
resource "aws_eks_cluster" "this" {
  name     = var.cluster_name
  version  = var.kubernetes_version   # e.g. "1.30"
  role_arn = aws_iam_role.eks_cluster.arn

  vpc_config {
    subnet_ids              = concat(var.private_subnet_ids, var.public_subnet_ids)
    endpoint_private_access = true
    endpoint_public_access  = true      # lock down in prod via public_access_cidrs
    public_access_cidrs     = var.allowed_cidrs
    security_group_ids      = [aws_security_group.eks_cluster.id]
  }

  enabled_cluster_log_types = ["api", "audit", "authenticator", "controllerManager", "scheduler"]

  encryption_config {
    resources = ["secrets"]
    provider {
      key_arn = var.kms_key_arn
    }
  }

  depends_on = [
    aws_iam_role_policy_attachment.eks_cluster_policy,
    aws_iam_role_policy_attachment.eks_vpc_controller,
  ]

  tags = var.tags
}

# ── Managed Node Group ─────────────────────────────────────────
resource "aws_eks_node_group" "app" {
  cluster_name    = aws_eks_cluster.this.name
  node_group_name = "${var.cluster_name}-app-nodes"
  node_role_arn   = aws_iam_role.eks_node.arn
  subnet_ids      = var.private_subnet_ids
  instance_types  = [var.node_instance_type]   # e.g. "m6i.xlarge"

  scaling_config {
    desired_size = var.node_desired
    min_size     = var.node_min
    max_size     = var.node_max
  }

  update_config {
    max_unavailable_percentage = 25
  }

  labels = {
    role = "app"
  }

  taint {
    key    = "dedicated"
    value  = "app"
    effect = "NO_SCHEDULE"    # remove for general-purpose nodes
  }

  launch_template {
    id      = aws_launch_template.eks_node.id
    version = aws_launch_template.eks_node.latest_version
  }

  depends_on = [
    aws_iam_role_policy_attachment.eks_node_policy,
    aws_iam_role_policy_attachment.eks_cni_policy,
    aws_iam_role_policy_attachment.eks_ecr_policy,
  ]
}

# ── IRSA: IAM Role for Service Account ─────────────────────────
# Allows the aws-load-balancer-controller SA to call AWS APIs
data "aws_iam_openid_connect_provider" "eks" {
  url = aws_eks_cluster.this.identity[0].oidc[0].issuer
}

resource "aws_iam_role" "alb_controller" {
  name = "${var.cluster_name}-alb-controller"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect = "Allow"
      Principal = {
        Federated = data.aws_iam_openid_connect_provider.eks.arn
      }
      Action = "sts:AssumeRoleWithWebIdentity"
      Condition = {
        StringEquals = {
          "${replace(aws_eks_cluster.this.identity[0].oidc[0].issuer, "https://", "")}:sub" = "system:serviceaccount:kube-system:aws-load-balancer-controller"
        }
      }
    }]
  })
}

resource "aws_iam_role_policy_attachment" "alb_controller" {
  role       = aws_iam_role.alb_controller.name
  policy_arn = aws_iam_policy.alb_controller.arn
}

# ── aws-auth ConfigMap (grant node group access) ───────────────
resource "kubernetes_config_map_v1_data" "aws_auth" {
  metadata {
    name      = "aws-auth"
    namespace = "kube-system"
  }
  force = true
  data = {
    mapRoles = yamlencode([
      {
        rolearn  = aws_iam_role.eks_node.arn
        username = "system:node:{{EC2PrivateDNSName}}"
        groups   = ["system:bootstrappers", "system:nodes"]
      },
    ])
  }
}

6. RDS Module with Parameter Group and Subnet Group

Provisioning Amazon RDS with Terraform involves the DB instance, a subnet group spanning private subnets, a parameter group for engine tuning, and a security group that restricts access to the application tier only.

Example 5: RDS Module — Complete Resource Set

# modules/rds/main.tf
resource "aws_db_subnet_group" "this" {
  name        = "${var.identifier}-subnet-group"
  subnet_ids  = var.private_subnet_ids
  description = "Subnet group for ${var.identifier} RDS instance"
  tags        = var.tags
}

resource "aws_db_parameter_group" "this" {
  name   = "${var.identifier}-params"
  family = var.parameter_group_family   # e.g. "postgres15"

  parameter {
    name  = "log_connections"
    value = "1"
  }
  parameter {
    name  = "log_disconnections"
    value = "1"
  }
  parameter {
    name  = "log_min_duration_statement"
    value = "1000"    # log queries slower than 1s
  }
  parameter {
    name         = "shared_preload_libraries"
    value        = "pg_stat_statements"
    apply_method = "pending-reboot"
  }

  tags = var.tags
}

resource "aws_security_group" "rds" {
  name        = "${var.identifier}-rds-sg"
  description = "Allow inbound from app security group only"
  vpc_id      = var.vpc_id

  ingress {
    from_port       = var.db_port
    to_port         = var.db_port
    protocol        = "tcp"
    security_groups = var.allowed_security_group_ids
    description     = "App tier access to RDS"
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = merge(var.tags, { Name = "${var.identifier}-rds-sg" })
}

resource "aws_db_instance" "this" {
  identifier        = var.identifier
  engine            = var.engine             # "postgres"
  engine_version    = var.engine_version     # "15.6"
  instance_class    = var.instance_class     # "db.r6g.large"
  allocated_storage = var.allocated_storage  # 100 (GB)
  storage_type      = "gp3"
  storage_encrypted = true
  kms_key_id        = var.kms_key_arn

  db_name  = var.db_name
  username = var.db_username
  password = var.db_password   # use aws_secretsmanager_secret_version in production

  db_subnet_group_name   = aws_db_subnet_group.this.name
  parameter_group_name   = aws_db_parameter_group.this.name
  vpc_security_group_ids = [aws_security_group.rds.id]

  multi_az               = var.multi_az          # true in prod
  publicly_accessible    = false
  deletion_protection    = var.deletion_protection

  backup_retention_period = var.backup_retention_days  # 7
  backup_window           = "03:00-04:00"
  maintenance_window      = "sun:04:00-sun:05:00"

  performance_insights_enabled          = true
  performance_insights_retention_period = 7

  enabled_cloudwatch_logs_exports = ["postgresql", "upgrade"]

  tags = var.tags

  lifecycle {
    prevent_destroy = true   # safety net for production databases
  }
}
Password management: Never pass plaintext passwords via Terraform variables in production. Instead, generate the password with random_password, store it in AWS Secrets Manager, and reference it with a data source. The RDS instance can be configured with manage_master_user_password = true to have RDS itself rotate the secret automatically.

7. CI/CD: GitHub Actions Workflow for Terraform

The gold standard Terraform CI/CD pattern is: plan on every pull request (so reviewers see the infrastructure diff), and apply on merge to main (so production changes are gated behind code review).

Example 6: GitHub Actions Workflow — Plan on PR, Apply on Merge

# .github/workflows/terraform.yml
name: Terraform CI/CD

on:
  pull_request:
    branches: [main]
    paths: ["infra/**"]
  push:
    branches: [main]
    paths: ["infra/**"]

permissions:
  id-token: write       # required for OIDC authentication
  contents: read
  pull-requests: write  # post plan output as PR comment

env:
  TF_VERSION: "1.7.5"
  AWS_REGION: "us-east-1"
  WORKING_DIR: "infra/vpc"

jobs:
  terraform-plan:
    name: Terraform Plan
    runs-on: ubuntu-latest
    if: github.event_name == 'pull_request'
    defaults:
      run:
        working-directory: ${{ env.WORKING_DIR }}

    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS credentials (OIDC)
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/github-actions-terraform
          aws-region: ${{ env.AWS_REGION }}

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: ${{ env.TF_VERSION }}

      - name: Terraform Format Check
        run: terraform fmt -check -recursive

      - name: Terraform Init
        run: terraform init -input=false

      - name: Terraform Validate
        run: terraform validate

      - name: Run tfsec security scan
        uses: aquasecurity/tfsec-action@v1.0.0
        with:
          working_directory: ${{ env.WORKING_DIR }}
          soft_fail: true

      - name: Run Checkov policy scan
        uses: bridgecrewio/checkov-action@v12
        with:
          directory: ${{ env.WORKING_DIR }}
          soft_fail: true

      - name: Terraform Plan
        id: plan
        run: |
          terraform plan \
            -var-file="envs/prod.tfvars" \
            -out=tfplan \
            -input=false \
            -no-color 2>&1 | tee plan_output.txt

      - name: Post Plan to PR
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const plan = fs.readFileSync('${{ env.WORKING_DIR }}/plan_output.txt', 'utf8');
            const maxLength = 65000;
            const truncated = plan.length > maxLength
              ? plan.substring(0, maxLength) + '\n\n... (truncated)'
              : plan;
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: `## Terraform Plan\n\`\`\`hcl\n${truncated}\n\`\`\``
            });

  terraform-apply:
    name: Terraform Apply
    runs-on: ubuntu-latest
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
    environment: production    # requires manual approval in GitHub Environments
    defaults:
      run:
        working-directory: ${{ env.WORKING_DIR }}

    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS credentials (OIDC)
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/github-actions-terraform
          aws-region: ${{ env.AWS_REGION }}

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: ${{ env.TF_VERSION }}

      - name: Terraform Init
        run: terraform init -input=false

      - name: Terraform Apply
        run: |
          terraform apply \
            -var-file="envs/prod.tfvars" \
            -auto-approve \
            -input=false
OIDC authentication: The workflow uses GitHub OIDC to assume an IAM role without any long-lived AWS keys stored in GitHub secrets. You need to create an IAM OIDC provider for token.actions.githubusercontent.com in your AWS account and a role with a trust policy restricting to your specific GitHub org/repo. See the IAM Roles & Policies guide for the full trust policy pattern.

8. Terraform Best Practices for AWS in 2026

Project Structure

Organize your repository so each independently deployable unit of infrastructure is its own Terraform root module (its own state file). A common layout:

infra/
├── bootstrap/          # S3 bucket + DynamoDB lock table (local backend, runs once)
├── modules/
│   ├── vpc/
│   ├── eks/
│   ├── rds/
│   └── iam/
├── envs/
│   ├── dev/
│   │   ├── main.tf     # calls modules, dev-specific values
│   │   ├── provider.tf # backend key = "dev/terraform.tfstate"
│   │   └── terraform.tfvars
│   ├── staging/
│   └── prod/
└── .github/workflows/terraform.yml

Security Scanning

Integrate static analysis into your CI pipeline:

  • tfsec — fast HCL-aware scanner. Catches issues like S3 buckets without public access blocks, security groups open to 0.0.0.0/0, unencrypted EBS volumes, and missing CloudTrail encryption.
  • Checkov — broader policy library including CIS benchmarks for AWS. Run both since they catch different things.
  • Infracost — posts a cost estimate for every PR so you know the monthly impact of the planned change before it lands.

DRY Configuration with Terragrunt

Terragrunt (by Gruntwork) is a thin Terraform wrapper that eliminates the repeated backend and provider boilerplate across environments. A root terragrunt.hcl defines the backend template, and each environment folder adds only the variables specific to that environment. The run-all plan command plans all modules in dependency order.

Importing Existing Resources

When adopting Terraform in a brownfield AWS account, import existing resources rather than recreating them:

# Terraform 1.5+ declarative import block
import {
  to = aws_s3_bucket.assets
  id = "my-existing-bucket-name"
}

import {
  to = aws_vpc.main
  id = "vpc-0abc123def456789"
}

# Then run: terraform plan
# Terraform generates the resource config diff and shows what attributes
# are not yet represented in your .tf files.

State Management Rules

  • Never edit state files manually. Use terraform state mv, terraform state rm, and terraform import.
  • Enable S3 bucket versioning on your state bucket — every apply creates a new object version, giving you a complete audit trail and instant rollback.
  • Use DynamoDB locking in all environments, not just production. Concurrent plans in CI against the same state file cause corruptions that are difficult to recover from.
  • Keep sensitive outputs out of state where possible. Prefer referencing AWS Secrets Manager ARNs instead of actual secret values.

Module Versioning

Always pin module sources to specific versions in production:

# Good — pinned to a specific semver tag
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.8.1"
}

# Risky — unpinned, will pull latest on next terraform init
module "vpc" {
  source = "terraform-aws-modules/vpc/aws"
}

Plan Discipline

In production, never run terraform apply without reviewing the plan output first. Use saved plan files (terraform plan -out=tfplan then terraform apply tfplan) to guarantee that the apply executes exactly what was reviewed — no drift from the time between plan and apply.

Frequently Asked Questions

What happens if two engineers run terraform apply at the same time?

With DynamoDB state locking, the second apply blocks immediately with a lock error showing who holds the lock. Without locking (or with local state), both applies can race and corrupt the state file. This is why remote state with locking is non-negotiable for team environments — never use local state in shared infrastructure.

How do I handle secrets like database passwords in Terraform?

The recommended pattern in 2026 is to use random_password to generate a password, store it in AWS Secrets Manager with aws_secretsmanager_secret_version, and reference the Secrets Manager ARN in your application configuration. The password value still ends up in Terraform state (encrypted at rest with KMS), so the state bucket must be access-controlled. RDS also supports manage_master_user_password = true which delegates rotation entirely to RDS and keeps the value out of state.

Should I use terraform destroy to tear down staging environments?

Yes, but with safeguards. Use lifecycle { prevent_destroy = true } on databases and S3 buckets even in staging — accidental destroys of data stores are painful to recover from. For cost savings, consider stopping EC2 instances and EKS node groups via scaling policies rather than full destruction. Terraform Cloud and Atlantis both support automated workspace destruction on a schedule.

What is Terraform Cloud and when should I use it?

Terraform Cloud (now HashiCorp Cloud Platform) provides remote plan/apply execution, policy-as-code with Sentinel/OPA, a web UI for state browsing, and audit logs. It is useful for large teams that want a managed backend without operating their own S3/DynamoDB setup. The free tier covers up to 500 resources. For AWS-centric teams already using GitHub Actions, the S3 backend with the OIDC-authenticated workflow shown above is a cost-effective self-managed alternative.

How do I upgrade the AWS provider without breaking existing resources?

Check the provider changelog for breaking changes between your current version and the target. Use version = "~> 5.50" (tilde-range) to allow automatic patch upgrades. For major version bumps (e.g., 4.x to 5.x), run terraform init -upgrade in a feature branch, run terraform plan, and review every resource for unexpected diffs. The 4.x to 5.x migration removed deprecated arguments around S3 bucket configuration — those changes are mechanical but require updating all existing bucket resources.