AWS Trusted Advisor: Automate Cost, Security and Performance Checks (2026)

AWS Trusted Advisor — Cost, Security and Performance

AWS Trusted Advisor is one of the most underused services in the entire AWS catalog. While most teams know it exists, very few automate it. This guide shows you how to query Trusted Advisor programmatically, build auto-remediation pipelines with Lambda and EventBridge, aggregate checks across an entire AWS Organization, and correlate findings with Cost Explorer — so your environment continuously heals itself instead of waiting for a quarterly review.

1. Trusted Advisor Overview and Tier Differences

AWS Trusted Advisor is a real-time best-practice inspection tool that analyzes your AWS environment across five pillars: Cost Optimization, Performance, Security, Fault Tolerance, and Service Limits. It runs checks continuously and surfaces actionable findings — flagging idle EC2 instances, open security groups, over-provisioned EBS volumes, and hundreds of other issues.

What separates Trusted Advisor from a one-off audit script is that it is always running, always comparing your current state against AWS best practices, and surfacing drifts the moment they appear. When a developer accidentally opens port 22 to 0.0.0.0/0 at 2 AM, Trusted Advisor flags it within minutes — not at the next quarterly review.

The Three Support Tiers

Access to Trusted Advisor checks is gated by your AWS Support plan:

Tier	Support Plan	Checks Available
Basic	Free / Developer	~7 core checks: S3 bucket permissions, MFA on root, security group unrestricted ports, IAM use, EBS public snapshots, RDS public snapshots, service limits
Standard	Business ($100/mo+)	Full catalog: 115+ checks across all five categories. Programmatic API access. CloudWatch alarms on check status.
Enterprise	Enterprise ($15,000/mo+)	Everything in Business plus organizational aggregation, priority response, and TAM-assisted remediation reviews.

Practical note: The Business support plan at $100/month pays for itself within days if you have even one medium-sized AWS account. A single idle m5.2xlarge running 24/7 costs ~$270/month — Trusted Advisor will find it immediately. The ROI is almost always positive within the first week.

The five check categories map directly to the AWS Well-Architected Framework pillars:

Cost Optimization — idle resources, reserved instance coverage gaps, unassociated IPs
Performance — over-utilized EC2, EBS throughput bottlenecks, CloudFront tuning
Security — open ports, public S3 buckets, missing MFA, exposed snapshots
Fault Tolerance — missing Multi-AZ, no S3 versioning, ELB without health checks
Service Limits — EC2 instance quotas, VPC limits, API call rate thresholds

2. Cost Optimization Checks and CLI Remediation

The Cost Optimization category is where most teams recover the most money the fastest. The checks flag resources that are running but providing no value — or providing value at a much higher cost than alternatives. The most impactful checks in this category are idle EC2 instances, underutilized Reserved Instances, idle RDS instances, idle load balancers, and unassociated Elastic IPs.

Key Cost Checks

Low Utilization Amazon EC2 Instances — EC2 instances with CPU <10% for 4 of 14 days
Idle Load Balancers — Classic/ALB/NLB with no requests in 7 days
Underutilized Amazon EBS Volumes — volumes with <1 IOPS for 7 days
Unassociated Elastic IP Addresses — $0.005/hour per unassociated EIP
Amazon RDS Idle DB Instances — no connections in 7 days
Amazon Redshift Underutilized Clusters — cluster utilization <5%

Find and release all unassociated Elastic IPs in a region:

# List all unassociated Elastic IPs
aws ec2 describe-addresses \
  --query 'Addresses[?AssociationId==null].[AllocationId,PublicIp,Tags]' \
  --output table

# Release a specific EIP (saves $3.60/month per IP)
aws ec2 release-address --allocation-id eipalloc-0abc123def456

# Bulk release all unassociated EIPs in a region (use with caution)
aws ec2 describe-addresses \
  --query 'Addresses[?AssociationId==null].AllocationId' \
  --output text | tr '\t' '\n' | while read alloc_id; do
    echo "Releasing $alloc_id"
    aws ec2 release-address --allocation-id "$alloc_id"
done

Find idle RDS instances and stop them (stopped RDS instances save ~70% — you only pay for storage):

# List all RDS instances and their status
aws rds describe-db-instances \
  --query 'DBInstances[*].[DBInstanceIdentifier,DBInstanceClass,DBInstanceStatus,MultiAZ]' \
  --output table

# Stop a non-production RDS instance (automatically restarts after 7 days per AWS policy)
aws rds stop-db-instance \
  --db-instance-identifier my-dev-database

# Check connections over the last 7 days via CloudWatch
aws cloudwatch get-metric-statistics \
  --namespace AWS/RDS \
  --metric-name DatabaseConnections \
  --dimensions Name=DBInstanceIdentifier,Value=my-dev-database \
  --start-time 2026-06-02T00:00:00Z \
  --end-time 2026-06-09T00:00:00Z \
  --period 86400 \
  --statistics Maximum \
  --output table

Reserved Instance Coverage: Trusted Advisor also flags when your Reserved Instance coverage drops below 80%. You can correlate this with Cost Explorer's RI Utilization report. Idle RIs that you can't use are often sellable on the AWS Marketplace for Reserved Instances.

3. Security Checks and Auto-Fix Lambda

The Security category is the one that can save you from a breach, not just a bill. Trusted Advisor continuously monitors your account for misconfigurations that attackers actively scan for. The most critical security checks cover network exposure, identity hygiene, and data access.

Critical Security Checks

Security Groups – Unrestricted Access — ports 22 (SSH), 3389 (RDP), 1433 (MSSQL), 3306 (MySQL) open to 0.0.0.0/0
Amazon S3 Bucket Permissions — publicly readable or writable buckets
MFA on Root Account — root credentials without MFA enabled
Amazon RDS Public Snapshots — snapshots shared publicly (data leak risk)
Amazon EBS Public Snapshots — same risk for EBS
IAM Use — whether IAM users/roles are being used instead of root
Exposed Access Keys — access keys committed to public GitHub repos (via partner integration)

Identify and revoke overly permissive security group rules:

# Find security groups with SSH (22) open to the world
aws ec2 describe-security-groups \
  --filters "Name=ip-permission.to-port,Values=22" \
             "Name=ip-permission.cidr,Values=0.0.0.0/0" \
  --query 'SecurityGroups[*].[GroupId,GroupName,Description]' \
  --output table

# Find security groups with RDP open to the world
aws ec2 describe-security-groups \
  --filters "Name=ip-permission.to-port,Values=3389" \
             "Name=ip-permission.cidr,Values=0.0.0.0/0" \
  --query 'SecurityGroups[*].[GroupId,GroupName,Description]' \
  --output table

# Revoke the offending rule — replace sg-xxx with actual SG ID
aws ec2 revoke-security-group-ingress \
  --group-id sg-0abc123def456789 \
  --protocol tcp \
  --port 22 \
  --cidr 0.0.0.0/0

Lambda function that auto-fixes public RDS snapshots by making them private:

import boto3

def lambda_handler(event, context):
    """
    Auto-remediate public RDS snapshots flagged by Trusted Advisor.
    Triggered by EventBridge rule on Trusted Advisor check refresh.
    """
    rds = boto3.client('rds')
    sns = boto3.client('sns')
    SNS_TOPIC = 'arn:aws:sns:us-east-1:123456789012:security-alerts'

    remediated = []
    errors = []

    # Find all public DB snapshots owned by this account
    paginator = rds.get_paginator('describe_db_snapshots')
    for page in paginator.paginate(SnapshotType='manual', IncludePublic=False):
        for snap in page['DBSnapshots']:
            # Check if snapshot is shared publicly
            attrs = rds.describe_db_snapshot_attributes(
                DBSnapshotIdentifier=snap['DBSnapshotIdentifier']
            )
            for attr in attrs['DBSnapshotAttributesResult']['DBSnapshotAttributes']:
                if attr['AttributeName'] == 'restore' and 'all' in attr['AttributeValues']:
                    snap_id = snap['DBSnapshotIdentifier']
                    try:
                        # Remove public access
                        rds.modify_db_snapshot_attribute(
                            DBSnapshotIdentifier=snap_id,
                            AttributeName='restore',
                            ValuesToRemove=['all']
                        )
                        remediated.append(snap_id)
                        print(f"Remediated public snapshot: {snap_id}")
                    except Exception as e:
                        errors.append({'snapshot': snap_id, 'error': str(e)})

    # Notify via SNS/Slack
    if remediated or errors:
        message = f"""Security Auto-Remediation Report
Remediated Public RDS Snapshots: {len(remediated)}
{chr(10).join(remediated)}
Errors: {len(errors)}
{str(errors)}"""
        sns.publish(
            TopicArn=SNS_TOPIC,
            Subject=f'[SECURITY] Auto-fixed {len(remediated)} public RDS snapshots',
            Message=message
        )

    return {'remediated': remediated, 'errors': errors}

Prevention over remediation: Use AWS Config rules (rds-snapshot-encrypted, s3-bucket-public-read-prohibited, restricted-ssh) alongside Trusted Advisor for defense-in-depth. Config rules can auto-remediate via SSM Automation documents before Trusted Advisor even runs its next refresh.

4. Performance Checks

The Performance category identifies bottlenecks that are either degrading your application's speed or indicating that you're over-provisioning compute for workloads that are saturating available resources. Unlike cost checks that focus on idle resources, performance checks focus on resources that are too small or misconfigured for actual demand.

Key Performance Checks

High Utilization Amazon EC2 Instances — CPU >90% for 4 of 14 days; you're likely throttling
Amazon EBS Provisioned IOPS (SSD) Over-Provisioned Volumes — paying for IOPS you don't use
Amazon EBS Throughput Optimization — throughput limited by instance type, not volume config
CloudFront Alternate Domain Names — CNAME records misconfigured, causing HTTPS errors
CloudFront Header Forwarding and Cache Hit Ratio — low cache hit ratio increases origin load and latency
Amazon Route 53 Alias Resource Record Sets — using CNAME instead of alias for AWS resources (higher latency)

Check EBS throughput performance for a specific instance and volume pair:

# Get EBS read/write throughput for the last 24 hours
aws cloudwatch get-metric-statistics \
  --namespace AWS/EBS \
  --metric-name VolumeReadBytes \
  --dimensions Name=VolumeId,Value=vol-0abc123def456789 \
  --start-time 2026-06-08T00:00:00Z \
  --end-time 2026-06-09T00:00:00Z \
  --period 3600 \
  --statistics Sum \
  --output table

# Compare gp2 vs gp3 — upgrade a volume to gp3 (zero downtime)
# gp3: 3000 IOPS + 125 MB/s baseline FREE, vs gp2 which charges per IOPS over 3000
aws ec2 modify-volume \
  --volume-id vol-0abc123def456789 \
  --volume-type gp3 \
  --iops 3000 \
  --throughput 125

# Monitor the modification
aws ec2 describe-volumes-modifications \
  --volume-ids vol-0abc123def456789 \
  --query 'VolumesModifications[*].[VolumeId,ModificationState,Progress]' \
  --output table

Improve CloudFront cache hit ratio — a low ratio means your origin is handling requests that CDN should absorb:

# Check CloudFront cache hit rate over 7 days
aws cloudwatch get-metric-statistics \
  --namespace AWS/CloudFront \
  --metric-name CacheHitRate \
  --dimensions Name=DistributionId,Value=EDFDVBD6EXAMPLE \
               Name=Region,Value=Global \
  --start-time 2026-06-02T00:00:00Z \
  --end-time 2026-06-09T00:00:00Z \
  --period 86400 \
  --statistics Average \
  --output table

CloudFront cache hit ratio: A ratio below 80% usually means you are forwarding too many headers or query strings to the origin. Review your cache behavior settings — forward only the headers and query strings that actually affect the response. Set Cache-Control: max-age=86400 on static assets.

5. Fault Tolerance Checks

Fault tolerance checks identify single points of failure — configurations that will cause an outage when an AZ goes down, a disk fails, or a network partition occurs. These are the checks that keep you off the AWS Service Health Dashboard as a victim. Most are simple to fix and have a meaningful impact on your Recovery Time Objective (RTO).

Key Fault Tolerance Checks

Amazon RDS Multi-AZ — production databases without standby replica
Amazon S3 Bucket Versioning — buckets without versioning enabled (no protection from accidental deletes)
Load Balancer Optimization — ELB not distributing traffic across multiple AZs
Auto Scaling Group Resources — ASGs referencing terminated instances or deleted AMIs
Amazon EC2 Availability Zone Balance — ASG instances heavily skewed toward one AZ
Amazon Aurora DB Instance Accessibility — Aurora clusters with no reader instances

Enable S3 versioning on all buckets that don't have it:

# List all buckets without versioning
for bucket in $(aws s3api list-buckets --query 'Buckets[*].Name' --output text); do
    status=$(aws s3api get-bucket-versioning --bucket "$bucket" \
             --query 'Status' --output text 2>/dev/null)
    if [ "$status" != "Enabled" ]; then
        echo "No versioning: $bucket"
    fi
done

# Enable versioning on a specific bucket
aws s3api put-bucket-versioning \
  --bucket my-critical-data-bucket \
  --versioning-configuration Status=Enabled

# Enable Multi-AZ on an existing RDS instance (causes brief failover)
aws rds modify-db-instance \
  --db-instance-identifier my-production-db \
  --multi-az \
  --apply-immediately

Fix an ELB that is only routing traffic to a single AZ:

# List subnets currently attached to an ALB
aws elbv2 describe-load-balancers \
  --names my-application-lb \
  --query 'LoadBalancers[*].AvailabilityZones' \
  --output json

# Add a second AZ subnet to the ALB
aws elbv2 set-subnets \
  --load-balancer-arn arn:aws:elasticloadbalancing:us-east-1:123456789012:loadbalancer/app/my-alb/abc123 \
  --subnets subnet-0abc123 subnet-0def456 subnet-0ghi789

Auto Scaling AZ rebalancing: If your ASG instances are concentrated in one AZ (often after a scale-in event), enable AZ rebalancing. The ASG will launch new instances in the underrepresented AZ and terminate excess instances elsewhere. Enable it:

aws autoscaling update-auto-scaling-group --auto-scaling-group-name my-asg --availability-zones us-east-1a us-east-1b us-east-1c

6. Service Limits and Requesting Increases via CLI

AWS imposes default quotas on almost every service — EC2 instance counts, VPCs per region, Lambda concurrency, API Gateway requests per second. When you hit a limit, requests fail silently or throw cryptic errors. Trusted Advisor's Service Limits checks warn you when you're at 80% of any limit, giving you time to request an increase before it becomes an incident.

Common Limits to Watch

EC2 Running On-Demand Instances — 32 vCPUs default for most regions on new accounts
VPCs per Region — 5 by default
Elastic IPs per Region — 5 by default
Lambda Concurrent Executions — 1,000 per region by default
API Gateway APIs per Region — 600 by default
ECS Task Definitions — 2,000 task definition families per region

Check current EC2 vCPU limits vs usage:

# Check EC2 On-Demand Instance vCPU limits per instance family
aws service-quotas list-service-quotas \
  --service-code ec2 \
  --query 'Quotas[?contains(QuotaName, `On-Demand`) && contains(QuotaName, `vCPUs`)].[QuotaName,Value]' \
  --output table

# Check current usage for running On-Demand Standard instances
aws ec2 describe-instances \
  --filters "Name=instance-state-name,Values=running" \
  --query 'Reservations[*].Instances[*].[InstanceType]' \
  --output text | sort | uniq -c | sort -rn | head -20

# Request a limit increase programmatically via Service Quotas
aws service-quotas request-service-quota-increase \
  --service-code ec2 \
  --quota-code L-1216C47A \
  --desired-value 192

# Check status of your quota increase requests
aws service-quotas list-requested-changes-by-service \
  --service-code ec2 \
  --query 'RequestedQuotas[*].[QuotaName,DesiredValue,Status,CaseId]' \
  --output table

List current Lambda concurrency and request an increase:

# Get current Lambda concurrency limit
aws lambda get-account-settings \
  --query '[AccountLimit.ConcurrentExecutions, AccountLimit.UnreservedConcurrentExecutions]' \
  --output table

# Request increase to 5000 concurrent executions
aws service-quotas request-service-quota-increase \
  --service-code lambda \
  --quota-code L-B99A9384 \
  --desired-value 5000

Pro tip: For EC2 vCPU limits, requests are usually auto-approved within minutes if you're requesting less than 10x your current quota. For larger increases, AWS may contact you to understand the use case. Request increases before you need them — not during an incident when you're trying to scale to handle traffic.

7. Trusted Advisor API with Python boto3

The Trusted Advisor API is available in the support boto3 client — and it must be called against us-east-1 regardless of your actual region. This is one of the few AWS APIs that is global and region-locked. The API lets you list all available checks, get current check results, and trigger a refresh of stale results.

Important: The Trusted Advisor API requires Business or Enterprise support. Calls from accounts on Basic or Developer support return an empty check list or SubscriptionRequiredException. Always catch this exception in automation code.

import boto3
import json
from datetime import datetime

# TA API is ALWAYS called against us-east-1
support = boto3.client('support', region_name='us-east-1')

def list_all_checks(language='en'):
    """List all available Trusted Advisor checks with metadata."""
    response = support.describe_trusted_advisor_checks(language=language)
    checks = response['checks']
    print(f"Total checks available: {len(checks)}")

    # Group by category
    by_category = {}
    for check in checks:
        cat = check['category']
        by_category.setdefault(cat, []).append({
            'id': check['id'],
            'name': check['name'],
            'description': check['description'][:80] + '...'
        })

    for category, items in by_category.items():
        print(f"\n=== {category.upper()} ({len(items)} checks) ===")
        for item in items:
            print(f"  [{item['id']}] {item['name']}")

    return checks

def get_check_result(check_id):
    """Get the current result for a specific Trusted Advisor check."""
    try:
        response = support.describe_trusted_advisor_check_result(
            checkId=check_id,
            language='en'
        )
        result = response['result']

        print(f"Check ID: {check_id}")
        print(f"Status: {result['status']}")  # ok, warning, error, not_available
        print(f"Timestamp: {result['timestamp']}")
        print(f"Resources Processed: {result['resourcesSummary']['resourcesProcessed']}")
        print(f"Resources Flagged: {result['resourcesSummary']['resourcesFlagged']}")
        print(f"Resources Suppressed: {result['resourcesSummary']['resourcesSuppressed']}")

        if result['flaggedResources']:
            print(f"\nFlagged Resources ({len(result['flaggedResources'])}):")
            for resource in result['flaggedResources'][:10]:  # show first 10
                print(f"  Status: {resource['status']}, Metadata: {resource['metadata']}")

        return result
    except support.exceptions.SubscriptionRequiredException:
        print("ERROR: Business or Enterprise support required for full check access")
        return None

def refresh_check(check_id):
    """Request a refresh of a specific Trusted Advisor check."""
    try:
        response = support.refresh_trusted_advisor_check(checkId=check_id)
        status = response['status']
        print(f"Check {check_id} refresh status: {status['status']}")
        print(f"Milliseconds before refreshable: {status['millisUntilNextRefreshable']}")
        return status
    except Exception as e:
        print(f"Error refreshing check {check_id}: {e}")
        return None

def get_all_flagged_resources(category_filter=None):
    """
    Pull all flagged resources across all checks.
    Optionally filter by category: 'cost_optimizing', 'security',
    'performance', 'fault_tolerances', 'service_limits'
    """
    support_client = boto3.client('support', region_name='us-east-1')

    # Get all checks
    checks = support_client.describe_trusted_advisor_checks(language='en')['checks']

    flagged = []
    for check in checks:
        if category_filter and check['category'] != category_filter:
            continue

        try:
            result = support_client.describe_trusted_advisor_check_result(
                checkId=check['id'], language='en'
            )['result']

            if result['status'] in ('warning', 'error'):
                for resource in result.get('flaggedResources', []):
                    flagged.append({
                        'check_name': check['name'],
                        'check_id': check['id'],
                        'category': check['category'],
                        'resource_status': resource['status'],
                        'metadata': resource['metadata'],
                        'timestamp': result['timestamp']
                    })
        except Exception:
            pass  # Skip checks that fail (e.g., not enough permissions)

    print(f"Total flagged resources: {len(flagged)}")
    return flagged

# Example usage:
if __name__ == '__main__':
    # List all checks
    checks = list_all_checks()

    # Get all security findings
    security_flags = get_all_flagged_resources(category_filter='security')

    # Export to JSON
    with open('trusted_advisor_security_report.json', 'w') as f:
        json.dump({
            'generated_at': datetime.utcnow().isoformat(),
            'flagged_count': len(security_flags),
            'findings': security_flags
        }, f, indent=2, default=str)
    print("Report saved to trusted_advisor_security_report.json")

Common check IDs you will use frequently:

COMMON_CHECK_IDS = {
    # Cost Optimization
    'low_utilization_ec2': 'Qch7DwouX1',
    'idle_load_balancers': 'hjLMh88uM8',
    'underutilized_ebs': 'DAvU99Dc4C',
    'unassociated_eips': 'Z4AUBRNSmz',
    'idle_rds': 'Ti39halfu8',
    # Security
    'sg_unrestricted_ssh': 'HCP4007jGY',
    'mfa_root_account': '7DAFEmoDos',
    's3_bucket_permissions': 'Pfx0RwqBli',
    'rds_public_snapshots': 'rSs93HQwa1',
    'ebs_public_snapshots': 'ePs02jT06w',
    # Fault Tolerance
    'rds_multi_az': 'f2iK5R6Dep',
    's3_bucket_versioning': 'R365s2Qddf',
    # Service Limits
    'ec2_on_demand_limit': '0Xc6LMYG8P',
    'vpc_limit': '0MVa5of05L',
}

8. Automated Remediation with EventBridge and Lambda

Querying Trusted Advisor manually or on a cron is useful, but the real power comes from event-driven automation. When a check status changes from ok to warning, you want action within minutes — not after someone reads a weekly report. The architecture is: EventBridge Scheduled Rule triggers Lambda every hour → Lambda pulls all flagged resources → auto-remediates safe items → posts a Slack summary for human review of riskier items.

Lambda function for automated Trusted Advisor remediation:

import boto3
import json
import os
import urllib.request
from datetime import datetime, timezone

support = boto3.client('support', region_name='us-east-1')
ec2 = boto3.client('ec2')
SLACK_WEBHOOK = os.environ.get('SLACK_WEBHOOK_URL', '')

def post_to_slack(message: str, color: str = 'good'):
    """Post a formatted message to Slack via webhook."""
    if not SLACK_WEBHOOK:
        print(f"Slack: {message}")
        return
    payload = json.dumps({
        'attachments': [{
            'color': color,
            'text': message,
            'footer': f'Trusted Advisor Auto-Remediation • {datetime.utcnow().strftime("%Y-%m-%d %H:%M UTC")}'
        }]
    }).encode('utf-8')
    req = urllib.request.Request(
        SLACK_WEBHOOK,
        data=payload,
        headers={'Content-Type': 'application/json'}
    )
    urllib.request.urlopen(req)

def remediate_unassociated_eips():
    """Auto-release Elastic IPs that have no association (safe to auto-fix)."""
    response = ec2.describe_addresses()
    released = []
    for addr in response['Addresses']:
        if 'AssociationId' not in addr:
            allocation_id = addr['AllocationId']
            public_ip = addr['PublicIp']
            try:
                ec2.release_address(AllocationId=allocation_id)
                released.append(f"{public_ip} ({allocation_id})")
                print(f"Released EIP: {public_ip}")
            except Exception as e:
                print(f"Could not release {allocation_id}: {e}")
    return released

def get_flagged_by_check(check_id):
    """Get flagged resources for a specific check."""
    try:
        result = support.describe_trusted_advisor_check_result(
            checkId=check_id, language='en'
        )['result']
        if result['status'] in ('warning', 'error'):
            return result.get('flaggedResources', [])
    except Exception as e:
        print(f"Error getting check {check_id}: {e}")
    return []

def lambda_handler(event, context):
    report = {
        'timestamp': datetime.now(timezone.utc).isoformat(),
        'auto_remediated': [],
        'needs_review': [],
        'clean_checks': []
    }

    # --- AUTO-REMEDIATE: Unassociated Elastic IPs (zero risk) ---
    released_eips = remediate_unassociated_eips()
    if released_eips:
        report['auto_remediated'].append({
            'action': 'Released unassociated Elastic IPs',
            'resources': released_eips,
            'savings_est': f'~${len(released_eips) * 3.6:.2f}/month'
        })

    # --- NOTIFY: Idle EC2 instances (need human confirmation before stopping) ---
    idle_ec2_check = 'Qch7DwouX1'
    idle_instances = get_flagged_by_check(idle_ec2_check)
    for resource in idle_instances:
        # metadata[0]=region, [1]=instance_id, [2]=instance_type, [4]=estimated_monthly_savings
        if len(resource['metadata']) >= 5:
            report['needs_review'].append({
                'type': 'Idle EC2 Instance',
                'region': resource['metadata'][0],
                'instance_id': resource['metadata'][1],
                'instance_type': resource['metadata'][2],
                'estimated_savings': resource['metadata'][4]
            })

    # --- NOTIFY: Security group issues (require change control) ---
    sg_check = 'HCP4007jGY'
    open_sgs = get_flagged_by_check(sg_check)
    for resource in open_sgs:
        report['needs_review'].append({
            'type': 'Unrestricted Security Group',
            'region': resource['metadata'][0] if resource['metadata'] else 'unknown',
            'sg_id': resource['metadata'][1] if len(resource['metadata']) > 1 else 'unknown',
            'protocol': resource['metadata'][2] if len(resource['metadata']) > 2 else 'unknown'
        })

    # Build Slack summary
    auto_count = len(report['auto_remediated'])
    review_count = len(report['needs_review'])

    slack_msg = f"""*AWS Trusted Advisor Auto-Remediation Report*
✅ Auto-remediated: {auto_count} actions
⚠️  Needs human review: {review_count} items

*Auto-Remediated Actions:*
"""
    for action in report['auto_remediated']:
        slack_msg += f"• {action['action']}: {len(action['resources'])} resources (saves {action['savings_est']})\n"

    if report['needs_review']:
        slack_msg += "\n*Items Requiring Review:*\n"
        for item in report['needs_review'][:10]:  # cap at 10 in Slack
            if item['type'] == 'Idle EC2 Instance':
                slack_msg += f"• Idle EC2: `{item['instance_id']}` ({item['instance_type']}) — saves {item['estimated_savings']}/mo\n"
            elif item['type'] == 'Unrestricted Security Group':
                slack_msg += f"• Open SG: `{item['sg_id']}` ({item['protocol']}) in {item['region']}\n"

    color = 'good' if review_count == 0 else ('warning' if review_count < 5 else 'danger')
    post_to_slack(slack_msg, color)

    print(json.dumps(report, indent=2))
    return report

EventBridge rule to trigger this Lambda hourly:

# Create EventBridge rule - every hour
aws events put-rule \
  --name "TrustedAdvisorAutoRemediation" \
  --schedule-expression "rate(1 hour)" \
  --state ENABLED \
  --description "Hourly Trusted Advisor auto-remediation"

# Add Lambda as target
aws events put-targets \
  --rule "TrustedAdvisorAutoRemediation" \
  --targets '[{
    "Id": "ta-remediation-lambda",
    "Arn": "arn:aws:lambda:us-east-1:123456789012:function:ta-auto-remediation"
  }]'

# Grant EventBridge permission to invoke Lambda
aws lambda add-permission \
  --function-name ta-auto-remediation \
  --statement-id AllowEventBridgeInvoke \
  --action lambda:InvokeFunction \
  --principal events.amazonaws.com \
  --source-arn arn:aws:events:us-east-1:123456789012:rule/TrustedAdvisorAutoRemediation

9. Organizational View and Delegated Admin

If you run multiple AWS accounts under AWS Organizations, you want a single pane of glass for Trusted Advisor findings across all accounts — not logging into each one individually. The Trusted Advisor organizational view lets a management account (or a delegated administrator account) aggregate checks across the entire organization.

Setting Up Organizational View

# Enable organizational view from the management account
aws support enable-aws-organizations-access

# Verify it's enabled
aws support describe-organization-member-accounts

# List all accounts in the organization
aws organizations list-accounts \
  --query 'Accounts[*].[Id,Name,Status]' \
  --output table

Pull findings across all organization accounts with Python:

import boto3

def get_org_trusted_advisor_report(check_id: str):
    """
    Get Trusted Advisor results across all accounts in the organization.
    Must be called from the management account or delegated admin.
    """
    support = boto3.client('support', region_name='us-east-1')

    try:
        response = support.describe_trusted_advisor_check_summaries(
            checkIds=[check_id]
        )
        summary = response['summaries'][0]
        print(f"Check: {check_id}")
        print(f"Status: {summary['status']}")
        print(f"Flagged: {summary['resourcesSummary']['resourcesFlagged']}")
        print(f"Processed: {summary['resourcesSummary']['resourcesProcessed']}")
        return summary
    except Exception as e:
        print(f"Error: {e}")
        return None

def suppress_check_for_account(check_id: str, account_id: str, reason: str):
    """
    Suppress a Trusted Advisor check for a specific account.
    Useful for known false positives (e.g., intentionally public S3 website buckets).
    """
    # Suppression is done via the Trusted Advisor console or by using
    # resource-level suppression in the individual account
    support_in_account = boto3.client(
        'support',
        region_name='us-east-1',
        # In production, assume a cross-account role here
    )

    # Get the check result to find the resource ID
    result = support_in_account.describe_trusted_advisor_check_result(
        checkId=check_id, language='en'
    )['result']

    print(f"Found {len(result['flaggedResources'])} flagged resources in account {account_id}")
    # To suppress individual resources, use the AWS Console or
    # the Trusted Advisor exclude resources API (available via Console only as of 2026)
    return result

def generate_org_cost_report():
    """Generate a cost optimization report across all org accounts."""
    orgs = boto3.client('organizations')
    accounts = orgs.list_accounts()['Accounts']

    total_savings_estimate = 0
    findings_by_account = {}

    cost_checks = [
        ('Qch7DwouX1', 'Idle EC2 Instances'),
        ('hjLMh88uM8', 'Idle Load Balancers'),
        ('Z4AUBRNSmz', 'Unassociated Elastic IPs'),
        ('Ti39halfu8', 'Idle RDS Instances'),
    ]

    for account in accounts:
        account_id = account['Id']
        account_name = account['Name']
        account_findings = []

        for check_id, check_name in cost_checks:
            try:
                sts = boto3.client('sts')
                assumed = sts.assume_role(
                    RoleArn=f'arn:aws:iam::{account_id}:role/OrganizationAccountAccessRole',
                    RoleSessionName='ta-org-audit'
                )
                credentials = assumed['Credentials']

                support = boto3.client(
                    'support',
                    region_name='us-east-1',
                    aws_access_key_id=credentials['AccessKeyId'],
                    aws_secret_access_key=credentials['SecretAccessKey'],
                    aws_session_token=credentials['SessionToken']
                )
                result = support.describe_trusted_advisor_check_result(
                    checkId=check_id, language='en'
                )['result']

                flagged = result['resourcesSummary']['resourcesFlagged']
                if flagged > 0:
                    account_findings.append({
                        'check': check_name,
                        'flagged': flagged
                    })
            except Exception:
                pass  # Account not accessible or support tier too low

        if account_findings:
            findings_by_account[account_name] = account_findings

    return findings_by_account

# Run the org report
report = generate_org_cost_report()
for account, findings in report.items():
    print(f"\n{account}:")
    for f in findings:
        print(f"  • {f['check']}: {f['flagged']} flagged resources")

Delegated Admin: Instead of running everything from the management account, designate a security or operations account as the Trusted Advisor delegated administrator. This follows the principle of least privilege — the management account should be touched as rarely as possible. Use aws organizations register-delegated-administrator --account-id 123456789012 --service-principal support.amazonaws.com to register it.

10. Integrating with Cost Explorer and Tracking Improvement

Trusted Advisor tells you what to fix. Cost Explorer tells you whether your fixes actually saved money. The two services are designed to be used together — Trusted Advisor surfaces idle RDS instances; Cost Explorer shows you exactly how much they cost last month and confirms the drop in spending after you stop them.

Get cost breakdown by service for the last 30 days to correlate with Trusted Advisor findings:

import boto3
from datetime import datetime, timedelta

ce = boto3.client('ce', region_name='us-east-1')

def get_service_costs_last_30_days():
    """Get cost breakdown by service for the last 30 days."""
    end = datetime.utcnow().strftime('%Y-%m-%d')
    start = (datetime.utcnow() - timedelta(days=30)).strftime('%Y-%m-%d')

    response = ce.get_cost_and_usage(
        TimePeriod={'Start': start, 'End': end},
        Granularity='MONTHLY',
        Metrics=['UnblendedCost'],
        GroupBy=[{'Type': 'DIMENSION', 'Key': 'SERVICE'}]
    )

    results = []
    for group in response['ResultsByTime'][0]['Groups']:
        service = group['Keys'][0]
        cost = float(group['Metrics']['UnblendedCost']['Amount'])
        if cost > 1.0:  # Filter out tiny amounts
            results.append({'service': service, 'cost_usd': round(cost, 2)})

    results.sort(key=lambda x: x['cost_usd'], reverse=True)
    print(f"\nTop services by cost ({start} to {end}):")
    for r in results[:15]:
        print(f"  ${r['cost_usd']:>10.2f}  {r['service']}")
    return results

def track_savings_over_time(resource_tag_key: str, resource_tag_value: str):
    """Track cost for a tagged resource before and after TA remediation."""
    ce = boto3.client('ce', region_name='us-east-1')

    # Before remediation (last month)
    last_month_start = (datetime.utcnow().replace(day=1) - timedelta(days=1)).replace(day=1).strftime('%Y-%m-%d')
    last_month_end = datetime.utcnow().replace(day=1).strftime('%Y-%m-%d')

    # This month (after remediation)
    this_month_start = datetime.utcnow().replace(day=1).strftime('%Y-%m-%d')
    this_month_end = datetime.utcnow().strftime('%Y-%m-%d')

    def get_tagged_cost(start, end):
        try:
            r = ce.get_cost_and_usage(
                TimePeriod={'Start': start, 'End': end},
                Granularity='MONTHLY',
                Filter={
                    'Tags': {
                        'Key': resource_tag_key,
                        'Values': [resource_tag_value]
                    }
                },
                Metrics=['UnblendedCost']
            )
            return float(r['ResultsByTime'][0]['Total']['UnblendedCost']['Amount'])
        except Exception:
            return 0.0

    before = get_tagged_cost(last_month_start, last_month_end)
    after = get_tagged_cost(this_month_start, this_month_end)

    # Annualize the projected savings
    days_elapsed = (datetime.utcnow() - datetime.utcnow().replace(day=1)).days + 1
    monthly_run_rate = (after / days_elapsed) * 30 if days_elapsed > 0 else 0
    monthly_savings = before - monthly_run_rate
    annual_savings = monthly_savings * 12

    print(f"Cost for tag {resource_tag_key}={resource_tag_value}:")
    print(f"  Last month: ${before:.2f}")
    print(f"  This month run rate: ${monthly_run_rate:.2f}")
    print(f"  Estimated monthly savings: ${monthly_savings:.2f}")
    print(f"  Projected annual savings: ${annual_savings:.2f}")
    return {'before': before, 'after_run_rate': monthly_run_rate, 'annual_savings': annual_savings}

# Generate the full report
get_service_costs_last_30_days()

Create a CloudWatch Dashboard that shows Trusted Advisor check statuses alongside cost metrics:

{
  "widgets": [
    {
      "type": "metric",
      "x": 0, "y": 0, "width": 12, "height": 6,
      "properties": {
        "title": "AWS Monthly Spend by Service",
        "metrics": [
          ["AWS/Billing", "EstimatedCharges", "ServiceName", "Amazon EC2",
           {"stat": "Maximum", "period": 86400}],
          ["AWS/Billing", "EstimatedCharges", "ServiceName", "Amazon RDS",
           {"stat": "Maximum", "period": 86400}],
          ["AWS/Billing", "EstimatedCharges", "ServiceName", "Amazon S3",
           {"stat": "Maximum", "period": 86400}]
        ],
        "view": "timeSeries",
        "period": 86400,
        "region": "us-east-1"
      }
    },
    {
      "type": "metric",
      "x": 12, "y": 0, "width": 12, "height": 6,
      "properties": {
        "title": "Lambda Invocations (Remediation Function)",
        "metrics": [
          ["AWS/Lambda", "Invocations",
           "FunctionName", "ta-auto-remediation",
           {"stat": "Sum", "period": 3600}],
          ["AWS/Lambda", "Errors",
           "FunctionName", "ta-auto-remediation",
           {"stat": "Sum", "period": 3600, "color": "#d62728"}]
        ],
        "view": "timeSeries",
        "period": 3600
      }
    }
  ]
}

Create this dashboard via CLI:

# Save the JSON above to dashboard.json, then:
aws cloudwatch put-dashboard \
  --dashboard-name "TrustedAdvisorOperations" \
  --dashboard-body file://dashboard.json

# Create a CloudWatch alarm when TA remediation Lambda errors
aws cloudwatch put-metric-alarm \
  --alarm-name "TA-Remediation-Lambda-Errors" \
  --metric-name Errors \
  --namespace AWS/Lambda \
  --dimensions Name=FunctionName,Value=ta-auto-remediation \
  --statistic Sum \
  --period 3600 \
  --threshold 3 \
  --comparison-operator GreaterThanOrEqualToThreshold \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:platform-alerts

Closing the loop: Run a monthly comparison report — pull Trusted Advisor flagged resource counts from the first of each month, and compare against actual Cost Explorer spend. If your TA flagged resources go down month-over-month, your costs should follow. This creates a virtuous cycle: automation finds waste → cost drops → leadership sees ROI → more investment in automation.

FAQ: AWS Trusted Advisor

Q: How often does Trusted Advisor refresh its checks?

Most checks refresh automatically every 24 hours. You can trigger a manual refresh via the console or the API (refresh_trusted_advisor_check), but each check has a minimum refresh interval (typically 5 minutes to 60 minutes depending on the check). The millisUntilNextRefreshable field in the API response tells you exactly when you can refresh again. For production automation, schedule your Lambda to run hourly and call refresh before reading results.

Q: Trusted Advisor shows a check as "green" but I know there's a problem. Why?

Trusted Advisor checks have specific thresholds. For example, the low-utilization EC2 check only flags instances with CPU under 10% for 4 of the last 14 days. An instance that averaged 8% CPU for 10 days but spiked last week won't be flagged. For more nuanced analysis, use AWS Compute Optimizer which runs its own ML-based analysis independently of Trusted Advisor. The two tools complement each other — use both.

Q: Can I suppress a Trusted Advisor finding so it doesn't show up as red?

Yes. In the Trusted Advisor console, you can "exclude" specific resources from a check. The resource moves to the "Excluded Items" tab and stops counting against the check's status. This is useful for intentionally public S3 buckets (like static website hosting), or development security groups that are intentionally open. Suppressions are stored at the account level and persist until removed.

Q: How do I integrate Trusted Advisor alerts into PagerDuty or Opsgenie?

Create an SNS topic, subscribe it to your alerting service's email endpoint or webhook (PagerDuty and Opsgenie both support SNS subscriptions). Then create CloudWatch alarms on Trusted Advisor metrics: aws cloudwatch put-metric-alarm --namespace "AWS/TrustedAdvisor" --metric-name "RedChecks" --threshold 1 --alarm-actions arn:aws:sns:us-east-1:...:your-topic. When any check turns red, the alarm fires, SNS publishes, and your on-call engineer gets paged.

Q: We have 50 accounts. What's the most efficient way to audit all of them?

Use the organizational view: enable it from your management account with aws support enable-aws-organizations-access, then designate a delegated admin account. From that account, you can see aggregated check statuses across all accounts. For per-account details, build an automation Lambda that uses sts:AssumeRole to switch into each account (via the OrganizationAccountAccessRole that AWS creates automatically in member accounts) and pulls individual check results. Store results in DynamoDB or S3 for historical trend analysis.