AWS CloudWatch: Metrics, Alarms, Logs and Dashboards (2026)
CloudWatch is AWS's observability platform — it collects metrics from every AWS service, stores application logs, triggers alarms, and surfaces dashboards. This guide covers the full observability stack: custom metrics, structured logging, powerful Logs Insights queries, composite alarms, and Container Insights for Kubernetes.
Table of Contents
1. Metrics and Namespaces
CloudWatch organises metrics into namespaces (e.g., AWS/EC2, AWS/RDS, AWS/Lambda). Each metric has dimensions that narrow it to a specific resource. Key built-in metrics:
- EC2: CPUUtilization, NetworkIn/Out, DiskReadOps (note: RAM and disk % require CloudWatch Agent)
- Lambda: Duration, Errors, Throttles, ConcurrentExecutions, IteratorAge (for SQS/Kinesis triggers)
- RDS: CPUUtilization, DatabaseConnections, FreeStorageSpace, ReadLatency, WriteLatency
- ALB: RequestCount, TargetResponseTime, HTTPCode_Target_5XX_Count, UnHealthyHostCount
2. Publishing Custom Metrics
Push custom application metrics with the AWS SDK:
import boto3
cloudwatch = boto3.client('cloudwatch', region_name='us-east-1')
def record_order_processed(order_value: float):
cloudwatch.put_metric_data(
Namespace='MyApp/Orders',
MetricData=[
{
'MetricName': 'OrdersProcessed',
'Value': 1,
'Unit': 'Count',
'Dimensions': [
{'Name': 'Environment', 'Value': 'production'},
{'Name': 'Region', 'Value': 'us-east-1'}
]
},
{
'MetricName': 'OrderValue',
'Value': order_value,
'Unit': 'None',
'Dimensions': [{'Name': 'Environment', 'Value': 'production'}]
}
]
)
Or via CLI for quick testing:
aws cloudwatch put-metric-data \
--namespace "MyApp/Orders" \
--metric-name "OrdersProcessed" \
--value 1 \
--unit Count \
--dimensions Environment=production
3. Alarms and Composite Alarms
Create an alarm on Lambda error rate with SNS notification:
aws cloudwatch put-metric-alarm \
--alarm-name "lambda-high-error-rate" \
--alarm-description "Lambda error rate above 5%" \
--namespace AWS/Lambda \
--metric-name Errors \
--dimensions Name=FunctionName,Value=myapp-processor \
--statistic Sum \
--period 60 \
--evaluation-periods 3 \
--threshold 5 \
--comparison-operator GreaterThanOrEqualToThreshold \
--treat-missing-data notBreaching \
--alarm-actions arn:aws:sns:us-east-1:123456789:myapp-alerts \
--ok-actions arn:aws:sns:us-east-1:123456789:myapp-alerts
Composite alarms combine multiple alarms with AND/OR logic — reduces alert noise:
aws cloudwatch put-composite-alarm \
--alarm-name "service-degraded" \
--alarm-rule "ALARM(lambda-high-error-rate) AND ALARM(alb-5xx-high)" \
--alarm-actions arn:aws:sns:us-east-1:123456789:myapp-critical
treat-missing-data ignore for metrics that are only published when events occur (like Lambda invocations during low-traffic hours) — otherwise you'll get false alarms at night.4. CloudWatch Logs and Log Groups
Structure your logs as JSON for powerful querying — avoid plain text logs in production:
import json, logging, time
class JsonFormatter(logging.Formatter):
def format(self, record):
log_obj = {
"timestamp": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
"level": record.levelname,
"message": record.getMessage(),
"logger": record.name,
}
if hasattr(record, 'request_id'):
log_obj['request_id'] = record.request_id
return json.dumps(log_obj)
logger = logging.getLogger()
handler = logging.StreamHandler()
handler.setFormatter(JsonFormatter())
logger.addHandler(handler)
logger.setLevel(logging.INFO)
Set log group retention to avoid unbounded storage costs:
aws logs put-retention-policy \
--log-group-name /aws/lambda/myapp-processor \
--retention-in-days 30
5. Logs Insights Queries
CloudWatch Logs Insights lets you run SQL-like queries across log groups. Essential queries:
-- Find top 10 slowest Lambda invocations in the last hour
fields @timestamp, @duration, @requestId
| filter @type = "REPORT"
| sort @duration desc
| limit 10
-- Count errors by type
fields @timestamp, level, message
| filter level = "ERROR"
| stats count(*) as error_count by message
| sort error_count desc
-- P99 API response times
fields @timestamp, responseTime
| filter ispresent(responseTime)
| stats pct(responseTime, 99) as p99, pct(responseTime, 95) as p95, avg(responseTime) as avg_ms
| sort @timestamp desc
6. CloudWatch Agent
The CloudWatch Agent runs on EC2 and collects OS-level metrics (memory, disk) that aren't available by default. Install and configure:
# Install on Amazon Linux 2023
sudo dnf install amazon-cloudwatch-agent -y
# Create config file
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
A minimal agent config for memory and disk:
{
"metrics": {
"namespace": "CWAgent",
"metrics_collected": {
"mem": {
"measurement": ["mem_used_percent"],
"metrics_collection_interval": 60
},
"disk": {
"measurement": ["used_percent"],
"metrics_collection_interval": 60,
"resources": ["/"]
}
}
},
"logs": {
"logs_collected": {
"files": {
"collect_list": [{
"file_path": "/var/log/myapp/app.log",
"log_group_name": "/ec2/myapp",
"log_stream_name": "{instance_id}"
}]
}
}
}
}
7. Container Insights for EKS
Container Insights collects cluster, node, pod and container-level metrics from EKS. Enable via CloudWatch add-on:
aws eks create-addon \
--cluster-name my-cluster \
--addon-name amazon-cloudwatch-observability \
--service-account-role-arn arn:aws:iam::123456789:role/CloudWatchAgentRole
This deploys the CloudWatch agent as a DaemonSet and enables Container Insights metrics in the ContainerInsights namespace — including pod_cpu_utilization, pod_memory_utilization, node_cpu_utilization, and cluster_failed_node_count.
8. Dashboards
Create a dashboard via CLI with widgets as JSON:
aws cloudwatch put-dashboard \
--dashboard-name "MyApp-Production" \
--dashboard-body '{
"widgets": [
{
"type": "metric",
"properties": {
"title": "Lambda Error Rate",
"metrics": [["AWS/Lambda","Errors","FunctionName","myapp-processor"]],
"period": 60,
"stat": "Sum",
"view": "timeSeries"
}
}
]
}'
Frequently Asked Questions
Why can't I see memory usage for my EC2 instances in CloudWatch?
Memory and disk utilization are OS-level metrics that AWS cannot access from the hypervisor. Install and configure the CloudWatch Agent on your EC2 instances to collect and publish these metrics to the custom CWAgent namespace.
How much does CloudWatch cost?
Key cost drivers: custom metrics ($0.30/metric/month after 10 free), Logs ingestion ($0.50/GB), Logs storage ($0.03/GB/month), Logs Insights queries ($0.005 per GB scanned). Avoid dumping huge volumes of verbose logs — use structured JSON and filter at the source.
What is the difference between a metric filter and Logs Insights?
Metric filters run continuously and convert matching log events into CloudWatch metrics in real time — useful for creating alarms on log patterns. Logs Insights is an ad-hoc query engine for historical analysis. Use metric filters for alerting, Logs Insights for investigation.
How do I avoid CloudWatch alarm noise?
Use composite alarms to require multiple conditions before alerting, set appropriate evaluation periods (require N of M datapoints), use treat-missing-data notBreaching for sparse metrics, and implement alarm suppression during maintenance windows with alarm actions on parent composite alarms.