Skip to main content

AWS CloudWatch: Comprehensive Monitoring and Observability

AWS CloudWatch is a powerful monitoring and observability service that provides data and actionable insights to optimize application performance, resource utilization, and overall operational health.

Monitoring Metrics

When hosting NailYourInterview.org on EC2 instances and using CloudFront for content delivery, several key metrics can be monitored to ensure optimal performance:

  • 🌐 CPU Utilization: Track the CPU usage of EC2 instances to ensure they aren't overloaded and can handle incoming requests efficiently.

  • 💾 Memory Usage: Monitor memory consumption to prevent performance bottlenecks and ensure that applications have enough resources to run smoothly.

  • 📈 Request Count: Keep track of the number of HTTP requests to gauge the traffic load on the site, helping to manage scaling and capacity planning.

  • 🔄 Cache Hit Rate: For CloudFront, monitor how often content is served from the cache versus the origin server to optimize caching strategies and reduce latency.

These metrics help in understanding usage patterns and the overall performance of the infrastructure.

Types of Metrics

Default Metrics

AWS CloudWatch provides default metrics automatically for AWS services:

  • 🔄 Basic Monitoring: Data is updated every 5 minutes.

  • ⏱️ Detailed Monitoring: Data is updated every minute, providing more granular insights (additional cost applies).

  • 📊 Standard Resolution: The default is a 1-minute resolution for most metrics, which is sufficient for tracking general performance.

  • 🔬 High Resolution: Allows capturing data points as frequently as every second for more precise monitoring, useful for highly dynamic environments.

Custom Metrics

To collect application-specific metrics, configure the CloudWatch Agent on EC2 instances. This setup is useful for monitoring custom data such as user activity, application performance, or any other metric specific to your needs.

Setting Up Alarms

CloudWatch Alarms help you stay ahead of potential issues by triggering actions based on predefined thresholds:

⚠️ High CPU Utilization Alarm: For NailYourInterview.org, configure an alarm to trigger if CPU usage exceeds 80% for 10 minutes. This will notify you via email or SMS, allowing you to investigate or scale resources.

Note

CloudWatch Alarms can trigger automated scaling actions, adjusting resources based on real-time demand.

🔔 Low Cache Hit Rate Alarm: If CloudFront's cache hit rate falls below 60%, an alarm can notify you to adjust caching settings or address content delivery issues. AWS SNS (Simple Notification Service) can be used to send notifications via email or SMS.

Managing Logs

CloudWatch Logs capture detailed information from your instances, helping you analyze and troubleshoot issues:

Example Scenario: To capture details like requested URL, timestamp, user IP address, and user agent, configure your server to generate access logs.

Example Nginx Access Log Entry:

192.168.1.1 - - [03/Aug/2024:12:00:00 +0000] "GET /about HTTP/1.1" 200

Steps to Configure CloudWatch Logs:

  1. Enable Access Logs: Update your Nginx configuration to log requests.

    access_log /var/log/nginx/access.log combined;
  2. Install CloudWatch Agent: Deploy the CloudWatch Agent on your EC2 instance to collect logs and metrics.

  3. Configure CloudWatch Agent: Edit the amazon-cloudwatch-agent.json file to specify which log files to monitor:

    {
    "logs": {
    "logs_collected": {
    "files": {
    "collect_list": [
    {
    "file_path": "/var/log/nginx/access.log",
    "log_group_name": "NailYourInterviewAccessLogs",
    "log_stream_name": "{instance_id}"
    }
    ]
    }
    }
    }
    }
  4. Start CloudWatch Agent: Start or restart the agent to begin sending logs to CloudWatch.

You will now have a log group (e.g., NailYourInterviewAccessLogs) in CloudWatch Logs where you can view and analyze the collected logs.

Events

CloudWatch Events provide near real-time stream of system events that describe changes in AWS resources:

Components of CloudWatch Events

  1. Rule: Defines the conditions under which an event is triggered.

    • Source: The origin of the event, such as an AWS service, custom events, or scheduled time.

    • Target: Specifies the action to take when the event matches the rule, such as invoking a Lambda function, sending an SNS notification, or starting a Step Function.

  2. Source: Can be a specific service or action.

    • Pattern: Filters events based on specific criteria, such as service name or event type. Example: Trigger on PutObject events for an S3 bucket.

    • Schedule: Create rules that run at specific times or intervals, similar to cron jobs.

  3. Event Bus:

    • Default Event Bus: Captures and routes events generated by AWS services (like EC2, RDS).

    • Custom Event Buses: Route events from external systems.

    • Partner Event Buses: Integrate with events from AWS Partner Network (APN) partners.

Note

The key difference between Alarms & Events is that Events can also be triggered based on user actions within the AWS account and can self-trigger based on predefined schedules or patterns.