Metric Filters

📋 Table of Contents

🌟 Overview

This document outlines the CloudWatch Metric Filters configured in our infrastructure. Metric filters allow us to extract specific data from log events and transform it into CloudWatch metrics, enabling more advanced monitoring and alerting capabilities.

Note: The metric filter and associated resources are defined in the metric-filter.ts file in our infrastructure code.

🏗️ Metric Filter Structure

A metric filter typically consists of the following components:

Component	Description
Filter Pattern	Defines what to look for in the log events
Metric Name	The name of the metric to create or update
Metric Namespace	The namespace for the metric
Metric Value	The value to publish for the metric when a matched log event is found

🚨 Error Metric Filter

We have configured a metric filter to track error logs across our application.

Error Metric Filter Configuration

const errorMetricFilter = new aws.cloudwatch.LogMetricFilter(`gh-errorfilter-${stack}-cw-${region}-metric-filter`, {
    name: `gh-errorfilter-${stack}-cw-${region}-metric-filter`,
    logGroupName: logGroup.name,
    metricTransformation: {
        name: "ErrorCount",
        namespace: "CustomMetrics",
        value: "1",
    },
    pattern: '{(($.event.status = "500 Internal Server Error") || ($.event.code = 500) || ($.level = "error" && $.event != "Invalid HTTP_HOST header: *" && $.event != "*was sent SIGTERM!") || ($.event.level = "error" && $.event != "Invalid HTTP_HOST header: *" && $.event != "*was sent SIGTERM!"))}',
});

This filter counts occurrences of 500 errors and other error logs, excluding specific known issues.

Important: Regularly review and update the filter pattern to ensure it captures all relevant error scenarios while excluding false positives.

🔔 Error Alarm

An alarm is set up to trigger when the error count exceeds a specified threshold.

Error Alarm Configuration

const errorAlarm = new aws.cloudwatch.MetricAlarm(`gh-errorfilter-${stack}-cw-${region}-alarm`, {
    name: `gh-errorfilter-${stack}-cw-${region}-alarm`,
    comparisonOperator: "GreaterThanThreshold",
    evaluationPeriods: 1,
    metricName: errorMetricFilter.metricTransformation.name,
    namespace: errorMetricFilter.metricTransformation.namespace,
    period: 10,
    statistic: "Sum",
    threshold: 1,
    alarmDescription: "This alarm is triggered when there are any error logs",
    alarmActions: [snsTopicErrorAlerts.arn],
    treatMissingData: "notBreaching",
});

This alarm triggers when there is more than one error log within a 10-second period.

📡 Log Subscription

A log subscription filter is set up to send matching log events to a Lambda function for further processing.

Log Subscription Configuration

const logSubscription = new aws.cloudwatch.LogSubscriptionFilter(`gh-errorfilter-${stack}-cw-${region}-subscription`, {
    logGroup: logGroup.name,
    filterPattern: errorMetricFilter.pattern,
    destinationArn: errorFilterLambdaFunction.arn,
}, {
    dependsOn: [lambdaPermission, logRolePolicy, logGroup],
});

This subscription sends matching log events to the errorFilterLambdaFunction for additional processing or alerting.

Note: For details on how the Lambda function processes and sends these logs to Discord, see the Error Filter Discord Lambda Function documentation.

🔐 IAM Roles and Permissions

Appropriate IAM roles and permissions are set up to allow CloudWatch Logs to invoke the Lambda function.

IAM Role and Policy Configuration

const logsRole = new aws.iam.Role(`gh-errorfilter-${stack}-iam-${region}-role`, {
    // ... role configuration ...
});

const logRolePolicy = new aws.iam.RolePolicy(`gh-errorfilter-${stack}-iam-${region}-role-policy`, {
    // ... policy configuration ...
});

const lambdaPermission = new aws.lambda.Permission(`gh-errorfilter-${stack}-lambda-${region}-permission`, {
    // ... lambda permission configuration ...
});

These configurations ensure that CloudWatch Logs has the necessary permissions to interact with the Lambda function.

📝 Best Practices

Optimize Filter Patterns: Regularly review and refine filter patterns to ensure accuracy and efficiency.
Monitor Costs: Be aware that extracting metrics from logs can increase CloudWatch costs. Monitor usage and adjust as needed.
Use Meaningful Metric Names: Choose clear and descriptive names for your metrics to aid in monitoring and troubleshooting.
Set Appropriate Thresholds: Regularly review and adjust alarm thresholds based on application behavior and requirements.
Leverage Lambda for Complex Processing: Use Lambda functions for more complex log processing that can't be handled by metric filters alone.

🔗 Useful Links

Remember to keep this document up-to-date as you modify the metric filter configuration or add new filters. Regular reviews will ensure that your log monitoring strategy remains effective and aligned with your operational needs.

Metric Filters

📋 Table of Contents​

🌟 Overview​

🏗️ Metric Filter Structure​

🚨 Error Metric Filter​

🔔 Error Alarm​

📡 Log Subscription​

🔐 IAM Roles and Permissions​

📝 Best Practices​

🔗 Useful Links​