AWS CloudWatch monitors AWS resources and applications in real-time.
CloudWatch can be used to collect and track metrics, which are the variables to be measured for resources and applications.
CloudWatch alarms can be configured
to send notifications or
to automatically make changes to the resources based on defined rules
In addition to monitoring the built-in metrics that come with AWS, custom metrics can also be monitored
CloudWatch provides system-wide visibility into resource utilization, application performance, and operational health.
By default, CloudWatch stores the log data indefinitely, and the retention can be changed for each log group at any time
CloudWatch Alarm history is stored for only 14 days.
CloudWatch collects various metrics from various resources
These metrics, as statistics, are available to the user through Console, CLI
CloudWatch allows creation of alarms with defined rules
to perform actions to auto scaling or stop, start, or terminate instances
to send notifications using SNS actions on your behalf.
CloudWatch can be accessed using
AWS CloudWatch console
CloudWatch CLI
AWS CLI
CloudWatch API
AWS SDKs
CloudWatch offers either basic or detailed monitoring for supported AWS services.
Basic monitoring means that a service sends data points to CloudWatch every five minutes.
Detailed monitoring means that a service sends data points to CloudWatch every minute.
If the AWS service supports both basic and detailed monitoring, the basic would be enabled by default and the detailed monitoring needs to be enabled for details metrics.
Amazon CloudWatch may be used to monitor IOPS metrics from the RDS instance and Amazon Simple Notification Service to send the notification if an alarm is triggered.
We can use CloudWatch to schedule activities. You would create an AWS CloudWatch Events rule that is scheduled using a cron expression. Configure the target as the Lambda function.
Metrics
Metric is the fundamental concept in CloudWatch.
Metrics are data about the performance of your systems.
Many AWS services provide free metrics for resources by default (such as Amazon EC2 instances, Amazon EBS volumes, and Amazon RDS DB instances).
You can also enable detailed monitoring for some resources, such as your Amazon EC2 instances, or publish your own application metrics. Amazon CloudWatch can load all the metrics in your account (both AWS resource metrics and application metrics that you provide) for search, graphing, and alarms.
Uniquely defined by a name, a namespace, and one or more dimensions
Represents a time-ordered set of data points published to CloudWatch.
Think of a metric as a variable to monitor, and the data points as representing the values of that variable over time. For example, the CPU usage of a particular EC2 instance is one metric provided by Amazon EC2. The data points themselves can come from any application or business activity from which you collect data.
Metrics are uniquely defined by a name, a namespace, and zero or more dimensions.
Each data point has a time stamp, and (optionally) a unit of measure
Data points can be either custom metrics or metrics from other
services in AWS.
Statistics can be retrieved about those data points as an ordered set of time-series data that occur within a specified time window.
When the statistics are requested, the returned data stream is identified by namespace, metric name, dimension, and (optionally) the unit.
Metrics exist only in the region in which they are created
CloudWatch stores the metric data for two weeks
Metrics cannot be deleted, but they automatically expire in 14 days if no new data is published to them.
NOTE: From Nov 2016 AWS provides Extended Metrics Retention
One minute data points are available for 15 days.
Five minute data points are available for 63 days.
One hour data points are available for 455 days (15 months).
Logs
CloudWatch Logs allows you to monitor, store, and access your log files from sources including Amazon EC2 instances, Route 53, CloudTrail, and other AWS services.
For example, you could monitor logs from Amazon EC2 instances in real time. You could track the number of errors that have occurred in your application logs and send a notification if that rate exceeds a previously defined amount.
Events
Amazon CloudWatch Events delivers a near real-time stream of system events that describe changes in AWS resources.
AWS resources can generate events when their state changes. For example, Amazon EC2 generates an event when the state of an EC2 instance changes from pending to running, and Amazon EC2 Auto Scaling generates events when it launches or terminates instances.
Using simple rules that you can quickly set up, you can match events and route them to one or more target functions or streams.
CloudWatch Events becomes aware of operational changes as they occur. CloudWatch Events responds to these operational changes and takes corrective action as necessary by sending messages to respond to the environment, activating functions, making changes, and capturing state information.
You can also use CloudWatch Events to schedule automated actions that self-trigger at certain times using cron or rate expressions.
Rules
A rule matches incoming events and routes them to targets for processing. A single rule can route to multiple targets, all of which are processed in parallel.
Rules are not processed in a particular order.
This enables different parts of an organization to look for and process the events that are of interest to them. A rule can customize the JSON message sent to the target by passing only certain parts or by overwriting it with a constant.
Targets
A target processes events.
Targets can include Amazon EC2 instances, AWS Lambda functions, Kinesis data streams, Amazon ECS tasks, Step Functions state machines, Amazon SNS topics, Amazon SQS queues, and built-in targets.
A target receives events in JSON format.
NameSpace
CloudWatch namespaces are containers for metrics.
A namespace allows you to categorize your metrics. If you are collecting a metric specific to a type of AWS resource, you can store the metric within the namespace for the type, such as AWS/EC2, or you can make up your own namespace, such as Customer/PurchaseData.
Metrics in different namespaces are isolated from each other, so that metrics from different applications are not mistakenly aggregated into the same statistics.
AWS namespaces all follow the convention AWS/<service>, for e.g. AWS/EC2 and AWS/ELB
Namespace names must be fewer than 256 characters in length.
There is no default namespace. Each data element put into CloudWatch must specify a namespace
Dimensión
Dimensions is a name/value pair that is part of the identity of a metric.
You can assign up to 10 dimensions to a metric.
Every metric has specific characteristics that describe it, and you can think of dimensions as categories for those characteristics.
Dimensions helps design a structure for the statistics plan.
Dimensions are part of the unique identifier for a metric, whenever a unique name pair is added to one of the metrics, a new metric is created.
Dimensions can be used to filter result sets that CloudWatch query returns
Every metric has specific characteristics that describe it, and you can think of dimensions as categories for those characteristics. Dimensions help you design a structure for your statistics plan. Because dimensions are part of the unique identifier for a metric, whenever you add a unique name/value pair to one of your metrics, you are creating a new variation of that metric.
Timestamp
Each metric data point must be marked with a time stamp to identify the data point on a time series.
Time stamp can be up to two weeks in the past and up to two hours into the future.
If no time stamp is provided, CloudWatch creates a time stamp based on the time the data element was received.
All times reflect the UTC time zone when statistics are retrieved.
Units
Units represent the statistic’s unit of measure for e.g. count, bytes, % etc
Statistics
Statistics are metric data aggregations over specified periods of time.
Aggregations are made using the namespace, metric name, dimensions, and the data point unit of measure, within the specified time period.
Periods
Period is the length of time associated with a specific statistic.
Each statistic represents an aggregation of the metrics data collected for a specified period of time.
Although periods are expressed in seconds, the minimum granularity for a period is one minute.
Aggregation
CloudWatch aggregates statistics according to the period length specified in calls to GetMetricStatistics.
Multiple data points can be published with the same or similar time stamps. CloudWatch aggregates them by period length when the statistics about those data points are requested.
Aggregated statistics are only available when using detailed monitoring.
Instances that use basic monitoring are not included in the aggregates
Metric data points that specify a unit of measure are aggregated separately. When you get statistics without specifying a unit, CloudWatch aggregates all data points of the same unit together.
CloudWatch does not aggregate data across regions.
CloudWatch aggregates statistics according to the period length that you specify when retrieving statistics.
For large datasets, you can insert a pre-aggregated dataset called a statistic set: MIN, MAX, SUM, AVG, Sample and Percentile.
Alarm
Alarms can automatically initiate actions on behalf of the user, based on specified parameters.
Alarm watches a single metric over a specified time period, and performs one or more actions based on the value of the metric relative to a given threshold over a number of time periods.
Alarms invoke actions for sustained state changes only i.e. the state must have changed and been maintained for a specified number of periods.
Action can be a
SNS notification
Auto Scaling policies
EC2 action – stop or terminate EC2 instances
After an alarm invokes an action due to a change in state, its subsequent behavior depends on the type of action associated with the alarm.
For Auto Scaling policy notifications, the alarm continues to invoke the action for every period that the alarm remains in the new state.
For SNS notifications, no additional actions are invoked.
An alarm has three possible states:
OK—The metric is within the defined threshold
ALARM—The metric is outside of the defined threshold
INSUFFICIENT_DATA—Alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state
Alarms exist only in the region in which they are created.
Alarm actions must reside in the same region as the alarm
Alarm history is available for the last 14 days.
Alarm can be tested by setting it to any state using the SetAlarmState API (mon-set-alarm-state command). This temporary state change lasts only until the next alarm comparison occurs.
Alarms can be disabled and enabled using the DisableAlarmActions and EnableAlarmActions APIs (mon-disable-alarm-actions and mon-enable-alarm-actions commands).
Command Example:
>> aws cloudwatch put-metric-alarm --alarm-name NotesWriteCapacityUnitsLimit --metric-name ConsumedReadCapacityUnits --namespace AWS/DynamoDB --statistic Sum --period 60 --treat-missing-data missing --datapoints-to-alarm 5 --alarm-actions arn:aws:cloudwatch:us-east-2:111122223333:alarm:Notes-WriteCapacityUnitsLimit-BasicAlarm --dimensions "Name=InstanceId,Value=i-12345678"
Regions
CloudWatch does not aggregate data across regions. Therefore, metrics are completely separate between regions.
Resolution
Metrics produced by AWS services are standard resolution by default.
When you publish a custom metric, you can define it as either standard resolution or high resolution.
When you publish a high-resolution metric, CloudWatch stores it with a resolution of 1 second, and you can read and retrieve it with a period of 1 second, 5 seconds, 10 seconds, 30 seconds, or any multiple of 60 seconds.
CloudWatch allows publishing custom metrics with put-metric-data CLI command (or its Query API equivalent PutMetricData)
CloudWatch creates a new metric if put-metric-data is called with a new metric name, else it associates the data with the specified existing metric.
put-metric-data command can only publish one data point per call.
Creating a new metric using the put-metric-data command, can take up to two minutes before statistics can be retrieved on the new metric using the get-metric-statistics command and can take up to fifteen minutes before the new metric appears in the list of metrics retrieved using the list-metrics command.
CloudWatch allows publishing
Single data point
Data points can be published with time stamps as granular as one-thousandth of a second, CloudWatch aggregates the data to a minimum granularity of one minute.
CloudWatch records the average (sum of all items divided by number of items) of the values received for every 1-minute period, as well as number of samples, maximum value, and minimum value for the same time period.
CloudWatch uses one-minute boundaries when aggregating data points
Aggregated set of data points called a statistics set
Data can also be aggregated before being published to CloudWatch
Aggregating data minimizes the number of calls reducing it to a single call per minute with the statistic set of data
Statistics include Sum, Average, Minimum, Maximum, SampleCount
If the application produces data that is more sporadic and have periods that have no associated data, either a the value zero (0) or no value at all can be published.
However, it can be helpful to publish zero instead of no value
to monitor the health of your application for e.g. alarm can be configured to notify if no metrics published every 5 minutes
to track the total number of data points
to have statistics such as minimum and average to include data points with the value 0.
Auto Scaling
By default, basic monitoring is enabled when the launch configuration is created using the AWS Management Console and detailed monitoring is enabled when the launch configuration is created using using the AWS CLI or an API
Auto Scaling sends data to CloudWatch every 5 minutes by default, when created from Console.
For an additional charge, you can enable detailed monitoring for Auto Scaling, which sends data to CloudWatch every minute.
Amazon CloudFront
Amazon CloudFront sends data to CloudWatch every minute by default.
Amazon CloudSearch
Amazon CloudSearch sends data to CloudWatch every minute by default.
Amazon CloudWatch Events
Amazon CloudWatch Events sends data to CloudWatch every minute by default.
Amazon CloudWatch Logs
Amazon CloudWatch Logs sends data to CloudWatch every minute by default.
Amazon DynamoDB
Amazon DynamoDB sends data to CloudWatch every minute for some metrics and every 5 minutes for other metrics.
Amazon EC2 Container Service
Amazon EC2 Container Service sends data to CloudWatch every minute.
Amazon ElastiCache
Amazon ElastiCache sends data to CloudWatch every minute.
Amazon Elastic Block Store
Amazon Elastic Block Store sends data to CloudWatch every 5 minutes.
Provisioned IOPS SSD (io1) volumes automatically send one-minute metrics to CloudWatch.
Amazon Elastic Compute Cloud
Amazon EC2 sends data to CloudWatch every 5 minutes by default. For an additional charge, you can enable detailed monitoring for Amazon EC2, which sends data to CloudWatch every minute.
Elastic Load Balancing
Elastic Load Balancing sends data to CloudWatch every minute.
Amazon Elastic MapReduce
Amazon Elastic MapReduce sends data to CloudWatch every 5 minutes.
Amazon Elasticsearch Service
Amazon Elasticsearch Service sends data to CloudWatch every minute.
Amazon Kinesis Streams
Amazon Kinesis Streams sends data to CloudWatch every minute.
Amazon Kinesis Firehose
Amazon Kinesis Firehose sends data to CloudWatch every minute.
AWS Lambda
AWS Lambda sends data to CloudWatch every minute.
Amazon Machine Learning
Amazon Machine Learning sends data to CloudWatch every 5 minutes.
AWS OpsWorks
AWS OpsWorks sends data to CloudWatch every minute.
Amazon Redshift
Amazon Redshift sends data to CloudWatch every minute.
Amazon Relational Database Service
Amazon Relational Database Service sends data to CloudWatch every minute.
Amazon Route 53
Amazon Route 53 sends data to CloudWatch every minute.
Amazon Simple Notification Service
Amazon Simple Notification Service sends data to CloudWatch every 5 minutes.
Amazon Simple Queue Service
Amazon Simple Queue Service sends data to CloudWatch every 5 minutes.
Amazon Simple Storage Service
Amazon Simple Storage Service sends data to CloudWatch once a day.
Amazon Simple Workflow Service
Amazon Simple Workflow Service sends data to CloudWatch every 5 minutes.
AWS Storage Gateway
AWS Storage Gateway sends data to CloudWatch every 5 minutes.
AWS WAF
AWS WAF sends data to CloudWatch every minute.
Amazon WorkSpaces
Amazon WorkSpaces sends data to CloudWatch every 5 minutes.
CloudWatch Logs can be used to monitor, store, and access log files from EC2 instances, CloudTrail, Route 53, and other sources.
CloudWatch Logs uses the log data for monitoring in an not; so, no code changes are required.
CloudWatch Logs require CloudWatch logs agent to be installed on the EC2 instances and on-premises servers.
CloudWatch Logs agent makes it easy to quickly send both rotated and non-rotated log data off of a host and into the log service.
An VPC endpoint can be configured to keep traffic between VPC and CloudWatch Logs from leaving the Amazon network. It doesn’t require an IGW, NAT, VPN connection, or Direct Connect connection
CloudWatch Logs allows exporting log data from the log groups to an S3 bucket, which can then be used for custom processing and analysis, or to load onto other systems.
Log data is encrypted while in transit and while it is at rest
Log data can be encrypted using an AWS KMS or customer master key (CMK).
Log Events
A log event is a record of some activity recorded by the application or resource being monitored.
Log event record contains two properties: the timestamp of when the event occurred, and the raw event message
Log Streams
A log stream is a sequence of log events that share the same source for e.g. logs events from an Apache access log on a specific host.
Log Groups
Log groups define groups of log streams that share the same retention, monitoring, and access control settings for e.g. Apache access logs from each host grouped through log streams into a single log group
Each log stream has to belong to one log group
There is no limit on the number of log streams that can belong to one log group.
Metric Filters
Metric filters can be used to extract metric observations from ingested events and transform them to data points in a CloudWatch metric.
Metric filters are assigned to log groups, and all of the filters assigned to a log group are applied to their log streams.
Retention Settings
Retention settings can be used to specify how long log events are kept in CloudWatch Logs.
Expired log events get deleted automatically.
Retention settings are assigned to log groups, and the retention assigned to a log group is applied to their log streams.
CloudWatch retains metric data as follows:
Data points with a period of less than 60 seconds are available for 3 hours. These data points are high-resolution custom metrics.
Data points with a period of 60 seconds (1 minute) are available for 15 days
Data points with a period of 300 seconds (5 minute) are available for 63 days
Data points with a period of 3600 seconds (1 hour) are available for 455 days (15 months)
Supported logging protocols
Apache log4net
Apache Log4j
Nlog
Serilog
Log Example:
{
"level": "Error",
"message": "Error processing notification",
"timestamp": "1591940416",
"context": {
"userId": “StudentA",
"type": "Lambda.Handler",
"env": "dev",
"component": "api",
"correlationId": "41e556-9e5-4c37-856e-3b623be",
"threadId": 16,
"member": "ProcessNotification",
"sourceFile": "D:\\a\\1\\s\\Lambda\\Handler.cs",
"exception": "<exception details>"
}
}
Monitor Logs from EC2 Instances in Real-time
can help monitor applications and systems using log data
can help track number of errors for e.g. 404, 500, for even specific literal terms “NullReferenceException”, occurring in the applications, which can then be matched to a threshold to send notification
Monitor AWS CloudTrail Logged Events
can be used to monitor particular API activity as captured by CloudTrail by creating alarms in CloudWatch and receive notifications
Archive Log Data
can help store the log data in highly durable storage, an alternative to S3.
log retention setting can be modified, so that any log events older than this setting are automatically deleted.
Log Route 53 DNS Queries
Can help log information about the DNS queries that Route 53 receives.
Real-time Processing of Log Data with Subscriptions
Subscriptions can help get access to real-time feed of logs events from CloudWatch logs and have it delivered to other services such as Kinesis stream, Kinesis Data Firehose stream, or AWS Lambda for custom processing, analysis, or loading to other systems
A subscription filter defines the filter pattern to use for filtering which log events get delivered to the AWS resource, as well as information about where to send matching log events to.
CloudWatch Logs log group can also be configured to stream data Elasticsearch Service cluster in near real-time
Searching and Filtering
CloudWatch Logs allows searching and filtering the log data by creating one or more metric filters.
Metric filters define the terms and patterns to look for in log data as it is sent to CloudWatch Logs.
CloudWatch Logs uses these metric filters to turn log data into numerical CloudWatch metrics that can be put as graph or set an alarm on.
Facilitates observability for your applications and underlying Amazon resources.
It helps you set up the best monitors for your application resources to analyze data continuously for signs of problems with your applications.
When you add your applications to Amazon CloudWatch Application Insights, it scans the resources in the applications.
CloudWatch Application Insights then recommends and configures metrics and logs on CloudWatch for application components.
Main capabilities:
Scans applications components to recommend key metrics, logs, and other data sources
Detects problems in your application and automatically creates CloudWatch dashboards with contextual information
Generates CloudWatch Events to notify you of future events