OPS 8: How do you understand the health of your workload?
Define, capture, and analyze workload metrics to gain visibility to workload events so that you can take appropriate action.
Resources
Build a Monitoring Plan
Creating Amazon CloudWatch Alarms
AWS Answers: Centralized Logging
Best Practices:
-
Identify key performance indicators: Identify key performance indicators (KPIs) based on desired business outcomes (for example, order rate, customer retention rate, and profit versus operating expense) and customer outcomes (for example, customer satisfaction). Evaluate KPIs to determine workload success.
-
Define workload metrics: Define workload metrics to measure the achievement of KPIs (for example, abandoned shopping carts, orders placed, cost, price, and allocated workload expense). Define workload metrics to measure the health of the workload (for example, interface response time, error rate, requests made, requests completed, and utilization). Evaluate metrics to determine if the workload is achieving desired outcomes, and to understand the health of the workload.
-
Collect and analyze workload metrics: Perform regular proactive reviews of metrics to identify trends and determine where appropriate responses are needed.
-
Establish workload metrics baselines: Establish baselines for metrics to provide expected values as the basis for comparison and identification of under and over performing components. Identify thresholds for improvement, investigation, and intervention.
-
Learn expected patterns of activity for workload: Establish patterns of workload activity to identify anomalous behavior so that you can respond appropriately if required.
-
Alert when workload outcomes are at risk: Raise an alert when workload outcomes are at risk so that you can respond appropriately if necessary.
-
Alert when workload anomalies are detected: Raise an alert when workload anomalies are detected so that you can respond appropriately if necessary.
-
Validate the achievement of outcomes and the effectiveness of KPIs and metrics : Create a business-level view of your workload operations to help you determine if you are satisfying needs and to identify areas that need improvement to reach business goals. Validate the effectiveness of KPIs and metrics and revise them if necessary.
Improvement Plan
Identify key performance indicators
Define workload metrics
Publish custom metrics
Searching and filtering log data
Amazon CloudWatch metrics and dimensions reference
Collect and analyze workload metrics
Using Amazon CloudWatch metrics
Amazon CloudWatch metrics and dimensions reference
Collect metrics and logs from Amazon EC2 instances and on-premises servers with the CloudWatch Agent
Establish workload metrics baselines
Creating Amazon CloudWatch alarms
Learn expected patterns of activity for workload
Alert when workload outcomes are at risk
What is Amazon CloudWatch Events?
Creating Amazon CloudWatch alarms
Invoking Lambda functions using Amazon SNS notifications
Alert when workload anomalies are detected
What is Amazon CloudWatch Events?
Creating Amazon CloudWatch alarms
Invoking Lambda functions using Amazon SNS notifications
Validate the achievement of outcomes and the effectiveness of KPIs and metrics
Using Amazon CloudWatch dashboards
What is log analytics?