PERF 7: How do you monitor your resources to ensure they are performing?

System performance can degrade over time. Monitor system performance to identify degradation and remediate internal or external factors, such as the operating system or application load.

Resources

Cut through the chaos: Gain operational visibility and insight (MGT301-R1)
X-Ray Documentation
CloudWatch Documentation
Monitoring, Logging, and Performance APN Partners

Best Practices:

Improvement Plan

Record performance-related metrics

  • Record performance data: Identify the relevant performance metrics for your workload and record them. This data helps identify which components are impacting overall performance or efficiency of your workload.
  • Identify performance metrics: Use the customer experience to identify the most important metrics. For each metric, identify the target, measurement approach, and priority. Use these data points to build alarms and notifications to proactively address performance-related issues.
  • Analyze metrics when events or incidents occur

  • Prioritize experience concerns for critical user stories: When you write critical user stories for your architecture, include performance requirements, such as specifying how quickly each critical story should execute. For these critical stories, implement additional scripted user journeys to ensure that you know how the user stories perform against your requirements.
  • Establish Key Performance Indicators (KPIs) to measure workload performance

  • Define the customer experience: Document the performance experience required by customers, including how customers judge the performance of the workload. Use these requirements to establish your KPIs, which indicate how the system is performing overall.
  • Test user journeys: Use synthetic or sanitized versions of production data (remove sensitive or identifying information) for load testing. Exercise your entire architecture by using replayed or pre-programmed user journeys through your application at scale.
  • Use monitoring to generate alarm-based notifications

  • Monitor metrics: Amazon CloudWatch can collect metrics across the resources in your architecture. You can collect and publish custom metrics to surface business or derived metrics. Use CloudWatch or a third-party monitoring service to set alarms that indicate when thresholds are breached.
  • Review metrics at regular intervals

  • Constantly improve metric collection and monitoring: As part of responding to incidents or events, evaluate which metrics were helpful in addressing the issue and which metrics could have helped that are not currently being tracked. Use this method to improve the quality of metrics you collect so that you can prevent or more quickly resolve future incidents.
  • Monitor and alarm proactively

  • Monitor performance during operations: Implement processes that provide visibility into performance as your workload is running. Build monitoring dashboards and establish a baseline for performance expectations.