This content is outdated. This version of the Well-Architected Framework is now found at: https://docs.aws.amazon.com/en_us/wellarchitected/2022-03-31/framework/operational-excellence.html

OPS 4: How do you design your workload so that you can understand its state?

Design your workload so that it provides the information necessary across all components (for example, metrics, logs, and traces) for you to understand its internal state. This enables you to provide effective responses when appropriate.

Resources

Gaining Better Observability of Your VMs with Amazon CloudWatch
Application Performance Management on AWS
Amazon CloudWatch Documentation

Best Practices:

Improvement Plan

Implement application telemetry

  • Implement log and metric telemetry: Instrument your application code to emit information about their internal state, status, and the achievement of business outcomes. Use this information to determine when a response is required.
    Gaining better observability of your VMs with Amazon CloudWatch - AWS Online Tech Talks
    How Amazon CloudWatch works
    What is Amazon CloudWatch?
    Using Amazon CloudWatch metrics
    What is Amazon CloudWatch Logs?
  • Implement and configure workload telemetry

  • Implement log and metric telemetry: Instrument your workload to emit information about its internal state, status, and the achievement of business outcomes. Use this information to determine when a response is required.
    Gaining better observability of your VMs with Amazon CloudWatch - AWS Online Tech Talks
    How Amazon CloudWatch works
    What is Amazon CloudWatch?
    Using Amazon CloudWatch metrics
    What is Amazon CloudWatch Logs?
  • Implement user activity telemetry

  • Implement user activity telemetry: Design your application code to emit information about user activity (for example, click streams, or started, abandoned, and completed transactions). Use this information to help understand how the application is used, patterns of usage, and to determine when a response is required.
  • Implement dependency telemetry

  • Implement dependency telemetry: Design and configure your workload to emit information about the state and status of systems it depends on. Some examples include: external databases, DNS, network connectivity, and external credit card processing services.
    Amazon CloudWatch Agent with AWS Systems Manager integration - unified metrics & log collection for Linux & Windows
    Collect metrics and logs from Amazon EC2 instances and on-premises servers with the CloudWatch Agent
  • Implement transaction traceability

  • Implement transaction traceability: Design your application and workload to emit information about the flow of transactions across system components, such as transaction stage, active component, and time to complete activity. Use this information to determine what is in progress, what is complete, and what the results of completed activities are. This helps you determine when a response is required. For example, longer than expected transaction response times within a component can indicate issues with that component.
    AWS X-Ray
    What is AWS X-Ray?