OPS 4: How do you design your workload so that you can understand its state?
Design your workload so that it provides the information necessary across all components (for example, metrics, logs, and traces) for you to understand its internal state. This enables you to provide effective responses when appropriate.
Resources
Gaining Better Observability of Your VMs with Amazon CloudWatch
Application Performance Management on AWS
Amazon CloudWatch Documentation
Best Practices:
-
Implement application telemetry: Instrument your application code to emit information about its internal state, status, and achievement of business outcomes. For example, queue depth, error messages, and response times. Use this information to determine when a response is required.
-
Implement and configure workload telemetry: Design and configure your workload to emit information about its internal state and current status. For example, API call volume, HTTP status codes, and scaling events. Use this information to help determine when a response is required.
-
Implement user activity telemetry: Instrument your application code to emit information about user activity, for example, click streams, or started, abandoned, and completed transactions. Use this information to help understand how the application is used, patterns of usage, and to determine when a response is required.
-
Implement dependency telemetry: Design and configure your workload to emit information about the status (for example, reachability or response time) of resources it depends on. Examples of external dependencies can include, external databases, DNS, and network connectivity. Use this information to determine when a response is required.
-
Implement transaction traceability: Implement your application code and configure your workload components to emit information about the flow of transactions across the workload. Use this information to determine when a response is required and to assist you in identifying the factors contributing to an issue.
Improvement Plan
Implement application telemetry
Gaining better observability of your VMs with Amazon CloudWatch - AWS Online Tech Talks
How Amazon CloudWatch works
What is Amazon CloudWatch?
Using Amazon CloudWatch metrics
What is Amazon CloudWatch Logs?
- Implement application telemetry: Design your application code to emit information about its
internal state, status, and achievement of business outcomes (for example, queue depth,
error messages, and response times).
Collect metrics and logs from Amazon EC2 Instances and on-premises servers with the CloudWatch Agent
Using CloudWatch Logs with container instances
Accessing Amazon CloudWatch Logs for AWS Lambda
Publish custom metrics
Implement and configure workload telemetry
Gaining better observability of your VMs with Amazon CloudWatch - AWS Online Tech Talks
How Amazon CloudWatch works
What is Amazon CloudWatch?
Using Amazon CloudWatch metrics
What is Amazon CloudWatch Logs?
- Implement and configure workload telemetry: Design and configure your workload to emit information about its internal state and current status (for example, API
call volume, HTTP status codes, and scaling events).
Amazon CloudWatch metrics and dimensions reference
AWS CloudTrail
What Is AWS CloudTrail?
VPC Flow Logs
Implement user activity telemetry
Implement dependency telemetry
Amazon CloudWatch Agent with AWS Systems Manager integration - unified metrics & log collection for Linux & Windows
Collect metrics and logs from Amazon EC2 instances and on-premises servers with the CloudWatch Agent
Implement transaction traceability
AWS X-Ray
What is AWS X-Ray?