OPS 7: How do you know that you are ready to support a workload?
Evaluate the operational readiness of your workload, processes and procedures, and personnel to understand the operational risks related to your workload.
Resources
AWS Config
AWS Systems Manager Features
Best Practices:
-
Ensure personnel capability: Have a mechanism to validate that you have the appropriate number of trained personnel to provide support for operational needs. Train personnel and adjust personnel capacity as necessary to maintain effective support.
-
Ensure consistent review of operational readiness: Ensure you have a consistent review of your readiness to operate a workload. Reviews must include, at a minimum, the operational readiness of the teams and the workload, and security requirements. Implement review activities in code and trigger automated review in response to events where appropriate, to ensure consistency, speed of execution, and reduce errors caused by manual processes.
-
Use runbooks to perform procedures: Runbooks are documented procedures to achieve specific outcomes. Enable consistent and prompt responses to well-understood events by documenting procedures in runbooks. Implement runbooks as code and trigger the execution of runbooks in response to events where appropriate, to ensure consistency, speed responses, and reduce errors caused by manual processes.
-
Use playbooks to investigate issues: Enable consistent and prompt responses to issues that are not well understood, by documenting the investigation process in playbooks. Playbooks are the predefined steps performed to identify the factors contributing to a failure scenario. The results from any process step are used to determine the next steps to take until the issue is identified or escalated.
-
Make informed decisions to deploy systems and changes: Evaluate the capabilities of the team to support the workload and the workload's compliance with governance. Evaluate these against the benefits of deployment when determining whether to transition a system or change into production. Understand the benefits and risks to make informed decisions.
Improvement Plan
Ensure personnel capability
- Team size: Ensure that you have enough team members to cover operational activities, including on-call duties.
- Team skill: Ensure that your team members have sufficient training on AWS, your
workload, and your operations tools to perform their duties.
AWS Events and Webinars
Welcome to AWS Training and Certification - Review capabilities: Review team size and skill as operating conditions and workloads change, to ensure there is sufficient capability to maintain operational excellence. Make adjustments to ensure that team size and skill match the operational requirements for the workloads that the team supports.
Ensure consistent review of operational readiness
AWS Systems Manager
AWS Config Rules dynamic compliance checking for cloud resources
How to audit your AWS resources for security compliance by using custom AWS Config Rules
How to track configuration changes to CloudFormation stacks using AWS Config
Amazon Inspector update assessment reporting, proxy support, and more
- Create checklists: Ensure you have a consistent review of your readiness to operate a workload. Create operational readiness checklists and validate them against your business, development, operations, and governance requirements. Ensure they address: governance, best practices, configuration standards, restoration procedures, monitoring, maintenance procedures, IT operations procedures, and staffing.
- Use checklists: Make checklists accessible to developers so that they can develop to the appropriate standards. Evaluate checklists when moving between lifecycle stages and environments so that you can identify issues early, when the level of effort to remediate issues is lower. Use the results of checklists to make informed decisions about benefits and risks when considering promoting changes between environments.
- Implement checklists as code and trigger checklist execution in response to events: Implement checklists as code and trigger checklist execution
in response to events where possible, to enhance speed, ensure consistency, and reduce errors caused by manual processes. Integrate
automated checklist execution into deployment pipelines.
AWS Config
What is AWS Config?
AWS Config: evaluating resources with Rules
Use runbooks to perform procedures
- Implement runbooks as code: Perform your operations as code by implementing your runbooks as code to ensure consistency and reduce errors caused by manual processes
AWS Systems Manager Run Command
AWS Systems Manager Automation
What is AWS Lambda? - Trigger runbooks in response to events: Trigger the execution of runbook code in response to observed events when appropriate. This increases the speed of the response
and reduces the level of effort to respond.
What is Amazon CloudWatch Events?
Creating a CloudWatch Events rule that triggers on an event
Creating a CloudWatch Events rule that triggers on an AWS API call using AWS CloudTrail
CloudWatch Events event examples from supported services
Using Amazon CloudWatch Alarms
Use playbooks to investigate issues
- Implement playbooks as code: Perform your operations as code by scripting your playbooks to ensure consistency and limit reduce errors caused by manual processes. Playbooks can be composed of multiple scripts representing the different steps that might be
necessary to identify the contributing factors to an issue.
Runbook activities can be triggered or performed as part of playbook activities, or may prompt for execution of a playbook in response to identified events.
Automate your operational playbooks with AWS Systems Manager
AWS Systems Manager Run Command
AWS Systems Manager Automation
What is AWS Lambda?
What is Amazon CloudWatch Events?
Using Amazon CloudWatch Alarms
Make informed decisions to deploy systems and changes