This content is outdated. This version of the Well-Architected Framework is now found at: https://docs.aws.amazon.com/en_us/wellarchitected/2022-03-31/framework/operational-excellence.html

OPS 7: How do you know that you are ready to support a workload?

Evaluate the operational readiness of your workload, processes and procedures, and personnel to understand the operational risks related to your workload.

Resources

AWS Config
AWS Systems Manager Features

Best Practices:

Improvement Plan

Ensure personnel capability

  • Personnel capability: Validate that there are sufficient trained personnel to effectively support the workload.
  • Ensure consistent review of operational readiness

  • Ensure consistent review of operational readiness: Ensure you have a consistent review of your readiness to operate a workload. Review must include at a minimum the operational readiness of the teams and the workload, and security considerations. Review elements can be hard requirements or you can make a risk-based decision to operate a workload that does not satisfy all requirements. Review elements can be specific to a workload, architecture, or can be implementation dependent. Implement reviews as code and trigger reviews in response to events where appropriate, to ensure consistency, speed of execution, and reduce errors caused by manual processes.
    AWS Systems Manager
    AWS Config Rules dynamic compliance checking for cloud resources
    How to audit your AWS resources for security compliance by using custom AWS Config Rules
    How to track configuration changes to CloudFormation stacks using AWS Config
    Amazon Inspector update assessment reporting, proxy support, and more
  • Use runbooks to perform procedures

  • Use runbooks to perform standard procedures: Runbooks are documented procedures to achieve specific outcomes. Enable consistent and prompt responses to well understood events by documenting procedures in runbooks. Runbooks must contain the minimum information for an adequately skilled person to achieve the desired outcome. For example, required permissions, required tools, constraints on performing the procedure (for example, specific maintenance windows), and execution steps.
  • Use playbooks to investigate issues

  • Use playbooks to identify issues: Playbooks are documented processes to investigate issues. Enable consistent and prompt responses to failure scenarios by documenting processes in playbooks. Playbooks must contain the information and guidance necessary for an adequately skilled person to gather applicable information, identify potential sources of failure, isolate faults, and determine contributing factors (i.e. perform root cause analysis).
  • Make informed decisions to deploy systems and changes

  • Make informed decisions to deploy workloads and changes: Evaluate the capabilities of the team to support the workload and the workload's compliance with governance. Evaluate these against the benefits of deployment when determining whether to transition a system or change into production. Understand the benefits and risks, and make informed decisions.