This content is outdated. This version of the Well-Architected Framework is now found at: https://docs.aws.amazon.com/en_us/wellarchitected/2022-03-31/framework/operational-excellence.html

OPS 11: How do you evolve operations?

Dedicate time and resources for continuous incremental improvement to evolve the effectiveness and efficiency of your operations.

Best Practices:

Improvement Plan

Have a process for continuous improvement

  • Define processes for continuous improvement: Regularly evaluate and prioritize opportunities for improvement to focus efforts where they provide the greatest benefits. Implement changes to improve and evaluate the outcomes to determine success. If the outcomes do not satisfy the goals, and the improvement is still a priority, iterate using alternative courses of action. Your operations processes should include dedicated time and resources to make continuous incremental improvements possible.
  • Perform post-incident analysis

  • Use a process to determine contributing factors: Review all customer impacting incidents. Have a process to identify and document the contributing factors of an incident so that you can develop mitigations to limit or prevent recurrence and you can develop procedures for prompt and effective responses. Communicate root cause as appropriate, tailored to target audiences.
  • Implement feedback loops

  • Feedback loops: Have procedures embedded in your operations activities to capture feedback from their execution and to identify areas for improvement.
  • Perform Knowledge Management

  • Knowledge Management: Ensure mechanisms exist for your team members to discover the information that they are looking for in a timely manner, access it, and identify that it’s current and complete. Maintain mechanisms to identify needed content, content in need of refresh, and content that should be archived so that it’s no longer referenced.
  • Define drivers for improvement

  • Understand drivers for improvement: You should only make changes to a system when a desired outcome is supported.
  • Validate insights

  • Validate Insights: Engage with business owners and subject matter experts to ensure there is common understanding and agreement of the meaning of the data you have collected. Identify additional concerns, potential impacts, and determine a courses of action.
  • Perform operations metrics reviews

  • Operations metrics reviews: Regularly perform retrospective analysis of operations metrics with cross-team participants from different areas of the business. Engage stakeholders, including the business, development, and operations teams, to validate your findings from immediate feedback and retrospective analysis, and to share lessons learned. Use their insights to identify opportunities for improvement and potential courses of action.
    Amazon CloudWatch
    Using Amazon CloudWatch metrics
    Publish custom metrics
    Amazon CloudWatch metrics and dimensions reference
  • Document and share lessons learned

  • Document and share lessons learned: Have procedures to document the lessons learned from the execution of operations activities and retrospective analysis so that they can be used by other teams.
  • Allocate time to make improvements

  • Allocate time to make improvements: Dedicate time and resources within your processes to make continuous incremental improvements possible. Implement changes to improve and evaluate the results to determine success. If the results do not satisfy the goals, and the improvement is still a priority, pursue alternative courses of action.