Correction of Error
Correction of Error is a process for improving quality by documenting and addressing issues. You will want to define a standardized way to document critical root causes, and ensure they are reviewed and addressed.
Related
Process
Applying a COE process help you ensure your team understands root causes, that they have been reviewed in a consistent way, and have be addressed correctly.
Structure of a COE
- What happend?
- What was the impact on customers and your business?
- What was the root cause?
- What data do you have to support this?
- especially metrics and graphs
- What were the critical pillar implications, especially security?
- When architecting workloads you make trade-offs between pillars based upon your business context. These business decisions can drive your engineering priorities. You might optimize to reduce cost at the expense of reliability in development environments, or, for mission-critical solutions, you might optimize reliability with increased costs. Security is always job zero, as you have to protect your customers.
- What lessons did you learn?
- What corrective actions are you taking?
- Actions items
- Related items (trouble tickets etc)
Review
- You should have your COE reviewed by your team, as well as other teams.
- High impact COEs should be reviewed during your operational meetings.