Runbook
Enable consistent and prompt responses to well understood events by documenting procedures in runbooks. Runbooks are the predefined procedures to achieve a specific outcome. Runbooks should contain the minimum information necessary to successfully perform the procedure. Start with a valid effective manual process, implement it in code and trigger automated execution where appropriate. This ensures consistency, speeds responses, and reduces errors caused by manual processes.
For example, the procedure to update static web site content might include staging the content in an uploaded content location, updating the web site to reference the new content, validating the display of the new content, and removing replaced content. The runbook might include steps to take if the new content does not display correctly. For example, roll back to prior content and notify the content team of the failure.
Related
Runbook development
Runbooks provide adequately skilled team members, who are unfamiliar with procedures or the workload, the instructions necessary to successfully complete an activity. Capturing runbooks preserves the institutional knowledge of your organization. It eases the burden on key personnel by sharing their knowledge and enabling more team members to achieve the same outcomes.
Building runbooks
- Where to start building Runbooks
- Prioritize frequently executed procedures to reduce the level of effort to conduct operations.
- Prioritize procedures with high error rates to reduce the probability of negative impacts on the business and operations.
- Prioritize procedures with significant potential harmful impact to the business or workload to manage risk.
- When to start building Runbooks
- Document your runbooks as you are developing your procedures.
- The best time to create a runbook is before it is needed as part of an incident response.
- Take advantage of new people joining your team and have them follow experienced personnel and document your procedures. This is both training and preservation of institutional knowledge by capturing it in runbooks.
- Runbook considerations
- Implement appropriate controls around your runbooks to ensure that:
- Runbooks can only be executed by authenticated and appropriately authorized personnel and resources.
- Runbooks can only be executed against explicitly defined appropriate targets. For example, tag your environments and invoke the runbook against the explicitly defined environment, or have your runbook verify through metadata that it is executing against an appropriate target.
- Runbooks should be reversible, either through reverting the change, or through execution of another runbook/procedure that returns the environment to the previous state. For example, while adding a user is not a reversible action, you can revert to the previous state by deleting the added user.
- There should be a mechanism by which you can verify that a runbook was successful in achieving its intended outcome. This might be internal to the runbook and based on the return codes from the executed actions. Alternatively, it might be identified manually by the person invoking the runbook, or recognized programmatically by the system invoking the runbook.
- Runbooks should be tested with the same engineering discipline that you use for application code.
- Implement appropriate controls around your runbooks to ensure that:
What to include in runbooks
- Document requirements to be able to execute the runbook.
- Identify required permissions.
- Identify required tools and configurations.
- Identify required network connectivity and access.
- Document constraints on the execution of the runbook.
- Identify maintenance windows.
- Identify impacted resources.
- Identify conflicts with other business or operations activities.
- Document procedure steps and expected outcomes.
- Identify procedure steps.
- Identify expected outcomes.
- Document escalation procedures.
- Identify to whom the runbook should be escalated if the active team member is unable to complete it successfully.
- Identify after what period of time the runbook should be escalated if the active team member has not yet completed it successfully.
- Identify any third parties to whom escalation may occur and under what circumstances.
- Identify any necessary support information required to escalate to third parties (for example, serial numbers, support contact information, support contract information).
- Identify any decision makers and under what circumstances they should be contacted prior to executing the procedure.
Convert runbooks to code
- Where appropriate you should convert manual runbooks into code
- Convert the documented process into code
- Automate triggering of runbooks
- Identify monitoring tests to identify the triggering events.
- Implement monitoring tests to trigger the automated runbook execution.
Revise runbooks as appropriate
- Review the execution of runbooks.
- Identify appropriate optimizations.
- Identify required revisions.
- Update runbooks, and scripts and automation, as appropriate.