What is a runbook?
“Runbooks” are the "ops" side of DevOps; it's how you keep your software running, and how you recover when something goes wrong.
Once software in production and customers rely on it, operations teams quickly find themselves needing to document all the
procedures and routines that they need to follow to keep things running smoothly.
Usually these are wiki pages or documents, with step by step guides on what a team member should do. You
probably have a bunch of these documented procedures already - plus a few undocumented ones too.
Teams refer to runbooks for two main reasons:
For routine operations tasks, especially for tasks that don't happen very frequently. For example:
- Backing up and testing the restore of databases.
- Requesting and updating SSL certificates.
- Resetting API keys or rotating passwords.
- Provisioning or deprovisioning infrastructure.
- Suspending or taking applications offline for maintenance.
- Updating a test environment with fresh, sanitized production data.
For emergency operations tasks, such as tasks you might need to perform at 3 AM
after an alert is raised.
- Failing over to a disaster recovery site, and then back to the main site.
- Restarting a set of applications, perhaps in a specific order.
- Restarting a web application that experiences a periodic memory leak that is yet to be resolved.
For runbooks to be useful, they need to meet three important criteria:
- Accurate. A runbook that is incorrect or out of date quickly becomes unreliable.
- Discoverable. Team members need to be able to find and follow the correct version of a runbook in a hurry.
- Tested. A runbook that's never been run - or that hasn't been run recently - is next to worthless.
Furthermore, a runbook needs to be executable by the person who needs to execute it - which is often easier said than done.
Even though runbooks may be scripted, actually running those scripts, or following the steps in the runbook, might require high levels of
access to AWS/Azure cloud accounts, RDP or SSH access to production servers, access to secret keys and configuration settings, and many other
permissions an engineer might not have. What's the point of Fred being on-call if he doesn't have permissions to all the systems needed to resolve the issue?
What is runbook automation?
Runbook automation means taking all of the steps from the wiki or Word document and turning them into a script or something that can be executed
automatically. Teams often start by turning these runbooks into PowerShell, Bash, or Python scripts.
This can go a long way to addressing the criteria above.
Unfortunately, scripts alone often aren't enough. Unless credentials are saved within the script - a bad idea - the person running the scripts
is going to require access to the cloud accounts or web/database servers that the script touches. That also means they can make other changes
to production that may not be audited.
What is a runbook automation platform?
A runbook automation platform like Octopus Deploy solves all of these issues. It creates a central place for team members
to manage, control, audit, schedule, and run runbooks. You can see when a runbook was last ran, you can see the changes to the runbook, and
you can run the same runbook against different environments. Team members can easily find a runbook, and click a big green button to run it. And everyone
can see the output from the last run and whether it succeeded or not.
Since Octopus Deploy already has access to your cloud accounts and production secrets required to deploy your applications, it's also the perfect
place to run your runbooks. Users in Octopus Deploy are given permissions to run the runbooks, without being granted unfettered access to do whatever
they like in a cloud account. An on-call engineer can have all the permissions they need to run your runbooks, without ever having access to SSH or RDP
into servers directly.
And of course, just like with deployments, everything is audited, there's advanced permissions, and a full API and support for over 300+ DevOps
steps out of the box.