How does Octopus handle rollbacks?

Published on: 11 Jul 2012

This question comes up a lot:

How does Octopus Deploy handle rollbacks when a deployment fails?

In answering this question, the first thing to keep in mind is that deployments are complex, and unlike a cloud PaaS solution, Octopus doesn't enforce very many limitations on what kinds of software you can build or deploy with it.

When something goes wrong during deployment, there are three general ways in which we can deal with it:

  1. Rollback - revert to using the previous version of the software
  2. Roll-forward - fix it and deploy a new version
  3. Roll-over - try again, maybe it was an intermittent problem

Automatically rolling back of code/applications is usually pretty straight forward - just find the old binaries, update IIS/load balancers/etc. to point to the old binaries, and you are away laughing.

Where it gets hard, though, is persistent storage - and by that I mean databases. During a deployment, you may have run a migration script to rename a column. After rolling back the application, the old version of the application may break because it expects the column to have the old name. And that's a pretty simple schema change.

One option is to restore the database from a backup. But the database may have been in use during the deployment, and perhaps you received an order shortly between the migration scripts being run and the deployment failing. Automatically rolling back the database would mean losing important data.

Designing applications to support rollback

When designing your application and making changes, there are techniques you can use that make the application more ready to support rollbacks.

One option is to make sure your changes are always backwards compatible with the previous version. We might make schema changes in a way that didn't break the old application, for example, by using views instead. After the third release, you might begin to drop old columns or tables. However, the complicated part about this is testing; it's not enough to just assume your schema change is backwards compatible, you really should test it, otherwise rollback will be just as bad as not rolling back.

Another approach that may help is to use architectural styles like Event Sourcing. This way you could 'replay' new events using the old code, or old events using the new code, and ideally both versions of the application would work.

Failed deployments should be exceptions

Unfortunately, as an application deployment tool, Octopus can't assume that you've written code to always support automatic rollbacks. There's no guarantee that automatically rolling back would make things better; in fact it could make things much worse.

Instead, we take the view that when a deployment fails, Octopus should:

  1. provide as much information as possible; and
  2. provide tools to help, but not to try and take over.

After all, when a production deployment goes wrong, it's usually a stressful time - the last thing you want is having a tool second-guessing you and making things worse!

To do this, Octopus makes it easy to deploy the previous release, or to deploy a new release. Just find the release you want to deploy, and click Deploy. You know what the release contains and how the packages are structured, and so you can decide whether it is safe to try and roll back or roll forward. To Octopus, rolling back or rolling forward are just like any normal deployment.

Failed deployments in a world of continuous deployment

A topic popularized by Eric Ries recently is continuous deployment, and the idea of a Cluster Immune System. The idea is that your system should monitor itself, and if it detects a problem, it can automatically rollback.

It's important to recognise that Eric isn't advocating particular deployment tools to enable this automatic rollback. Rather, you need to think at a systems level when designing an application. Your ability to transparently rollback at the first sign of a problem will be dictated more by your physical infrastructure, database and system design than by your choice of deployment tool.

Octopus does make API's available that make it possible to automatically deploy new and old releases of your projects, and so it can certainly play a part in a recovery strategy. But no deployment tool is going to be a silver bullet; you'll need to think holistically about your system if that is your goal.

DeployFailed.ps1

As much as Octopus will never be able to automatically recover from a failed deployment for you, I'm very open to finding ways to make it easier for you to create a recovery strategy. One suggestion is for a DeployFailed.ps1 that would be run if the deployment to a particular machine failed. It's up to you what would go in that script, but the support would be there.

In conclusion

Let's come back to the original question:

How does Octopus Deploy handle rollbacks when a deployment fails?

The simple answer is that Octopus makes it easy to deploy the previous successful release, or to deploy a new release when you've fixed the problem. This can be invoked by the API, so if your system is designed for it, you could automate it.

That said, the ability to automatically recover from a failed deployment is very much a property of your system as a whole, and not just a feature of automated deployment tools. Octopus has (and will have more) features to help, but as is always the case, there are no silver bullets in this business.