In the previous episode, we talked about different service design approaches. This time, we dive deeper into database changes.
Even for teams that can effortlessly deploy their application code, database changes can be more stressful. Changing a schema is a high-stakes operation, and there are many ways to do it badly.
Read on to find out why database deployments are different from application deployments, and what techniques you can use to make them worry-free.
Watch the episode
You can watch the episode below, or read on to find some of the key discussion points.
Watch Continuous Delivery Office Hours Ep.5
Why are databases different
When you change your application’s code and discover an issue, it’s trivial to revert to the previous version. In rare cases where a change affects your data, there may be cleanup to do.
You can use timestamps and a one-off process to fix those rare data-mess-up instances, but they hint at something fundamentally different about database updates. They have different levels and types of risk associated with them.
If you think backups will save you from a bad database deployment, you haven’t yet tried it. Sure, they prevent total data loss, but between your backup and your fix, the data moved. Often by a lot.
There are techniques to apply transactions from the backup up to the point of failure, but if your change caused an issue you needed to roll back, you probably don’t want to apply those transactions automatically. Welcome to the “data remediation project”.
If you added a new column or table as part of your database change and you need to roll back, you have to decide what to do with any data in those tables. Do you forget it, or do you need to keep hold of it and reapply it later when you make a new attempt to extend the schema?
Modern software teams prefer fix-forward for application issues, which are reasonably easy to roll back. Databases take rollbacks to another level.
Crucial modernization steps
Imagine you met a friend for lunch and they told you they store they application’s source code on a network share, rather than in version control. You’d think it was a joke, and when you realize it’s not, you’d form a strong opinion about the kind of sloppy outfit they must be running.
When we meet for lunch, what are you going to tell me about your database schema and static data? Please tell me it’s all in version control, not on a network share.
You should make all database changes by updating the files in version control and deploying them like you would your application code. You progress the change through environments to ensure it works, and you avoid embarrassment caused by the application failing because someone forgot to add the new column in production.
There are further choices to make, which we’ll cover next, but failing to version-control your database is unforgivable.
State-based vs migration-based approaches
On to the first choice for your database project. Do you make state-based or migration-based updates?
State-based schemas describe the desired state of the database. It will list each table with its columns, indexes, and relationships. You use a model-based tool to deploy the database, which compares your current state with the desired state and applies the changes for you.
Some state-based tools convert the differences into standard database scripts, like ALTER TABLE... scripts. Others perform migrations by creating a new table with the changes and moving the data into it. This is important if you’re using a technology like replication, which prevents the model migration mechanism from working.
The alternative to state-based database updates is migration-based updates, where you write your own ALTER TABLE scripts. You keep all your scripts in version control and use a tool that applies them in order and tracks when each was applied, preventing the same script from being applied twice to the same database.
The main difference between the two is procedural. You can code-review the migration scripts on demand, but you’ll need to review state-based migration scripts after your tools generate the implementation plan, which can make the review task more of a large batch.
Tooling and automation
Whichever approach you use, tooling helps it work. Many of our customers use Redgate tools as part of their deployment process to manage database schema changes, and there’s value in leaning on a tool written by folks who care deeply about the problem and how to solve it.
Crucially, you shouldn’t make any database changes outside of tool-based automation.
Test data management
Once you’ve automated your database schema and static data, it’s worth considering your test data. When automated or manual acceptance tests fail, it’s usually because someone unwittingly messed up the data. Prior test runs often leave data in an inconsistent state, especially if a test failure halts the run.
You can resolve this issue by automating your test data setup. Not only does this make it easy to reset the data during your build and test cycle, but it also lets you provide a self-service runbook for the test team to reset the data in their test environment whenever they need to.
There’s an up-front investment in this, but I can promise that it takes fewer hours than fixing your test data a few times.
Database refactoring patterns
The final thought to ponder concerns the steps you take in changing your database schema. When you’re in the habit of deploying your database and application in the same release process, you start to depend on this change coordination.
You delete a column from the database and immediately deploy the application, with all references to the deleted column removed. It looks like it works smoothly, but it’s a trap.
Imagine you had a critical bug in the application and had to redeploy the previous version. Now you can’t, because the previous version will try to read from a column that doesn’t exist. You no longer have a quick, easy back out plan, since you have to re-add the column, and you’ll also need data to put in it.
This approach also prevents you from making seamless deployments, as even if you progressively roll out the application version, the old version will error out due to the database change. You may deploy in the opposite order, application first, then database. You’ll discover the same problem with new columns that the latest version expects to find in the database.
You need to decouple database and application deployments, and there’s a whole book on the topic, called Refactoring Databases (Ambler, Sadalage). You can start by following the expand/contract pattern, which splits updates into steps. The principle is that you don’t delete a column until the production application has no reads or writes. You add a column and don’t reference it in your code until you deploy to production.
This means you can run the current version and the new version of the software against the same database, which means you can progressively roll out the new version and redeploy the prior version without touching the database.
Databases are only as hard as you make them
The database is high-risk, which is why updating it can be scary. I hope you’ve found this post full of practical advice for making database deployments robust and stress-free.
Database deployments, like application deployments, should be a happy time. You should be celebrating the new features and enhancements you’ve delivered to your users, not biting your nails and worrying that something’s about to go horribly wrong.
Happy deployments!
Continuous Delivery Office Hours is a series of conversations about software delivery, with Tony Kelly, Bob Walker, and Steve Fenton.
You can find more episodes on YouTube, Apple Podcasts, and Pocket Casts.
