How to Avoid a Data Migration Disaster

April 7, 2014


When a new IT system is going live, there's often a need to migrate data from old systems into the new one. While that process may seem trivial or uninteresting, it's easy to get wrong, and when it does, that shiny new IT system that you've just spent a fortune developing is often rendered useless. Here are five things to keep in mind to ensure your boring workaday data /migration process doesn't screw up your big launch.

Don't Migrate the Data

I was once asked for my advice by a company that were considering a complete re-write of their software product. My first reaction was, that's never a good idea. Joel Spolsky's classic essay on how Netscape lost the browser wars is a perfect explanation for why the complete re-write is usually bad. However, when they explained their thinking and the particular set of circumstances they were facing, I eventually realised it was their only option. I was still nervous about the impact their re-write might have on their business though. These things are always more painful than you expect. So I asked them whether there were bits of the old system they could keep around. The database for example, could they keep the database the same and just re-write the software that talks to it?

Redeveloping a system but keeping the same database removes the need for a data /migration process, it enables you to roll back your upgrade if it didn't go smoothly, run systems in parallel if necessary, or to continue developing the old system while the new one is in the works. It gives you all kinds of options for keeping your business moving that a big bang system upgrade doesn't.

If it is at all possible, I recommend keeping your data in it's current system and evolving it, rather than starting again from a blank canvas.

Do it From Day One

If you're developing a new system, the best possible test data you can use is the live data from the current production system. Putting that data into your new system will highlight any issues with your data /migration process and any issues that your new system would experience with using that data. And given that finding those issues is a good thing, the longer you leave it before doing that, the more time you could be wasting, and the less time you'll have to fix it. So the ideal time to build your data /migration process, is right at the start of your project.

Automate It

Starting with a copy of the live data is good, having a continually up to date copy of the live data is even better. By repeatedly and consistently dropping the new system's development database and rebuilding it from production data, you will iron out all sorts of data /migration issues and ensure that everything you need for the big switch-over to the new system is in place, ready and working smoothly.

Batch it Up

If you're moving hundreds of thousands, or even millions, of records from one database to another, it can take quite a long time. And processes that run for a long time can go wrong part way though. Maybe the internet connection goes down, or the process has an error when it's 99% done, or there's just so much data that the server runs out of memory. Whatever it is, you don't want to be starting again from scratch if things go wrong. The ideal way to run a data /migration is to break up the data into batches that will either succeed or fail in one go. Then keep track of where you were, so if one batch fails, you can start again from that batch and pick up where you left off.

Use /migrations

If you've decided not to re-develop your application's database, or if you're redeveloping it but you have an automated data /migration process in place, you're still going to need to make incremental changes to your database every so often. For this I would strongly recommend you use an approach known as "/migrations". Most modern application development frameworks have /migrations. Ruby on Rails has them, Django has them, .NET has them, even PHP has them.

/migrations are a way to keep track of, and control, the evolution of your database in a way that links each change in its structure, to the version of the app that it was supposed to work with. They also enable you to easily upgrade/downgrade a database when you want to change the version of your application that it works with. Using /migrations effectively separates the data that's in your database, from the structure of that database, because the structure can easily be changed. This makes deploying upgrades easier, rolling back upgrades easier, restoring backups easier, setting up new servers easier and running automated tests easier. All of which lowers the likelihood of things going wrong.

Migrating data feels like it should be simple, but it's one of those areas where the devil is in the detail. All sorts of things can and do go wrong, sometimes with catastrophic consequences. So it pays to give it plenty of thought from day one of you software development project.