Blog

Preserving legacy urls

May 16, 2013

URLs have become a ubiquitous part of the web. Memorize domains of popular sites, share youtube links, bookmark pages for later, etc. However, what they really represent is a link to a page with some specific content.

I don't really have much experience running large sites with thousands of pages of content but I had a taste of that experience when migrating my blog from wordpress to pelican. While my blog doesn't get a lot of traffic, I was generating some hits on my django pages. People were landing on my pages directly from google after searching for some specific keywords.

Here's the problem: with my switch to Pelican, I wanted to change up my url structure a bit. This means that the google results that people were clicking on would no longer lead to the content. Naturally, in time, google would recrawl my site and upate those links. But until then, those old links would be invalid. Let's not forget about that one guy who might have bookmarked one of my pages who won't be able to find that content later down the line!

Obviously this is a bit dramaticized for my small personal blog but it reflects on a larger software problem, backwards compatibility/legacy support. Luckily I only wanted to perserve a handful of urls and utilized nginx's rewrite rules to implement http 301 redirects. Example:

location /index.php/facekey-geekdom-checkin/{
            rewrite ^ http://ralphminderhoud.com/posts/facekey-geekdom-checkin/;
    }

This turned out to be trivially easy and I now have a few rewrite rules in my configuration file to point to the new location for my old posts. However, it made me stop and reflect on the possible magnitude of this problem. What if my blog had 100 posts? 1,000 posts? 10,000? It still would have been possible but would have been a lot more work and I would have had to look at alternative solutions. It's clear to see why banks often still have legacy mainframe computers crunching our credit card transactions. Even if they wanted to migrate to newer systems, I'm sure the cost and complexity of doing so is too prohibitive.