The Importance of Distributed Version Control: The Lessons of Spatial

Along with extensive testing, another lesson I wish I’d learned much earlier in my Computer Science career is the use of version control, and particularly distributed version control.

For a solo developer working on a small, comparatively-unimportant project, version control can seem like a luxury, and it’s easy enough to make something like it using other tools in an ad hoc fashion. But once a project becomes somewhat large or somewhat important, or is developed in collaboration with someone else, not using version control—or not using version control effectively—can quickly cause problems.

For example, in my senior project at Calvin, I was working on a “virtual reality” system that was originally put together by a brilliant student a couple of years earlier, and then extended and improved and ported to a more developer-friendly language the previous summer by a friend of mine. However, in the summer’s work, my friend’s unfamiliarity with version control showed: every so often a commit would include a lot of work touching many areas of the code, and worst of all code versioning didn’t start until a great deal of work had already been done. This came back to bite us when I uncovered a bug (a segmentation fault) in the system’s flagship application. I tried to track down where it came from, only to find that it entered the versioned code-base in an “initial import” commit. We never did fix that bug; instead, for demos we booted into a “known-good” working copy of the code that he had been working in before handing the project off. And that led to the work done my senior year being apparently lost when someone else took up the project a few years later (more on that later).

Good version control practice—commit early, commit often, commit only one change at a time, don’t commit broken code, write descriptive and possibly extensive commit messages—are practical examples of the Golden Rule. It’s wise to assume that any bit of code may introduce a bug (whose source will need to be tracked down as narrowly as possible), and that someone (quite possibly you, enough later for memory to have failed to recall details) will be reading the commit messages later.

But I go further than advocating effective version control use, and recommend distributed version control, for two reasons. First: centralized version control makes a major assumption that in practice turns out not to be valid at critical points: that the server will be up and available whenever I need to make a (frequent, atomic) commit, peruse the version history, start working on a different machine, merge others’ work with mine, and so on. And centralized version control makes keeping good VCS practice difficult; a commit to the central server may immediately start interacting with others’ work even if it breaks what they would like to commit. And it’s tremendously difficult to switch from working on a major feature (not yet working well enough to commit) to working on another feature, or fixing a bug, or something like that. With distributed version control it’s easy enough to commit changes to my own working copy, but not push them to others until they’re ready, and it’s quite easy to switch contexts (by switching branches in Git, or “shelving” my changes in Bazaar) as needed. And, most importantly, if the server goes away, I still have all the history at my fingertips.

After the two semesters I spent on my senior project, I continued to “hack” on it, pushing my changes to the central server. I hadn’t yet come to these conclusions about DVCS, so I acted under the unconscious assumption that the server would always be there. And then, several months ago, when I wanted to look something up in the history, I was alarmed to find that the server was down, and apparently entirely decommissioned. I eventually found that the project had been continued, in a DVCS repository, by later students (yay!) but that his work began by importing the state of the project before my year of work on it (no!) If we’d worked in a distributed version control system, this problem of connectivity never would have come up, and if I’d used a DVCS client with the centralized server (as such adapters exist) I would never have lost the history, and so could send it “upstream.”

I hope that by now I’ve learned my lesson about using version control (though I’m not yet consistently disciplined enough about commit messages); perhaps, if you make these practices a habit, you won’t have to learn them the hard way.

Related articles:
Can anybody hear me?
Love your version history, and it’ll love you back


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s