Main

June 22, 2008

Branching out on my own (Git it?)

Lately I've been working with Git. Git is a revision control system, along the lines of CVS or Subversion. It has two main advantages: it is a distributed RCS meaning it allows for decentralized revision control and it does branching and merging quite well. It also happens to be the brainchild of Linus Torvalds, of Linux fame.

Having mainly used CVS my entire professional life (all of eight years), I've grown accustomed to its eccentricities especially when it comes to branching (and merging) and file management. Once I saw Linus's Google TechTalk on Git12, I decided that I'd give Git a try. I like it so much more than CVS that I'm plugging it wherever I can. Admittedly, this is probably not so much due to Git being awesome (which itself is somewhat similar to Mercurial, Bazaar, or Darcs) but rather to CVS being horrible.

I found Git a bit confusing beyond the update code and commit work flow. Everything being a SHA1 sum led to much of that confusion. A commit is a SHA1 sum. The tree is a SHA1 sum. Content is a SHA1 sum. Reading a few articles and blog posts of other folks who were at one time or another similarly afloat in a sea of "Dur?" was tremendously helpful, especially Git from the Bottom Up by John Wiegley. Git's documentation is also helpful but not so much the man files than the online documentation including the CVS migration manual and Git User's Manual.

After reading Git from the Bottom Up, everything pretty much just clicked. The structures in which Git stores content is really easy to digest and is the basis for the type of work flows that can be achieved that you would never imagine or find possible to do in CVS. Git at one time or another was called "the stupid content tracker," and it really is just that (as an aside, you may find this discussion between Linus and Bram Cohen about merging strategies interesting). The rather basic content tracking is what allows for its distributed nature and painless branching and merging.

Think of all the things that one must do to set up centralized revision control. First, you have to find some place for the repository to live. Then, you must give access to commit to the repository to those who need it which may entail giving them login access to the machine on which the repository lives. After everyone is able to talk with the repository, rules about branching, merging, and tagging are usually set up to avoid problems.

With Git, the repository lives on the developer's machine. When he or she gets to the point of wanting to share that code with a wider audience, it is simply a matter of making it available via HTTP which they're likely to have set up before hand. (See http://git.kernel.org/ for an example of gitweb, a nice front end to the directory structure that a repository lives in.) Any changes that someone else would make are sent by e-mail, done over HTTP as the clone was, or could potentially occur over SSH. (They'll rely on SHA1 sums of the content, tree, and commit history common to both repositories to facilitate the merge.) This eliminates a lot of the annoying server administrivia to manage a repository.

Furthermore, setting up rules for branching and merging are completely unnecessary because the repository you cloned from the original programmer is yours to do with as you please. You can commit without regard to whether or not it will break the code base or someone may be checking out code later on in the day. Things like git rebase --interactive are sufficiently advanced to be freakin' magic to CVS users such as myself and help in creating a single commit or series of commits to send back to the original (and possibly authoritative) developer for inclusion in their repository.

This has been rather haphazardly put together, but I hope that you'll take a look at Git if you haven't already. I highly recommend reading Git from the Bottom Up because it is interesting from a computer science standpoint and serves as a primer to the staging area and content storage model Git uses. I'm hoping that as time goes on, I will have worn down my co-workers' resolve (really, we just have to find the time to do it) and we'll finally port our CVS repository to Git. In the mean time, I'll be happily coding away on my personal projects with Git.

1) As a side note, probably my favorite Linus quote comes from this video: "[...] the way merging is done is the way real security is done--by a network of trust. If you have ever done any security work and it did not involve the concept of network of trust, it wasn't security work. It was masturbation."

2) Randal Schwartz of Perl fame is apparently involved with Git and did a TechTalk about six months after Linus.