Repository Guide: Part 3: Branching, Merging and Tagging

Continuing on from Part 1 and [Part 2][2] of the guide, this post will be about the topics of branching and tagging, two extremely powerful tools you can use when managing your repository. This guide will cover the topics from a Subversion-oriented point of view.

Tagging

I think tagging is the simpler of the two concepts so I'll start there. Tagging is simple assigning a tag – a name – to a snapshot of your repository. In Subversion, this is really easy to understand as Subversion uses global version numbers for files, so every version is a automatically "tagged" with a unique number. In CVS tagging has more appeal as each file has its own version so keeping track of which version aligns to what is difficult if it's not managed.

Now, the reason one would want to use tagging is to assign names to important and/or significant milestones in the development of your source files. Early on this might be a "finalised", reviewed and signed off version of your requirements specification, or a version of your architecture that your team is happy to proceed with. Down the track, this will most likely be used for source files when your product hits an initial beta version or a release, such as "1.0", that will be shipped to the client.

That's pretty much all there is to tagging, the only thing that you have to keep in mind when working in Subversion (and probably some other VCSes) is that tagging is handled internally in the same way as branching, which I will discuss now, and the only difference is one of pure semantics.

Branching and Merging

If you think of your repository development as occurring in a straight line (called a "mainline" or "trunk") then a branch is just what you would expect: an offshoot of your main development that can continue independently. Two important things to understand with branching are that only files which are changed in that branch are actually re-versioned and that any changes you make to a branch can be merged back in to the mainline at a later time. Perhaps the best way to explain this is with an example.

Say you have just reached version 1.0 of your product and you ship this version out but plan on continuing development. Making a branch of your 1.0 code is good practice because it means you'll always be able to easily restore to that point and compile and ship another copy if need be. This, of course, is the same as tagging. Branching will allow you to go one step further – you can continue development on the 1.0 branch. In three weeks time a customer alerts you to a bug in your 1.0 product. It is obviously not ideal to try and replicate the 1.0 code from your new "2.0" code, nor is it a good idea to try and ship "2.0" early to fix this lone bug. With your 1.0 branch, however, this is trivial – you still have an exact copy of the code that was shipped out to the customers and you can release a new 1.1 version with the bug fix. Note that you now have a "2.0" mainline which is where all your new code lives and a "1.0" branch of your product that contains the old version with the applied bug fix.

Now, another common use for branching (at least in Subversion) is to allow each developer, or development team, to work on their own copy of the code. This allows for a minimal number of conflicts for all the constant updates and commits that you'll want to do and is most useful when a large number of disruptions are expected, such as those that occur when rewriting a core part of a library or adding in a major new feature. When a developer is finished on their features, fixes or whatever they can merge their code back into the mainline (or another branch).

Merging can be thought of as the opposite of branching. It's the process of combining files from separate branches back together again. This might result in the effective elimination of a branch or it might just be a way of synchronising some files that are common between branches. In the example given before, a bug fix that is applied to a file from the 1.0 branch (that has not seen significant modifications for the 2.0 branch) is a perfect candidate for merging back into the mainline.

So how does one do these things in Subversion?

Branching:

svn copy trunk branches/version-1.0<br /> svn commit

Pretty easy right? Subversion just recursively copies everything in the trunk directory into the version-1.0 directory, which then gets committed as per usual. Please note that Subversion uses efficient copies that do not waste unnecessary space. So, as I said above, the branch will not "keep" copies of files that have not changed.

Merging:

svn merge branches/version-1.0/src/fixed.c trunk/src/fixed.c

Of course, that's just a very basic usage and you'll want to look at the Subversion manual or in svn help merge. The working of a merge are that the two files are compared, similar to diff and even more similarly to svn diff, and a patch based on this comparison is applied to the working path version (the second argument). If there are conflicts that can't be automatically resolved you will have to fix them manually. As a side note, the version control software darcs works entirely on the concept of applying patches.

Tagging in Subversion, as you may have guessed by now, is really only creating a branch that you never modify!

Further Reading

Hopefully this has given you an introduction to the these extremely important topics. Branching and merging are two of the compelling reasons for using a repository in the first place. I suggest you now go and read up on the topics in the Subversion book, or whatever manual you have for your chosen VCS. You might also like to read a complete book on version control use, such as Pragmatic Version Control using Subversion.

[2]: http://www.chnorton.com.au/2007/07/18/repository-guide-part-2-how-to-use-a-repository/