Repository Guide: Part 4: Organisation

Organising a repository takes on several forms. The most obvious, and important, is how you organise the directory structure and name your files. Secondly, you need to make sure all the metadata used by your VCS is organised and effective. Finally, you must have some sort of process in place to ensure that these things are kept tidy and manageable.

Directory Structure and File Naming

Your directory structure is very important as it makes finding things easier and makes managing your code base much simpler. If getting a test set up is difficult because they're strewn all over the place then chances are that it won't be done. Similarly, if something is difficult to find then it might as well not be there.

What you name things also plays a critical role in this. Names should be obvious as to their purpose and easy to remember. Source code should go in src or source, tests in test, libraries in lib or libraries and so on.

You might think this is all obvious but I've actually seen a repository laid out like so:

trunk/
    src/
        APP/
            external_app_1/
               doc/
               lib/
               src/
            external_app_2/
               lib/
               src/
               test/
            src/
                app/
                lib/

This is, without doubt, the worst layout and naming scheme I have ever seen – and hope I ever will see. Let me try and explain some of the reasoning here. The first src directory is there to hold any and all code, including tests, binary libraries the team's code depends on and the external applications that are "mashed" into the overall application. Then, the first APP directory holds … well, everything I just mentioned. There are no other directories so I'm baffled as to why this was put here, much less in uppercase characters. From there the layout splits into directories containing the external applications and another src directory. This src is code that was created by the team, mostly to glue everything together. In that src directory we have another directory app (which contains, well, source code) and lib (which contains binary libraries the app code depends on). On top of this, you have all the directories that come from the external apps … and some of that code was also modified by the team!

Phew! That actually took longer to explain than I thought! I think it's a perfect example of what not to do though. Can you imagine me trying to find the code that had been edited by the team?

If you are using tags and branches, as I outlined in the previous post in this series, then Subversion has a standard naming convention that can choose to adopt if you wish. This convention is use the directories trunk for your mainline development, branches for branch copies and tags for tagged copies. Very simple and obvious! You can, however, choose to use this is two different ways:

Use these directories as the top-level directories in your repository.
Use these directories as sub-directories of the various parts of your project stored in your repository.

The first should be fairly self-explanatory but here's an example:

repository/
    branches/
    tags/
        version-1.0/
            doc/
            src/
            test/
    trunk/
        doc/
        src/
        test/

So basically you just have all your work under one of these three directories.

The other way is more suitable for repositories that store multiple projects or where you wish to have clean separation between various parts of the project. An example would be:

repository/
    doc/
        design/
            branches/
            tags/
            trunk/
    src/
        branches/
        tags/
        trunk/

This greatly multiplies the number of directories you have in certain cases so be careful if you want to go this way.

You of course don't have to choose either of these conventions but whatever you do choose, you will need to document that somewhere and have a clearly defined process for dealing with branches and tags and whatnot.

Metadata

Although Subversion does allow for (nearly) arbitrary metadata to be attached to version controlled files, I will mostly be referring to the most common kind of "extra information" that you can add to your repository: commit logs. A commit log is simply a text message saved with the version that you check in. They are completely obvious as you are required to add one when committing, either by using a text editor such as vim, nano or a graphical editor for GUI shells (such as TortoiseSVN) or by explicitly adding a commit option on the command line via -m "message" (or the equivalent for your VCS).

This information is quite important as it allows you to more easily recognise versions, changes, bug fixes and so forth when you are inspecting or auditing your repository later. Some people like to have very structured logs similar to the following:

CHANGE:
REASON:
FILES AFFECTED:

I think listing the files changed in the log isn't strictly necessary are that information can usually be reclaimed fairly easily (at least on the command line) but I know that there can be some potential pitfalls in Subversion when dealing with binary files. So it's up to your team whether or not you want to list these files. If you do, Subversion makes it quite easy as a list of the files changed is given in the "template" log and you can easily move this text into your actual log.

Having a brief description of the changes made and the reason for those changes is essential. For someone checking a log later it is nice to be able to see descriptions of each version and track the development of product. While I don't think it's necessary to have such a formal structure as above, I firmly believe that the information must be present in all logs. If the information is self-explanatory then you can get away with short messages like "Fixed bug #3535″ but more information about this bug must be available somewhere else (a bug tracker in this case).

Process

Process is one of those things that should be obvious throughout any engineering project – otherwise it's not really engineering! For your repository, make sure that the process you use allows your repository to stay organised. The things I have spoken about above should give you some starting points – like having procedures to use when writing commit logs, naming conventions to use for file and directory names, where to put tags and branches, when to make branches and tags and so on. Management of the repository is a part of your configuration management plan, which also includes descriptions of what your configurations are, who's in charge of them, methods used for management the configurations, etc.

Hopefully this has given you some ideas and guides about how to better organise your repository. Again, I did not intend this to be an exhaustive examination of the subject but something to get you up and running. From here you can look into your own ways of keeping your repository in working order – after all, the repository is for your team so it's for the team to decide what is best.

Chris Norton

Repository Guide: Part 4: Organisation

Directory Structure and File Naming

Metadata

Process