Repository Guide: Part 1: What Is A Repository?

One of the many things that is in my backlog of things to do is write up a guide for proper/advanced use of version control system repositories for 3rd year software engineering students at my university. Hopefully I might be more motivated to finish it if I create it in a series of blog posts. I'm sure others might be interested in this topic as well and, if you have any suggestions or queries, provide feedback in the comments!

What Is A Repository?

This might sound like an obvious question and most people would give an answer along the lines of:

It's where we store all our files.

This is wrong. The repository is not a dumping ground for anything you write. Other people use their repository as a way to transfer files from one person to another. Again, this is wrong. A final response from students is:

We're only using version control because we were told to.

Whilst we do make you use version control, we do it for the same reason your mother makes you eat your greens – we do it because it's good for you (and we care).

So, now that we know what a repository isn't, let's come up with a definition for what it is.

Firstly, version control is only meant to be used for things that actually need versions. Although this might sound obvious but what is, and is not, meant to be kept under version control is not immediately apparent to many people. So, what sorts of things need versions? What sorts of things change as the project progresses? Source code and documentation seem to be standard choices. You expect them to change, in some cases fairly frequently. How about minutes and agendas? These may certainly change but do we want to be able to manage their versions? For the majority of cases the answer is no. Whilst someone may go back to correct minutes, or items may be added to an agenda, it is not useful to be able to track these changes. (If you really want to be able to though then use a wiki.)

Secondly, the repository is for source files only. This means that anything that can be generated from another file does not belong in here. Intermediate output files, such as .toc files from LaTeX compilation or .o files from compiled C code, are examples of this. Importantly, this also includes the final results of generation. So, PDF files of your documents, or compiled executable programs do not belong in the repository. If you'd like to keep these somewhere to save time and effort then you may put them in your team directory or have them linked to from your web page. Note that source files can also include images, Visio files, XML, SQL scripts, Excel spreadsheets and anything else that is necessary for your project and cannot be generated from something else.

Thirdly, and as a corollary to the above, any external libraries or software that is needed for your software to build or run should also be placed in the repository. (Within reason – you don't really need a copy of the Java runtimes, for example). This is especially important because these pieces of software are critical to your product but often have different versions that may change over time, while you are developing, which can have serious consequences if you're not tracking which version your software needs. You can see that storing your own copy is a way of managing risk.

Finally, the repository as a whole is the canonical place for all your product files and serves as a means of building your product, as well as running tests and analysis tools over your code and executables. This means that, should someone want to build a copy of your product, they only need to check out the repository and have a small number of tools and external software (such as a certain compiler, runtime libraries, operating system, etc). This means that you also need to include things like make files, configuration files, Visual Studio projects and what ever else is needed to build your product from scratch.

From all that you should hopefully have a fundamental understanding of what a repository is intended for. Let's now create a definition based on this:

A repository is a tool to manage all source files required for the building and running of a product and to help manage the development and configurations of those source files.

This is, of course, a short, simplistic answer but I think it summarises the intention fairly well. The next post will be about the standard workflow in a repository.