Source code is a vital asset to any software development team, and over the decades a set of source code management tools have been developed to keep code in shape. These tools allow changes to be tracked, so we recreate previous versions of the software and see how it develops over time. These tools are also central to the coordination of a team of multiple programmers, all working on a common codebase. By recording the changes each developer makes, these systems can keep track of many lines of work at once, and help developers work out how to merge these lines of work together.
This division of development into lines of work that split and merge is central to the workflow of software development teams, and several patterns have evolved to help us keep a handle on all this activity. Like most software patterns, few of them are gold standards that all teams should follow. Software development workflow is very dependent on context, in particular the social structure of the team and the other practices that the team follows.
My task in this article is to discuss these patterns, and I'm doing so in the context of a single article where I describe the patterns but intersperse the pattern explanations with narrative sections that better explain context and the interrelationships between them. To help make it easier to distinguish them, I've identified the pattern sections with the "✣" dingbat.
In thinking about these patterns, I find it useful to develop two main categories. One group looks at integration, how multiple developers combine their work into a coherent whole. The other looks at the path to production, using branching to help manage the route from an integrated code base to a product running in production. Some patterns underpin both of these, and I'll tackle these now as the base patterns. That leaves a couple of patterns that are neither fundamental, nor fit into the two main groups - so I'll leave those till the end.
Create a copy and record all changes to that copy.
If several people work on the same code base, it quickly becomes impossible for them to work on the same files. If I want to run a compile, and my colleague is the middle of typing an expression, then the compile will fail. We would have to holler at each other: "I'm compiling, don't change anything". Even with two this would be difficult to sustain, with a larger team it would be incomprehensible.
The simple answer to this is for each developer to take a copy of the code base. Now we can easily work on our own features, but a new problem arises: how do we merge our two copies back together again when we're done?
A source code control system makes this process much easier. The key is that it records every change made to each branch as commit. Not just does this ensure nobody forgets the little change they made to utils.java
, recording changes makes it easier to perform the merge, particularly when several people have changed the same file.
This leads me to the definition of branch that I'll use for this article. I define a branch as a particular sequence of commits to the code base. The head, or tip, of a branch is the latest commit in that sequence.
That's the noun, but there's also the verb, "to branch". By this I mean creating a new branch, which we can also think of as splitting the original branch into two. Branches merge when commits from one branch are applied to another.
The definitions I'm using for "branch" correspond to how I observe most developers talking about them. But source code control systems tend to use "branch" in a more particular way.
I can illustrate this with a common situation in a modern development team that's holding their source code in a shared git repository. One developer, Scarlett, needs to make a few changes so she clones that git repository and checks out the master branch. She makes a couple of changes committing back into her master. Meanwhile, another developer, let's call her Violet, clones the repository onto the her desktop and checks out the master branch. Are Scarlett and Violet working on the same branch or a different one? They are both working on "master", but their commits are independent of each other and will need to be merged when they push their changes back to the shared repository. What happens if Scarlett decides she's not sure about the changes that she's made, so she tags the last commit and resets her master branch to origin/master (the last commit she cloned from the shared repository).