Continuous merge of bulk merge

I have been working on a project where we have 75 daily commiters (most of us work in pairs so the team size is even bigger) on the same code base. We tend have at any point of time three branches where the work might be going on. Currently we call these streams Live, Release and Dev. These branches are quite active because we release every 4-6 weeks.

Live: branch which is in production.

Release: branch on which we are preparing the next release i.e. it is in regression testing phase with only bug fixes going on, with occasional stories

Dev: the branch (trunk) on which active development is happening

The important decision that we had to make last year was who/how should we be merging the code between these different branches. There were two options really:

Continuous Merge Commiter to live(release) branch also commits to trunk(and release).

Bulk Merge Commiters checks-in only to the concerned branch. Along with that, periodically (mostly every week) one person does the merge and commits all changes to other branches.

In continuous merge strategy the overhead seemed to be more because commit also means that the person would have to run the build before it, for every branch s/he commits. Merge really is a automated process in SVN and is not time consuming. While in bulk merge we would do that only once per 200-300 commits. From efficiency point of view it seemed no brainer.

Our experience in last one year suggests otherwise and here is the experience report.

Merge is much more than auto-merging the files between two branches

At least the SVN merge (we are moving towards GIT) misses deletes, renames and moves. Its merge-info feature tries to help but it ends us creating problem as not everyone on the project is using it in the same way. We sort of ignore it. When merging 200+ changes we often get things like "skipped missing target". In addition to this we tend to have design change in some areas. This means that the merge needs to be done logically. Person doing the merge pulls in the people involved to sort of hand code it again. Where ever unit tests were also not refactored one can know something breaks but not exactly why. In lot of cases tests are also refactored. All of this means that one cannot really test the merge activity.

Longer merge activity creates other issues

In most cases the whole process of merging/building/committing/getting-CI-build to be green took close to a week. As the person doing it moans/curses more and more, we have to lock SVN so that s/he is spared of added pain caused by constant check-in happening on the branch where s/he is trying to commit. If this window is not very small it causes a lot of waste.

Weekly merge effectively becomes bi-weekly

The merge started a week from last commit takes a week to do, becoming effectively two weeks. Two weeks (an iteration) is a long time on our project as our release is 3-4 iterations of development. It disrupts a lot of other activities like testing/story-sign-offs/automation-test-commits.

Refactoring gets affected

In the past we had avoided refactoring because it makes the merge difficult. Some refactoring do not affect the merge but the refactoring which lead to rename/delete affect the merge.

It also introduces delays and waits in the flow of development and becoming additional thing for team to manage and communicate. It poses one challenge as to how do you ensure that everyone commits every check-in to trunk when they commit to a branch. We tracked this with what we called merge tickets. We tagged the bugs to be verified on trunk. This at least ensures that every bug is re-tested.

We use SVN as our centralized repository but a lot of developers on the project have moved to GIT-SVN. GIT makes working with branches and merging lot simpler, apart from other benefits. Using GIT or other distributed version control is highly recommended if you find yourself in such situations. There are other ways to tackle this kind of situations like releasing even faster, breaking down the deliverables into smaller deliverable so that there is lesser overlap, etc. But overall if you find yourself in such situation consider all the pros and cons before choosing a strategy.

The efficient-but-unpredictable approach is worse than slightly-in-efficient-but-predictable approach.