Designing Continuous Integration

Build time

The most important use case of continuous integration is "finding out whether my commit succeeded integration" and ability to pass on output to downstream activity. This is important because of its frequency. After having heard thousands of opinions I can safely(?) say that 10 (+/- 5) minutes of build time is about what you want to have. Hence this is the most important design requirement.

Failure rate of continuous integration

In a development team setup failure of continuous integration build causes waste. Commiter fixing the build, other commiters waiting to push in their changes, testers waiting for new build to test the story or bug. Arguably in some cases each of them can continue doing something else while the integration build is being fixed. Even in such cases context switch is undesirable, causes its own waste and probably they wouldn't their work in the most optimal way. Continuous integration with high failure rates reduces confidence of its users. On the other hand an integration build which always passes is probably avoiding some important things or the commiters are too conservative in commiting. A pass rate of 80-90% is good enough.

Clean build

Clean build brings the code base to clean state before running other build tasks. Clean state of codebase should ideally be same as checkout state, where checkout state is the state of codebase when freshly checked out from source control system. It is useful for pre commit build to do a clean build. The commiters whose machine is already setup and a build without clean does the job, do not want to run a clean build, as it takes much longer. In such a setup new team members struggle to get the machine to a state where pre-commit build succeeds, even when they haven't made any change to code base.

Continuous integration can help here. Most teams generally can afford to invest in better hardware for continuous integration environment. This lends itself well for leveraging this environment for doing certain, less brittle but lengthy, tasks. The hardware improvement for such environments can be to increase the number of agents as well one's with higher computing power. Personally I have seen most projects do not invest in same quality of hardware for commiters.

Continuous Integration output

Deliverable of continuous integration defines the essential artifacts of an integration build. Along with these the continuous integration should also publish all the information which would help in diagnosing problems like failed build, build performance and such. It needs to be stressed that it is not sufficient that the integration build generates this information, it needs to be published so the next builds do not overwrite this information. Since the diagnostics are published for a build it should not contain information which was not collected as part of that build run. For example, if we publish web service logs, it is important that the integration build cleans the logs before these start collecting information about the build run.

Running from a source control

Continuous integration tools allow multiple materials for a pipeline. Such situation arise when the source control is not structured in a way that there is one to one mapping between development codebase and source control repository. Continuous integration server monitors changes in its material. As soon as it detects a change it sleeps for configured time in case there are more changes to any of the materials. If there is any change then it sleeps again. This is really to ensure that all commits are taken in before the build starts. This is useful when working on source controls which version individual files instead of the whole repository (e.g. CVS). Most modern source controls (subversion, git, mercurial) version the repository.

From the perspective of continuous integration, a user commit might span across multiple source control commits if the development codebase really spans across multiple repository or fragmented within a single repository. Since, the continuous integration server would doesn't understand user commit, one might have integration builds running for partial user commits. This can make user commit an annoying and wasteful experience, leading to lesser commits. It is recommended to restructure the source control folders to create one to one mapping between codebase and integration material.

Running tests crossing environment and network boundary

Automated tests can be classified into four categories based on their run-time footprint.

a) Memory bound test: These do not call other process or access the files in the file system.

b) Machine bound test: These are bound to the processes and files on the same machine where the test is running. (It might interact with database server running locally) Such test can be used even when there is no network connection.

c) Network bound test: Such tests interact with processes and files running in the local area network (connecting to exchange server, LDAP server).

d) Internet bound test: These tests need you to have connectivity to internet and all machines and processes on which they depend on to be running and reachable.

Building multiple commits?

Ability to build only one commit at a time has its advantages. A bad commit can be easily separated from a good one for instance.

How much to build?

Google Sites

Report abuse