It's not enough simply to test our software, we need to ensure that our tests actually improve the quality of the final product. There are two tools in particular that many developers find useful for this purpose, continuous integration, and code coverage measurement.
Say we have a piece of software, and we've written a great test suite for it. Success, right? But how do we know that the test suite actually passes? We could just agree to run it periodically, or perhaps before each pull request or merge. But people forget things, and once the tests are broken whose job is it to fix them?
Enter "continuous integration", often abbreviated "CI". CI can do a great deal more than run tests, but we're going to focus on its role in verifying that our tests continue to pass as the code changes. Basically, CI acts as an automatic gatekeeper. If a new version (branch) of the code passes CI, then that version can be merged into the main code base (often the master branch), otherwise is must be reworked until it does pass. This is in addition to other manual steps like code review.
This is a somewhat controversial tool. It is commonly misused, usually because people misunderstand it. When we talk about measuring "code coverage" we are interested in knowing what fraction of our code is actually tested by our test suite. For example, let's consider the following code:
def make_greeting(name):
if name == 'George':
return 'Oh, hi there!'
else:
return 'Nice to meet you'
Now, imagine we were to write some unit tests for this simple function.
import hello
assert say_hello('Megan') == 'Nice to meet you'
This is fine, the test calls the function and verifies the output. But the test doesn't cause every line in the function under test to be run. Specifically, we never hit the "true" branch of the if-statement. So out of five lines, our test executes four, making our code coverage 80%. This is a very simple example, but the idea should be clear.
There are several ways to measure coverage. Above I demonstrated "line" coverage, in other words, measuring coverage as a percentage of the total lines in the program exercised by tests. Another common metric is "branch coverage".
Branch coverage essentially counts the number of "paths" through the program that are covered. At each point where a decision is made (if-statements, switch-statements, and so on) it counts how many of the possibilities are run by the test suite, and how many are not. The code coverage then becomes the fraction of covered paths. In the example above, the branch coverage would be 50%.
One of the traps that developers and organizations sometimes fall into is "chasing" 100% code coverage. It seems strange to think that we might not want 100% coverage, why wouldn't we? Clearly our test suite above would have been improved by adding another test that passed "George" as the name, right? Maybe not. We have to consider the cost of that additional test, and this isn't always simple to do. There are several costs we have to consider here:
The first is pretty simple, if the developer earns $N/hr and a test took X hours to write then the test cost $NX. If we know about how much a test will cost, we need to ask ourselves (preferably before we write it) how much it is worth. A test that verifies extremely simple code may not be worth very much, and we should be cognizant of this fact.
The second cost listed above might sound familiar if you've taken an economics course. The idea behind "opportunity cost" is that if you choose to do a thing you give up the opportunity to do something else, at least during the time you spend doing the thing you chose to do. So, as a developer, if I choose to add another test for a section of code I am giving up the opportunity to spend that time adding an additional feature to my software, or adding a test to another section of code, etc. So, again, when we consider adding additional tests to try to achieve 100% coverage we need to ask ourselves if doing so is really the most valuable use of our time.
Finally, direct and opportunity costs don't end once the code (or test) is written. Software must be maintained, bugs must be fixed, features must be added. Every line of code we add today will continue to incur costs into the future. So when we "chase" 100% coverage we might consider the costs acceptable today, but if we fail to consider the costs we (or our organizations) will pay tomorrow, we may still end up overpaying for test coverage.