Choosing Metrics for Agile Practice

First published in The Agile Zone, 28 February 2014

"Metrics are not a device for restraining the mad" - James Fenton

Recap

In previous weeks we've looked at the importance of achieving transparency in an agile transformation. We've seen how a Kanban board, even when used without any supporting agile best practices, can add immediate value in this regard. We've then made use of the increased visibility to stabilize newly formed teams and remove bottlenecks to their productivity.

This is a significant first step towards process improvement and the elimination of waste. At this point, and before we go any further, we should make sure we are able to gather hard data on the effectiveness of the changes that are made. In agile practice the elicitation of such data is known as gathering metrics.

Basic team metrics

The most fundamental measurement used in lean systems is that of lead time. This is the average duration of work from the moment a request is received by a team until it is satisfied by delivering the necessary value to the consumer.

Imagine the team as a pipe. The clock starts ticking as soon as the team have accepted the request (i.e. it goes into one end of the pipe); at this point it becomes an item of inventory that they are responsible for working on. The clock stops only when the corresponding value is delivered (i.e. when the finished item emerges from the other end of the pipe). Backlog time must be included if the team effectively own it and control the work added to it. Essentially, it would all count as time spent in the team's pipe along with testing, peer review, and any time wasted on delays. On the other hand, backlogs that are allowed to grow outside of the team's control are considered to lie outside of the pipe. If items are observed to spend an average of 3 days in a team's pipe then that will be the lead time for the team. Note that the lead time for the customer's request will be greater than this if it doesn't go straight into the team's pipe or undergoes some sort of additional processing afterwards.

An equally simple metric is the team's throughput, which is the number of items actually delivered over a given period. If you were to count an average of 10 finished items emerging from the pipe each day, then the throughput would be 10 per day. Multiply throughput by lead time and you can tell how many items, on average, are in the pipe...that is to say, the "Work In Progress". For a lead time of 3 days and a throughput of 10 items per day, we can surmise that the average WIP must be 30. In other words:

WIP = throughput * lead time

or

throughput = WIP / lead time

Clearly then, if you can find ways to halve the lead time while maintaining the same WIP limit, you can expect throughput to double. Conversely if you were to increase a team's WIP limit, you'd find that the lead time would also have to increase for each item if they were to maintain the same throughput...meaning the customer making a request will have to wait longer for it to be fulfilled. This is the brutal arithmetic that makes limited WIP so desirable. In short, it allows value to be delivered more swiftly to customers so it can be put to productive use, and it minimizes the depreciation on work that has been partially developed and is in progress.

The most common metric used on agile projects, however, is velocity. This is essentially the same as throughput, but instead of counting finished items we would count the number of "points" that have been estimated for each item. Unlike throughput, velocity takes into account the fact that not all items are of comparable "size"...some can take markedly longer to complete than others, and so should carry more points. These differences can be very significant in projects where all sorts of new work needs doing, some of which might be quite involved and some of which might be comparatively trivial. The discrepancies can be rather less significant in "Business As Usual" work where amendments to a system have reduced to small and repeatable changes, and in such cases "ticket throughput" might still be the most appropriate measure.

Feeling the burn

The work that a team accepts into its backlog becomes its own responsibility at that point. Certain agile methods, most notably Scrum, prescribe two separate backlogs only one of which belongs to the Development Team. This is the Sprint Backlog, and it effectively represents a batch of work that the team forecasts for delivery within a specified timebox or "Sprint". In Scrum, a sprint cannot exceed a month and this helps to keep the batch reasonably small. The team will agree to take a suitable Sprint Backlog from a larger backlog of work belonging to a customer representative or Product Owner. This is known as the Product Backlog.

Since a Sprint Backlog represents a batch, albeit a small one, it would be useful to know during the sprint if the team are on-course to deliver it all by the end of the sprint. If the batch is of an estimated size...possibly expressed in terms of the total points, or perhaps the time any associated tasks are likely to take...then a burndown of the work can be tracked on a daily basis. This shows the amount of work remaining in the Sprint Backlog, day by day, and so the rate of progress can be determined.

Cumulative flow

Lead time, throughput, velocity and burndown are elementary measures because they include all of the waste that is incurred by a team using the process it owns and is accountable for. They are blunt instruments. They do not reveal exactly where, within the process, blockages and delays are most likely to arise. None can inform the team of where change should be implemented.

The use of a task or kanban board can help and we saw how to add diagnostic columns for this purpose in the last article. It is possible to eyeball the board for blockages, and to derive anecdotal information that can be considered in a retrospective. What a board won't do is to give you hard numbers to back up these claims.


Clearly though, if we have added diagnostic columns to a board, we will be in a position to elicit the relevant metrics ourselves. A keen observer will be able to record how long they spend "In Development", "In Test", "Externally Blocked" or in any of the other states we have expressed an interest in. The use of electronic boards makes the process less onerous than tracking things manually, and statistics can usually be derived at the click of a button. Cumulative Flow diagrams are often used to illustrate these measurements. They are closely related to the time-and-space diagrams used to model traffic flow, and the effect played by congestion shockwaves can become apparent.

Conclusion, and a note on other metrics

The measurements we have looked at in this short article have been focused on technical delivery. Lead time, throughput, velocity, burn, and cumulative flow can all be used by team members to inspect and adapt their process, and to confirm or refute any suspicions using hard data. What they don't do is to provide an indicator of the value delivered. For all we know, none of the work the team has done and delivered to the customer might have actually proven useful.

Measuring "value" is a thorny issue and it can be quite subjective. For example, if we have replaced an e-commerce website, how do we know if the initiative was worthwhile? Do we simply measure any change in revenue? If so, how can we prove the change was not due to other factors...and over what period should we measure it? What about market share, or customer retention and churn, or the number of new customers inducted? Aren't these figures potentially even more important? Shouldn't we be measuring changes in customer demographics? What about improved hit rates on the new site? We can take that measurement easily enough, but how should we figure it in?

Essentially, what we need to do is this. We need to be able to distinguish "vanity" business metrics, which simply look good, from "actionable" business metrics that can actually be used to form and test an hypothesis as quickly and efficiently as possible. In short we need to think about our metrics as though we were a "Lean Startup", and see a product not merely as a cash cow for revenue, but as a learning tool for exploring new and bigger opportunities.