Faking It: Estimates and Metrics in Scrum

First published on Scrum.org, 28 February 2018 (syndicated).

"The most important metrics are: did we execute the way in which we said we would, and did we deliver the value to the business that we had promised?" - Jamie S. Miller

In an earlier post we took a critical look at metrics and at how easily they can be abused. Pretty much anything can be measured, and the gratuitous presentation of numbers can give a sheen of science to an undertaking, no matter how absurd it might really be. The problem is that a wealth of data can seem to make a convincing case, even when the numbers have not been correlated to an hypothesis by rigorous empirical means. Hence phrenology, although it is now understood to be a pseudo-science, was thought to be a credible enough discipline by our forebears. Careful measurements of people's skulls were made in an attempt to ascertain their mental condition. Only over time, and through sceptical enquiry, did it eventually become clear that the shape of a person's head relates very poorly indeed to their psychological make-up. We can trust that any measurements taken were accurate and extensive, but the data was informationally useless when applied in pursuit of this supposed science. The measurements could never validate the phrenological method, irrespective of their quality and quantity. Today it is dismissed as the relict superstition of a bygone age.

Simply put, an abundance  of metrics, irrespective of the precision with which they might be taken, cannot cheat a fundamentally weak correlation between hypothesis and data. The descendants of yesterday's “bump-readers”, however, can still be found in the board-rooms and management offices of large corporations today. With many people under their assumed control, they demand standardized measures of productivity by means of which employees might be compared, punished, and rewarded. Any straws may be grasped at as long as they can be counted. Thus agile teams, which ought to be assessed empirically by the incremental release of value, are instead gauged by the higher-ups in terms of how much estimated work they appear to have "delivered".

Those are the bumps of today. Estimates proxy for value in this grotesque dystopia. Measures like "story points" have become commoditized as a surrogate currency, inviting bizarre inflationary pressures and market distortions upon any numbers which might be arrived at. The actual provision of value to stakeholders is ignored as a quantity too difficult to measure, and so cock-eyed metrics are appropriated in compensation. Our work is cut out for us in trying to persuade delinquent executives to do the right thing - to master the science of measurement - and to value the empiricism which would allow informed decisions to be made.

A further irony is that these suspect techniques, whereby projections are made which are based on estimates, can be used quite rationally by agile teams themselves. The numbers represent a collaborative assessment of essential criteria, such as how much work a team believes it can take on. Having taken these measurements the teams which own them can then make reasoned forecasts. It is their data which they may use for their own projective purposes, even though other stakeholders can only be assured by the receipt of actual value. One reasonable forecast might be how much work they think is likely to remain at a given point before one of those valuable increments is delivered to customers. The shorter the time-period under consideration, the smaller the leap-of-faith a team will make when determining the likelihood of a valuable, empirical outcome.


The Sprint Burndown

The "Sprint Burndown" is an example of this sort of projective practice. It is based on estimates, and is quite familiar to many Scrum Teams. During Sprint Planning, a Development Team will meet with the Product Owner to agree on a selection of work from the Product Backlog. The selection forms the basis of the Sprint Backlog, which is a forecast of the work needed to achieve a jointly agreed Sprint Goal. This body of work may be revised during the Sprint time-box in order to better meet the Goal. Achieving a Sprint Goal is an accomplishment which is of signal importance in Scrum. Completing the original forecast of work arrived at during Sprint Planning is, in truth, somewhat irrelevant. The critical thing is to have a plan which allows the Goal to be met. It is the Sprint Goal, and not the Sprint Backlog, which represents the more artful team commitment. In essence, measuring how much work is left in the Sprint Backlog ought to be nothing more than an exercise in forecasting goal actualization. It relies on having up-to-date estimates which allow the team's progress itself to be continually estimated, until such time as an increment is delivered, and which empirically validates the work which has been undertaken.

A Sprint Burndown is a forecast of the work which remains to be done by a team, for which projections can be made based on prior forecasts, and it is updated throughout the Sprint until the goal is met. The Sprint Burndown may therefore be a projection based on estimates, but it is understood that the measurements are made by a team for its own purposes, and for no-one else's. It tells them whether or not they are actually on course to provide empirical evidence, by the end of a Sprint, that the complex challenge they have undertaken has been mitigated. External stakeholders will gauge progress only through the evidence vouched by actual delivery. Story points and other estimates should never proxy for this value, or be traded or commoditized. These measures are only useful to the teams which make them, within the context of their Sprint and their own development concerns.

Advocates of empirical process control may not be entirely satisfied with this. Even if we accept that value will be evidenced empirically by the end of each Sprint, we still see an attempt to measure progress using estimates. We see promissory notes for value instead of work genuinely done. The leap-of-faith being made through a story point Sprint Burndown is admittedly time-boxed and carefully limited, but it is a leap-of-faith nevertheless.

Why Estimate?

So why do it? Why estimate at all? Why not just focus on completing one item on a Sprint Backlog at a time, bringing it to release quality, and so measure progress in terms of the rate of value honestly and genuinely delivered? If we need a burndown to show us progress towards a goal, why not track that progress in terms of actuals rather than estimates? Moreover, wouldn't this allow empirical process control towards that very goal to be brought within the Sprint itself?

The argument is a sound one, and the case for "no estimates" in agile delivery has a lot to be said for it. Certainly, we must understand and accept that measuring progress on the basis of story points is indeed unempirical, even within the narrow confines of a Sprint. The delivery of working features, early and often, is the only measure of progress which can be truly satisfactory at any scale. What a story point burn-down may reasonably do, however, is to give a team transparency over a complex event. You see, that's what a Sprint really is. It isn't just a stream of work where independent and discrete pieces of value are exposed to uniform pull and flow. Their joint purpose is to meet a Sprint Goal. That goal can mitigate a very significant risk which ultimately makes a Sprint Backlog more than the sum of its parts. Incremental release certainly doesn't have to be deferred to the end of a Sprint, and it may indeed occur on the basis of pull-into-production and continuous flow. However, it might only make sense to effect a release at the end of a Sprint where a complex deliverable is at hand, and there are multiple unknowns to be juggled. Scrum makes no prescription about any of these scenarios or about the metrics which a trusted and self-organizing team ought to use. A story point burn-down is an interim construct through which empirical process control can be faked. When release happens, the fakery ends and progress is recalibrated. As long as we understand and accept this as well, then there may not be a problem.

The Product Burndown

Now let's consider another common way of projecting delivery by means of story-point estimates, and which is found in many Scrum implementations. The "Product Burndown", like the Sprint Burndown, is a forecast which shows how much work is likely to remain over time, and projected dates for its likely completion. However unlike a Sprint Burndown - which constrains a forecast to the Sprint Backlog - a Product Burndown attempts a forecast over perhaps the entire corpus of work. Estimates like story-points may be used to calibrate them, and to make projections which extend over many months, and possibly even into years of anticipated product development.

Moreover, these estimates are not intended primarily for Development Team consumption, but rather for the benefit of senior stakeholders and other higher-ups who wish to be appraised concerning longer-term delivery outcomes. How reasonable is it to use Development Team estimates for these purposes? Shouldn't those people care more about receiving value iteratively and incrementally, rather than about graphs and charts and projections? Aren't we getting perilously close to the old bump-reading problem, where careful measurements end up being used badly, reality is misrepresented, and empiricism takes a back seat? In short, can executive types be trusted with Development Team measures and metrics?

Let's remind ourselves that, at its root, the only purpose of estimation is to allow a Development Team to figure out how much work it thinks it can take on. When those estimates are exposed beyond the team's circle of trust we may indeed run the risk of abuse, of story points being commoditized, of teams being compared or obliged to bid for work using points as a cryptocurrency, and other abominations. In Scrum this is a risk which lies squarely with a Product Owner to manage. As a member of the Scrum Team, the Product Owner is trusted to understand and respect the Development Team's estimates and to use any associated projections sensibly. The Product Owner will understand the limitations of using estimates to measure progress, and the importance of recalibrating a Product Burndown and any forecasts in light of the empirical evidence brought about by release. If there is doubt about the ability of other stakeholders to consume this data, then the Product Owner - as their representative, advocate and arbiter - must decide whether or not they ought to be exposed to estimated measures and forecasts at all. Perhaps they aren't. A Product Owner might be the only trusted consumer of Product Burndown information.

The Product Owner must be respected as the authority who must interpret the available data, including forecasts, and who will make decisions for optimizing and releasing product value. He or she is the one customer representative who must lie within the Scrum Team circle of trust. It is an unprecedented level of responsibility and accountability...and it comes with the job.