Excel Model Transparency Definition and Measure

Post date: Nov 4, 2016 3:29:25 PM

Transparency Formulas:

Model

Formula

Reference

= SUM(Distinct Formula Transparency, Non-Formula Value Transparency)

= SUM(Reference Transparency in Formula)

= COUNT(steps to find VALUE) + COUNT(steps to find LABEL)

Are your models Transparent? Are you sure? How do you know?

Model transparency is a requirement in publicly traded companies so it is critical we know what transparency means. Transparency is also important to auditors and those who commission models because the less transparent a model is the longer it takes to audit and the more it costs to deliver. But you knew all that, right?

Can you provide a concrete, measurable definition for transparency? All scholarly papers I could find report there is no consensus on that term. Opinions differ wildly. Yet modelers claim one method is more transparent than another. How do we know?

The short answer is, we don't because no one, that I know of, measures model transparency. Up untill now, transparency in modeling has been a concept, not a measure. Comparing fuzzy concepts belongs to philosophy, not computer science nor accounting. So let's fix that.

What is Transparency?

I posed this to the EuSpRIG '17 conference. Everyone indicated they knew yet, when asked, each response was quite different. This was expected. Prior research indicated, "Little consensus is found to exist around a single, testable definition for transparency." (quote source)

To develop a measure we need a "single, testable definition" so let us start with the standard:

characterized by visibility or accessibility of information especially concerning business practices

Merriam Webster

In this context Transparency applies to information. A model has some degree of transparency if its information is accessible. The degree of transparency is how easily we can access its information. If we cannot access a model's information, that model is opaque.

Definition (in a spreadsheet context)

Ability to access each cell's information.

Measure

Effort required to access each cell's information.

What is Information?

"There is a subtle difference between data and information. Data are the facts or details from which information is derived. ...For data to become information, data needs to be put into context." (quote source). In Excel, cells displaying values are raw data. Add context with proper labels and values become information. For illustration consider a spreadsheet with a single cell occupied by the number 12. 12 is raw data. Without context we have no idea what 12 represents. But if we add the label "Inches of Rope" we know.

What do we mean by 'Value'?

In modeling, values are any input to, or result of a formula or function. Values can be sub-categorized into Value Types with distinct labeling requirements (see also International Vocabulary of Metrology).

Some cells contain labels. These are not 'values' and are excluded from transparency measures along with a model's documentation, functions and operators.

Transparency vs Understandability

Understandabilty as a concept is based as much on our abilities as anything else. To illustrate, consider a simple financial model which may be easily understood by someone with skills in Excel and Finance but totally beyond a toddler's grasp. Transparency is necessary for understanding a model but transparency has nothing to do with our abilities. Transparency is solely a model property and, as such, can be measured without human complexity considerations.

NOTE! This is not to say there are no formula factors which are measurably more or less understandable within a population. For more on those factors see: Measuring Spreadsheet Formula Understandability by Felienne Hermans, Martin Pinzger & Arie van Deursen.

Transparency vs Complexity

Complexity differs from transparency. To illustrate consider that a simple cell with no formula can be totally opaque when its source cannot be revealed; whereas, a rather complex equation with proper labeling can be quite transparent. Increased complexity can make finding a cell's source information more difficult, but that is accounted for in this measurement.

Measuring Ease of Access

Finding a source's value and label in Excel is accomplished by following well defined steps such as clicking Excel's Trace Precedents icon or clicking in the formula bar to see precedents highlighted and color coded, or pressing F9 to evaluate a formula. Just looking within the immediate vicinity is not considered a step. Each step requires roughly the same effort, thus counting the number of steps measures the ease by which we access source values and labels. No steps is totally transparent. More steps are less transparent.

Reference Types

The number of steps required to find a source's value and label varies by source type and whether local (on the same worksheet), remote (on a different worksheet in the same workbook) or external (from outside the workbook). Source types include: literal constants in functions; literal constants in formulas; cell/range references, named range references, and structured references

Literal Constants in Functions

In the example at right 3.5 is a literal constant. Literal constants expose their values (3.5); thus, there are no steps required to find them. Finding their labels, which tell us what their value means may require many steps.

When we use a literal constant in a function, the function may provide a parameter label sufficient for identifying what the literal is. In the example we know 3.5 is a Rate. The step required to find this label is to double click the formula's cell or click on the formula in the formula bar. Both methods expose a 'tooltip' below the cursor (red circle). If we need additional information we can click the

button in the formula bar to bring up the Insert Function dialog or we can click the function's name in the tooltip to bring up Excel's function help.

Sometimes parameter labeling is too vague. In the example we know 6 is the Number of Periods but we do not know if that is in weeks, months, quarters, etc. Which it is can be important.

Sometimes functions provide no meaningful labels at all. In such cases the literal is opaque.

A literal-constant-in-a-function's transparency is based on the function's parameter labeling being sufficient. We can catalog Excel functions used and classify each parameter as sufficient or not. If a function's parameter is sufficient then a literal constant in that parameter has a transparency of 1 (no steps to find its value plus one step to find the function's parameter label). Otherwise the literal constant is opaque.

Transparency Score

= IF(Function Parameter Label Sufficient, 1, Opaque)

Literal Constants in Formulas

Finding a literal constant's label when used outside functions may require more steps. Consider this example:

=A11*12

12 is a literal constant. We know its value (12). What we don't know is what it means. It could be a dozen. It could be months in a year. It could be inches in a foot. To understand this constant we need to look at the formula's label and hope that makes it clear.

Literal-constant-in-a-formula transparency is the number of steps required to find the formula's label. Judgement may be required to determine if the formula's label is sufficient to identify what the literal constant is. If no label is found or judged insufficient the constant is opaque.

Transparency Score: =If(Label Sufficient, COUNT(Steps to Find Formula Label), Opaque)

Cell/Range References

In our example formula, A11 is a cell reference. To understand this reference we need to find its value and label. We can find its value by clicking the formula, selecting the reference and pressing F9 (considered to be one step). To find its label we must find A11 and hope it has a label next to it.

Cell/range reference transparency is the number of steps required to find its value and label. Remote or External cell references require even more steps to find their labels. If its label is missing the cell reference is opaque.

Transparency Score: =1 + COUNT(Steps to Find Label)

Named Range References

A named range is a defined name containing no functions or operators. It may contain a literal constant or cell/range reference. A name is a label, thus there are no steps required to find a named range reference's label.

Named range reference transparency is 1 because it only takes one step to find its value which is identical to cell/range references: click the formula, select the reference in the formula bar and press F9.

Transparency Score: =1

Structured References

A structured reference is an automatically created named range reference and shares its transparency measure.

Transparency Score: =1

Formulas

A formula's transparency is the sum of its various reference transparencies. Excel often provides multiple methods for achieving the same result. Comparing the transparency of multiple formulas that achieve the same result can help discern which practice is best.

Models

A model's transparency is the sum of distinct formula transparencies (truly duplicate formulas can be ignored) plus the sum of all non-formula value cell transparencies (input cells). This indicates the portion of audit effort required to verify transparency and to identify formula precedents (not fitness for purpose, conformance to standards, nor formula correctness). We should strive to move this number as close to zero (fully transparent) as practical.

Model transparency should never be sacrificed for formula transparency. A common method for making formulas more transparent is to create a local cell referencing a remote reference and then use the local cell in a formula. This merely distributes the transparency measure to other cells with each new cell degrading the model's overall transparency.

Recommendations

Model transparency requires values and labels. As long as all value cells are accessible at the time of auditing, a model's overall transparency is governed by labeling. With that in mind I recommend:

  • Label all cell references appropriately.

  • Label efficiently to reduce clutter.

  • Use freeze panes to keep row and column labels in view (immediate vicinity).

  • Seek alternatives to indirect references and if no practical alternatives exist make sure indirect references are constrained to occupied cells.

  • When formulas require constants use appropriately named references instead of literals.

  • When formulas require remote values use remote references instead of local cells daisy chained to remote cells.

  • Use structured references when practical.

  • Seek alternatives to error values if practical.

  • Fix cells reporting errors not appropriately handled in downstream calculations.

  • Favor model transparency over individual cell transparency.

Summary

Philosophical debates over which methods are more or less transparent can end as we can now rationally measure model transparency and properly discern best practices.

NOTE! Labels can be found in various locations including:

    • Adjacent cells, row or column headings

    • Cell comments

    • Function parameter labels, parameter extended text, and formula help text

    • Data Validation Input Messages

    • Locations dictated by corporate standards

    • Model documentation

References

Using F9: https://exceljet.net/tips/how-to-check-and-debug-a-formula-with-f9