BacKGround

What is a Counterfactual?

A counterfactual explanation describes the contrapositive instance where "if X had not occurred, Y would not have occurred." For example, an explanation of "I studied, so I aced the test" could read: "If I had not studied, I would not have aced the test."

In Machine Learning, a counterfactual analyzes a given instance of cause and effect (or inputs and outputs) and imagines how an outcome could have been changed by contradicting the observed facts. A good counterfactual explanation describes the smallest change in inputs that changes the prediction to the desired output.

Why is it important?

Counterfactual explanations help users determine what changes they need to make to produce a new outcome. These explanations can be applied to a variety of fields, including finance (e.g approving a loan, receiving income over a certain threshold), hiring (e.g. admittance into a university), and medicine (e.g. cancer diagnosis, determining risk of acquiring diabetes).

On a more technical level, counterfactual modeling helps the user understand (to a certain degree) black box algorithms. Getting insight on how to change an output sheds light on input weights.

How are these instances generated?

Counterfactual instances can be generated in two ways: either through a naive trial-and-error method, or by optimizing a loss function. The loss function takes as an input the instance to be changed and a counterfactual. Then, the function is optimized to reduce the difference between the predicted and predefined outcome and the difference between the counterfactual and predefined instance.

The approach suggested by Wachter et. al (2017) suggests optimizing the following loss function through gradient descent :

The tolerance ϵ for λ determines how large the difference can be between the prediction of the counterfactual instance (fhat) and the desired outcome (y'). The tolerance is calculated by:

Overall Steps to Solve for a Counterfactual:

    1. Choose an instance x to be explained, the desired outcome y', a tolerance ϵ, and a low initial value for λ

    2. Use a random instance as an initial counterfactual

    3. Optimize the loss with the sampled counterfactual

    4. Within the defined tolerance ϵ:

      1. Increase λ

      2. Optimize the loss with the current counterfactual as starting point

      3. Return the counterfactual that minimizes the loss

    5. Repeat steps 2-4 and return the list of counterfactuals