Explainable AI

Which game are the features playing?

The SARS-CoV-2 has mutated, and a new disease, COVID-22, is spreading. You attended a concert two days ago and are feeling COVID-22 symptoms, so you want to get tested and preferably get the result quickly. You call your AI-doctor to get on the fast-track. After reporting some data, the AI-doctor says “sorry, but you are not eligible for the fast-track, have a good day.” You try to argue, but the AI-doctor cannot give an explanation. You would have liked some intelligible reasoning behind the rejection, such as “your reported behavioral patterns combined with the lack of more specific symptoms than a mild cold makes it unlikely that you have COVID-22.”

So how do we get from the raw input data to a reasonable explanation of how the input data led to the conclusion?

Explainable AI is an umbrella term for methods that can bridge the gap between a prediction algorithm and a human’s understanding of its behavior. In this post we will look at Shapley values as a way to attribute an outcome of a prediction algorithm to different features of the input data. Shapley values are not natural language explanations of a prediction, but they are an attempt to quantify how different aspects of the input data contribute to a prediction in a way that is interpretable to humans.

We will use the prediction of fast-track eligibility for COVID-22 testing to illustrate how Shapley values are defined and how they can be computed in the context of a prediction problem. As we show in this post, Shapley values depend on the choice of a value function. For any specific choice of the value function, Shapley values are a reasonable way to distribute the contribution to a prediction among the different features, and there are excellent resources available that explain the theoretical justification of Shapley values. However, the value function is not unique, and the focus in this post is how the choice of the value function affects the interpretation of the Shapley values.

A simple prediction algorithm

We will consider a prediction algorithm of the probability of a binary outcome $Y$ from input data $x = (x_1, \ldots, x_p)$ in $\mathcal{X} = \mathcal{X}_1 \times \ldots \times \mathcal{X}_p$ with each feature set $\mathcal{X}_j$ finite. The algorithm might predict the probability of having COVID-22 based on a number of self reported features such as symptoms and behavior. To illustrate the main ideas we will use a simple Bayesian network model, whose graph is as follows

graph LR
  A([ X<sub>1</sub> ]) --> D([ X<sub>3</sub> ])
  B --> E([ X<sub>2</sub> ])
  A --> B[ Z ] --> C([Y])
  C --> D

Here $Y = 1$ denotes that you have the COVID-22 infection, $Z = 1$ denotes that you have been in close contact with an infected, $X_1 = 1$ that you have been in a crowd, $X_2 = 1$ that you know you have been in close contact with an infected, and $X_3 = 1$ that you have COVID-22 symptoms.

In this example, all variables are binary, and the joint probabilities factorize according to the graph as $$p(x_1, z, x_2, y, x_3) = p_3(x_3 \mid y, x_1)p_y(y \mid z)p_2(x_2 \mid z) p_z(z \mid x_1) p_1(x_1).$$ If $(x_1, x_2, x_3)$ are the reported features, the conditional probability of $Y = 1$ is computable using the formula $$p(y \mid x_1, x_2, x_3) = \frac{\sum_{z} p(x_1, z, x_2, y, x_3)}{\sum_{z,y} p(x_1, z, x_2, y, x_3)}$$ for any joint probability distribution. We will compute these probabilities from the factors in the Bayesian network, but in practice we might instead estimate $p(y \mid x_1, x_2, x_3)$ directly from data.

We will use the following numerical values of the conditional probabilities in the Bayesian network \begin{align*} p_1(1) & = 0.05 \\ p_z(1 \mid 0) & = 0.01 \\
p_z(1 \mid 1) & = 0.05 \\
p_2(1 \mid 0) & = 0 \\
p_2(1 \mid 1) & = 0.5 \\
p_y(1 \mid 0) & = 0.001 \\
p_y(1 \mid 1) & = 0.1 \\
p_3(1 \mid 0, 0) & = 0.02 \\
p_3(1 \mid 1, 0) & = 0.8 \\
p_3(1 \mid 0, 1) & = 0.05 \\
p_3(1 \mid 1, 1) & = 0.83. \\
\end{align*} Observing the feature vector $x = (x_1, x_2, x_3)$, the prediction algorithm of $f(x) = P(Y = 1 \mid X = x)$ is then given by the table

$$ \begin{array}{ccc|c} x_1 & x_2 & x_3 & f(x) \\
\hline 0 & 0 & 0 & 0.0003 \\
1 & 0 & 0 & 0.0006 \\
0 & 1 & 0 & 0.0222 \\
1 & 1 & 0 & 0.0195 \\
0 & 0 & 1 & 0.0566 \\
1 & 0 & 1 & 0.0557 \\
0 & 1 & 1 & 0.8163 \\
1 & 1 & 1 & 0.6484 \\
\end{array} $$

This algorithm is sufficiently simple to be directly interpretable.

Suppose that getting into the fast-track for testing requires $f(x) \geq 0.1$, and your features were $x = (1, 0, 1)$. Since $f(1, 0, 1) = 0.0557$ you are not eligible, and we can see from the table that in addition to the symptoms you should have reported that you had been in close contact with an infected to be eligible.

In fact, the algorithm cuts the feature space into two parts in a very simple way: $f(x) \geq 0.1$ if and only if $(x_2, x_3) = (1, 1)$. That is, you are eligible for the fast-track if you report having been in contact with an infected and have symptoms, and otherwise you are not eligible.

When the prediction task is more complicated, we cannot so easily read of how the different features affect the predicted probabilities. With Shapley values we will instead attempt to explain how the prediction $f(1, 0, 1) = 0.0557$ can be attributed to the individual values of the features.

Prediction: a cooperative game played by features

Shapley values have their origin in cooperative game theory as a principled way to distribute the outcome of a game among the different players. The idea is to regard prediction as a cooperative game played by the features.

In the general setup there are $p$ features and the algorithm is a function $$f : \mathcal{X} \to [0,1].$$ When we evaluate the function and compute $f(x) = f(x_1, \ldots, x_p)$, this value depends on all the $p$ features. The objective is to define numbers, $\varphi_1, \ldots, \varphi_p$, for the $p$ features such that $\varphi_j$ quantifies how much the specific feature value, $x_j$, contributes to the prediction $f(x_1, \ldots, x_j, \ldots, x_p)$.

To understand the contribution of feature $j$, suppose for a moment that its value is missing, but we have the values of all the remaining features. Which prediction will we then make? It’s a frequent practical problem that a feature value is missing, and a simple solution is imputation. That is, we first predict the missing value as $\hat{x}_j$, and then we plug it into the prediction algorithm to give

$$f{}^{-j}(x_{-j}) = f(x_1, \ldots, x_{j-1}, \hat{x}_j, x_{j+1}, \ldots, x_p).$$

The actual imputation algorithm is immaterial here, but let’s say we use the simplest one: predict $\hat{x}_j$ to be the marginally most probable outcome in $\mathcal{X}_j$. We could now attribute the difference

$$f(x) - f{}^{-j} (x_{-j})$$

to feature $j$. But what if feature $k$ is missing as well – shouldn’t the difference

$$f{}^{-k}(x_{-k}) - f{}^{-\{j,k\}}(x_{-\{j,k\}})$$

also be considered to appropriately quantify the contribution of feature $j$?

Letting $f{}^{-R}(x_{-R})$ for $R \subseteq \{1, \ldots, p\}$ denote the prediction based on imputed values, $\hat{x}_j$, for $j \in R$, we arguably need to consider

$$f{}^{-R}(x_{-R}) - f{}^{-R \cup \{j\}}(x_{-R \cup \{j\}})$$

for all $R \subseteq \{1, \ldots, p\} \backslash \{j\}$ to fully understand the contribution to the prediction of feature $j$. The summary of these numbers, written as the weighted sum

\begin{align} \varphi_j = \sum_{R \subseteq \{1, \ldots, p\} \backslash \{j\}} \frac{(p-|R|)!(|R|-1)!}{p!} (f{}^{-R}(x_{-R}) - f{}^{-R \cup \{j\}}(x_{-R \cup \{j\}})), \end{align}

is known as the Shapley value for the $j$-th feature.

Using the game theoretic terminology, for the coalition $S \subseteq \{1, \ldots, p\}$ of features, the outcome of the game is the value function $$v(S) = f{}^{-S^{c}}(x_{-S^c}),$$ that is, the prediction obtained from features in $S$ only and with features in $S^c$ imputed, and \begin{align} \varphi_j = \sum_{S \subseteq \{1, \ldots, p\} \backslash \{j\}} \frac{|S|!(p -|S|-1)!}{p!} (v(S \cup \{j\}) - v(S)) \tag{1} \end{align} is the weighted sum of the contribution of feature $j$ to any coalition. With this definition1 it can be shown that $$f(x) = v(\emptyset) + \varphi_1 + \ldots + \varphi_p.$$

The game theoretic interpretation of Shapley values is covered elsewhere, and it can also be shown that the weighted sum definition above is the unique attribution satisfying a number of natural properties. We gloss over those details here — Shapley values are for a given value function a well established and sensible way to decompose $f(x) = v(\{1, \ldots, p\})$ into contributions from the individual players.

All value functions considered satisfy that $f(x) = v(\{1, \ldots, p\})$ but differ in how they assign values to other coalitions. Different value functions will lead to different quantifications of attribution by Shapley values, and we want to understand in greater detail what these differences mean. Indeed, the choice above via imputation is only one way of defining how feature coalitions act.

For the value function based on imputation, the attributions are justifiable when imputation based prediction is actually used – but we cannot necessarily stretch the meaning of the attributions to have intrinsic value for the individual features.

SHAP: a population game

Any specific algorithm capable of making predictions when some values are missing – whether the algorithm is based on imputation or not – gives rise to a value function and corresponding Shapley values. These Shapley values explain the contributions of the individual features for that specific prediction algorithm.

One particularly interesting prediction algorithm is the theoretically optimal predictor given by $$v(S) = P(Y = 1 \mid X_S = x_S)$$ for any subset $S \subseteq \{1, \ldots, p\}$. With no missing values the prediction algorithm is simply $$f(x) = v(\{1, \ldots,p\}) = P(Y = 1 \mid X = x)$$ and $v(\emptyset) = P(Y = 1)$. The Shapley values arising from this specific value function are called SHAP (SHapley Additive exPlanation)2.

Note that we cannot obtain $v(S)$ from $f$ alone, e.g. via imputation, but we can compute $v(S)$ for all $S$ from the joint distribution of $(Y, X_1, \ldots, X_p)$. The $j$-th Shapley value for this value function can be interpreted as the intrinsic predictive contribution of $j$-th feature. It tells us how much that feature contributes with if we exploit the features in any coalition in an optimal way.

We can also see that for any $S \subseteq \{1, \ldots, p\}$ and with $R = S^c$, \begin{align} v(S) = \sum_{x_R} f(x_S, x_R) P(x_R \mid X_S = x_S), \tag{2} \end{align} thus we can compute the value function and corresponding Shapley values from knowledge of $f$ and the distribution of $X = (X_1, \ldots, X_p)$.

Shapley values for COVID-22 example

We can compute the Shapley values in the COVID-22 example using as value function either $f$ combined with imputation, denoted $v^{\mathrm{Imp}}$, or using the population predictors denoted $v^{\mathrm{Pop}}$. All missing $x$-values are imputed as $0$, which results in $v^{\mathrm{Imp}}(\emptyset) = f(0, 0, 0) = 0.0003$, while $v^{\mathrm{Pop}}(\emptyset) = 0.0022$ is the marginal probability of being infected. All the Shapley values can be found in the table below. Irrespectively of the choice of value function, it is always true that $$v(\emptyset) + \varphi_1 + \varphi_2 + \varphi_3 = f(x).$$

$$ \begin{array}{ccc|c|c|rrr|c} x_1 & x_2 & x_3 & v & v(\emptyset) & \varphi_1 & \varphi_2 & \varphi_3 & f(x) \\
\hline 0 & 0 & 0 & \mathrm{Imp}& 0.0003 & 0 & 0 & 0 & 0.0003\\
0 & 0 & 0 & \mathrm{Pop}& 0.0022 & -0.0001 & -0.0003 & -0.0015 & 0.0003\\
\hline 1 & 0 & 0 & \mathrm{Imp}& 0.0003 & 0.0003 & 0 & 0 & 0.0006 \\
1 & 0 & 0 & \mathrm{Pop}& 0.0022 & 0.0018 & -0.0008 & -0.0026 & 0.0006 \\
\hline 0 & 1 & 0 & \mathrm{Imp}& 0.0003 & 0 & 0.0219 & 0 & 0.0222\\
0 & 1 & 0 & \mathrm{Pop}& 0.0022 & 0.0001 & 0.0597 & -0.0399 & 0.0222\\
\hline 1 & 1 & 0 & \mathrm{Imp}& 0.0003 & -0.0012 & 0.0204 & 0 & 0.0195\\
1 & 1 & 0 & \mathrm{Pop}& 0.0022 & 0.0006 & 0.0580 & -0.0413 & 0.0195\\
\hline 0 & 0 & 1 & \mathrm{Imp}& 0.0003 & 0 & 0 & 0.0563 & 0.0566\\
0 & 0 & 1 & \mathrm{Pop}& 0.0022 & -0.0004 & -0.0093 & 0.0640 & 0.0566\\
\hline 1 & 0 & 1 & \mathrm{Imp}& 0.0003 & -0.0003 & 0 & 0.0557 & 0.0557 \\
1 & 0 & 1 & \mathrm{Pop}& 0.0022 & 0.0037 & -0.0154 & 0.0651 & 0.0557 \\
\hline 0 & 1 & 1 & \mathrm{Imp}& 0.0003 & 0 & 0.3908 & 0.4252 & 0.8163 \\
0 & 1 & 1 & \mathrm{Pop}& 0.0022 & 0.0139 & 0.4127 & 0.3875 & 0.8163 \\
\hline 1 & 1 & 1 & \mathrm{Imp}& 0.0003 & -0.0565 & 0.3346 & 0.3699 & 0.6484\\
1 & 1 & 1 & \mathrm{Pop}& 0.0022 & -0.0380 & 0.3506 & 0.3337 & 0.6484\\
\end{array} $$

The table shows that the value function is decisive for the actual numerical values, but there are also many similarities between the rows corresponding to imputation and the rows corresponding to the population predictors.

Looking at the last four rows, say, corresponding to $x_2 = x_3 = 1$, we see that the Shapley values in both cases attribute the probability $f(x)$ about equally to “having been in contact with an infected” and “having symptoms”. Both methods also agree that $x_1 = 1$ contributes negatively if $x_2 = x_3 = 1$. This is perfectly sensible as the model includes a component capturing that “having been in a crowd” can give other infections that result in symptoms.

A notable difference is seen for $x_2 = 1$ and $x_3 = 0$, where imputation attributes the prediction to $x_2$ with a small positive Shapley value, while the population predictors assign a larger positive attribution value to $x_2$ but a compensating negative attribution to $x_3$.


The computations for the example were done using the formulas directly. We can do that because the example is so simple. In practice, when the number of features is larger, we face several difficulties:

  • The number of terms in the sum (2) required to compute $v$ grows exponentially with $|R|$.
  • The number of subsets $S$ in the sum (1) grows exponentially with $p$.

To address the first of these computational challenges, the original paper2 on SHAP proposes the approximation $$P(x_R \mid X_S = x_S) \simeq P(x_R) \tag{3}$$ where $P(x_R)$ denotes the marginal distribution of $X_R$. Unless $X_1, \ldots, X_p$ are independent, this is, indeed, an approximation.

Once we can compute (approximations of) $v$ efficiently, the sum (1) can be approximated via Monte Carlo methods and/or weighted linear regression. More recently, TreeSHAP3 was introduced as a computationally efficient approximation of SHAP for a tree-based predictor $f$ – still approximating $P(x_R \mid X_S = x_S)$ by the marginal $P(x_R)$.

Without approximating the sum (1), recent results4 indicate that the computation of SHAP is fundamentally hard in most cases. And that this can be the case even if $X_1, \ldots, X_p$ are independent. However, we can compute $v$ without relying on (3), e.g. by using efficient inference algorithms for Bayesian network models of the joint distribution of $X_1, \ldots, X_p$. This combined with approximations of (1) could yield computationally feasible algorithms that do not assume (3).

Interpretations of the explanations

Even if (3) was initially regarded as an approximation, subsequent arguments5 were made that Shapley values based on the value function $$v(S) = \sum_{x_R} f(x_S, x_R) P(x_R) \tag{4}$$ are conceptually more meaningful than SHAP based on the conditional distributions. Whether this is the case is really a matter of what we want the Shapley values to express.

One way to interpret (4) is as an alternative to imputation. Instead of imputing missing values we average over possible values using a marginal distribution. Imputation is really an extreme version of (4) where $P(x_R)$ is replaced by a point measure. When the value function (4) is used to define Shapley values, the number $$v(S \cup \{j\}) - v(S)$$ for $S \subseteq \{1, \ldots, p\} \backslash \{j\}$ quantifies explicitly how sensitive $f$ is to the particular value $x_j$ when joined to the values $x_S$. This is not so for SHAP, based on (2), where the quantification also includes how much additional information $x_j$ includes about the other features indexed by $S^c$.

This brings us to the core of the matter of this post. The numerical attributions given by the Shapley values “explain” how a specific feature value affects the prediction. But how should we interpret this explanation?

  • If the value function (4) is used, the $j$-th Shapley value expresses how the predictor $f$ depends on the specific value $x_j$ of the $j$-th feature.
  • If the value function (2) is used, the $j$-th Shapley value expresses the predictive information in the specific value $x_j$ of the $j$-th feature.

Shapley values based in (4) have recently been called true to the model, while Shapley values based on (2) have been called true to the data6, and either set of Shapley values can be useful depending upon what we try to explain.

It is, however, slightly unfortunate that Shapley values based on (4) have become known as “interventional”. This is unfortunate since the terminology suggests a causal interpretation, but these Shapley values are not at all related to real-world interventions on the features. They are “interventional” in a technical sense of intervening on the inputs to the prediction algorithm, but they are not “interventional” in the conventional sense of intervening on the features in reality.

In fact, if we have a causal model of the features, we could replace $P(x_R \mid X_S = x_S)$ by $P(x_R \mid \mathrm{do}(X_S = x_S))$ in (2) to obtain yet another value function and corresponding causal Shapley values7, which now express the interventional effect of the features.

In conclusion, the explanations obtained from Shapley values should be interpreted with care, and what the correct interpretation is depends on the choice of the value function. That is, which game it is that the features play!

  1. In game theory the value function usually satisfies $v(\emptyset) = 0$, which can be achieved by redefining the value function as $v(S) - v(\emptyset)$. This doesn’t change the Shapley values and makes the notation more cumbersome. ↩︎

  2. Lundberg, S. M. and Lee, S. “A unified approach to interpreting model predictions.” Advances in Neural Information Processing Systems (2017) ↩︎

  3. Lundberg, S. M. et al. “From local explanations to global understanding with explainable AI for trees.” Nature Machine Intelligence, volume 2, 56–67 (2020) ↩︎

  4. Van den Broeck, G., Lykov, A., Schleich M., Suciu, D. “On the Tractability of SHAP Explanations.” The Thirty-Fifth AAAI Conference on Artificial Intelligence (2021). ↩︎

  5. Janzing, D., Minorics L., Blobaum, P. “Feature relevance quantification in explainable AI: A causal problem.” Proceedings of the 23rdInternational Conference on Artificial Intelligence and Statistics (AISTATS) 2020 ↩︎

  6. Chen, H., Janizek J. D., Lundberg, S., Lee, S. “True to the Model or True to the Data?” https://arxiv.org/abs/2006.16234 ↩︎

  7. Heskes, T., Sijben, E., Bucur, I. G., Claassen, T. “Causal Shapley Values: Exploiting Causal Knowledge to Explain Individual Predictions of Complex Models.” 34th Conference on Neural Information Processing Systems (NeurIPS 2020). ↩︎

Niels Richard Hansen
Niels Richard Hansen
Professor of Computational Statistics

Works on causal structure learning to achieve explainable, robust and transportable AI.