Weak and strong solutions of random equations

The shadowland beyond the DAG

Many scientific models can be formulated and analyzed as solutions of random equations, that is, equations containing a random component. Structural equation models, widely used as causal models, form one class of such models. The class of structural equations whose corresponding graph is a Directed Acyclic Graph (DAG) is particularly well studied, and a substantial amount of theory, as well as intuition, related to solutions of random equations is based on our understanding of DAG-inducing structural equations.

However, when we move into the shady realms of other types of random equations, the concepts defined in the DAG-world do not generalize easily, and any intuition based on DAGs is greatly challenged in the shadowland beyond the DAG.

This post is the first in a series on how to define causal models in terms of structural equations that do not induce a DAG. That is, structural equations where the variables may depend cyclically on each other, and which are thus able to represent feedback mechanisms. In this post we will introduce weak and strong solutions of random equations, in particular structural equations. We will collect and discuss some basic results regarding existence and uniqueness of solutions in a theorem we call Kurtz’ alternatives.

Solution maps

If we consider any particular equation, it may have none, one or more solutions. We will be interested in the set of solutions and, more specifically, how solutions depend on input data. We will focus on the case where there is at least one solution for any given input data, and we will consider mappings $$\mathrm{input \ data} \mapsto \mathrm{SOLVE}(\mathrm{input \ data})$$ where $\mathrm{SOLVE}(\mathrm{input \ data})$ denotes one solution for the given input data. If there is always exactly one solution for any given input data, we will have a unique solution map $\mathrm{SOLVE}$, but in general there may be several solution maps.

The SIS model

To give a simple example of a solution map, we consider the quadratic equation \begin{align} I = \varepsilon(1 - I)I \tag{1} \end{align} where $\varepsilon > 0$ is considered input data and $I \in [0,1]$ is the variable. We will explain below how the solution of this equation can be understood as the steady-state proportion of infected individuals in a population according to the Susceptible-Infectious-Susceptible (SIS) model. The common cold is an example of an infectious disease that can be modelled by the SIS model. The infection does not induce long-term immunity in an individual, and after some time being infected, an individual becomes susceptible to infection again.

We observe that the equation (1) always has $I = 0$ as solution, that is, there is always the steady-state solution where no individuals are infected. If $\varepsilon \geq 1$ then $I = 1 - \varepsilon^{-1}$ is also a solution in $[0,1]$. We can thus write down at least two solution maps as functions of $\varepsilon$: $$\mathrm{SOLVE}_0(\varepsilon) = 0$$ and $$\mathrm{SOLVE}_1(\varepsilon) = \max\left(1 - \varepsilon^{-1}, 0 \right).$$

To explain how we arrive at the equation (1) as the steady-state equation of the SIS model let $I_t \in [0,1]$ denote the proportion of infected in a population at time $t$. The SIS model posits that the evolution of $I_t$ over time satisfies the integral equation

\begin{align*} I_t & = I_0 + \int_0^t \beta(1 - I_s)I_s -\gamma I_s \mathrm{d} s \end{align*}

where $\beta, \gamma > 0$ are parameters. The $\beta$ parameter is the infection rate, while the $\gamma$ parameter is the recovery rate.

Letting $\varepsilon = \beta / \gamma$ it is straightforward to see that solutions of $I = \varepsilon(1 - I)I$ correspond to steady-state solutions of the SIS integral equation. We may also observe that if $I_0 = 0$ then $I_t = 0 = \mathrm{SOLVE}_0(\varepsilon)$ for all $t \geq 0$ irrespective of the value of $\varepsilon$, while if $I_0 > 0$ then $I_t$ converges to $\mathrm{SOLVE}_1(\varepsilon)$ as $t \to \infty$. Thus even if the initial condition $I_0$ does not appear in the steady-state equation, it does determine which of the two steady-state solutions is reached by the dynamical system defined by the SIS model.

Random equations

Equations, such as the SIS integral equation or its steady-state equation, are used to capture relations among observables in a range of sciences including infectious disease modelling. In mathematics we study equations abstractly and of all forms and shapes. Equations express relations among variables via mathematical expressions that involve operators. The variables represent integers, real or complex numbers, vectors, functions or other elements belonging to sets on which the mathematical operators are defined.

The SIS steady-state equation is a simple equation in one variable depending on the one-dimensional input data1 $\varepsilon = \beta / \gamma > 0$. In this post we will regard input data as random. Though $\varepsilon$ in this example is a ratio of two population parameters, we can perceive $\varepsilon$ as a random variable, whose distribution reflects variations between populations. In that way we regard the steady-state equation for the SIS model as a random equation determined by the random input data $\varepsilon$. We will then be interested in the pushforward distribution of the solution to (1) and how it depends on the distribution of $\varepsilon$.

For a fixed solution map $\mathrm{SOLVE}$ the pushforward distribution of interest is simply the distribution of the random variable $$I = \mathrm{SOLVE}(\varepsilon).$$ If there are multiple solution maps we can have several different pushforward distributions, but in this case an interesting phenomenon arises: the equations may have weak solutions that cannot be written exclusively in terms of a $\mathrm{SOLVE}$ map.

Structural equation models: weak and strong solutions

We now proceed to the general setup of structural equation models. Let
$$f_i : \mathbb{R}^p \times \mathbb{R} \to \mathbb{R}$$ for $i = 1, \ldots, p$ be $p$ maps, and let $f : \mathbb{R}^p \times \mathbb{R}^p \to \mathbb{R}^p$ be defined2 as $$f(x, e) = (f_1(x, e_1), \ldots, f_p(x, e_p))^T.$$ We will then be interested in solutions $X$ of the structural equation \begin{align} X = f(X, \varepsilon) \tag{2} \end{align} where $\varepsilon \in \mathbb{R}^p$ is a random variable with some distribution $\nu$.

Definition (Weak solution). Let $\nu$ be a probability distribution on $\mathbb{R}^p$. If $X, \varepsilon \in \mathbb{R}^p$ are $p$-dimensional random variables we say that $(X, \varepsilon)$ is a weak solution to the structural equation (2) if:
  • $\varepsilon$ has distribution $\nu$
  • $X = f(X, \varepsilon)$ almost surely.

In terms of the map $f$ we define a directed graph to have nodes ${1, \ldots, p}$ and an edge from $i$ to $j$ if the $i$-th coordinate, $f_i$, of $f$ depends on $x_j$. If this graph is a DAG it is always possible to solve the structural equation (2) recursively so that $X$ is expressed as a (unique) function of $\varepsilon$. Regarding $\varepsilon$ as input data, we can thus, whenever the graph induced by $f$ is a DAG, write $$X = \mathrm{SOLVE}(\varepsilon)$$ as a function of $\varepsilon$ where $\mathrm{SOLVE}$ is the unique solution map. The pair $(\mathrm{SOLVE}(\varepsilon), \varepsilon)$ is a weak solution to (2) according to the definition above, but it has the special structure that $X$ is expressed as a deterministic function of $\varepsilon$. If $X = \mathrm{SOLVE}(\varepsilon)$ almost surely we call the solution a strong solution.

Multiple solution maps

If the graph induced by $f$ is not a DAG we might in some cases still have a unique solution map $\mathrm{SOLVE}$, but in general we cannot expect a solution to exist, nor that it is unique if it does. In fact, the theory of DAG-inducing structural equation models does not generalize easily to the non-DAG case. Many of these difficulties are treated in the foundational paper3 by Bongers, Forré, Peters and Mooij.

The SIS steady-state equation is a simple example with $p = 1$ and $$f(x, \varepsilon) = \varepsilon (1 - x) x$$ with the additional restriction that $x \in [0,1]$. Depending on $\varepsilon$ there may be one or two solutions to the equation. We know that there are at least two different solution maps as discussed above corresponding to the two weak solutions $(0, \varepsilon)$ and $(\max\left(1 - \varepsilon^{-1}, 0 \right), \varepsilon)$. And since both of these solutions can be expressed in terms of a solution map applied to $\varepsilon$, they are both strong solutions as well.

To give a slightly more complicated example we will consider the following set of $p = 4$ equations

\begin{align*} X_1 & = \mathrm{sign}(X_2) \varepsilon_1 \\
X_2 & = \mathrm{sign}(X_1) \varepsilon_2 \\
X_3 & = \mathrm{sign}(X_4) \varepsilon_3 \\
X_4 & = \mathrm{sign}(X_3) \varepsilon_4 \end{align*}

where $\nu$ is assumed to have support contained in $[0,\infty)^4$. Then $(\varepsilon, \varepsilon)$ is clearly a weak solution, and so is $(-\varepsilon, \varepsilon)$. The first of these corresponds to the solution map $\mathrm{SOLVE}_{++}(\varepsilon) = \varepsilon$ and the second to $\mathrm{SOLVE}_{- -}(\varepsilon) = -\varepsilon$, so these two weak solutions are strong solutions as well.

For both of the examples above we ask: are all weak solutions in fact strong solutions? The slightly surprising answer is “No”, and to understand what they then are, we will dive into Kurtz’ alternatives in the next section.

Kurtz’ alternatives

We follow the treatment by Thomas Kurtz4. For a structural equation given by $f$ and with $\nu$ a probability measure on $\mathbb{R}^p$, we let $\mathcal{S}_{f,\nu}$ denote the set of probability measures, $\mu$, on $\mathbb{R}^p \times \mathbb{R}^p$ satisfying

  • $\mu(\mathbb{R}^p \times B) = \nu(B)$ for all $B \subseteq \mathbb{R}^p$ (the second marginal of $\mu$ is $\nu$).
  • $x = f(x, e)$ for $\mu$-almost every $(x, e) \in \mathbb{R}^p \times \mathbb{R}^p$.

An element $\mu \in \mathcal{S}_{f,\nu}$ is called a joint solution measure. We see that the pair $(X, \varepsilon)$ of random variables is a weak solution to the structural equation (2) if and only if their joint distribution $\mu$ belongs to $\mathcal{S}_{f,\nu}$. It is thus clear that $\mathcal{S}_{f,\nu} \neq \emptyset$ if and only if there exists a weak solution $(X, \varepsilon)$ to the structural equation.

We say that the structural equation has a weakly unique solution if all weak solutions $(X, \varepsilon)$ have the same distribution of $X$. This is equivalent to saying that all $\mu \in \mathcal{S}_{f,\nu}$ have the same first marginal.

Recall that a weak solution $(X, \varepsilon)$ is a strong solution if there exists a map $$\mathrm{SOLVE} : \mathbb{R}^p \to \mathbb{R}^p$$ such that $X = \mathrm{SOLVE}(\varepsilon)$ almost surely. Combining Lemma 2.6, Corollary 2.8 and Proposition 2.10 by Kurtz4 we get the following result:

Theorem (Kurtz' alternatives). Suppose that $\mathcal{S}_{f,\nu} \neq \emptyset$.
  • Either the structural equation has a weakly unique solution $(X, \varepsilon)$, in which case there exists a map $\mathrm{SOLVE} : \mathbb{R}^p \to \mathbb{R}^p$ such that $$X = \mathrm{SOLVE}(\varepsilon) \quad \text{almost surely}$$ and all weak solutions are strong solutions.
  • Or the structural equation has (at least) two weak solutions $(X, \varepsilon)$ and $(X’, \varepsilon’)$ with $X$ and $X'$ having different distributions (weak uniqueness does not hold), in which case there exists a weak solution that is not a strong solution.

Proposition 2.10 by Kurtz4 also gives that $\mathcal{S}_{f,\nu}$ is a singleton set if and only if the structural equation has a weakly unique solution. Another way of phrasing Kurtz’ alternatives is therefore as follows:

  • Either $\mathcal{S}_{f,\nu} = \{\mu\}$ contains a single element, in which case $$\mu(A \times B) = \nu(\mathrm{SOLVE}^{-1}(A) \cap B)$$ for a map $\mathrm{SOLVE} : \mathbb{R}^p \to \mathbb{R}^p$.
  • Or $\mathcal{S}_{f,\nu}$ contains at least two elements, in which case there exists a $\mu \in \mathcal{S}_{f,\nu}$ and a weak solution $(X, \varepsilon)$, with joint distribution $\mu$, that is not a strong solution.

The proof of Kurtz’ alternatives is not difficult. It is mostly an exercise in measure theory and conditional distributions. We refer to Kurtz’ paper4 for details.

In terms of our second example above, we clearly do not have weak uniqueness. One of the (strong) solutions is concentrated on $(0,\infty)^4$ and the other on $(-\infty, 0)^4$. An explicit example of a third (weak) solution is $$X = Z \cdot \mathrm{SOLVE}_{++}(\varepsilon) + (1 - Z) \cdot \mathrm{SOLVE}_{- -}(\varepsilon)$$ where $\mathbb{P}(Z = 1) = \mathbb{P}(Z = 0) = 0.5$ and $Z$ is independent of $\varepsilon$. This solution cannot be expressed as a function of $\varepsilon$ alone since it depends on the additional random variable $Z$. It is thus an example of a weak solution that is not a strong solution. Its distribution is the mixture $$\mu = 0.5 \cdot \mu_{++} + 0.5 \cdot \mu_{- -}$$ where $\mu_{++}$ and $\mu_{- -}$ are the distributions of $\mathrm{SOLVE}_{++}(\varepsilon)$ and $\mathrm{SOLVE}_{- -}(\varepsilon)$, respectively.

Does it matter?

A notable technical consequence of Kurtz’ alternatives is that we can establish existence of a (unique) strong solution by showing existence of a weakly unique solution. This has been useful in the context of more general stochastic equations, such as stochastic differential equations, which are also treated in Kurtz’ papers4 5. We will return to this topic in a future post where we will discuss the Yamada-Watanabe-Engelbert theorem in greater detail.

For structural causal models, one consequence of Kurtz’ alternatives is that if we don’t have weak uniqueness, then there exists a weak solution $(X, \varepsilon)$ such that the distribution of $X$ is not the pushforward of the distribution $\nu$ of $\varepsilon$. The solution relies on additional randomness not encoded explicitly by the structural equation. For the SIS steady-state equation, this implicit randomness could be the initial condition $I_0$ of the SIS integral equation. Thus it may in specific cases be possible to interpret the source of the additional randomness, which can then be used to pinpoint which of multiple solutions is relevant in a given context.

We will in a future post unfold the implications of non-uniqueness of solutions to structural equations and the existence of weak solutions that are not strong solutions. In particular, we will discuss the consequences for the definitions of interventional and counterfactual distributions, which may not be uniquely defined in terms of the variables and expressions that define the equation itself.


  1. Input data is often called parameters, but this is confusing in the context we consider, where input data can be initial conditions and random variables, like $(I_0, \varepsilon)$ for the SIS model, and where the model might have additional parameters. ↩︎

  2. It is unimportant for the general theory that we suppose $e \in \mathbb{R}^p$ and that $f_i$ only depends on $e_i$. This does, however, align with standard practice, in particular when the “errors” entering into $f$ are assumed independent. ↩︎

  3. S. Bongers, P. Forré, J. Peters, J. M. Mooij. Foundations of structural causal models with cycles and latent variables. Ann. Statist. 49(5): 2885-2915 (2021). DOI: 10.1214/21-AOS2064 ↩︎

  4. T. Kurtz. The Yamada-Watanabe-Engelbert theorem for general stochastic equations and inequalities. Electronic Journal of Probability, 12, 951–965 (2007). ↩︎

  5. T. Kurtz. Weak and Strong Solutions of General Stochastic Models. Electronic Communications in Probability, 19, 1-16, (2014). ↩︎

Niels Richard Hansen
Niels Richard Hansen
Professor of Computational Statistics

Works on causal structure learning to achieve explainable, robust and transportable AI.