#jsDisabledContent { display:none; } My Account |  Register |  Help

# Law of total variance

Article Id: WHEBN0000312408
Reproduction Date:

 Title: Law of total variance Author: World Heritage Encyclopedia Language: English Subject: Collection: Publisher: World Heritage Encyclopedia Publication Date:

### Law of total variance

In probability theory, the law of total variance[1] or variance decomposition formula, also known as Eve's law, states that if X and Y are random variables on the same probability space, and the variance of Y is finite, then

\operatorname{Var}[Y]=\operatorname{E}(\operatorname{Var}[Y\mid X])+\operatorname{Var}(\operatorname{E}[Y\mid X]).\,

Some writers on probability call this the "conditional variance formula". In language perhaps better known to statisticians than to probabilists, the two terms are the "unexplained" and the "explained" components of the variance (cf. fraction of variance unexplained, explained variation).

There is a general variance decomposition formula for c ≥ 2 components (see below).[2] For example, with two conditioning random variables:

\operatorname{Var}[Y]=\operatorname{E}(\operatorname{Var}[Y\mid X_1 , X_2])+ \operatorname{E}(\operatorname{Var}[\operatorname{E}[Y\mid X_1,X_2]\mid X_1]) + \operatorname{Var}(\operatorname{E}[Y\mid X_1]),\,

which follows from the law of total conditional variance:[2]

\operatorname{Var}[Y\mid X_1]=\operatorname{E}(\operatorname{Var}[Y\mid X_1,X_2]\mid X_1)+\operatorname{Var}(\operatorname{E}[Y\mid X_1,X_2]\mid X_1).\,

Note that the conditional expected value E( Y | X ) is a random variable in its own right, whose value depends on the value of X. Notice that the conditional expected value of Y given the event X = x is a function of x (this is where adherence to the conventional and rigidly case-sensitive notation of probability theory becomes important!). If we write E( Y | X = x ) = g(x) then the random variable E( Y | X ) is just g(X). Similar comments apply to the conditional variance.

## Contents

• Proof 1
• General variance decomposition applicable to dynamic systems 2
• The square of the correlation and explained (or informational) variation 3
• Higher moments 4
• References 6

## Proof

The law of total variance can be proved using the law of total expectation.[3] First,

\operatorname{Var}[Y] = \operatorname{E}[Y^2] - [\operatorname{E}[Y]]^2

from the definition of variance. Then we apply the law of total expectation to each term by conditioning on the random variable X:

= \operatorname{E}_{X}\!\left[\operatorname{E}[Y^2\mid X]\right] - [\operatorname{E}_{X}[\operatorname{E}[Y\mid X]]]^2

Now we rewrite the conditional second moment of Y in terms of its variance and first moment:

= \operatorname{E}_{X}\!\left[\operatorname{Var}[Y\mid X] + [\operatorname{E}[Y\mid X]]^2\right] - [\operatorname{E}_{X}[\operatorname{E}[Y\mid X]]]^2

Since the expectation of a sum is the sum of expectations, the terms can now be regrouped:

= \operatorname{E}_{X}[\operatorname{Var}[Y\mid X]] + \left(\operatorname{E}_{X}[\operatorname{E}[Y\mid X]]^2 - [\operatorname{E}_{X}[\operatorname{E}[Y\mid X]]]^2\right)

Finally, we recognize the terms in parentheses as the variance of the conditional expectation E[Y|X]:

= \operatorname{E}_{X}[\operatorname{Var}[Y\mid X]] + \operatorname{Var}_{X}[\operatorname{E}[Y\mid X]]

## General variance decomposition applicable to dynamic systems

The following formula shows how to apply the general, measure theoretic variance decomposition formula [2] to stochastic dynamic systems. Let Y(t) be the value of a system variable at time t. Suppose we have the internal histories (natural filtrations) H_{1t},H_{2t},\ldots,H_{c-1,t}, each one corresponding to the history (trajectory) of a different collection of system variables. The collections need not be disjoint. The variance of Y(t) can be decomposed, for all times t, into c ≥ 2 components as follows:

\operatorname{Var}[Y(t)] = \operatorname{E}(\operatorname{Var}[Y(t)\mid H_{1t},H_{2t},\ldots,H_{c-1,t}])+ \sum_{j=2}^{c-1}\operatorname{E}(\operatorname{Var}[\operatorname{E}[Y(t)\mid H_{1t},H_{2t},\ldots,H_{jt}]\mid H_{1t},H_{2t},\ldots,H_{j-1,t}])+ \operatorname{Var}(\operatorname{E}[Y(t)\mid H_{1t}]).\,

The decomposition is not unique. It depends on the order of the conditioning in the sequential decomposition.

## The square of the correlation and explained (or informational) variation

In cases where (YX) are such that the conditional expected value is linear; i.e., in cases where

\operatorname{E}(Y \mid X)=aX+b,\,

it follows from the bilinearity of Cov(-,-) that

a={\operatorname{Cov}(Y,X) \over \operatorname{Var}(X)}

and

b=\operatorname{E}(Y)-{\operatorname{Cov}(Y,X)\over \operatorname{Var}(X)} \operatorname{E}(X)

and the explained component of the variance divided by the total variance is just the square of the correlation between Y and X; i.e., in such cases,

{\operatorname{Var}(\operatorname{E}(Y\mid X)) \over \operatorname{Var}(Y)} = \operatorname{Corr}(X,Y)^2.\,

One example of this situation is when (X, Y) have a bivariate normal (Gaussian) distribution.

More generally, when the conditional expectation E( Y | X ) is a non-linear function of X

\iota_{Y\mid X} = {\operatorname{Var}(\operatorname{E}(Y\mid X)) \over \operatorname{Var}(Y)} = \operatorname{Corr}(\operatorname{E}(Y\mid X),Y)^2,\, [2]

which can be estimated as the R squared from a non-linear regression of Y on X, using data drawn from the joint distribution of (X,Y). When E( Y | X ) has a Gaussian distribution (and is an invertible function of X), or Y itself has a (marginal) Gaussian distribution, this explained component of variation sets a lower bound on the mutual information:[2]

\operatorname{I}(Y;X) \ge \ln ([1 - \iota_{Y\mid X}]^{-1/2}).\,

## Higher moments

A similar law for the third central moment μ3 says

\mu_3(Y)=\operatorname{E}(\mu_3(Y\mid X))+\mu_3(\operatorname{E}(Y\mid X)) +3\,\operatorname{cov}(\operatorname{E}(Y\mid X),\operatorname{var}(Y\mid X)).\,

For higher cumulants, a simple and elegant generalization exists. See law of total cumulance.

## References

• [4]
1. ^ Neil A. Weiss, A Course in Probability, Addison–Wesley, 2005, pages 385–386.
2. ^ a b c d e Bowsher, C.G. and P.S. Swain, Proc Natl Acad Sci USA, 2012: 109, E1320–29.
3. ^ Neil A. Weiss, A Course in Probability, Addison–Wesley, 2005, pages 380–383.
4. ^ Prof. Blitzstein, Joe. "Stat 110 Final Review (Eve's Law)". http://stat110.net/. Harvard University, Department of Statistics. Retrieved 9 July 2014.
• Billingsley, Patrick (1995). Probability and Measure. New York, NY: John Wiley & Sons, Inc. (Problem 34.10(b))
This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and USA.gov, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for USA.gov and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.

Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.