#jsDisabledContent { display:none; } My Account |  Register |  Help

# Cramér-Rao

Article Id: WHEBN0023573375
Reproduction Date:

 Title: Cramér-Rao Author: World Heritage Encyclopedia Language: English Subject: Gaussian function Collection: Publisher: World Heritage Encyclopedia Publication Date:

### Cramér-Rao

In estimation theory and statistics, the Cramér–Rao bound (CRB) or Cramér–Rao lower bound (CRLB), named in honor of Harald Cramér and Calyampudi Radhakrishna Rao who were among the first to derive it,[1][2][3] expresses a lower bound on the variance of estimators of a deterministic parameter. The bound is also known as the Cramér–Rao inequality or the information inequality.

In its simplest form, the bound states that the variance of any unbiased estimator is at least as high as the inverse of the Fisher information. An unbiased estimator which achieves this lower bound is said to be (fully) efficient. Such a solution achieves the lowest possible mean squared error among all unbiased methods, and is therefore the minimum variance unbiased (MVU) estimator. However, in some cases, no unbiased technique exists which achieves the bound. This may occur even when an MVU estimator exists.

The Cramér–Rao bound can also be used to bound the variance of biased estimators of given bias. In some cases, a biased approach can result in both a variance and a mean squared error that are below the unbiased Cramér–Rao lower bound; see estimator bias.

## Statement

The Cramér–Rao bound is stated in this section for several increasingly general cases, beginning with the case in which the parameter is a scalar and its estimator is unbiased. All versions of the bound require certain regularity conditions, which hold for most well-behaved distributions. These conditions are listed later in this section.

### Scalar unbiased case

Suppose $\theta$ is an unknown deterministic parameter which is to be estimated from measurements $x$, distributed according to some probability density function $f\left(x;\theta\right)$. The variance of any unbiased estimator $\hat\left\{\theta\right\}$ of $\theta$ is then bounded by the reciprocal of the Fisher information $I\left(\theta\right)$:

$\mathrm\left\{var\right\}\left(\hat\left\{\theta\right\}\right)$

\geq \frac{1}{I(\theta)} where the Fisher information $I\left(\theta\right)$ is defined by



I(\theta) = \mathrm{E}

\left[
\left(
\frac{\partial \ell(x;\theta)}{\partial\theta}
\right)^2
\right] = -\mathrm{E}\left[ \frac{\partial^2 \ell(x;\theta)}{\partial\theta^2} \right]


and $\ell\left(x;\theta\right)=\log f\left(x;\theta\right)$ is the natural logarithm of the likelihood function and $\mathrm\left\{E\right\}$ denotes the expected value (over $x$).

The efficiency of an unbiased estimator $\hat\left\{\theta\right\}$ measures how close this estimator's variance comes to this lower bound; estimator efficiency is defined as

$e\left(\hat\left\{\theta\right\}\right) = \frac\left\{I\left(\theta\right)^\left\{-1\right\}\right\}\left\left(\boldsymbol\left\{T\right\}\left(X\right)\right\right) \geq \frac$
{\partial \boldsymbol{\psi} \left(\boldsymbol{\theta}\right)}
{\partial \boldsymbol{\theta}}


[I\left(\boldsymbol{\theta}\right)]^{-1} \left(

\frac
{\partial \boldsymbol{\psi}\left(\boldsymbol{\theta}\right)}
{\partial \boldsymbol{\theta}}


\right)^T where

• The matrix inequality $A \ge B$ is understood to mean that the matrix $A-B$ is positive semidefinite, and
• $\partial \boldsymbol\left\{\psi\right\}\left(\boldsymbol\left\{\theta\right\}\right)/\partial \boldsymbol\left\{\theta\right\}$ is the Jacobian matrix whose $ij$th element is given by $\partial \psi_i\left(\boldsymbol\left\{\theta\right\}\right)/\partial \theta_j$.

If $\boldsymbol\left\{T\right\}\left(X\right)$ is an unbiased estimator of $\boldsymbol\left\{\theta\right\}$ (i.e., $\boldsymbol\left\{\psi\right\}\left\left(\boldsymbol\left\{\theta\right\}\right\right) = \boldsymbol\left\{\theta\right\}$), then the Cramér–Rao bound reduces to



\mathrm{cov}_{\boldsymbol{\theta}}\left(\boldsymbol{T}(X)\right) \geq I\left(\boldsymbol{\theta}\right)^{-1}.

If it is inconvenient to compute the inverse of the Fisher information matrix, then one can simply take the reciprocal of the corresponding diagonal element to find a (possibly loose) lower bound (For the Bayesian case, see eqn. (11) of Bobrovsky, Mayer-Wolf, Zakai, "Some classes of global Cramer-Rao bounds", Ann. Stats., 15(4):1421-38, 1987).



\mathrm{var}_{\boldsymbol{\theta}}\left(T_m(X)\right) = \left[\mathrm{cov}_{\boldsymbol{\theta}}\left(\boldsymbol{T}(X)\right)\right]_{mm} \geq \left[I\left(\boldsymbol{\theta}\right)^{-1}\right]_{mm} \geq \left(\left[I\left(\boldsymbol{\theta}\right)\right]_{mm}\right)^{-1}.

### Regularity conditions

The bound relies on two weak regularity conditions on the probability density function, $f\left(x; \theta\right)$, and the estimator $T\left(X\right)$:

• The Fisher information is always defined; equivalently, for all $x$ such that $f\left(x; \theta\right) > 0$,
$\frac\left\{\partial\right\}\left\{\partial\theta\right\} \log f\left(x;\theta\right)$
exists, and is finite.
• The operations of integration with respect to $x$ and differentiation with respect to $\theta$ can be interchanged in the expectation of $T$; that is,

\frac{\partial}{\partial\theta}
\left[
\int T(x) f(x;\theta) \,dx
\right]
=
\int T(x)
\left[
\frac{\partial}{\partial\theta} f(x;\theta)
\right]
\,dx


whenever the right-hand side is finite.
This condition can often be confirmed by using the fact that integration and differentiation can be swapped when either of the following cases hold:
1. The function $f\left(x;\theta\right)$ has bounded support in $x$, and the bounds do not depend on $\theta$;
2. The function $f\left(x;\theta\right)$ has infinite support, is continuously differentiable, and the integral converges uniformly for all $\theta$.

### Simplified form of the Fisher information

Suppose, in addition, that the operations of integration and differentiation can be swapped for the second derivative of $f\left(x;\theta\right)$ as well, i.e.,

$\frac\left\{\partial^2\right\}\left\{\partial\theta^2\right\}$
\left[
\int T(x) f(x;\theta) \,dx
\right]
=
\int T(x)
\left[
\frac{\partial^2}{\partial\theta^2} f(x;\theta)
\right]
\,dx.


In this case, it can be shown that the Fisher information equals



I(\theta) =

-\mathrm{E}
\left[
\frac{\partial^2}{\partial\theta^2} \log f(X;\theta)
\right].


The Cramèr–Rao bound can then be written as



\mathrm{var} \left(\widehat{\theta}\right) \geq \frac{1}{I(\theta)} = \frac{1} {

-\mathrm{E}
\left[
\frac{\partial^2}{\partial\theta^2} \log f(X;\theta)
\right]


}. In some cases, this formula gives a more convenient technique for evaluating the bound.

## Single-parameter proof

The following is a proof of the general scalar case of the Cramér–Rao bound, which was described above; namely, that if the expectation of $T$ is denoted by $\psi \left(\theta\right)$, then, for all $\theta$,

$\left\{\rm var\right\}\left(t\left(X\right)\right) \geq \frac\left\{\partial \theta_k\right\} + \frac\left\{1\right\}\left\{2\right\} \mathrm\left\{tr\right\} \left\left($
{\boldsymbol C}^{-1}
\frac{\partial {\boldsymbol C}}{\partial \theta_m}
{\boldsymbol C}^{-1}
\frac{\partial {\boldsymbol C}}{\partial \theta_k}


\right) where "tr" is the trace.

For example, let $w\left[n\right]$ be a sample of $N$ independent observations) with unknown mean $\theta$ and known variance $\sigma^2$

$w\left[n\right] \sim \mathbb\left\{N\right\}_N \left\left(\theta \left\{\boldsymbol 1\right\}, \sigma^2 \left\{\boldsymbol I\right\} \right\right).$

Then the Fisher information is a scalar given by



I(\theta) = \left(\frac{\partial\boldsymbol{\mu}(\theta)}{\partial\theta}\right)^T{\boldsymbol C}^{-1}\left(\frac{\partial\boldsymbol{\mu}(\theta)}{\partial\theta}\right) = \sum^N_{i=1}\frac{1}{\sigma^2} = \frac{N}{\sigma^2}, and so the Cramér–Rao bound is



\mathrm{var}\left(\hat \theta\right) \geq \frac{\sigma^2}{N}.

### Normal variance with known mean

Suppose X is a normally distributed random variable with known mean $\mu$ and unknown variance $\sigma^2$. Consider the following statistic:



T=\frac{\sum_{i=1}^n\left(X_i-\mu\right)^2}{n}.

Then T is unbiased for $\sigma^2$, as $E\left(T\right)=\sigma^2$. What is the variance of T?



\mathrm{var}(T) = \frac{\mathrm{var}(X-\mu)^2}{n}=\frac{1}{n} \left[ E\left\{(X-\mu)^4\right\}-\left(E\left\{(X-\mu)^2\right\}\right)^2 \right]

(the second equality follows directly from the definition of variance). The first term is the fourth moment about the mean and has value $3\left(\sigma^2\right)^2$; the second is the square of the variance, or $\left(\sigma^2\right)^2$. Thus

$\mathrm\left\{var\right\}\left(T\right)=\frac\left\{2\left(\sigma^2\right)^2\right\}\left\{n\right\}.$

Now, what is the Fisher information in the sample? Recall that the score V is defined as



V=\frac{\partial}{\partial\sigma^2}\log L(\sigma^2,X)

where $L$ is the likelihood function. Thus in this case,



V=\frac{\partial}{\partial\sigma^2}\log\left[\frac{1}{\sqrt{2\pi\sigma^2}}e^{-(X-\mu)^2/{2\sigma^2}}\right] =\frac{(X-\mu)^2}{2(\sigma^2)^2}-\frac{1}{2\sigma^2}

where the second equality is from elementary calculus. Thus, the information in a single observation is just minus the expectation of the derivative of V, or



I =-E\left(\frac{\partial V}{\partial\sigma^2}\right) =-E\left(-\frac{(X-\mu)^2}{(\sigma^2)^3}+\frac{1}{2(\sigma^2)^2}\right) =\frac{\sigma^2}{(\sigma^2)^3}-\frac{1}{2(\sigma^2)^2} =\frac{1}{2(\sigma^2)^2}.

Thus the information in a sample of $n$ independent observations is just $n$ times this, or $\frac\left\{n\right\}\left\{2\left(\sigma^2\right)^2\right\}.$

The Cramer Rao bound states that



\mathrm{var}(T)\geq\frac{1}{I}.

In this case, the inequality is saturated (equality is achieved), showing that the estimator is efficient.

However, we can achieve a lower mean squared error using a biased estimator. The estimator



T=\frac{\sum_{i=1}^n\left(X_i-\mu\right)^2}{n+2}.

obviously has a smaller variance, which is in fact

$\mathrm\left\{var\right\}\left(T\right)=\frac\left\{2n\left(\sigma^2\right)^2\right\}\left\{\left(n+2\right)^2\right\}.$

Its bias is

$\left\left(1-\frac\left\{n\right\}\left\{n+2\right\}\right\right)\sigma^2=\frac\left\{2\sigma^2\right\}\left\{n+2\right\}$

so its mean squared error is

$\mathrm\left\{MSE\right\}\left(T\right)=\left\left(\frac\left\{2n\right\}\left\{\left(n+2\right)^2\right\}+\frac\left\{4\right\}\left\{\left(n+2\right)^2\right\}\right\right)\left(\sigma^2\right)^2$

=\frac{2(\sigma^2)^2}{n+2}

which is clearly less than the Cramér–Rao bound found above.

When the mean is not known, the minimum mean squared error estimate of the variance of a sample from Gaussian distribution is achieved by dividing by n + 1, rather than n − 1 or n + 2.

## References and notes

• . Chapter 3.
• . Section 3.1.3.