 #jsDisabledContent { display:none; } My Account | Register | Help Flag as Inappropriate This article will be permanently flagged as inappropriate and made unaccessible to everyone. Are you certain this article is inappropriate?          Excessive Violence          Sexual Content          Political / Social Email this Article Email Address:

# Sanov's theorem

Article Id: WHEBN0031587409
Reproduction Date:

 Title: Sanov's theorem Author: World Heritage Encyclopedia Language: English Subject: Collection: Publisher: World Heritage Encyclopedia Publication Date:

### Sanov's theorem

In information theory, Sanov's theorem gives a bound on the probability of observing an atypical sequence of samples from a given probability distribution.

Let A be a set of probability distributions over an alphabet X, and let q be an arbitrary distribution over X (where q may or may not be in A). Suppose we draw n i.i.d. samples from q, represented by the vector x^n = x_1, x_2, \ldots, x_n. Further, let us ask that the empirical distribution, \hat{p}_{x^n}, of the samples falls within the set A—formally, we write \{x^n : \hat{p}_{x^n} \in A\}. Then,

q^n(x^n) \le (n+1)^{|X|} 2^{-nD_{\mathrm{KL}}(p^*||q)},

where

In words, the probability of drawing an atypical distribution is proportional to the KL distance from the true distribution to the atypical one; in the case that we consider a set of possible atypical distributions, there is a dominant atypical distribution, given by the information projection.

Furthermore, if A is a closed set,

\lim_{n\to\infty}\frac{1}{n}\log q^n(x^n) = -D_{\mathrm{KL}}(p^*||q).