World Library  
Flag as Inappropriate
Email this Article

Typical set

Article Id: WHEBN0000248717
Reproduction Date:

Title: Typical set  
Author: World Heritage Encyclopedia
Language: English
Subject: Asymptotic equipartition property, Sanov's theorem, Typical subspace, Quantum capacity, Entanglement distillation
Collection: Information Theory, Probability Theory
Publisher: World Heritage Encyclopedia

Typical set

In information theory, the typical set is a set of sequences whose probability is close to two raised to the negative power of the entropy of their source distribution. That this set has total probability close to one is a consequence of the asymptotic equipartition property (AEP) which is a kind of law of large numbers. The notion of typicality is only concerned with the probability of a sequence and not the actual sequence itself.

This has great use in compression theory as it provides a theoretical means for compressing data, allowing us to represent any sequence Xn using nH(X) bits on average, and, hence, justifying the use of entropy as a measure of information from a source.

The AEP can also be proven for a large class of stationary ergodic processes, allowing typical set to be defined in more general cases.


  • (Weakly) typical sequences (weak typicality, entropy typicality) 1
  • Strongly typical sequences (strong typicality, letter typicality) 2
  • Jointly typical sequences 3
  • Applications of typicality 4
    • Typical set encoding 4.1
    • Typical set decoding 4.2
    • Universal null-hypothesis testing 4.3
    • Universal channel code 4.4
  • See also 5
  • References 6

(Weakly) typical sequences (weak typicality, entropy typicality)

If a sequence x1, ..., xn is drawn from an i.i.d. distribution X defined over a finite alphabet \mathcal{X}, then the typical set, Aε(n)\in\mathcal{X}(n) is defined as those sequences which satisfy:

2^{-n[H(X)+\varepsilon]} \leqslant p(x_1, x_2, \dots , x_n) \leqslant 2^{-n[H(X)-\varepsilon]}


H(X) = - \sum_{y \isin \mathcal{X}}p(y)\log_2 p(y)

is the information entropy of X. The probability above need only be within a factor of 2nε.

It has the following properties if n is sufficiently large, \epsilon>0 can be chosen arbitrarily small so that:

  1. The probability of a sequence from X being drawn from Aε(n) is greater than 1 − ε, i.e. Pr[x^{(n)} \in A_\epsilon^{(n)}] \geq 1 - \epsilon
  2. \left| {A_\varepsilon}^{(n)} \right| \leqslant 2^{n(H(X)+\varepsilon)}
  3. \left| {A_\varepsilon}^{(n)} \right| \geqslant (1-\varepsilon)2^{n(H(X)-\varepsilon)}
  4. Most sequences are not typical. If the distribution over \mathcal{X} is not uniform, then the fraction of sequences that are typical is
\frac{|A_\epsilon^{(n)}|}{|\mathcal{X}^{(n)}|} \equiv \frac{2^{nH(X)}}{2^{n\log|\mathcal{X}|}} = 2^{-n(\log|\mathcal{X}|-H(X))} \rightarrow 0
as n becomes very large, since H(X) < \log|\mathcal{X}|.

For a general stochastic process {X(t)} with AEP, the (weakly) typical set can be defined similarly with p(x1x2, ..., xn) replaced by p(x0τ) (i.e. the probability of the sample limited to the time interval [0, τ]), n being the degree of freedom of the process in the time interval and H(X) being the entropy rate. If the process is continuous-valued, differential entropy is used instead.

Counter-intuitively, most likely sequence is often not a member of the typical set. For example, suppose that X is an i.i.d Bernoulli random variable with p(0)=0.1 and p(1)=0.9. In n independent trials, since p(1)>p(0), the most likely sequence of outcome is the sequence of all 1's, (1,1,...,1). Here the entropy of X is H(X)=0.469, while -\frac{1}{n}\log p(x^{(n)}=(1,1,\ldots,1)) = -\frac{1}{n}\log (0.9^n) = 0.152

So this sequence is not in the typical set because its average logarithmic probability cannot come arbitrarily close to the entropy of the random variable X no matter how large we take the value of n. For Bernoulli random variables, the typical set consists of sequences with average numbers of 0s and 1s in n independent trials. For this example, if n=10, then the typical set consist of all sequences that has a single 0 in the entire sequence. In case p(0)=p(1)=0.5, then every possible binary sequences belong to the typical set.

Strongly typical sequences (strong typicality, letter typicality)

If a sequence x1, ..., xn is drawn from some specified joint distribution defined over a finite or an infinite alphabet \mathcal{X}, then the strongly typical set, Aε,strong(n)\in\mathcal{X} is defined as the set of sequences which satisfy

\left|\frac{N(x_i)}{n}-p(x_i)\right| < \frac{\varepsilon}{\|\mathcal{X}\|}.

where {N(x_i)} is the number of occurrences of a specific symbol in the sequence.

It can be shown that strongly typical sequences are also weakly typical (with a different constant ε), and hence the name. The two forms, however, are not equivalent. Strong typicality is often easier to work with in proving theorems for memoryless channels. However, as is apparent from the definition, this form of typicality is only defined for random variables having finite support.

Jointly typical sequences

Two sequences x^n and y^n are jointly ε-typical if the pair (x^n,y^n) is ε-typical with respect to the joint distribution p(x^n,y^n)=\prod_{i=1}^n p(x_i,y_i) and both x^n and y^n are ε-typical with respect to their marginal distributions p(x^n) and p(y^n). The set of all such pairs of sequences (x^n,y^n) is denoted by A_{\varepsilon}^n(X,Y). Jointly ε-typical n-tuple sequences are defined similarly.

Let \tilde{X}^n and \tilde{Y}^n be two independent sequences of random variables with the same marginal distributions p(x^n) and p(y^n). Then for any ε>0, for sufficiently large n, jointly typical sequences satisfy the following properties:

  1. P\left[ (X^n,Y^n) \in A_{\varepsilon}^n(X,Y) \right] \geqslant 1 - \epsilon
  2. \left| A_{\varepsilon}^n(X,Y) \right| \leqslant 2^{n (H(X,Y) + \epsilon)}
  3. \left| A_{\varepsilon}^n(X,Y) \right| \geqslant (1 - \epsilon) 2^{n (H(X,Y) - \epsilon)}
  4. P\left[ (\tilde{X}^n,\tilde{Y}^n) \in A_{\varepsilon}^n(X,Y) \right] \leqslant 2^{-n (I(X;Y) - 3 \epsilon)}
  5. P\left[ (\tilde{X}^n,\tilde{Y}^n) \in A_{\varepsilon}^n(X,Y) \right] \geqslant (1 - \epsilon) 2^{-n (I(X;Y) + 3 \epsilon)}

Applications of typicality

Typical set encoding

In information theory, typical set encoding encodes only the typical set of a stochastic source with fixed length block codes. Asymptotically, it is, by the AEP, lossless and achieves the minimum rate equal to the entropy rate of the source.

Typical set decoding

In information theory, typical set decoding is used in conjunction with random coding to estimate the transmitted message as the one with a codeword that is jointly ε-typical with the observation. i.e.

\hat{w}=w \iff (\exists w)( (x_1^n(w),y_1^n)\in A_{\varepsilon}^n(X,Y))

where \hat{w},x_1^n(w),y_1^n are the message estimate, codeword of message w and the observation respectively. A_{\varepsilon}^n(X,Y) is defined with respect to the joint distribution p(x_1^n)p(y_1^n|x_1^n) where p(y_1^n|x_1^n) is the transition probability that characterizes the channel statistics, and p(x_1^n) is some input distribution used to generate the codewords in the random codebook.

Universal null-hypothesis testing

Universal channel code

See also


  • C. E. Shannon, "A Mathematical Theory of Communication", Bell System Technical Journal, vol. 27, pp. 379–423, 623-656, July, October, 1948
  • Cover, Thomas M. (2006). "Chapter 3: Asymptotic Equipartition Property, Chapter 5: Data Compression, Chapter 8: Channel Capacity". Elements of Information Theory. John Wiley & Sons.  
  • David J. C. MacKay. Information Theory, Inference, and Learning Algorithms Cambridge: Cambridge University Press, 2003. ISBN 0-521-64298-1
This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.

Copyright © World Library Foundation. All rights reserved. eBooks from Project Gutenberg are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.