World Library  
Flag as Inappropriate
Email this Article

Pinsker's inequality

Article Id: WHEBN0020064401
Reproduction Date:

Title: Pinsker's inequality  
Author: World Heritage Encyclopedia
Language: English
Subject: Catalog of articles in probability theory, Information theory, List of statistics articles
Collection: Information Theory, Probabilistic Inequalities
Publisher: World Heritage Encyclopedia

Pinsker's inequality

In information theory, Pinsker's inequality, named after its inventor Mark Semenovich Pinsker, is an inequality that bounds the total variation distance (or statistical distance) in terms of the Kullback–Leibler divergence. The inequality is tight up to constant factors.[1]


  • Formal statement 1
  • History 2
  • Inverse problem 3
  • References 4
  • Additional reading 5

Formal statement

Pinsker's inequality states that, if P and Q are two probability distributions, then

\delta(P,Q) \le \sqrt{\frac{1}{2} D_{\mathrm{KL}}(P\|Q)}


\delta(P,Q)=\sup \{ |P(A) - Q(A)| : A\text{ is an event to which probabilities are assigned.} \}

is the total variation distance (or statistical distance) between P and Q and

D_{\mathrm{KL}}(P\|Q) = \sum_i \ln\left(\frac{P(i)}{Q(i)}\right) P(i)\!

is the Kullback–Leibler divergence in nats.


Pinsker first proved the inequality with a worse constant. The inequality in the above form was proved independently by Kullback, Csiszár, and Kemperman.[2]

Inverse problem

An inverse of the inequality cannot hold: for every \epsilon > 0, there are distributions with \delta(P,Q)\le\epsilon but D_{\mathrm{KL}}(P\|Q) = \infty.[3]


  1. ^ Csiszár, Imre; Körner, János (2011). Information Theory: Coding Theorems for Discrete Memoryless Systems. Cambridge University Press. p. 44.  
  2. ^ Tsybakov, Alexandre (2009). Introduction to Nonparametric Estimation. Springer. p. 132.  
  3. ^ The divergence becomes infinite whenever one of the two distributions assigns probability zero to an event while the other assigns it a nonzero probability (no matter how small); see e.g. Basu, Mitra; Ho, Tin Kam (2006). Data Complexity in Pattern Recognition. Springer. p. 161.  .

Additional reading

  • Thomas M. Cover and Joy A. Thomas: Elements of Information Theory, 2nd edition, Willey-Interscience, 2006
  • Nicolo Cesa-Bianchi and Gábor Lugosi: Prediction, Learning, and Games, Cambridge University Press, 2006
This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.

Copyright © World Library Foundation. All rights reserved. eBooks from Project Gutenberg are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.