World Library  
Flag as Inappropriate
Email this Article

Jensen–Shannon divergence

Article Id: WHEBN0004019573
Reproduction Date:

Title: Jensen–Shannon divergence  
Author: World Heritage Encyclopedia
Language: English
Subject: List of statistics articles
Publisher: World Heritage Encyclopedia

Jensen–Shannon divergence

In probability theory and statistics, the JensenShannon divergence is a popular method of measuring the similarity between two probability distributions. It is also known as information radius (IRad)[1] or total divergence to the average.[2] It is based on the Kullback–Leibler divergence, with some notable (and useful) differences, including that it is symmetric and it is always a finite value. The square root of the Jensen–Shannon divergence is a metric often referred to as Jensen-Shannon distance.[3][4]


Consider the set M_+^1(A) of probability distributions where A is a set provided with some σ-algebra of measurable subsets. In particular we can take A to be a finite or countable set with all subsets being measurable.

The Jensen–Shannon divergence (JSD) M_+^1(A) \times M_+^1(A) \rightarrow [0,\infty{}) is a symmetrized and smoothed version of the Kullback–Leibler divergence D(P \parallel Q). It is defined by

{\rm JSD}(P \parallel Q)= \frac{1}{2}D(P \parallel M)+\frac{1}{2}D(Q \parallel M)

where M=\frac{1}{2}(P+Q)

A more general definition, allowing for the comparison of more than two probability distributions, is:

{\rm JSD}_{\pi_1, \ldots, \pi_n}(P_1, P_2, \ldots, P_n) = H\left(\sum_{i=1}^n \pi_i P_i\right) - \sum_{i=1}^n \pi_i H(P_i)

where \pi_1, \ldots, \pi_n are weights that are selected for the probability distributions P_1, P_2, \ldots, P_n and H(P) is the Shannon entropy for distribution P. For the two-distribution case described above,

P_1=P, P_2=Q, \pi_1 = \pi_2 = \frac{1}{2}.\


The Jensen–Shannon divergence is bounded by 1, given that one uses the base 2 logarithm.[5]

0 \leq {\rm JSD}( P \parallel Q ) \leq 1

For log base e, or ln, which is commonly used in statistical thermodynamics, the upper bound is ln(2):

0 \leq {\rm JSD}( P \parallel Q ) \leq \ln(2)

Relation to mutual information

The Jensen–Shannon divergence is the mutual information between a random variable X associated to a mixture distribution between P and Q and the binary indicator variable Z that is used to switch between P and Q to produce the mixture. Let X be some abstract function on the underlying set of events that discriminates well between events, and choose the value of X according to P if Z = 0 and according to Q if Z = 1. That is, we are choosing X according to the probability measure M=(P+Q)/2, and its distribution is the mixture distribution. We compute

\begin{align} I(X; Z) &= H(X) - H(X|Z)\\ &= -\sum M \log M + \frac{1}{2} \left[ \sum P \log P + \sum Q \log Q \right] \\ &= -\sum \frac{P}{2} \log M - \sum \frac{Q}{2} \log M + \frac{1}{2} \left[ \sum P \log P + \sum Q \log Q \right] \\ &= \frac{1}{2} \sum P \left( \log P - \log M\right ) + \frac{1}{2} \sum Q \left( \log Q - \log M \right) \\ &= {\rm JSD}(P \parallel Q) \end{align}

It follows from the above result that the Jensen–Shannon divergence is bounded by 0 and 1 because mutual information is non-negative and bounded by H(Z) = 1.

One can apply the same principle to a joint distribution and the product of its two marginal distribution (in analogy to Kullback–Leibler divergence and mutual information) and to measure how reliably one can decide if a given response comes from the joint distribution or the product distribution—subject to the assumption that these are the only two possibilities.[6]

Quantum Jensen–Shannon divergence

The generalization of probability distributions on density matrices allows to define quantum Jensen–Shannon divergence (QJSD).[7][8] It is defined for a set of density matrices (\rho_1,\ldots,\rho_n) and probability distribution \pi=(\pi_1,\ldots,\pi_n) as

{\rm QJSD}(\rho_1,\ldots,\rho_n)= S\left(\sum_{i=1}^n \pi_i \rho_i\right)-\sum_{i=1}^n \pi_i S(\rho_i)

where S(\pi_i) is the von Neumann entropy. This quantity was introduced in quantum information theory, where it is called the Holevo information: it gives the upper bound for amount of classical information encoded by the quantum states (\rho_1,\ldots,\rho_n) under the prior distribution \pi (see Holevo's theorem)[9] Quantum Jensen–Shannon divergence for \pi=\left(\frac{1}{2},\frac{1}{2}\right) and two density matrices is a symmetric function, everywhere defined, bounded and equal to zero only if two density matrices are the same. It is a square of a metric for pure states[10] but it is unknown whether the metric property holds in general.[8] The Bures metric is closely related to the quantum JS divergence; it is the quantum analog of the Fisher information metric.


The Jensen–Shannon divergence has been applied in bioinformatics and genome comparison,[11][12] in protein surface comparison,[13] and in the social sciences[14] and the quantitative study of history, [15] and in machine learning.[16]


  1. ^ Hinrich Schütze; Christopher D. Manning (1999). Foundations of Statistical Natural Language Processing. Cambridge, Mass: MIT Press. p. 304.  
  2. ^ Dagan, Ido; Lillian Lee; Fernando Pereira (1997). "Similarity-Based Methods For Word Sense Disambiguation". Proceedings of the Thirty-Fifth Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics: pp. 56–63.  
  3. ^ Endres, D. M.; J. E. Schindelin (2003). "A new metric for probability distributions". IEEE Trans. Inf. Theory 49 (7): pp. 1858–1860.  
  4. ^ Ôsterreicher, F.; I. Vajda (2003). "A new class of metric divergences on probability spaces and its statistical applications". Ann. Inst. Statist. Math. 55 (3): pp. 639–653.  
  5. ^ Lin, J. (1991). "Divergence measures based on the shannon entropy". IEEE Transactions on Information Theory 37 (1): 145–151.  
  6. ^ Schneidman, Elad; Bialek, W; Berry, M.J. 2nd (2003). "Synergy, Redundancy, and Independence in Population Codes". Journal of Neuroscience 23 (37): 11539–11553.  
  7. ^ Majtey, A.; Lamberti, P.; Prato, D. (2005). "Jensen-Shannon divergence as a measure of distinguishability between mixed quantum states". Physical Review A 72 (5).  
  8. ^ a b Briët, Jop; Harremoës, Peter (2009). "Properties of classical and quantum Jensen-Shannon divergence". Physical Review A 79 (5).  
  9. ^  . English translation: Probl. Inf. Transm., 9, 177–183 (1975)) MR 456936
  10. ^ Braunstein, Samuel; Caves, Carlton (1994). "Statistical distance and the geometry of quantum states". Physical Review Letters 72 (22): 3439–3443.  
  11. ^ Sims, GE; Jun, SR; Wu, GA; Kim, SH (2009). "Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions". Proceedings of the National Academy of Sciences of the United States of America 106 (8): 2677–82.  
  12. ^ Itzkovitz, S; Hodis, E; Segal, E (2010). "Overlapping codes within protein-coding sequences". Genome Research 20 (11): 1582–9.  
  13. ^ Ofran, Y; Rost, B (2003). "Analysing six types of protein-protein interfaces". Journal of Molecular Biology 325 (2): 377–87.  
  14. ^ DeDeo, Simon; Hawkins, Robert X. D.; Klingenstein, Sara; Hitchcock, Tim (2013). "Bootstrap Methods for the Empirical Study of Decision-Making and Information Flows in Social Systems". "Entropy" 15 (6): 2246–2276.  
  15. ^ Klingenstein, Sara; Hitchcock, Tim; DeDeo, Simon (2014). "The civilizing process in London’s Old Bailey". "Proceedings of the National Academy of Sciences" 111 (26): 9419–9424.  
  16. ^ Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio, "Generative Adversarial Networks", NIPS 2014.

Further reading

External links

  • Ruby gem for calculating JS divergence
  • Python code for calculating JS divergence
This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.

Copyright © World Library Foundation. All rights reserved. eBooks from Project Gutenberg are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.