#jsDisabledContent { display:none; } My Account |  Register |  Help
 Flag as Inappropriate This article will be permanently flagged as inappropriate and made unaccessible to everyone. Are you certain this article is inappropriate?          Excessive Violence          Sexual Content          Political / Social Email this Article Email Address:

# Additive smoothing

Article Id: WHEBN0017110513
Reproduction Date:

 Title: Additive smoothing Author: World Heritage Encyclopedia Language: English Subject: Collection: Publisher: World Heritage Encyclopedia Publication Date:

### Additive smoothing

In Lidstone smoothing, is a technique used to smooth categorical data. Given an observation x = (x1, …, xd) from a multinomial distribution with N trials and parameter vector θ = (θ1, …, θd), a "smoothed" version of the data gives the estimator:

\hat\theta_i= \frac{x_i + \alpha}{N + \alpha d} \qquad (i=1,\ldots,d),

where α > 0 is the smoothing parameter (α = 0 corresponds to no smoothing). Additive smoothing is a type of shrinkage estimator, as the resulting estimate will be between the empirical estimate xi / N, and the uniform probability 1/d. Using Laplace's rule of succession, some authors have argued that α should be 1 (in which case the term add-one smoothing[2][3] is also used), though in practice a smaller value is typically chosen.

From a Bayesian point of view, this corresponds to the expected value of the posterior distribution, using a symmetric Dirichlet distribution with parameter α as a prior. In the special case where the number of categories is 2, this is equivalent to using a Beta distribution as the conjugagate prior for the parameters of Binomial distribution.

## History

Laplace came up with this smoothing technique when he tried to estimate the chance that the sun will rise tomorrow. His rationale was that even given a large sample of days with the rising sun, we still can not be completely sure that the sun will still rise tomorrow (known as the sunrise problem).[4]

## Generalized to the case of known incidence rates

Often you are testing the bias of an unknown trial population against a control population with known parameters (incidence rates) μ = (μ1, …, μd). In this case the uniform probability 1/d should be replaced by the known incidence rate of the control population μi to calculate the smoothed estimator :

\hat\theta_i= \frac{x_i + \mu_i \alpha d }{N + \alpha d } \qquad (i=1,\ldots,d),

As a consistency check, if the empirical estimator happens to equal the incidence rate, i.e. μi = xi / N, the smoothed estimator is independent of α and also equals the incidence rate.

## Applications

### Classification

Additive smoothing is commonly a component of naive Bayes classifiers.

### Statistical language modelling

In a bag of words model of natural language processing and information retrieval, the data consists of the number of occurrences of each word in a document. Additive smoothing allows the assignment of non-zero probabilities to words which do not occur in the sample.

## References

1. ^ C.D. Manning, P. Raghavan and M. Schütze (2008). Introduction to Information Retrieval. Cambridge University Press, p. 260.
2. ^
3. ^
4. ^ Lecture 5 | Machine Learning (Stanford) at 1h10m into the lecture

Copyright © World Library Foundation. All rights reserved. eBooks from Project Gutenberg are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.