World Library  
Flag as Inappropriate
Email this Article

Sample mean

Article Id: WHEBN0000060166
Reproduction Date:

Title: Sample mean  
Author: World Heritage Encyclopedia
Language: English
Subject: Statistic
Collection:
Publisher: World Heritage Encyclopedia
Publication
Date:
 

Sample mean

The sample mean or empirical mean and the sample covariance are statistics computed from a collection of data on one or more random variables. The sample mean is a vector each of whose elements is the sample mean of one of the random variables – that is, each of whose elements is the arithmetic average of the observed values of one of the variables. The sample covariance matrix is a square matrix whose i, j element is the sample covariance (an estimate of the population covariance) between the sets of observed values of two of the variables and whose i, i element is the sample variance of the observed values of one of the variables. If only one variable has had values observed, then the sample mean is a single number (the arithmetic average of the observed values of that variable) and the sample covariance matrix is also simply a single value (the sample variance of the observed values of that variable).

Sample mean

Main article: Arithmetic mean

Let x_{ij} be the ith independently drawn observation (i=1,...,N) on the jth random variable (j=1,...,K). These observations can be arranged into N column vectors, each with K entries, with the K ×1 column vector giving the ith observations of all variables being denoted \mathbf{x}_i (i=1,...,N).

The sample mean vector \mathbf{\bar{x}} is a column vector whose jth element \bar{x}_{j} is the average value of the N observations of the jth variable:

\bar{x}_{j}=\frac{1}{N}\sum_{i=1}^{N}x_{ij},\quad j=1,\ldots,K.

Thus, the sample mean vector contains the average of the observations for each variable, and is written

\mathbf{\bar{x}}=\frac{1}{N}\sum_{i=1}^{N}\mathbf{x}_i.

Sample covariance

The sample covariance matrix is a K-by-K matrix \textstyle \mathbf{Q}=\left[ q_{jk}\right] with entries

q_{jk}=\frac{1}{N-1}\sum_{i=1}^{N}\left( x_{ij}-\bar{x}_j \right) \left( x_{ik}-\bar{x}_k \right),

where q_{jk} is an estimate of the covariance between the jth variable and the kth variable of the population underlying the data. In terms of the observation vectors, the sample covariance is

\mathbf{Q} = {1 \over {N-1}}\sum_{i=1}^N (\mathbf{x}_i-\mathbf{\bar{x}}) (\mathbf{x}_i-\mathbf{\bar{x}})^\mathrm{T},

Alternatively, arranging the observation vectors as the columns of a matrix, so that

\mathbf{F} = \begin{bmatrix}\mathbf{x}_1 & \mathbf{x}_2 & \dots & \mathbf{x}_N \end{bmatrix},

which is a matrix of K rows and N columns. Here, the sample covariance matrix can be computed as

\mathbf{Q} = \frac{1}{N-1}( \mathbf{F} - \mathbf{\bar{x}} \,\mathbf{1}_N^\mathrm{T} ) ( \mathbf{F} - \mathbf{\bar{x}} \,\mathbf{1}_N^\mathrm{T} )^\mathrm{T},

where \mathbf{1}_N is an N by 1 vector of ones. If the observations are arranged as rows instead of columns, so \mathbf{\bar{x}} is now a 1×K row vector and \mathbf{M}=\mathbf{F}^\mathrm{T} is an N×K matrix whose column j is the vector of N observations on variable j, then applying transposes in the appropriate places yields

\mathbf{Q} = \frac{1}{N-1}( \mathbf{M} - \mathbf{1}_N \mathbf{\bar{x}} )^\mathrm{T} ( \mathbf{M} - \mathbf{1}_N \mathbf{\bar{x}} ).

Discussion

The sample mean and the sample covariance matrix are unbiased estimates of the mean and the covariance matrix of the random vector \textstyle \mathbf{X}, a row vector whose jth element (j = 1, ..., K) is one of the random variables.[1] The sample covariance matrix has \textstyle N-1 in the denominator rather than \textstyle N due to a variant of Bessel's correction: In short, the sample covariance relies on the difference between each observation and the sample mean, but the sample mean is slightly correlated with each observation since it's defined in terms of all observations. If the population mean \operatorname{E}(\mathbf{X}) is known, the analogous unbiased estimate

q_{jk}=\frac{1}{N}\sum_{i=1}^N \left( x_{ij}-\operatorname{E}(X_j)\right) \left( x_{ik}-\operatorname{E}(X_k)\right),

using the population mean, has \textstyle N in the denominator. This is an example of why in probability and statistics it is essential to distinguish between random variables (upper case letters) and realizations of the random variables (lower case letters).

The maximum likelihood estimate of the covariance

q_{jk}=\frac{1}{N}\sum_{i=1}^N \left( x_{ij}-\bar{x}_j \right) \left( x_{ik}-\bar{x}_k \right)

for the Gaussian distribution case has N in the denominator as well. The ratio of 1/N to 1/(N − 1) approaches 1 for large N, so the maximum likelihood estimate approximately equals the unbiased estimate when the sample is large.

Variance of the sample mean

For each random variable, the sample mean is a good estimator of the population mean, where a "good" estimator is defined as being efficient and unbiased. Of course the estimator will likely not be the true value of the population mean since different samples drawn from the same distribution will give different sample means and hence different estimates of the true mean. Thus the sample mean is a random variable, not a constant, and consequently has its own distribution. For a random sample of N observations on the jth random variable, the sample mean's distribution itself has mean equal to the population mean E(X_j) and variance equal to \frac{\sigma^2_j}{N}, where \sigma^2_j is the variance of the random variable Xj.

Weighted samples

Main article: Weighted mean

In a weighted sample, each vector \textstyle \textbf{x}_{i} (each set of single observations on each of the K random variables) is assigned a weight \textstyle w_i \geq0. Without loss of generality, assume that the weights are normalized:

\sum_{i=1}^{N}w_i = 1.

(If they are not, divide the weights by their sum). Then the weighted mean vector \textstyle \mathbf{\bar{x}} is given by

\mathbf{\bar{x}}=\sum_{i=1}^N w_i \mathbf{x}_i.

and the elements q_{jk} of the weighted covariance matrix \textstyle \mathbf{Q} are [2]

q_{jk}=\frac{\sum_{i=1}^{N}w_i}{\left(\sum_{i=1}^{N}w_i\right)^2-\sum_{i=1}^{N}w_i^2}

\sum_{i=1}^N w_i \left( x_{ij}-\bar{x}_j \right) \left( x_{ik}-\bar{x}_k \right) .

If all weights are the same, \textstyle w_{i}=1/N, the weighted mean and covariance reduce to the sample mean and covariance above.

Criticism

The sample mean and sample covariance are widely used in statistics and applications, and are extremely common measures of location and dispersion, respectively, likely the most common: they are easily calculated and possess desirable characteristics.

However, they suffer from certain drawbacks; notably, they are not robust statistics, meaning that they are sensitive to outliers. As robustness is often a desired trait, particularly in real-world applications, robust alternatives may prove desirable, notably quantile-based statistics such the sample median for location,[3] and interquartile range (IQR) for dispersion. Other alternatives include trimming and Winsorising, as in the trimmed mean and the Winsorized mean.

See also

References

This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and USA.gov, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for USA.gov and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
 
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
 
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.
 


Copyright © World Library Foundation. All rights reserved. eBooks from Project Gutenberg are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.