Skip to main content

The Central Limit Theorem and Related Topics

The Central Limit Theorem

If measurements are obtained independently and come from a process with finite variance, then the distribution of their mean tends towards a Gaussian (normal) distribution as the sample size increases.

Fig. 30

Figure: The standard normal density

Details

The Central Limit Theorem states that if X1,X2,X_1, X_2, \ldots are independent and identically distributed random variables with mean μ\mu and (finite) variance σ2\sigma^2, then the distribution of Xˉn:=X1++Xnn\bar{X}_n:= \displaystyle\frac{X_1+\dots+X_n}{n} tends towards a normal distribution. It follows that for a large enough sample size nn, the distribution random variable Xˉn\bar{X}_n can be approximated by N(μ,σ2/n)N(\mu,\sigma^2/n).

The standard normal distribution is given by the p.d.f.:

φ(z)=12πez22\varphi(z) = \displaystyle\frac{1}{\sqrt{2\pi}} e^{\displaystyle\frac{-z^2}{2}}

for zRz\in \mathbb{R}

The standard normal distribution has an expected value of zero:

μ=zφ(z)dz=0\mu =\displaystyle\int z\varphi (z)dz =0

and a variance of:

σ2=(zμ)2φ(z)dz=1\sigma^2 =\displaystyle\int ({z-\mu})^2 \varphi(z)dz=1

If a random variable ZZ has the standard normal (or Gaussian) distribution, we write ZN(0,1)Z\sim N(0,1).

If we define a new random variable, YY, by writing Y=σZ+μY=\sigma Z + \mu, then YY has an expected value of μ\mu, a variance of σ2\sigma^2 and a density (p.d.f.) given by the formula:

f(y)=12πσ e(yμ)22σ2f(y) = \displaystyle\frac{1}{\sqrt{2\pi}\sigma} \ e^{\displaystyle\frac{-(y-\mu)^2}{2\sigma^2}}

This is general normal (or Gaussian) density, with mean μ\mu and variance σ2\sigma^2.

The Central Limit Theorem states that if you take the mean of several independent random variables, the distribution of that mean will look more and more like a Gaussian distribution (if the variance of the original random variables is finite).

More precisely, the cumulative distribution function of:

Xˉnμσ/n\displaystyle\frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}}

converges to Φ\Phi, the N(0,1)N(0,1) cumulative distribution function.

Examples

Example

If we collect measurements on waiting times, these are typically assumed to come from an exponential distribution with density

f(t)=λeλt, for t>0f(t)=\lambda e^{-\lambda t},\textrm{ for } t>0

The Central Limit Theorem states that the mean of several such waiting times will tend to have a normal distribution.

Example

We are often interested in computing:

w=xˉμ0snw=\displaystyle\frac{\bar{x}-\mu_0}{\displaystyle\frac{s}{\sqrt{n}}}

which comes from a TT distribution (see below), if the xix_i are independent outcomes from a normal distribution.

However, if nn is large and σ2\sigma^2 is finite then ww values will look as though they came from a normal distribution. This is in part a consequence of the Central Limit Theorem, but also of the fact that ss will become close to σ\sigma as nn increases.

Properties of the Binomial and Poisson Distributions

The binomial distribution is really a sum of 00 and 11 values (counts of failures =0= 0 and successes =1=1). So, a simple, single binomial outcome will correspond to coming from a normal distribution if the count is large enough.

Details

Consider the binomial probabilities:

p(x)=(nx)px(1p)nxp(x)=\displaystyle\binom{n}{x}p^x(1-p)^{n-x}

for x=0,1,2,3,,nx=0,1,2,3, \cdots,n where nn is a non-negative integer. Suppose pp is a small positive number, specifically consider a sequence of decreasing pp -values, specified with pn=λnp_n= \displaystyle\frac{\lambda}{n} and consider the behavior of the probability as nn \rightarrow \infty. We obtain:

(nx)pnx(1pn)nx=n!x!(nx!)(λn)x(1λn)nx=N(n1)(n2)(nx+1)x!λnx(1λn)x(1λn)n=N(n1)(n2)(nx+1)x!nxλx(1λn)x(1λn)n\begin{aligned} \displaystyle\binom{n}{x}p_n^x(1-p_n)^{n-x} &= \displaystyle\frac{n!}{x!(n-x!)} \left ( \displaystyle\frac{\lambda}{n} \right )^x \left ( 1-\displaystyle\frac{\lambda}{n} \right )^{n-x} \\ &= \displaystyle\frac{N(n-1)(n-2)\cdots (n-x+1)}{x!} \displaystyle\frac{\displaystyle\frac{\lambda}{n}^x}{\left ( 1-\displaystyle\frac{\lambda}{n} \right ) ^x} \left ( 1-\displaystyle\frac{\lambda}{n} \right )^n \\ &= \displaystyle\frac{N(n-1)(n-2)\cdots (n-x+1)}{x!n^x} \displaystyle\frac{\lambda^x}{\left ( 1-\displaystyle\frac{\lambda}{n} \right ) ^x} \left ( 1-\displaystyle\frac{\lambda}{n} \right )^n \end{aligned}
Note

Notice that N(n1)(n2)(nx+1)nx1\displaystyle\frac{N(n-1)(n-2)\cdots (n-x+1)}{n^x}\to 1 as nn\to\infty. Also notice that (1λn)x1(1-\displaystyle\frac{\lambda}{n})^x\to 1 as nn\to\infty. Also

limn(1λn)=eλ\lim_{n \to \infty} \bigg( 1-\displaystyle\frac{\lambda}{n} \bigg) = e^{- \lambda}

and it follows that

limn(nx)pnx(1pn)nx=eλλxx!,x=0,1,2,,n\lim_{n \to \infty} \displaystyle\binom{n}{x}p_n^x(1-p_n)^{n-x} = \displaystyle\frac{e^{- \lambda} \lambda^x}{x!}, x= 0,1,2, \cdots, n

and hence the binomial probabilities may be approximated with the corresponding Poisson.

Examples

Example

The mean of a binomial (n,p) variable is μ=np\mu=n\cdot p and the variance is σ2=np(1p)\sigma^2=np(1-p).

The R command dbinom(q,n,p) calculates the probability of q successes in n trials assuming that the probability of a success is p in each trial (binomial distribution), and the R command pbinom(q,n,p) calculates the probability of obtaining q or fewer successes in n trials.

The normal approximation of this distribution can be calculated with pnorm(q,mu,sigma) which becomes pnorm(q,np,sqrt(np(1-p)). Three numerical examples (note that pbinom and pnorm give similar values for large n):

> pbinom(3,10,0.2)
[1] 0.8791261

> pnorm(3,10*0.2,sqrt(10*0.2*(1-0.2)))
[1] 0.7854023
> pbinom(3,20,0.2)
[1] 0.4114489

> pnorm(3,20*0.2,sqrt(20*0.2*(1-0.2)))
[1] 0.2880751
> pbinom(30,200,0.2)
[1] 0.04302156

> pnorm(30,200*0.2,sqrt(200*0.2*(1-0.2)))
[1] 0.03854994
Example

We are often interested in computing w=xˉμs/nw=\displaystyle\frac{\bar{x}-\mu}{s/\sqrt{n}}, which has a TT distribution if the xix_i are independent outcomes from a normal distribution. If nn is large and σ2\sigma^2 is finite, this will look as if it comes from a normal distribution.

The numerical examples below demonstrate how the TT distribution approaches the normal distribution.

> qnorm(0.7)
[1] 0.5244005 #This is the value which gives the cumulative probability of p=0.7 for a n~(0,1)

> qt(0.7,2)
[1] 0.6172134 #The value, which gives the cumulative probability of p=0.7 with n=2 for the t distribution.

> qt(0.7,5)
[1] 0.5594296

> qt(0.7,10)
[1] 0.541528

> qt(0.7,20)
[1] 0.5328628

> qt(0.7,100)
[1] 0.5260763

Monte Carlo Simulation

If we know an underlying process we can simulate data from the process and evaluate the distribution of any quantity based on such data.

Fig. 31

Figure: A simulated set of tt-values based on data from an exponential distribution.

Examples

Example

Suppose our measurements come from an exponential distribution and we want to compute

t=xμs/nt = \displaystyle\frac{\overline x - \mu}{s / \sqrt{n}}

but we want to know the distribution of those when μ\mu is the true mean.

For instance, n=5n=5 and μ=1\mu = 1, we can simulate (repeatedly) x1,,x5x_1, \ldots, x_5 and compute a tt -value for each. The following R commands can be used for this:

library(MASS)
n <-5
mu <-1
lambda <-1
tvec <-NULL
for(sim in 1:10000) {
x <-rexp(n,lambda)
xbar <-mean(x)
s <-sd(x)
t <-(xbar-mu)/(s/sqrt(n))
tvec <- c(tvec,t)
}

truehist(tvec) # truehist gives a better histogram

Show values at certain positions in the vector by uing:

> sort(tvec)[9750]
[1] 1.698656

> sort(tvec)[250]
[1] -6.775726