Skip to main content

The Multivariate Normal Distribution and Related Topics

Transformations of Random Variables

Recall that if XX is a vector of continuous random variables with a joint probability density function and if Y=h(X)Y=h(X) such that hh is a one-to-one function and continuously differentiable with inverse gg so X=g(Y)X= g(Y), then the density of YY is given by

fY(y)=f(g(y))Jf_Y(y)=f(g(y))|J|

Details

JJ is the Jacobian determinant of gg. In particular if Y=AXY=AX then

fY(y)=f(A1y)det(A1)f_Y(y)=f(A^{-1}y)|det(A^{-1})|

if AA has an inverse.

The Multivariate Normal Distribution

Details

Consider independent identically distributed random variables, Z1,,ZnN(0,1)Z_1, \ldots,Z_n \sim N(0,1),

Z=(Z1Zn)\underline{Z} = \left( \begin{array}{ccc} Z_1 \\ \vdots \\ Z_n \end{array} \right)

and let Y=AZ+μ\underline{Y}=A \underline{Z} + \underline{\mu} where AA is an invertible n×nn \times n matrix and μRn\underline{\mu} \in \mathbb{R}^n is a vector, so Z=A1(Yμ)Z= A^{-1}(Y-\underline{\mu}).

Then the p.d.f. of YY is given by

fY(y)=fZ(A1(yμ))det(A1)f_{\underline{Y}}(\underline{y})= f_{\underline{Z}}(A^{-1}(\underline{y}- \underline{\mu})) \vert \det(A^{-1}) \vert

But the joint p.d.f. of Z\underline{Z} is the product of the p.d.f.'s of Z1,,ZnZ_1, \ldots, Z_n, so fZ(z)=f(z1)f(z2)f(zn)f_{\underline{Z}}(\underline{z})= f(z_1) \cdot f(z_2) \cdot \ldots \cdot f(z_n) where

f(zi)=12πez22f(z_i) = \displaystyle\frac{1}{\sqrt{2 \pi}} e^{-\displaystyle\frac{z^2}{2}}

and hence

fZ(z)=i=1n12πez22=(12π)ne12Σi=1nzi2=1(2π)n2e12zz\begin{align} f_{\underline{Z}}(\underline{z}) &= \prod_{i=1}^n \displaystyle\frac{1}{\sqrt{2 \pi}} e^{\displaystyle\frac{-z^2}{2}} \\ &= \left( \displaystyle\frac{1}{\sqrt{2 \pi}} \right) ^n e^{-\displaystyle\frac{1}{2} \Sigma_{i=1}^n z_i^2} \\ &= \displaystyle\frac{1}{(2 \pi) ^ {\displaystyle\frac{n}{2}}} e^{-\displaystyle\frac{1}{2} \underline{z}'\underline{z}} \end{align}

since

i=1nzi2=z2=zz=zz\displaystyle\sum_{i=1}^n z_i^2 = \Vert \underline{z} \Vert ^2 = \underline{z} \cdot \underline{z} = \underline{z}' \underline{z}

The joint p.d.f. of Y\underline{Y} is therefore

fY(y)=fZ(A1(yμ))det(A1)=1(2π)n2e12(A1(yμ))(A1(yμ))1det(A)\begin{align} f_{\underline{Y}}(\underline{y}) &= f_{\underline{Z}}(A^{-1}(\underline{y} - \underline{\mu})) \vert \det(A^{-1}) \vert \\ &= \displaystyle\frac{1}{(2 \pi)^{\displaystyle\frac{n}{2}}} e^{-\displaystyle\frac{1}{2}(A^{-1}(\underline{y}-\underline{\mu}))'(A^{-1}(\underline{y}-\underline{\mu}))}\displaystyle\frac{1}{\vert \det(A)\vert} \end{align}

We can write det(AA)=det(A)2\det(AA')=det(A)^2 so det(A)=det(AA)\vert \det(A)\vert = \sqrt{det(AA')} and if we write Σ=AA\Sigma=AA', then

det(A)=Σ12\vert \det(A) \vert = \vert \boldsymbol{\Sigma} \vert ^ {\displaystyle\frac{1}{2}}

Also, note that

(A1(yμ))(A1(yμ))=(yμ)(A1)A1(yμ)=(yμ)Σ1(yμ)(A^{-1}(\underline{y}-\underline{\mu}))'(A^{-1}(\underline{y}-\underline{\mu})) = (\underline{y} - \underline{\mu})'(A^{-1})' A^{-1}(\underline{y} - \underline{\mu}) = (\underline{y} - \underline{\mu})' \boldsymbol{\Sigma}^{-1}(\underline{y}-\underline{\mu})

We can now write

fY(y)=1(2π)n2Σ12e12(yμ)Σ1(yμ)f_{\underline{Y}}(\underline{y}) = \displaystyle\frac{1}{(2 \pi)^{\displaystyle\frac{n}{2}} \vert \boldsymbol{\Sigma} \vert ^{\displaystyle\frac{1}{2}}} e^{-\displaystyle\frac{1}{2} (\underline{y}-\underline{\mu}) \boldsymbol{\Sigma}^{-1} (\underline{y}-\underline{\mu})}

This is the density of the multivariate normal distribution.

Note that

E[Y]=μE[\underline{Y}] = \mu

Var[Y]=Var[AZ]=AVar[Z]A=AIA=ΣVar[\underline{Y}] = Var[A\underline{Z}] = AVar[\underline{Z}]A' = AIA' = \boldsymbol{\Sigma}

Notation: YN(μ,Σ)\underline{Y}\sim N(\underline{\mu}, \boldsymbol{\Sigma})

Univariate Normal Transforms

The general univariate normal distribution with density

fY(y)=12πσe(yμ)22σ2f_Y(y) = \displaystyle\frac{1}{\sqrt{2\pi}\sigma}e^{-\displaystyle\frac{(y-\mu)^2}{2\sigma^2}}

is a special case of the multivariate version.

Details

Further, if ZN(0,1)Z\sim N(0,1), then clearly X=aZ+μN(μ,σ2)X=aZ+\mu \sim N(\mu,\sigma^2), where σ2=a2\sigma^2=a^2.

Transforms to Lower Dimensions

If YN(μ,Σ)Y\sim N \left ( \boldsymbol{\mu},\boldsymbol{\Sigma} \right ) is a random vector of length nn and AA is an m×nm\times n matrix of rank mnm\leq n, then AYN(Aμ,AΣA)AY \sim N(A\mu,A\Sigma A').

Details

If YN(μ,Σ)Y\sim N \left ( \boldsymbol{\mu},\boldsymbol{\Sigma} \right ) is a random vector of length nn and AA is an m×nm\times n matrix of rank mnm\leq n, then AYN(Aμ,AΣA)AY \sim N(A\mu,A\Sigma A'). To prove this, set up an (nm)×n(n-m)\times n matrix, BB, so that the n×nn\times n matrix, CC, formed from combining the rows of AA and BB is of full rank nn. Then it is easy to derive the density of CYCY which also factors nicely into a product, only one of which contains AYAY, which gives the density for AYAY.

The OLS Estimator

Suppose YN(Xβ,σ2I)Y \sim N(X \beta,\sigma^2 I). The ordinary least squares estimator, when the n×pn \times p matrix is of full rank, pp, where pnp\leq n, is:

β^=(XX)1XY\hat{\beta} = (X'X)^{-1}X'Y

The random variable which describes the process giving the data and estimate is:

b=(XX)1XYb = (X'X)^{-1}X'Y

It follows that

β^N(β,σ2(XX)1)\hat{\beta} \sim N(\beta,\sigma^{2}(X'X)^{-1})

Details

Suppose YN(Xβ,σ2I)Y \sim N(X \beta,\sigma^2I). The ordinary least squares estimator, when the n×pn \times p matrix is of full rank, pp, is:

β^=(XX)1XY\hat{\beta} = (X'X)^{-1}X'Y

The equation below is the random variable which describes the process giving the data and estimate:

b=(XX)1XYb = (X'X)^{-1}X'Y

If B=(XX)1XB = (X'X)^{-1}X', then we know that

BYN(BXβ,B(σ2I)B)BY \sim N(B X \beta, B(\sigma^{2}I)B')

Note that

BXβ=(XX)1XXβ=βBX\beta = (X'X)^{-1}X'X\beta=\beta

and

B(σ2I)B=σ(XX)1X[(XX)1X]=σ2(XX)1XX(XX)1=σ2(XX)1\begin{aligned} B(\sigma^{2}I)B' &= \sigma^{}(X'X)^{-1}X'[(X'X)^{-1}X']' \\ &= \sigma^{2}(X'X)^{-1}X'X(X'X)^{-1} \\ &= \sigma^{2}(X'X)^{-1} \end{aligned}

It follows that

β^N(β,σ2(XX)1)\hat{\beta} \sim N(\beta,\sigma^{2}(X'X)^{-1})

Note

The earlier results regarding the multivariate Gaussian distribution also show that the vector of parameter estimates will be Gaussian even if the original YY-variables are not independent.