Skip to main content

Estimation, Estimates and Estimators

Ordinary Least Squares for a Single Mean

If μ\mu is unknown and xi,,xnx_i,\ldots,x_n are data, we can estimate μ\mu by finding

minμi=1n(xiμ)2\min_{\mu} \displaystyle\sum_{i=1}^{n}(x_i-\mu)^2

In this case the resulting estimate is simply

μ=x\mu = \overline{x}

and can easily be derived by setting the derivative to zero.

Examples

Example

Consider the numbers x1,,x5x_1, \ldots, x_5 to be

13,7,4,16 and 913,7,4,16 \textrm{ and } 9

We can plot (xiμ)2\displaystyle\sum(x_i-\mu)^2 vs. μ\mu and find the minimum.

Maximum Likelihood Estimation

If (Y1,,Yn)\left (Y_1, \ldots, Y_n\right )' is a random vector from a density fθf_{\theta} where θ\theta is an unknown parameter, and y\mathbf{y} is a vector of observations then we define the likelihood function to be

Ly(θ)=fθ(y)L_{\mathbf{y}}(\theta)=f_{\theta}(y)

Examples

Example

If x1,,xnx_1,\ldots,x_n are assumed to be observations of independent random variables with a normal distributions and mean of μ\mu and variance of σ2\sigma^2, then the joint density is

f(x1)f(x2)f(xn)f(x_1)\cdot f(x_2)\cdot\ldots\cdot f(x_n)

=12πσe(x1μ)22σ212πσe(xnμ)22σ2= \displaystyle\frac{1}{\sqrt{2\pi}\sigma}e^{-\displaystyle\frac{(x_1-\mu)^2}{2\sigma^2}} \cdot \ldots\cdot \displaystyle\frac{1}{\sqrt{2\pi}\sigma}e^{-\displaystyle\frac{(x_n-\mu)^2}{2\sigma^2}}

=Πi=1n12πσe(xiμ)22σ2=\Pi_{i=1}^n \displaystyle\frac{1}{\sqrt{2\pi}\sigma}e^{-\displaystyle\frac{(x_i-\mu)^2}{2\sigma^2}}

=1(2π)n/2σne12σ2i=1N(xiμ)2=\displaystyle\frac{1}{(2\pi)^{n/2}\sigma^n}e^{-\displaystyle\frac{1}{2\sigma^2}\displaystyle\sum_{i=1}^N(x_i-\mu)^2}

and if we assume σ2\sigma^2 is known then the likelihood function is

L(μ)=1(2π)n/2σne12σ2Σi=1N(xiμ)2L(\mu)=\displaystyle\frac{1}{(2\pi)^{n/2}\sigma^n}e^{-\displaystyle\frac{1}{2\sigma^2}\Sigma_{i=1}^N(x_i-\mu)^2}

Maximizing this is done by maximizing the log, i.e. finding the μ\mu for which:

ddμlnL(μ)=0,\displaystyle\frac{d}{d\mu}\ln L(\mu)=0,

which again results in the estimate

μ^=x\hat{\mu}=\overline{x}

Detail

Definition

If (Y1,,Yn)\left (Y_1, \ldots, Y_n\right )' is a random vector from a density fθf_{\theta} where θ\theta is an unknown parameter, and y\mathbf{y} is a vector of observations then we define the likelihood function to be

Ly(θ)=fθ(y)L_{\mathbf{y}}(\theta)=f_{\theta}(y)

Ordinary Least Squares

Consider the regression problem where we fit a line through (xi,yi)(x_i,y_i) pairs with x1,,xnx_1, \ldots, x_n fixed numbers but where yiy_i is measured with error.

Fig. 36

Figure: Regression line through data pairs.

Details

The ordinary least squares (OLS) estimates of the parameters α\alpha and β\beta in the model yi=α+βxi+ϵiy_i=\alpha + \beta x_i + \epsilon_i are obtained by minimizing the sum of squares

i(yi(α+βxi))2\displaystyle\sum_i \left ( y_i -(\alpha +\beta x_i) \right )^2

a=ybxb=i=1n(xix)(yiy)i=1n(xix)2\begin{aligned} a &= \overline{y} - b\overline{x} \\ \\ b &= \displaystyle\frac{\displaystyle\sum^n_{i=1} (x_i-\overline{x})(y_i-\overline{y})}{\displaystyle\sum^n_{i=1} (x_i-\overline{x})^2} \end{aligned}

Random Variables and Outcomes

Details

Recall that X1,,XnX_1, \ldots, X_n are random varibles (reflecting the population distribution) and x1,,xnx_1, \ldots, x_n are numerical outcomes of these distributions. We use upper case letters to denote random variables and lower case letters to denote outcome or data.

Examples

Example

Let the mean of a population be zero and the σ=4\sigma=4. Then draw three samples from this population with size, nn, either 44, 1616 or 6464. The sample mean Xˉ\bar{X} will have a distribution with mean zero and standard deviation of σn\displaystyle\frac{\sigma}{\sqrt{n}} where n=4n= 4, 1616 or 6464.

Estimators and Estimates

In OLS regression, note that the values of aa and bb:

a=ybxa = \overline{y} - b \overline{x}

b=Σi=1n(xix)(yiy)Σi=1n(xix)2b = \displaystyle\frac{\Sigma_{i=1}^{n} (x_i - \overline{x}) (y_i - \overline{y})}{\Sigma_{i=1}^{n} (x_i - \overline{x})^2}

are outcomes of random variables e.g. bb is the outcome of

β^=Σi=1n(xix)(YiY)Σi=1n(xix)2\hat{\beta} = \displaystyle\frac{\Sigma_{i=1}^{n} (x_i - \overline{x}) (Y_i - \overline{Y})}{\Sigma_{i=1}^{n} (x_i - \overline{x})^2}

the estimator which has some distribution.

Fig. 37

Figure: Shows an example of the distribution of the estimator β^\hat{\beta}

Details

The following R commands can be used to generate a distribution for the estimator β^\hat{\beta}

library(MASS)
nsim <- 1000
betahat <- NULL
for (i in 1:nsim) {
n <- 20
x <- seq(1:n) # Fixed x vector
y <- 2 + 0.4*x + rnorm(n, 0, 1)
xbar <- mean(x)
ybar <- mean(y)
b <- sum((x-xbar)*(y-ybar))/sum((x-xbar)^2)
a <- ybar - b * xbar
betahat <- c(betahat, b)
}
truehist(betahat)