Skip to main content

Miscellanea

Simple Probabilities In R

R has functions to compute probabilities based on most common distributions

If XX is a random variable with a known distribution, then R can typically compute values of the cumulative distribution function or:

F(x)=P[Xx]F(x)=P[X \leq x]

Examples

Example

If XBin(n,p)X \sim Bin(n,p) has binomial distribution, i.e.

P(X=x)=(nx)px(1p)nx,P(X = x) = \displaystyle{n \choose x}p^x(1-p)^{n-x},

then cumulative probabilities can be computed with pbinom\verb|pbinom|, e.g.

pbinom(5,10,0.5)

gives

P[X5]=0.623P[X \leq 5] = 0.623

where

XBin(n=10,p=12)X \sim Bin(n=10,p= \displaystyle\frac{1}{2})

This can also be computed by hand. Here we have n=10n=10, p=1/2p=1/2 and the probability P[X5]P[X \leq 5] is obtained by adding up the individual probabilities, P[X=0]+P[X=1]+P[X=2]+P[X=3]+P[X=4]+P[X=5]P[X =0]+P[X =1]+P[X =2]+P[X =3]+P[X =4]+P[X =5]

P[X5]=x=05(10x)12x1210xP[X \leq 5] =\displaystyle\sum_{x=0}^5 \displaystyle{10\choose x} \displaystyle\frac{1}{2}^x\displaystyle\frac{1}{2}^{10-x}

This becomes

P[X5]=(100)12012100+(101)12112101+(101)12212102+(103)12312103+(104)12412104+(105)12512105P[X \leq 5] = \displaystyle{10 \choose 0} \displaystyle\frac{1}{2}^0\displaystyle\frac{1}{2}^{10-0} + \displaystyle{10 \choose 1} \displaystyle\frac{1}{2}^1\displaystyle\frac{1}{2}^{10-1} + \displaystyle{10 \choose 1} \displaystyle\frac{1}{2}^2\displaystyle\frac{1}{2}^{10-2} + \displaystyle{10 \choose 3} \displaystyle\frac{1}{2}^3\displaystyle\frac{1}{2}^{10-3} + \displaystyle{10 \choose 4} \displaystyle\frac{1}{2}^4\displaystyle\frac{1}{2}^{10-4} + \displaystyle{10 \choose 5} \displaystyle\frac{1}{2}^5\displaystyle\frac{1}{2}^{10-5}

or

P[X5]=(100)1210+(101)1210+(101)1210+(103)1210+(104)1210+(105)1210=1210(1+10+45+)P[X \leq 5] = \displaystyle{10 \choose 0} \displaystyle\frac{1}{2}^{10} + \displaystyle{10 \choose 1} \displaystyle\frac{1}{2}^{10} + \displaystyle{10 \choose 1} \displaystyle\frac{1}{2}^{10} + \displaystyle{10 \choose 3} \displaystyle\frac{1}{2}^{10} + \displaystyle{10 \choose 4} \displaystyle\frac{1}{2}^{10} + \displaystyle{10 \choose 5} \displaystyle\frac{1}{2}^{10}=\displaystyle\frac{1}{2}^{10} \left(1+10+45+\dots \right)

Furthermore,

> pbinom(10,10,0.5)
[1] 1

and

> pbinom(0,10,0.5)
[1] 0.0009765625

It is sometimes of interest to compute P[X=x]P[X=x] in this case, and this is given by the dbinom function, e.g.

> dbinom(1,10,0.5)
[1] 0.009765625

or 101024\displaystyle\frac{10}{1024}

Example

Suppose XX has a uniform distribution between 0 and 1, i.e. XUnf(0,1)X \sim Unf(0,1). Then the punifpunif function will return probabilities of the form

P[Xx]=xf(t)dt=0xf(t)dtP[X \leq x]= \int_{-\infty}^{x} f(t)dt= \int_{0}^{x} f(t)dt

where f(t)=1f(t)=1 if 0t10 \leq t \leq 1 and f(t)=0f(t)=0. For example:

> punif(0.75)
[1] 0.75

To obtain P[aXb],P[a \leq X \leq b], we use punifpunif twice, e.g.

> punif(0.75)-punif(0.25)
[1] 0.5

Computing Normal Probabilities In R

To compute probabilities XN(μ,σ2)X\sim N(\mu,\sigma^2) is usually transformed, since we know that

Z:=Xμσ(0,1)Z:=\displaystyle\frac{X-\mu}{\sigma} \sim(0,1)

The probabilities can then be computed for either XX or ZZ with the pnorm function in R.

Details

Suppose XX has a normal distribution with mean μ\mu and variance

XN(μ,σ2)X\sim N(\mu,\sigma^2)

then to compute probabilities, XX is usually transformed, since we know that

Z=Xμσ(0,1)Z=\displaystyle\frac{X-\mu}{\sigma} \sim(0,1)

and the probabilities can be computed for either XX or ZZ with the pnorm function.

Examples

Example

If ZN(0,1)Z \sim N(0,1) then we can e.g. obtain P[Z1.96]P[Z\leq1.96] with

> pnorm(1.96)
[1] 0.9750021

> pnorm(0)
[1] 0.5

> pnorm(1.96)-pnorm(1.96)
[1] 0

> pnorm(1.96)-pnorm(-1.96)
[1] 0.9500042

The last one gives the area between -1.96 and 1.96.

Example

If XN(42,32)X \sim N(42,3^2) then we can compute probabilities either by transforming

P[Xx]=P[Xμσxμσ]=P[Zxμσ]\begin{aligned} P[X\leq x] &= P\left[\displaystyle\frac{X-\mu}{\sigma} \leq \displaystyle\frac{x-\mu}{\sigma}\right] \\ &= P\left[Z \leq \displaystyle\frac{x-\mu}{\sigma}\right] \end{aligned}

and calling pnorm with the computed value z=xμσz=\displaystyle\frac{x-\mu}{\sigma}, or call pnorm with xx and specify μ\mu and σ\sigma.

To compute P[X48]P[X\leq 48], either set z=(4842)/3=2z=(48-42)/3=2 and obtain

> pnorm(2)
[1] 0.9772499

or specify μ\mu and σ\sigma

> pnorm(42,42,3)
[1] 0.5

Introduction to Hypothesis Testing

Details

If we have a random sample x1,,xnx_1, \ldots, x_n from a normal distribution, then we consider them to be outcomes of independent random variables X1,,XnX_1, \ldots, X_n where XiN(μ,σ2)X_i \sim N(\mu, \sigma^2). Typically, μ\mu and σ2\sigma^2 are unknown but assume for now that σ2\sigma^2 is known

Consider the hypothesis

H0:μ=μ0 vs. H1:μ>μ0H_0: \mu = \mu_0 \text{ vs. } H_1: \mu > \mu_0

where

μ0\mu_0

is a specified number.

Under the assumption of independence, the sample mean

x=1ni=1nxi\overline{x} = \displaystyle\frac{1}{n}\displaystyle\sum^n_{i=1}x_i

is also an observation from a normal distribution, with mean μ\mu but a smaller variance.Specifically, x\overline{x} is the outcome of

X=1ni=1nXi\overline{X} = \displaystyle\frac{1}{n}\displaystyle\sum^n_{i=1}X_i

and

XN(μ,σ2n)X \sim N(\mu, \displaystyle\frac{ \sigma^2}{n})

so the standard deviation of XX is σn\displaystyle\frac{\sigma}{\sqrt{n}}, so the appropriate error measure for x\overline{x} is σn\displaystyle\frac{\sigma}{\sqrt{n}}, when σ\sigma is unknown.

If H0H_0 is true, then

z:=xμ0σ/nz:= \displaystyle\frac{\overline{x}-\mu_0}{\sigma / \sqrt{n}}

is an observation from an nN(0,1)n \sim N (0,1) distribution, i.e. an outcome of

Z=Xμ0σ/nZ= \displaystyle\frac{\overline{X}-\mu_0}{\sigma / \sqrt{n}}

where ZN(0,1)Z \sim N(0,1) when H0H_0 is correct. It follows that e.g. P[Z>1.96]=0.05P[\vert Z \vert > 1.96] = 0.05 and if we observe Z>1.96\vert Z \vert > 1.96 then we reject the null hypothesis.

Note that the value z=1.96z^\ast = 1.96 is a quantile of the normal distribution and we can obtain other quantiles with the pnorm function, e.g. pnorm gives 1.961.96.