Miscellanea Simple Probabilities In R R has functions to compute probabilities based on most common distributions
If X X X is a random variable with a known distribution, then R can typically compute values of the cumulative distribution function or:
F ( x ) = P [ X ≤ x ] F(x)=P[X \leq x] F ( x ) = P [ X ≤ x ]
Examples If X ∼ B i n ( n , p ) X \sim Bin(n,p) X ∼ B in ( n , p ) has binomial distribution, i.e.
P ( X = x ) = ( n x ) p x ( 1 − p ) n − x , P(X = x) = \displaystyle{n \choose x}p^x(1-p)^{n-x}, P ( X = x ) = ( x n ) p x ( 1 − p ) n − x ,
then cumulative probabilities can be computed with pbinom \verb|pbinom| pbinom , e.g.
gives
P [ X ≤ 5 ] = 0.623 P[X \leq 5] = 0.623 P [ X ≤ 5 ] = 0.623
where
X ∼ B i n ( n = 10 , p = 1 2 ) X \sim Bin(n=10,p= \displaystyle\frac{1}{2}) X ∼ B in ( n = 10 , p = 2 1 )
This can also be computed by hand.
Here we have n = 10 n=10 n = 10 , p = 1 / 2 p=1/2 p = 1/2 and the probability P [ X ≤ 5 ] P[X \leq 5] P [ X ≤ 5 ] is obtained by adding up the individual probabilities, P [ X = 0 ] + P [ X = 1 ] + P [ X = 2 ] + P [ X = 3 ] + P [ X = 4 ] + P [ X = 5 ] P[X =0]+P[X =1]+P[X =2]+P[X =3]+P[X =4]+P[X =5] P [ X = 0 ] + P [ X = 1 ] + P [ X = 2 ] + P [ X = 3 ] + P [ X = 4 ] + P [ X = 5 ]
P [ X ≤ 5 ] = ∑ x = 0 5 ( 10 x ) 1 2 x 1 2 10 − x P[X \leq 5] =\displaystyle\sum_{x=0}^5 \displaystyle{10\choose x} \displaystyle\frac{1}{2}^x\displaystyle\frac{1}{2}^{10-x} P [ X ≤ 5 ] = x = 0 ∑ 5 ( x 10 ) 2 1 x 2 1 10 − x
This becomes
P [ X ≤ 5 ] = ( 10 0 ) 1 2 0 1 2 10 − 0 + ( 10 1 ) 1 2 1 1 2 10 − 1 + ( 10 1 ) 1 2 2 1 2 10 − 2 + ( 10 3 ) 1 2 3 1 2 10 − 3 + ( 10 4 ) 1 2 4 1 2 10 − 4 + ( 10 5 ) 1 2 5 1 2 10 − 5 P[X \leq 5] = \displaystyle{10 \choose 0} \displaystyle\frac{1}{2}^0\displaystyle\frac{1}{2}^{10-0} + \displaystyle{10 \choose 1} \displaystyle\frac{1}{2}^1\displaystyle\frac{1}{2}^{10-1} + \displaystyle{10 \choose 1} \displaystyle\frac{1}{2}^2\displaystyle\frac{1}{2}^{10-2} + \displaystyle{10 \choose 3} \displaystyle\frac{1}{2}^3\displaystyle\frac{1}{2}^{10-3} + \displaystyle{10 \choose 4} \displaystyle\frac{1}{2}^4\displaystyle\frac{1}{2}^{10-4} + \displaystyle{10 \choose 5} \displaystyle\frac{1}{2}^5\displaystyle\frac{1}{2}^{10-5} P [ X ≤ 5 ] = ( 0 10 ) 2 1 0 2 1 10 − 0 + ( 1 10 ) 2 1 1 2 1 10 − 1 + ( 1 10 ) 2 1 2 2 1 10 − 2 + ( 3 10 ) 2 1 3 2 1 10 − 3 + ( 4 10 ) 2 1 4 2 1 10 − 4 + ( 5 10 ) 2 1 5 2 1 10 − 5
or
P [ X ≤ 5 ] = ( 10 0 ) 1 2 10 + ( 10 1 ) 1 2 10 + ( 10 1 ) 1 2 10 + ( 10 3 ) 1 2 10 + ( 10 4 ) 1 2 10 + ( 10 5 ) 1 2 10 = 1 2 10 ( 1 + 10 + 45 + … ) P[X \leq 5] = \displaystyle{10 \choose 0} \displaystyle\frac{1}{2}^{10} + \displaystyle{10 \choose 1} \displaystyle\frac{1}{2}^{10} + \displaystyle{10 \choose 1} \displaystyle\frac{1}{2}^{10} + \displaystyle{10 \choose 3} \displaystyle\frac{1}{2}^{10} + \displaystyle{10 \choose 4} \displaystyle\frac{1}{2}^{10} + \displaystyle{10 \choose 5} \displaystyle\frac{1}{2}^{10}=\displaystyle\frac{1}{2}^{10} \left(1+10+45+\dots \right) P [ X ≤ 5 ] = ( 0 10 ) 2 1 10 + ( 1 10 ) 2 1 10 + ( 1 10 ) 2 1 10 + ( 3 10 ) 2 1 10 + ( 4 10 ) 2 1 10 + ( 5 10 ) 2 1 10 = 2 1 10 ( 1 + 10 + 45 + … )
Furthermore,
and
> pbinom(0,10,0.5) [1] 0.0009765625
It is sometimes of interest to compute P [ X = x ] P[X=x] P [ X = x ] in this case, and this is given by the dbinom
function, e.g.
> dbinom(1,10,0.5) [1] 0.009765625
or 10 1024 \displaystyle\frac{10}{1024} 1024 10
Suppose X X X has a uniform distribution between 0
and 1
, i.e. X ∼ U n f ( 0 , 1 ) X \sim Unf(0,1) X ∼ U n f ( 0 , 1 ) .
Then the p u n i f punif p u ni f function will return probabilities of the form
P [ X ≤ x ] = ∫ − ∞ x f ( t ) d t = ∫ 0 x f ( t ) d t P[X \leq x]= \int_{-\infty}^{x} f(t)dt= \int_{0}^{x} f(t)dt P [ X ≤ x ] = ∫ − ∞ x f ( t ) d t = ∫ 0 x f ( t ) d t
where f ( t ) = 1 f(t)=1 f ( t ) = 1 if 0 ≤ t ≤ 1 0 \leq t \leq 1 0 ≤ t ≤ 1 and f ( t ) = 0 f(t)=0 f ( t ) = 0 .
For example:
To obtain P [ a ≤ X ≤ b ] , P[a \leq X \leq b], P [ a ≤ X ≤ b ] , we use p u n i f punif p u ni f twice, e.g.
> punif(0.75)-punif(0.25) [1] 0.5
Computing Normal Probabilities In R To compute probabilities X ∼ N ( μ , σ 2 ) X\sim N(\mu,\sigma^2) X ∼ N ( μ , σ 2 ) is usually transformed, since we know that
Z : = X − μ σ ∼ ( 0 , 1 ) Z:=\displaystyle\frac{X-\mu}{\sigma} \sim(0,1) Z := σ X − μ ∼ ( 0 , 1 )
The probabilities can then be computed for either X X X or Z Z Z with the pnorm
function in R.
Details Suppose X X X has a normal distribution with mean μ \mu μ and variance
X ∼ N ( μ , σ 2 ) X\sim N(\mu,\sigma^2) X ∼ N ( μ , σ 2 )
then to compute probabilities, X X X is usually transformed, since we know that
Z = X − μ σ ∼ ( 0 , 1 ) Z=\displaystyle\frac{X-\mu}{\sigma} \sim(0,1) Z = σ X − μ ∼ ( 0 , 1 )
and the probabilities can be computed for either X X X or Z Z Z with the pnorm
function.
Examples If Z ∼ N ( 0 , 1 ) Z \sim N(0,1) Z ∼ N ( 0 , 1 ) then we can e.g. obtain P [ Z ≤ 1.96 ] P[Z\leq1.96] P [ Z ≤ 1.96 ] with
> pnorm(1.96) [1] 0.9750021 > pnorm(0) [1] 0.5 > pnorm(1.96)-pnorm(1.96) [1] 0 > pnorm(1.96)-pnorm(-1.96) [1] 0.9500042
The last one gives the area between -1.96
and 1.96
.
If X ∼ N ( 42 , 3 2 ) X \sim N(42,3^2) X ∼ N ( 42 , 3 2 ) then we can compute probabilities either by transforming
P [ X ≤ x ] = P [ X − μ σ ≤ x − μ σ ] = P [ Z ≤ x − μ σ ] \begin{aligned} P[X\leq x] &= P\left[\displaystyle\frac{X-\mu}{\sigma} \leq \displaystyle\frac{x-\mu}{\sigma}\right] \\ &= P\left[Z \leq \displaystyle\frac{x-\mu}{\sigma}\right] \end{aligned} P [ X ≤ x ] = P [ σ X − μ ≤ σ x − μ ] = P [ Z ≤ σ x − μ ] and calling pnorm
with the computed value z = x − μ σ z=\displaystyle\frac{x-\mu}{\sigma} z = σ x − μ , or call pnorm
with x x x and specify μ \mu μ and σ \sigma σ .
To compute P [ X ≤ 48 ] P[X\leq 48] P [ X ≤ 48 ] , either set z = ( 48 − 42 ) / 3 = 2 z=(48-42)/3=2 z = ( 48 − 42 ) /3 = 2 and obtain
or specify μ \mu μ and σ \sigma σ
Introduction to Hypothesis Testing Details If we have a random sample x 1 , … , x n x_1, \ldots, x_n x 1 , … , x n from a normal distribution, then we consider them to be outcomes of independent random variables X 1 , … , X n X_1, \ldots, X_n X 1 , … , X n where X i ∼ N ( μ , σ 2 ) X_i \sim N(\mu, \sigma^2) X i ∼ N ( μ , σ 2 ) .
Typically, μ \mu μ and σ 2 \sigma^2 σ 2 are unknown but assume for now that σ 2 \sigma^2 σ 2 is known
Consider the hypothesis
H 0 : μ = μ 0 vs. H 1 : μ > μ 0 H_0: \mu = \mu_0 \text{ vs. } H_1: \mu > \mu_0 H 0 : μ = μ 0 vs. H 1 : μ > μ 0
where
μ 0 \mu_0 μ 0
is a specified number.
Under the assumption of independence, the sample mean
x ‾ = 1 n ∑ i = 1 n x i \overline{x} = \displaystyle\frac{1}{n}\displaystyle\sum^n_{i=1}x_i x = n 1 i = 1 ∑ n x i
is also an observation from a normal distribution, with mean μ \mu μ but a smaller variance.Specifically, x ‾ \overline{x} x is the outcome of
X ‾ = 1 n ∑ i = 1 n X i \overline{X} = \displaystyle\frac{1}{n}\displaystyle\sum^n_{i=1}X_i X = n 1 i = 1 ∑ n X i
and
X ∼ N ( μ , σ 2 n ) X \sim N(\mu, \displaystyle\frac{ \sigma^2}{n}) X ∼ N ( μ , n σ 2 )
so the standard deviation of X X X is σ n \displaystyle\frac{\sigma}{\sqrt{n}} n σ , so the appropriate error measure for x ‾ \overline{x} x is σ n \displaystyle\frac{\sigma}{\sqrt{n}} n σ , when σ \sigma σ is unknown.
If H 0 H_0 H 0 is true, then
z : = x ‾ − μ 0 σ / n z:= \displaystyle\frac{\overline{x}-\mu_0}{\sigma / \sqrt{n}} z := σ / n x − μ 0
is an observation from an n ∼ N ( 0 , 1 ) n \sim N (0,1) n ∼ N ( 0 , 1 ) distribution, i.e. an outcome of
Z = X ‾ − μ 0 σ / n Z= \displaystyle\frac{\overline{X}-\mu_0}{\sigma / \sqrt{n}} Z = σ / n X − μ 0
where Z ∼ N ( 0 , 1 ) Z \sim N(0,1) Z ∼ N ( 0 , 1 ) when H 0 H_0 H 0 is correct.
It follows that e.g. P [ ∣ Z ∣ > 1.96 ] = 0.05 P[\vert Z \vert > 1.96] = 0.05 P [ ∣ Z ∣ > 1.96 ] = 0.05 and if we observe ∣ Z ∣ > 1.96 \vert Z \vert > 1.96 ∣ Z ∣ > 1.96 then we reject the null hypothesis.
Note that the value z ∗ = 1.96 z^\ast = 1.96 z ∗ = 1.96 is a quantile of the normal distribution and we can obtain other quantiles with the pnorm
function, e.g. pnorm
gives 1.96 1.96 1.96 .