6) Confidence Intervals¶

Vitor Kamada

econometrics.methods@gmail.com

Last updated 7-21-2020

6.1) What is the confidence interval?¶

Confidence interval is a range of values for a parameter of the sampled population.

Narrow confidence interval suggests precise estimate. Wide confidence interval suggests that little is known about the population.

6.2) What is the z-interval for proportion ($p$)?¶

A confidence interval that assumes a normal model for the sampling distribution.

Confidence interval for proportion ($p$) is:

\[ \hat{p}\pm z_{\frac{\alpha}{2}} \cdot se(\hat{p}) \]

\[ \hat{p}\pm z_{\frac{\alpha}{2}}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]

where $P(Z>z_{\alpha})=\alpha$ for $Z\sim N(0,1)$. For a 95% confidence interval, the level of significance ($\alpha$) is 5%, then $z_{2.5\%} = 1.96$.

6.3) Microsoft believes that 75% of Windows users are super satisfied with the operational system. If that’s the case, what range holds 95% of all sample proportions if $n=225$?¶

\[ \hat{p}\pm z_{\frac{\alpha}{2}}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]

\[ 0.75\pm 1.96\sqrt{\frac{0.75(1-{0.75})}{225}} \]

\[ 0.693 \leq p \leq 0.807 \]

6.4) What is Student’s t-distribution?¶

Student’s t-distribution is a model for the sampling distribution that compensates for substituting the standard deviation of the sample ($s$) for the standard deviation of the population ($\sigma$) in the standard error.

Student’s t-distribution is the exact sampling distribution of the random variable:

\[T_{n-1} = \frac{\bar{X}-\mu}{\frac{S}{\sqrt{n}}}\]

where $(n-1)$ is the degrees of freedom (df). As $n \rightarrow \infty$, t-distribution converge to normal distribution.

See in the chart that t-distribution with degrees of freedom ($df=5$) has thick tails that accommodates more numbers of outliers than the normal distribution. Note that t-distribution with degrees of freedom ($df=30$) is indistinguishable from normal distribution.

import numpy as np
from scipy.stats import t, norm
import matplotlib.pyplot as plt
%matplotlib inline

x = np.linspace(-4, 4, 100)

t_dist5 = plt.plot(x, t(5).pdf(x), 'b', label='t, df=5')
t_dist30 = plt.plot(x, t(30).pdf(x), 'g--', label='t, df=30')

normal_dist = plt.plot(x, norm.pdf(x), 'r--',label='Normal')

plt.legend()
plt.show()

_images/6)_Confidence_Intervals_10_0.png

6.5) What is the t-interval for the mean ($\mu$)?¶

The $100(1-\alpha)\%$ confidence t-interval for $\mu$ is:

\[ \bar{x}\ \pm t_{\frac{\alpha}{2},n-1}\frac{s}{\sqrt{n}}\]

where $P(T_{n-1} > t_{\frac{\alpha}{2},n-1}) = \frac{\alpha}{2}$ for $T_{n-1}$ distributed as a student’s t-random variable with degrees of freedom ($df = n-1$).

6.6) A manager claims that the clients have on average $6,200 in the bank. How reasonable is the claim?¶

A random sample of 49 clients shows the average $\bar{x}=4,200$ with standard deviation ($s = 3,500$).

We can use the code below to get:

\[t_{\frac{\alpha}{2},n-1} = t_{2.5\%,48} =-2.01 \]

t.ppf(0.025,48)

-2.010634754696446

The 95% confidence interval for $\mu$ is:

\[ \bar{x}\ \pm t_{\frac{\alpha}{2},n-1}\frac{s}{\sqrt{n}}\]

\[ 4,200 \pm 2.01 \frac{3,500}{\sqrt{49}} \]

$$ $3,195 \leq \mu \leq $5,205 $$

The manager’ claim is not compatible with the data. The average $6,200 is far above the confidence interval.

6.7) What is the margin of error?¶

Margin of error (ME) of 95% confidence interval for $\mu$:

\[ ME = t_{2.5\%,n-1}\frac{s}{\sqrt{n}}\]

Margin of error (ME) of 95% confidence interval for $p$:

\[ ME = z_{2.5\%}\frac{\sqrt{\hat{p}(1-\hat{p})}}{\sqrt{n}}\]

It is very common to round and replace the $t_{2.5\%,n-1}$ and $z_{2.5\%}$ by 2.

6.8) Justify the numbers below:¶

alt text

\[ ME = z_{2.5\%}\frac{\sqrt{\hat{p}(1-\hat{p})}}{\sqrt{n}}\]

Replace $z_{2.5\%}$ by 2. The variance, $\hat{p}(1-\hat{p})$, is maximum, when $\hat{p} = \frac{1}{2}$

\[ ME = 2\frac{\sqrt{\frac{1}{2}(1-\frac {1}{2})}}{\sqrt{n}}\]

\[ ME = 2\frac{(\frac {1}{2})}{\sqrt{n}}\]

\[ ME = \frac{1}{\sqrt{n}}\]

Let’s test $n=400$:

\[ ME = \frac{1}{\sqrt{400}} \]

\[= \frac{1}{20} \]

\[= 0.05 \]

\[=5\% \]

Exercises¶

1| Find the 95% z-interval for the parameter $p$, given $\hat{p}=0.5$, $n=36$.

2| A school claims that the students have on average a score of 690 in SAT Math Test. How reasonable is the claim? A random sample of 36 students from this school has average $\bar{x}= 650$, with standard deviation ($s=100$).

3| Do you agree with the statement: “All other things the same, a 90% confidence interval is narrow than a 95% confidence interval.” Justify.

4| What is the probability that $\bar{X}>\mu?$

5| What is the coverage of the confidence interval $[\hat{p}$ to $1]$?

6| Fox News interviewed 500 voters and claimed that the margin of error is less than 3% to predict the outcome of the election. Do you agree? Justify.

Reference¶

Adhikari, A., DeNero, J. (2020). Computational and Inferential Thinking: The Foundations of Data Science. Link

Adhikari, A., Pitman, J. (2020). Probability for Data Science. Link

Diez, D. M., Barr, C. D., Çetinkaya-Rundel, M. (2014). Introductory Statistics with Randomization and Simulation. Link

Lau, S., Gonzalez, J., Nolan, D. (2020). Principles and Techniques of Data Science. Link

Probability and Statistics with Python