Study Guide for Final Exam

This will be a closed-book final exam.

You are permitted one 8.5 inch by 11 inch sheet of handwritten notes (both sides) for the final exam.

You will need a calculator. A calculator that performs arithmetic operations, reciprocals, square roots, powers and logarithms (base 10) is sufficient. Graphing calculators are permitted.

You will have 2 hours to complete the exam. The exam will start promptly at 11:35 AM, so please arrive on time.

There will be 7 questions on the exam: 3 modeled after problems from the previous exams, and 4 from the material covered since Exam 2.

To do well on the new material on the exam, you should be able to do the following:

Chapter 8

Section 8.1: Basic Properties of Confidence Intervals

Give the correct interpretation of the confidence level \(c\) associated with an interval estimator, and explain what is wrong with some common misinterpretations of confidence intervals.
Explain how the width of a confidence interval for the population mean of a Gaussian population with known standard deviation \(\sigma\) varies as you change:
- the confidence level \(c\);
- the sample size \(n\); or
- the population standard deviation \(\sigma\).
Construct confidence level \(c\) confidence upper bounds and lower bounds for a population mean \(\mu\) for a Gaussian population with both known and unknown population standard deviation \(\sigma\).
Sketch confidence level \(c\) upper bounds and lower bounds for a population mean \(\mu\) for a Gaussian population with both known and unknown population standard deviation \(\sigma\).
Relate the form of confidence upper / lower bounds for a population mean to the endpoints of the two-sided confidence interval for the same mean.

Section 8.2: Large Sample Confidence Intervals for a Population Mean and Proportion

Identify when a ‘word problem’ asks for a confidence interval for a population proportion and/or a success probability from a binomial experiment.
State the standardization of the success proportion \(\widehat{p}_{n} = X/n\) used in constructing the Agresti-Coull confidence interval, and explain why this standardized variable approximately follows a standard Gaussian distribution.
Use the binom.agresti.coull function in the R package binom to compute the Agresti-Coull confidence interval for a population proportion.
State how the confidence level passed to binom.agresti.coull must be modified to use binom.agresti.coull to construct confidence upper / lower bounds.
Sketch confidence level \(c\) confidence intervals, confidence upper bounds, and confidence lower bounds for a population proportion based on the Agresti-Coull confidence interval.
Distinguish between ‘word problems’ that call for the use of:
- \(\bar{x}_{n} \pm z_{\alpha/2} \cdot \sigma / \sqrt{n}\);
- \(\bar{x}_{n} \pm t_{\alpha/2, n-1} \cdot s / \sqrt{n}\);
- the Agresti-Coull confidence interval; or
- none of the above
or their upper / lower bound equivalents.

Chapter 9

Section 9.1: Tests of a Hypothesis Based on a Single Sample

Define a statistical hypothesis test.
State what types of quantities are of interest in statistical hypothesis tests.
Given a claim stated in words, determine a relevant population parameter related to the claim, and write the claim as an equality / inequality in terms of that population parameter.
Explain what the null and alternative hypotheses correspond to in a hypothesis test.
Given a claim stated in words, determine whether the claim is a null hypothesis or an alternative hypothesis, and determine the complementary claim.
Explain the main approach of a hypothesis test as finding evidence either for or against each of the null hypothesis and the alternative hypothesis.
Define the Type I Error Rate of a hypothesis test.
Define the significance level (‘level’) of a hypothesis test.
Relate the Type I Error Rate of a hypothesis test to the confidence level of an interval estimator used to test the hypothesis.
Identify the two types of errors that we can make while performing a statistical hypothesis test with regards to rejecting / not rejecting the null hypothesis.
Give the standard names for the two types of errors from the previous learning objective.
Define the Type I and Type II error rates of a hypothesis testing procedure.
State which of the two error rates we control by fixing \(\alpha\) for a hypothesis test.
Explain the analogy between Type I and Type II errors and convictions in the US criminal justice system.
Define test statistic.
Explain how the three test statistics from Sections 9.2 and 9.3 measure the discrepancy between the null hypothesis and the data.
Distinguish between a test statistic, which gives the procedure for testing a statistical hypothesis, and the observed test statistic, which is the application of that procedure to a particular sample.
Define the rejection region of a statistical hypothesis test.
Explain the rationale behind the procedure for constructing the rejection region for a hypothesis test of significance level \(\alpha\).
Construct rejection regions for one-sided and two-sided alternative hypotheses.
State and follow the “hypothesis testing recipe” given a claim about a population to test.
Define the Type II Error Rate for a hypothesis test.
Recognize and use the convention of denoting the Type I Error rate by \(\alpha\) and the Type II Error Rate by \(\beta\).
For a given hypothesis test and a specified alternative value, compute the Type I and Type II Error Rates for the hypothesis test.
Define the power of a hypothesis test, and relate the power to the Type II Error Rate of the test.
Define the effect size \(\Delta\) for a hypothesis test for the population mean of a Gaussian population.
State how the power of a hypothesis changes with:
- The sample size \(n\)
- The effect size \(\Delta\)
- The significance level \(\alpha\)

Section 9.2: Tests About a Population Mean

State and perform the procedure for testing a pair of hypotheses of the form \(H_{0} : \mu = \mu_{0}\) vs. \(H_{a} : \mu \neq \mu_{0}\) using a confidence interval.
State and perform the procedure for testing a pair of hypotheses of the form \(H_{0} : \mu \leq \mu_{0}\) vs. \(H_{a} : \mu > \mu_{0}\) using a confidence bound.
State and perform the procedure for testing a pair of hypotheses of the form \(H_{0} : \mu \geq \mu_{0}\) vs. \(H_{a} : \mu < \mu_{0}\) using a confidence bound.
State an appropriate test statistic to test a claim about the population mean of a Gaussian population with known population standard deviation.
State an appropriate test statistic to test a claim about the population mean of a Gaussian population with unknown population standard deviation.
Identify when the \(Z\) or \(T\) statistics are appropriate to test a claim about the mean of a population.
Perform a level \(\alpha\) hypothesis test for a claim about a population mean.
Use the pwr.norm.test function from the R package pwr to perform power calculations for hypothesis tests for the population mean of a Gaussian population.
- Find \(n\) given \(\mu_{0}, \mu_{a}, \sigma, \alpha,\) and \(1 - \beta\).
- Find \(1 - \beta\) given \(\mu_{0}, \mu_{a}, \sigma, \alpha,\) and \(n\).
Determine the distribution of the \(Z\)-statistic for a given hypothesis test for the population mean of a Gaussian population when \(\mu = \mu_{a} \neq \mu_{0}\).

Section 9.3: Tests Concerning a Population Proportion

State and perform the procedure for testing a pair of hypotheses of the form \(H_{0} : p = p_{0}\) vs. \(H_{a} : p \neq p_{0}\) using a confidence interval.
State and perform the procedure for testing a pair of hypotheses of the form \(H_{0} : p \leq p_{0}\) vs. \(H_{a} : p > p_{0}\) using a confidence bound.
State and perform the procedure for testing a pair of hypotheses of the form \(H_{0} : p \geq p_{0}\) vs. \(H_{a} : p < p_{0}\) using a confidence bound.
State an appropriate test statistic to test a claim about the success probability / population proportion from a binomial experiment.
Identify when the \(Z\) statistic is appropriate to test a claim about the success probability of a binomial experiment.
Perform a level \(\alpha\) hypothesis test for a claim about a binomial success probability or population proportion.

Section 9.4: \(P\)-values

Define the \(P\)-value for an observed test statistic.
Compute the \(P\)-value for the observed test statistic in a hypothesis test for:
- The population mean of a Gaussian population where \(\sigma\) is known.
- The population mean of a Gaussian population where \(\sigma\) is unknown.
- The success probability / population proportion for a binomial experiment.
Use a \(P\)-value to perform a hypothesis test at a significance level \(\alpha\).
(In the best of all possible worlds:) Avoid common mis-definitions and mis-interpretations of the \(P\)-value.

Chapter 10

Section 10.1: \(z\) Tests and Confidence Intervals for a Difference Between Two Population Means

Distinguish between inferential questions about a single population and about two populations.
Given a claim about two population, identify relevant population parameters and state the claim as an equality / inequality involving the population parameters.
State a point estimator for the difference between two population means, given a sample from each population.
Determine the mean and variance of the point estimator \(D = \bar{X} - \bar{Y}\).
Compute the mean and variance of sums and differences of two or more independent random variables.
Determine the distribution of the sum or difference of two independent Gaussian random variables.
State the \(Z\)-statistic for a two sample test for the difference between two population means, including its sampling distribution under the null hypothesis when the populations are Gaussian.
Construct confidence intervals and upper/lower confidence bounds for the difference between two population means when the population standard deviations are known.
Perform a hypothesis test for a claim about the difference between two population means when the population standard deviations are known.

Section 10.2: The Two-Sample \(t\) Test and Confidence Interval

State the \(T\)-statistic for a two sample test for the difference between two population means, including its sampling distribution under the null hypothesis when the populations are Gaussian.
Construct confidence intervals and upper/lower confidence bounds for the difference between two population means when the population standard deviations are unknown.
Perform a hypothesis test for a claim about the difference between two population means when the population standard deviations are unknown.
Use the R function welch.two.sample.t to perform a two sample \(t\)-test using summarized data.

Chapter 12

Section 12.1: The Simple Linear Regression Model

Define a response and a predictor in the context of predicting one variable using another.
Distinguish between a response \(y\) and a prediction \(\widehat{y}(x) = f(x)\) of the response.
Give the form of the prediction model used in a simple linear regression.
Recognize the terminology “performing a regression” or “fitting a regression” for determining the parameters of a regression model using data.

Section 12.2: Estimating Model Parameters

Fit a simple linear regression model to data using R’s lm function.
Recognize the slope and intercept of a simple linear regression model, and interpret the values of a slope and intercept in the context of a particular problem.
Use a regression model to predict a response given a value of the predictor.
Define the residual (error) of a prediction in a simple linear regression.
Explain why the distribution of the residuals provides information about how well a regression function predicts a response.
Define the residual standard error (standard error of prediction).
Define the median absolute error of prediction, and interpret the median absolute error in the context of a particular problem.

Section 12.5: Correlation

State examples of paired data.
Construct a scatter plot by-hand for a (small) data set of paired data.
Construct a scatter plot in R using plot.
Define the sample covariance, and explain why it quantifies linear association between two outcomes.
Define the sample correlation, and relate the sample correlation to the sample covariance.
State the main properties of the sample correlation in terms of its range, invariance to affine transformations of the data, and symmetry in \(X\) and \(Y\).
Define the population covariance and population correlation.
State the assumptions made on the population used in constructing the hypothesis tests and confidence intervals presented in this section for the population correlation.
State the null and alternative hypotheses for a claim about a population correlation.
State a test statistic for testing a claim about a population correlation, including its sampling distribution under the null hypothesis when the population is bivariate Gaussian.
Test a claim about a population correlation by either constructing a rejection region or computing a \(P\)-value for an appropriate test statistic.
Use cor.test and interpret its output to perform hypothesis tests or construct (approximate) confidence intervals / bounds for a population correlation.