Study Guide for Final Exam
This will be a closed-book final exam.
You are permitted one 8.5 inch by 11 inch sheet of handwritten notes (both sides) for the final exam.
You will need a calculator. A calculator that performs arithmetic operations, reciprocals, square roots, powers and logarithms (base 10) is sufficient. Graphing calculators are permitted.
You will have 2 hours to complete the exam. The exam will start promptly at 11:35 AM, so please arrive on time.
There will be 7 questions on the exam: 3 modeled after problems from the previous exams, and 4 from the material covered since Exam 2.
To do well on the new material on the exam, you should be able to do the following:
Chapter 8
Section 8.1: Basic Properties of Confidence Intervals
- Give the correct interpretation of the confidence level \(c\) associated with an interval estimator, and explain what is wrong with some common misinterpretations of confidence intervals.
- Explain how the width of a confidence interval for the population mean of a Gaussian population with known standard deviation \(\sigma\) varies as you change:
- the confidence level \(c\);
- the sample size \(n\); or
- the population standard deviation \(\sigma\).
- Construct confidence level \(c\) confidence upper bounds and lower bounds for a population mean \(\mu\) for a Gaussian population with both known and unknown population standard deviation \(\sigma\).
- Sketch confidence level \(c\) upper bounds and lower bounds for a population mean \(\mu\) for a Gaussian population with both known and unknown population standard deviation \(\sigma\).
- Relate the form of confidence upper / lower bounds for a population mean to the endpoints of the two-sided confidence interval for the same mean.
Section 8.2: Large Sample Confidence Intervals for a Population Mean and Proportion
- Identify when a ‘word problem’ asks for a confidence interval for a population proportion and/or a success probability from a binomial experiment.
- State the standardization of the success proportion \(\widehat{p}_{n} = X/n\) used in constructing the Agresti-Coull confidence interval, and explain why this standardized variable approximately follows a standard Gaussian distribution.
- Use the binom.agresti.coull function in the R package binom to compute the Agresti-Coull confidence interval for a population proportion.
- State how the confidence level passed to binom.agresti.coull must be modified to use binom.agresti.coull to construct confidence upper / lower bounds.
- Sketch confidence level \(c\) confidence intervals, confidence upper bounds, and confidence lower bounds for a population proportion based on the Agresti-Coull confidence interval.
- Distinguish between ‘word problems’ that call for the use of:
- \(\bar{x}_{n} \pm z_{\alpha/2} \cdot \sigma / \sqrt{n}\);
- \(\bar{x}_{n} \pm t_{\alpha/2, n-1} \cdot s / \sqrt{n}\);
- the Agresti-Coull confidence interval; or
- none of the above
or their upper / lower bound equivalents.
Chapter 9
Section 9.1: Tests of a Hypothesis Based on a Single Sample
- Define a statistical hypothesis test.
- State what types of quantities are of interest in statistical hypothesis tests.
- Given a claim stated in words, determine a relevant population parameter related to the claim, and write the claim as an equality / inequality in terms of that population parameter.
- Explain what the null and alternative hypotheses correspond to in a hypothesis test.
- Given a claim stated in words, determine whether the claim is a null hypothesis or an alternative hypothesis, and determine the complementary claim.
- Explain the main approach of a hypothesis test as finding evidence either for or against each of the null hypothesis and the alternative hypothesis.
- Define the Type I Error Rate of a hypothesis test.
- Define the significance level (‘level’) of a hypothesis test.
- Relate the Type I Error Rate of a hypothesis test to the confidence level of an interval estimator used to test the hypothesis.
- Identify the two types of errors that we can make while performing a statistical hypothesis test with regards to rejecting / not rejecting the null hypothesis.
- Give the standard names for the two types of errors from the previous learning objective.
- Define the Type I and Type II error rates of a hypothesis testing procedure.
- State which of the two error rates we control by fixing \(\alpha\) for a hypothesis test.
- Explain the analogy between Type I and Type II errors and convictions in the US criminal justice system.
- Define test statistic.
- Explain how the three test statistics from Sections 9.2 and 9.3 measure the discrepancy between the null hypothesis and the data.
- Distinguish between a test statistic, which gives the procedure for testing a statistical hypothesis, and the observed test statistic, which is the application of that procedure to a particular sample.
- Define the rejection region of a statistical hypothesis test.
- Explain the rationale behind the procedure for constructing the rejection region for a hypothesis test of significance level \(\alpha\).
- Construct rejection regions for one-sided and two-sided alternative hypotheses.
- State and follow the “hypothesis testing recipe” given a claim about a population to test.
- Define the Type II Error Rate for a hypothesis test.
- Recognize and use the convention of denoting the Type I Error rate by \(\alpha\) and the Type II Error Rate by \(\beta\).
- For a given hypothesis test and a specified alternative value, compute the Type I and Type II Error Rates for the hypothesis test.
- Define the power of a hypothesis test, and relate the power to the Type II Error Rate of the test.
- Define the effect size \(\Delta\) for a hypothesis test for the population mean of a Gaussian population.
- State how the power of a hypothesis changes with:
- The sample size \(n\)
- The effect size \(\Delta\)
- The significance level \(\alpha\)
Section 9.2: Tests About a Population Mean
- State and perform the procedure for testing a pair of hypotheses of the form \(H_{0} : \mu = \mu_{0}\) vs. \(H_{a} : \mu \neq \mu_{0}\) using a confidence interval.
- State and perform the procedure for testing a pair of hypotheses of the form \(H_{0} : \mu \leq \mu_{0}\) vs. \(H_{a} : \mu > \mu_{0}\) using a confidence bound.
- State and perform the procedure for testing a pair of hypotheses of the form \(H_{0} : \mu \geq \mu_{0}\) vs. \(H_{a} : \mu < \mu_{0}\) using a confidence bound.
- State an appropriate test statistic to test a claim about the population mean of a Gaussian population with known population standard deviation.
- State an appropriate test statistic to test a claim about the population mean of a Gaussian population with unknown population standard deviation.
- Identify when the \(Z\) or \(T\) statistics are appropriate to test a claim about the mean of a population.
- Perform a level \(\alpha\) hypothesis test for a claim about a population mean.
- Use the pwr.norm.test function from the R package pwr to perform power calculations for hypothesis tests for the population mean of a Gaussian population.
- Find \(n\) given \(\mu_{0}, \mu_{a}, \sigma, \alpha,\) and \(1 - \beta\).
- Find \(1 - \beta\) given \(\mu_{0}, \mu_{a}, \sigma, \alpha,\) and \(n\).
- Determine the distribution of the \(Z\)-statistic for a given hypothesis test for the population mean of a Gaussian population when \(\mu = \mu_{a} \neq \mu_{0}\).
Section 9.3: Tests Concerning a Population Proportion
- State and perform the procedure for testing a pair of hypotheses of the form \(H_{0} : p = p_{0}\) vs. \(H_{a} : p \neq p_{0}\) using a confidence interval.
- State and perform the procedure for testing a pair of hypotheses of the form \(H_{0} : p \leq p_{0}\) vs. \(H_{a} : p > p_{0}\) using a confidence bound.
- State and perform the procedure for testing a pair of hypotheses of the form \(H_{0} : p \geq p_{0}\) vs. \(H_{a} : p < p_{0}\) using a confidence bound.
- State an appropriate test statistic to test a claim about the success probability / population proportion from a binomial experiment.
- Identify when the \(Z\) statistic is appropriate to test a claim about the success probability of a binomial experiment.
- Perform a level \(\alpha\) hypothesis test for a claim about a binomial success probability or population proportion.
Section 9.4: \(P\)-values
- Define the \(P\)-value for an observed test statistic.
- Compute the \(P\)-value for the observed test statistic in a hypothesis test for:
- The population mean of a Gaussian population where \(\sigma\) is known.
- The population mean of a Gaussian population where \(\sigma\) is unknown.
- The success probability / population proportion for a binomial experiment.
- Use a \(P\)-value to perform a hypothesis test at a significance level \(\alpha\).
- (In the best of all possible worlds:) Avoid common mis-definitions and mis-interpretations of the \(P\)-value.
Chapter 10
Section 10.1: \(z\) Tests and Confidence Intervals for a Difference Between Two Population Means
- Distinguish between inferential questions about a single population and about two populations.
- Given a claim about two population, identify relevant population parameters and state the claim as an equality / inequality involving the population parameters.
- State a point estimator for the difference between two population means, given a sample from each population.
- Determine the mean and variance of the point estimator \(D = \bar{X} - \bar{Y}\).
- Compute the mean and variance of sums and differences of two or more independent random variables.
- Determine the distribution of the sum or difference of two independent Gaussian random variables.
- State the \(Z\)-statistic for a two sample test for the difference between two population means, including its sampling distribution under the null hypothesis when the populations are Gaussian.
- Construct confidence intervals and upper/lower confidence bounds for the difference between two population means when the population standard deviations are known.
- Perform a hypothesis test for a claim about the difference between two population means when the population standard deviations are known.
Section 10.2: The Two-Sample \(t\) Test and Confidence Interval
- State the \(T\)-statistic for a two sample test for the difference between two population means, including its sampling distribution under the null hypothesis when the populations are Gaussian.
- Construct confidence intervals and upper/lower confidence bounds for the difference between two population means when the population standard deviations are unknown.
- Perform a hypothesis test for a claim about the difference between two population means when the population standard deviations are unknown.
- Use the R function welch.two.sample.t to perform a two sample \(t\)-test using summarized data.
Chapter 12
Section 12.1: The Simple Linear Regression Model
- Define a response and a predictor in the context of predicting one variable using another.
- Distinguish between a response \(y\) and a prediction \(\widehat{y}(x) = f(x)\) of the response.
- Give the form of the prediction model used in a simple linear regression.
- Recognize the terminology “performing a regression” or “fitting a regression” for determining the parameters of a regression model using data.
Section 12.2: Estimating Model Parameters
- Fit a simple linear regression model to data using R’s lm function.
- Recognize the slope and intercept of a simple linear regression model, and interpret the values of a slope and intercept in the context of a particular problem.
- Use a regression model to predict a response given a value of the predictor.
- Define the residual (error) of a prediction in a simple linear regression.
- Explain why the distribution of the residuals provides information about how well a regression function predicts a response.
- Define the residual standard error (standard error of prediction).
- Define the median absolute error of prediction, and interpret the median absolute error in the context of a particular problem.
Section 12.5: Correlation
- State examples of paired data.
- Construct a scatter plot by-hand for a (small) data set of paired data.
- Construct a scatter plot in R using plot.
- Define the sample covariance, and explain why it quantifies linear association between two outcomes.
- Define the sample correlation, and relate the sample correlation to the sample covariance.
- State the main properties of the sample correlation in terms of its range, invariance to affine transformations of the data, and symmetry in \(X\) and \(Y\).
- Define the population covariance and population correlation.
- State the assumptions made on the population used in constructing the hypothesis tests and confidence intervals presented in this section for the population correlation.
- State the null and alternative hypotheses for a claim about a population correlation.
- State a test statistic for testing a claim about a population correlation, including its sampling distribution under the null hypothesis when the population is bivariate Gaussian.
- Test a claim about a population correlation by either constructing a rejection region or computing a \(P\)-value for an appropriate test statistic.
- Use cor.test and interpret its output to perform hypothesis tests or construct (approximate) confidence intervals / bounds for a population correlation.