Study Guide for Final Exam

This will be a closed-book final exam.

You are permitted one 8.5 inch by 11 inch sheet of handwritten notes (both sides) for the final exam.

You will need a calculator. A calculator that performs arithmetic operations, reciprocals, square roots, powers and logarithms (base 10) is sufficient. Graphing calculators are permitted.

You will have 2 hours to complete the exam. The exam will start promptly at 11:35 AM, so please arrive on time.

There will be 7 questions on the exam: 4 modeled after problems from the previous exams, and 3 from the material covered since Exam 3.

To do well on the new material on the exam, you should be able to do the following:

Chapter 5

Section 5.7: Assessing Normality

  1. Explain why we should investigate whether the values in a sample are approximately normally distributed when the sample size is less than 30 before using any of the inferential procedures we developed for population means.
  2. Follow the three step procedure for normality testing:
  3. Explain what a normal probability plot should look like when the population is: (a) normally distributed and (b) not normally distributed.
  4. State the names of at least two test statistics for testing the normality of the density histogram of a population.
  5. State the null and alternative hypotheses for a normality test.
  6. Interpret a \(P\)-value from a normality test in terms of a claim about whether the population is normally distributed.

Chapter 8

Important: For all of the hypothesis tests in this chapter, you do not need to know the formulas for the test statistics, but you should be able to state the name of the appropriate test statistic for a given type of hypothesis test, and determine its sampling distribution under the null hypothesis.

Section 8.2: Inferences About Two Proportions

  1. State scientific or medical claims that could be tested using a hypothesis test for the difference in two population proportions.
  2. Given a claim about two population proportions, determine the appropriate null and alternative hypotheses for testing that claim.
  3. Distinguish between when a right-tailed, left-tailed, or two-tailed test is appropriate for testing a claim about two population proportions.
  4. State when the normal approximation-based hypothesis test for the difference between two population proportions is appropriate.
  5. Given a ‘word problem’ about two population proportions and their sample estimates, identify the number of ‘successes’ in each sample, and the number of ‘trials’ in each sample.
  6. Use Minitab to perform a hypothesis test for the difference between two population proportions.
  7. Interpret a printout from Minitab’s Two-Sample Proportion procedure, and use such a printout to test a claim about the difference between two population proportions.
  8. Given a printout from Minitab’s Two-Sample Proportion procedure, determine the appropriate \(Z\) statistic from the \(P\)-value and vice versa.

Section 8.3: Inferences About Two Means: Independent Samples

  1. Give example claims in science, medicine, and nutrition involving two population means.
  2. State a claim about two population means using an equality / inequality. For example, \(\mu_{1} > \mu_{2}\), \(\mu_{1} \neq \mu_{2}\), etc.
  3. State the null and alternative hypotheses resulting from a claim about two population means both in terms of \(\mu_{1}\) and \(\mu_{2}\), and in terms of the difference parameter \(\delta = \mu_{1} - \mu_{2}\).
  4. State the conditions when the two sample \(T\)-test with independent samples is appropriate for testing a claim about two population means from two samples from those populations.
  5. State the test statistic for the two sample \(T\)-test with independent samples.
  6. Use Minitab to perform a two sample \(T\)-test with independent samples.
  7. Interpret the output of Minitab’s two sample \(T\)-test with independent samples as they relate to a claim about two population means.

Section 8.4: Inferences from Matched Pairs

  1. Explain what it means to have ‘matched pairs’ in a study with two samples.
  2. Explain why matched pairs violates one of the assumptions of the 2-sample \(T\)-test with independent samples.
  3. State claims about population means for matched pairs in terms of the individual means and the difference in the means.
  4. Give examples of studies in nutrition, exercise, and medicine that might use a matched pair design.
  5. State the test statistic for a paired \(T\)-test, and give its sampling distribution under the null hypothesis.
  6. Explain, roughly, why the 2-sample \(T\)-test with independent samples and the paired \(T\)-test might result in different conclusions using the same data.
  7. Determine, based on the design of a study, whether a 2-sample \(T\)-test with independent samples or a paired \(T\)-test would most appropriate for testing a claim about the population means.
  8. Interpret Minitab’s output from its paired \(T\)-test routine.
  9. Given summary statistics about the mean and standard deviation of difference scores from matched samples, use the appropriate \(T\)-statistic to test a claim about the population means.

Chapter 9

Important: For all of the hypothesis tests in this chapter, you do not need to know the formulas for the test statistics, but you should be able to state the name of the appropriate test statistic for a given type of hypothesis test, and determine its sampling distribution under the null hypothesis.

Section 9.2: Correlation

  1. State what it means, loosely, for two quantitative variables to be associated.
  2. Interpret a scatter plot of two quantitative variables.
  3. State the trend of two quantitative variables given a scatter plot.
  4. State the strength of the trend of two quantitative variables given a scatter plot.
  5. Recognize the names linear correlation coefficient, Pearson correlation coefficient, and ‘the’ correlation coefficient as synonyms.
  6. Identify the notation used for the sample correlation \(r\) and the population correlation \(\rho\) (the lowercase Greek letter rho).
  7. State the range of values that the sample correlation \(r\) and the population correlation \(\rho\) can take, and what those values correspond to in terms of the presence / absence of a linear trend in the sample data.
  8. State when the sampling distribution used for hypothesis testing and confidence intervals is appropriate for a given data set.
  9. State the null and alternative hypotheses for a claim about the correlation between two outcomes in a population.
  10. Use the \(P\)-value provided by Minitab to test a claim about the correlation between two outcomes in a population.
  11. Perform the hypothesis testing routine in the right column of page 334 of Triola & Triola for a claim about a population correlation.
  12. Interpret a confidence interval, like the one provided by this web applet, in terms of the correlation between two outcomes in a population.
  13. Distinguish between correlative and causative statements about two outcomes, and give examples of correlative statements that do not imply causative statements.

Section 9.3: Regression

  1. Define the terms “response” and “predictor” in the context of predicting one outcome from another, and identify what outcome is the response and what outcome is the predictor when give a prediction problem.
  2. Specify how a regression function is related to the task of predicting one outcome from another.
  3. Specify the form of a simple linear regression of a response \(y\) on a predictor \(x\). Equivalently, identify the form of a simple linear regression model for predicting a response \(y\) using a predictor \(x\).
  4. Perform a simple linear regression using Minitab.
  5. Recall, from either high school or college algebra / precalculus, the equation for a line and the interpretation of the slope and intercept of the line.
  6. Identify the slope and intercept from a simple linear regression model, and interpret the slope and intercept in the context of the prediction problem.
  7. Explain, with an example, why the causal interpretation of the slope of a simple linear regression model as ‘the amount that the response increases as the predictor increases by 1 unit’ is generally not correct.
  8. Use a simple linear regression model to identify the best prediction of the response at a given value of the predictor.
  9. Explain, qualitatively, in what sense the line determined by simple linear regression is the ‘line of best fit.’ That is, what do we mean by ‘fit’ and what do we mean by ‘best’?
  10. Determine the error of a prediction given a simple linear regression model, a value for the predictor, and a response at that value of the predictor.
  11. Identify the sample residuals / errors
    \[e = y - \widehat{y} = y - (b_{0} + b_{1} x)\] from a regression of \(y\) on \(x\) (aka from predicting a response \(y\) using a predictor \(x\)).
  12. Describe and interpret a residual plot of \(e\) versus \(x\) or equivalently of \(e\) versus \(\widehat{y}\).

Section 9.4: Variation and Prediction Intervals

  1. Explain why the variation of the frequency histogram for the prediction errors of a simple linear regression gives one indication of how good the model is at prediction.
  2. Define the root-mean-square error (RMSE), aka standard error of prediction (\(S\)), in terms of the distribution of the prediction errors.
  3. Interpret the RMSE / \(S\) in terms of the proportion of errors that fall within \(\pm k \cdot S\) for \(k = 1, 2, 3\).
  4. Give two equivalent definitions of the coefficient of determination, aka \(R^{2}\), aka R-sq.
  5. State the common (though erroneous) way to use \(R^{2}\) as a way to evaluate a simple linear regression model.
  6. Relate the coefficient of determination to the sample linear correlation coefficient \(r_{xy}\).

Chapter 9 (Additional Handout): Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Errors

  1. Identify the five main assumptions of the simple linear regression model with normal residuals.
  2. Explain what the above assumptions indicate about how the true errors / residuals \(\epsilon_{i}\) should look when the simple linear regression model with normal residuals is appropriate.
  3. Use the diagnostic plots generated by Minitab to determine whether the simple linear regression model with normal residuals is appropriate for the paired data under consideration.
  4. Interpret the standard errors for the estimators \(b_{0}\) and \(b_{1}\) reported by Minitab in terms of margins of error. Minitab calls the standard errors SE Coef.
  5. Identify the hypothesis tests associated with the \(P\)-values reported by Minitab.
  6. Interpret the P-values reported by Minitab in terms of hypothesis tests for the population intercept \(\beta_{0}\) and slope \(\beta_{1}\).

Chapter 10

Important: For all of the hypothesis tests in this chapter, you do not need to know the formulas for the test statistics, but you should be able to state the name of the appropriate test statistic for a given type of hypothesis test, and determine its sampling distribution under the null hypothesis.

Section 10.2: Multinomial Experiments: Goodness-of-Fit

  1. Relate a multinomial procedure to a binomial procedure.
  2. State the requirements necessary for the outcome of an procedure to follow a multinomial distribution.
  3. State examples from everyday life that follow a multinomial distribution.
  4. State the definition of a goodness-of-fit test.
  5. Explain, qualitatively, how the \(\chi^{2}\) statistic compares the observed outcomes to the expected outcomes in a multinomial procedure.
  6. State the large-sample sampling distribution of the \(\chi^{2}\) statistic under the null hypothesis for a multinomial procedure with \(k\) categories for the outcomes.
  7. State conditions on the expected frequencies for when the \(\chi^{2}\) statistic can be used.
  8. Use Table A–4 to determine critical values for a one-sample \(\chi^{2}\) test for goodness-of-fit at significance level \(\alpha\).
  9. Use Minitab to perform a one-sample \(\chi^{2}\) test for goodness-of-fit.
  10. Interpret the output of a one-sample \(\chi^{2}\) test for goodness-of-fit in terms of a given null and alternative hypotheses.

Section 10.3: Contingency Tables: Independence and Homogeneity

  1. Relate association to statistical independence.
  2. State an appropriate test statistic for testing a claim of association between two categorical variables.
  3. State under what conditions the \(\chi^{2}\) test for independence is appropriate.
  4. Relate the contingency tables from this section to the frequency / count tables from Chapter 3.
  5. Compute estimates of probabilities from contingency tables for two categorical variables \(X\) and \(Y\), including estimates of \(P(X = x, Y = y)\), \(P(X = x)\), \(P(Y = y)\), \(P(X = x \mid Y = y)\), and \(P(Y = y \mid X = x)\).
  6. Explain how to compute the expected frequencies (counts) under the null model of independence using the multiplication rule for independent events.
  7. Distinguish between statistical association and causation.