Study Guide for Final Exam
This will be a closed-book exam.
You are permitted one 8.5 inch by 11 inch sheet of handwritten notes (both sides) for the final exam. This page of notes should be clearly visible through your web camera during the entire exam. You should submit photos of your notes to the eCampus Assignment for the Final Exam in a distinct collated PDF. Do not include the notes in the collated PDF of your exam solutions.
You will need a calculator. A calculator that performs arithmetic operations, reciprocals, square roots, powers and logarithms (base 10) is sufficient. Graphing calculators are permitted.
You will have 2 hours to complete the exam. The exam will start promptly at 8:30 AM, so please log into the Zoom session early.
To do well on the exam, you should be able to do the following:
Chapter 27
Comparing two samples: the Wilcoxon rank sum test
- Describe useful exploratory plots to generate before performing Wilcoxon’s Rank Sum Test, and interpret exploratory plots in this context.
- Define the rank of a data value in a sample.
- Explain the rationale for Wilcoxon’s Rank Sum Test for comparing two population distributions.
- State Wilcoxon’s Rank Sum Test Statistic \(W\).
- Given a (small) combined sample with the associated ranks, compute Wilcoxon’s Rank Sum Test Statistic.
- State appropriate null and alternative hypotheses for Wilcoxon’s Rank Sum Test, given a claim about two populations.
- Perform Wilcoxon’s Rank Sum Test using
wilcox.test
in R.
- Interpret
wilcox.test
’s output for the Rank Sum Test.
Matched pairs: the Wilcoxon signed rank test
- Describe useful exploratory plots to generate before performing Wilcoxon’s Signed Rank Test, and interpret exploratory plots in this context.
- Explain the rationale for Wilcoxon’s Signed Rank Test for the median of a distribution.
- State the assumption needed for Wilcoxon’s Signed Rank Test.
- State Wilcoxon’s Signed Rank Test Statistic \(W^{+}\).
- State appropriate null and alternative hypotheses for Wilcoxon’s Signed Rank Test, given a claim about a single population or two matched populations.
- Perform Wilcoxon’s Signed Rank Test using
wilcox.test
in R.
- Interpret
wilcox.test
’s output for the Signed Rank Test.
Chapter 24
Comparing several means
- Use boxplots generated using
gf_boxplot
to compare samples from two more more populations.
- State the null and alternative hypotheses for a claim about equality amongst three or more population means.
- Explain why the alternative hypothesis of a “all means are equal” null hypothesis does not specify precisely which population means, if any, differ.
- Recognize that a test related to three or more population means is called an Analysis of Variance.
- Recognize the acronym ANOVA for Analysis of Variance.
The \(F\) statistic
- State the test statistic for a one-way ANOVA.
- Explain the effect on the \(F\)-statistic of increasing / decreasing the variability amongst the sample means and increasing / decreasing the variability within each sample.
- Indicate what values of an \(F\)-statistic indicate evidence against a null hypothesis.
- Given rug plots and box plots for several samples, indicate whether the \(F\)-statistic will be small or large.
- State the sampling distribution of the \(F\) statistic when the null hypothesis is true.
- State the type of \(P\)-value (right-sided, left-sided, or two-sided) computed from the \(F\) distribution, and explain why this type of \(P\)-value is used.
The analysis of variance \(F\) test
- Use the functions
aov
and summary
to perform an \(F\) test in R.
- Interpret the output of an
aov
object passed to summary
.
Conditions for ANOVA
- State the assumptions of a one-way ANOVA.
- State an alternative to a one-way ANOVA when the assumptions of the one-way ANOVA fail or cannot be checked.
Pairwise Comparisons (Lecture Notes for Lecture 21)
- Explain why pairwise comparisons are necessary after finding a statistically significant \(F\) statistic.
- Recognize Tukey’s Honest Significant Difference as a pairwise comparison method.
- Use the function
TukeyHSD
to perform pairwise comparsisons in R.
- Interpret the output of
TukeyHSD
when passed an output from aov
, including:
- the estimated pairwise differences
- the left and right confidence intervals for the pairwise differences
- the adjusted \(P\)-value for the pairwise comparisons
- State the null hypothesis implicit in the \(P\)-values reported by
TukeyHSD
.
Chapter 21
Hypotheses for goodness of fit
- Recognize and state a claim about population proportions for categories of a categorical variable in a population.
- Given claimed proportions of categories in a population, state the null and alternative hypothesis corresponding to that claim.
- Explain why the alternative hypothesis of a “the population proportions equal the specified values” null hypothesis does not specify precisely which population proportions, if any, differ.
Expected counts and chi-square statistic
- Given a one-way table of counts and null values for population proportions, compute the expected count for each category.
- Compute the deviation between the observed counts in a sample and the expected counts under the null model.
- Compute the \(\chi^{2}\)-statistic given observed counts and population proportions.
- Recognize the Greek letter \(\chi\) (“chi”, pronounced “ki” as in “kite”) as the Greek analog to the Roman letter \(x\).
- Compute the \(\chi^{2}\)-statistic from observed counts and null proportions using
xchisq.test
from mosaic
.
The chi-square test for goodness of fit
- Explain why it is more appropriate to call the \(\chi^{2}\) “goodness-of-fit” test a “lack-of-fit” test.
- Interpret the output of
xchisq.test
in terms of a \(\chi^{2}\) lack-of-fit test.
- Use the output of
xchisq.test
to test a hypothesis about the proportions of some category in a population.
Interpreting significant chi-square results
- Construct simultaneous confidence intervals for the population proportions of a categorical variable using
gf_pop_props
from MUsaic
.
- Interpret the output of
gf_pop_props
.
- State the implicit null hypothesis tested by checking for inclusion of a null population proportion in a confidence interval returned by
gf_pop_props
.
Conditions for the chi-square test
- State the assumptions on a sample for the \(\chi^{2}\) statistic to follow a \(\chi^{2}\) distribution.
Chapter 22
Two-way tables
- Explain how a two-way table can be used to summarize counts of a categorical variable across two or more populations.
- Give an analogy between one-, two-, and multi-sample tests for population means and a two-way table for testing claims about proportions of a category across one, two, and more than two populations.
- State the convention we will use in the class in terms of what rows and columns correspond to in a two-way table.
Hypotheses for two-way tables of counts
- Explain what we mean by two variables being associated.
- Explain what we mean by two variables being independent.
- Describe how independence between two variables manifests in a two-way table.
- State the generic form of the null and alternative hypotheses about two categorical variables in a population.
- Given a problem about two categorical variables, identify the relevant null and alternative hypotheses from the problem.
Expected counts and the chi-square statistic
- Explain under what hypothesis the expected counts for the \(\chi^{2}\) statistic for association are calculated.
- State the \(\chi^{2}\) statistic for association.
- State the sampling distribution of the \(\chi^{2}\) statistic for association under the null hypothesis.
- Construct a matrix from a two-way table using
matrix
in R.
- Compute the \(\chi^{2}\) statistic for association using
xchisq.test
from mosaic
.
- Interpret the output of
xchisq.test
in terms of the \(\chi^{2}\) statistic for association.
The chi-square test
- Interpret the output of
xchisq.test
in terms of the \(\chi^{2}\) test for association.
- Perform a hypothesis test for association using
xchisq.test
.
Conditions for the chi-square test
- State the assumptions on the data for the \(\chi^{2}\) statistic for association to follow a \(\chi^{2}\) distribution.
Chapter 9
Risk and odds
- Define the population risk of a negative outcome.
- Define the population odds of a negative outcome.
- Compute the sample risk and sample odds given a table summarizing positive and negative outcomes in a sample.
Chapter 20
Two-sample problems: proportions
- Give examples of two-sample problems that involve proportions.
Relative risk and odds ratios
- Define the population relative risk of a negative outcome in a treatment group compared to a control group.
- Define the population odds ratio of a negative outcome in a treatment group compared to a control group.
- Compute the sample relative risk and sample odds ratio given a two-way table.
- State what value of relative risk / odds ratio corresponds to “no difference” in risk between the treatment and control populations.
- Explain why relative risks and odds ratios are most appropriately considered on a logarithmic scale.
- Recognize that odds are not risks, and odds ratios are not relative risks.
- Interpret relative risks and odds ratios in terms of whether they indicate the treatment or control condition leads to lower risk.
Inferences for Relative Risks and Odds Ratios Using R (Lecture Notes for Lecture 24)
- State the four conditions on the count of a categorical variable for it be binomial.
- Compute the sample risks, relative risk, and odds ratio from a two-way table using
oddsRatio
from the mosaic
package.
- Compute confidence intervals for the population relative risk and odds ratio from a two-way table using
oddsRatio
from the mosaic
package.
- Use a confidence interval for either a population relative risk or population odds ratio to test for a difference in risk between a treatment and control population.
Chapter 23
The regression parameters
- State the assumptions of the Simple Linear Regression with Normal Noise (SLRNN) model.
- State the three parameters of the SLRNN model.
- Relate the components of the SLRNN model (intercept, slope, errors) to their sample analogs (intercept, slope, residuals).
- Explain in what sense the SLRNN model is “a line plus noise.”
- State the estimator for the standard deviation of the noise term in the SLRNN model.
- Identify the estimates for the intercept, slope, and standard deviation of the noise term in an output from
summary
.
Checking the conditions for inference
- Use the following four plots to diagnose the validity of the SLRNN model from residuals of a simple linear regression:
a. A Q-Q plot of the residuals.
b. A plot of the residuals against the fitted values of the response.
c. A plot of the squared residuals against fitted values of the response.
d. A plot of the residuals against the individual index.
- Identify which SLRNN model assumption is checked using the four residual plots from the previous learning objective.
- Use
mplot
and gf_line
in mosaic
to generate residual diagnostic plots from an output from lm
.
Testing the hypothesis of no linear relationship
- State the \(T\)-statistic for the sample slope and its sampling distribution under the null hypothesis when the SLRNN model assumptions hold.
- State the estimate of the standard error of the slope estimate in simple linear regression.
- Identify the estimate of the standard error of the slope estimate in an output from
summary
.
- Perform a hypothesis test for a claim about a population slope.
Confidence intervals for the regression slope
- State the confidence interval for a population slope under the SLRNN model assumptions.
- Construct a confidence interval for a population slope using the output from
summary
and qt
.