Study Guide for Exam 1
This will be a closed-book exam. You will need a calculator. A calculator that performs arithmetic operations, reciprocals, square roots, powers and logarithms (base 10) is sufficient. Graphing calculators are permitted.
You will not be using R or RStudio during the exam, but you should know the R commands to generate a given graphic, compute a given statistic, etc.
To do well on the exam, you should be able to do the following:
What is Statistics? (Lecture Notes)
- State examples of statistical questions.
- State the five main steps in the statistical analysis of data.
Chapter 1
Individuals and variables
- Define individual and variable in the context of a data set.
- Identify the individuals and variables given the description of a data set.
- Explain how the concept of individual and variable relates to the rows and columns of a typical spreadsheet containing a data set.
Identifying categorical and quantitative variables
- State the characteristics of quantitative and categorical variables.
- Given a variable, determine whether it is quantitative or categorical.
Quantitative variables: histograms
- Construct a rug plot by-hand given a (small) data set.
- Construct a histogram by-hand given a (small) data set and the desired bin width and starting boundary.
Interpreting histograms
- Identify key characteristics of the shape of a histogram.
- Identify key characteristicss of the center of a histogram.
- Identify key characteristics of the spread of a histogram.
- Identify potential outliers using a histogram in conjunction with a rug plot.
- Interpret a histogram in the context of the data set it summarizes.
Chapter 2
- State the formula for the sample mean of a data set.
- Recognize the notation \(\bar{x}\) for the sample mean.
- Compute the sample mean of a (small) data set by-hand.
- State a physical interpretation of the sample mean in terms of the rug plot of the data.
- State the definition of the sample median.
- Compute the sample median of a (small) data set by-hand.
Measures of spread: percentiles, standard deviation
- State the formula for the sample variance and sample standard deviation.
- Recognize the notation \(s\) for the sample standard deviation.
- Compute the sample variance and sample standard deviation of a (small) data set by-hand.
- Define the sample quartiles of a data set.
- Recognize the notation \(Q_{1}\), \(Q_{2}\), and \(Q_{3}\) for the first, second, and third sample quartiles.
- Define the interquartile range of a data set.
Graphical displays of numerical summaries
- State the five components of the “five-number summary.”
- Draw a boxplot for a data set given a five-number summary of the data set.
- Interpret a boxplot in terms of symmetry versus skewness of a distribution and identification of outliers.
Chapter 3
Explanatory and response variables
- Define explanatory and response variables in terms of their roles in predicting one variable from another.
- Given a predictive question about two variables, identify which variable is the explanatory variable and which is the response variable.
Relationship between two quantitative variables: scatterplots
- Interpret a scatter plot showing the relationship between an explanatory variable and response variable in a data set.
- Construct a scatter plot from a (small) data set with two quantitative variables.
Adding categorical variables to scatterplots
- Explain how to construct a scatter plot that also includes the value of a categorical variable for each individual in a data set.
Measuring linear association: correlation
- State the formula for the sample covariance and sample correlation.
- State the major properties of the sample correlation.
- Given a scatter plot, identify whether the sample correlation for the points is positive, negative, or nearly zero.
Chapter 4
The least-squares regression line
- Specify how a regression function is related to the task of predicting one outcome from another.
- Specify the form of a simple linear regression of a response variable \(y\) on an explanatory variable \(x\). Equivalently, identify the form of a simple linear regression model for predicting a response variable \(y\) using a predictor variable \(x\).
- Recall, from either high school or college algebra / precalculus, the equation for a line and the interpretation of the slope and intercept of the line.
- Identify the slope and intercept from a simple linear regression model, and interpret the slope and intercept in the context of a prediction problem.
- Use a simple linear regression model to predict a response variable at a given value of the predictor.
- Determine the residual / error of a prediction given a simple linear regression model, a value for the explanatory variable, and a response variable at that value of the explanatory variable.
Association does not imply causation
- Give examples where an observed association between two variables does not result from a causal influence from one variable to the other.
- Explain the motto “association does not imply causation,” or its more popular counterpart “correlation does not imply causation.”
Nonlinear Relationships (Handout)
- Explain why it is sometimes appropriate to transform either the explanatory variable, the response variable, or both, before computing a sample correlation.
Chapter 5
Marginal distributions
- Idenfity when a two-way table is appropriate for summarizing the relationship between an explanatory variable and a response variable.
- Given a two-way table, compute the marginal distributions of the two-way table both as counts and percentages / proportions.
- Identify what a marginal distribution, when given as a percentage or proportion, should sum to.
Conditional Distributions
- Given a “word problem” asking for the proportion of individuals with one characteristic who have another characteristic, identify the appropriate numerator and denominator using a two-way table.
- Explain why a conditional distribution is so-named.
- Given a two-way table, compute the conditional distributions for one of the categorical variables as percentages / proportions.
- Identify what a conditional distribution, when given as a percentage or proportion, should sum to.
Association versus Causation for Categorical Variables (Lecture Notes)
- Distinguish between association and causation for categorical variables.
- Posit potential “lurking variables” for an observed association between two categorical variables.
Chapter 6
Observation versus experiment
- Compare and constrast observational and experimental studies.
- Given the description of a study, identify whether the study was observational or experimental.
- Define confounding in terms of the observed association between an explanatory variable and a response variable.
- Propose confounding variables that might explain an assocation in an observational study.
Sampling
- Compare and contrast a population and a sample.
- Given the description of a study, identify the target population, the sample, and the actual population.
Sampling Designs
- Define sampling design.
- Explain bias in a sampling design.
- Given the description of a study and its sampling design, identify possible biases in its sampling design.
- Define a convenience sampling design.
- Define a simple random sample (SRS) sampling design.
R (Lecture Notes)
- Use R for basic arithmetic.
- Load a data frame into RStudio.
- Load a package (such as
mosaic
) into RStudio.
- State the grammar used by functions in
mosaic
.
- Generate a rug plot using
mosaic
’s gf_rugx
function.
- Generate a histogram using
mosaic
’s gf_histogram
function, and specify the bin width and starting boundary by passing arguments to gf_histogram
.
- Add a rug plot to a histogram generated by
mosaic
using the %>%
operator.
- Compute the sample mean, sample median, sample standard deviation, and five-number summary using
mosaic
in R.
- Plot a boxplot using
mosaic
in R.
- Construct a scatter plot using
mosaic
in R.
- Construct a scatter plot that includes a categorical variable using
mosaic
in R.
- Compute the sample correlation between two variables using
mosaic
in R.
- Compute \(\log_{10} (x)\) in R.
- Compute the sample correlation between transformed explanatory and response variables.
- Perform a simple linear regression using R.
- Relate the output of
lm
to the slope and intercept of the fitted simple linear regression equation.