David Darmon

# MA 151-02, Statistics with Applications

## Fall 2019

Section 02: Tuesday, 1:15 PM – 2:35 PM; Thursday, 11:40 AM – 1:00 PM, Howard Hall 308

Here's the official description:

Analysis of data, probability, random variables, normal distribution, sampling theory, confidence intervals, and statistical inference.

This course covers the process of statistical analysis from beginning to end. That process, in broad strokes, is as follows: we pose a scientific question, determine what experiments or observations might provide data towards answering that question, develop approaches to collecting that data, summarize the resulting data, and derive inferences relevant to the original scientific question. In the process, you will learn about sampling, descriptive statistics, probability, probability models, inferential statistics, confidence intervals, hypothesis tests, and regression. You will also learn how to analyze data using R, a programming language ideally suited for statistical computing.

#### Prerequisites

MA 101 or MA 105 passed with a grade of C- or higher, or Math Placement Level 3 or 4. Not open to computer science majors or to students required to complete MA 125, except software engineering majors.

#### Professor

 Dr. David Darmon ddarmon [at] monmouth.edu Howard Hall 241

This is currently a tentative listing of topics, in order.

Introduction: What is statistics? What types of questions can statistics answer? Types of data.
Descriptive statistics for quantitative data: Summaries of the entire data distribution: rug plot, dot plot, histogram, box plot. Measures of center: mean, median, mode. Measures of variation: range, standard deviation, quartiles.
Descriptive statistics for two quantitative variables: Scatterplots. Trendlines to summarize an association. Regression. Correlation.
Descriptive statistics for categorical data: Two-way tables. Marginals and conditionals of two-way tables. Lurking variables and confounding via three-way tables.
Origins of data: Experimental versus observational studies. Causation versus association. Methods of data collection and biases in data collection.
Probability: The origin of probability in games of chance. The interpretation of probability. Probabilities from random sampling of a two-way table. Probabilities from Venn diagrams.
Random variables: Random variables as 'numbers that could have been otherwise.' Random variables as an idealization of the data collection process. Relationship between random variables and histograms / density plots.
The normal distribution: The normal distribution and its properties. The normal distribution as an idealized model for a population distribution. Putting a population variable on a standard scale. Probabilities and quantiles for normally distributed populations.
Sampling distributions: The connection between a population distribution and the distribution of a sample statistic. Statistical properties of the sample mean under simple random sampling. The central limit theorem.
Confidence intervals: Confidence intervals for population means. Confidence intervals as interval estimators. The interpretation of confidence intervals. Using confidence intervals to distinguish between practical and statistical significance.
Hypothesis tests: Hypothesis tests for population means. Components of a hypothesis test. Types of error in hypothesis testing. Scientific hypotheses and statistical hypotheses. Statistical significance and practical significance. P-values.
Confidence intervals and hypothesis tests for two-sample problems: Two-sample tests for population means and their associated confidence intervals. One-sample and two-sample tests for population proportions. Tests for independence and homogeneity using two-way tables.
Correlation and regression: Simple linear regression. Interpretation of regression coefficients in the context of a population. Diagnostic plots and model checking for simple linear regression. Statistical properties of estimates of regression coefficients. Hypothesis tests and confidence intervals for population regression coefficients.
Analysis of Variance: One-way ANOVA. Relationship of one-way ANOVA to the two-sample $$t$$-test. The problem of multiple comparisons. Diagnostic plots for one-way ANOVA. Contrasts from one-way ANOVA.
Nonparametric tests for population means: Violations of the distributional assumptions of $$t$$-tests and ANOVA. Rank-based test statistics. Mann-Whitney-Wilcoxon test for two independent samples. Kruskal-Wallis test for one-way ANOVA.
See the end for the current lecture schedule, subject to revision. Homework and additional resources will be linked there, as available.

## Course Mechanics

#### Office Hours

 Tuesday,   03:00–04:00 PM Howard Hall 241 Thursday, 10:00–11:00 AM Howard Hall 241 Thursday, 01:30–02:30 PM Howard Hall 241 Friday,     09:00–10:00 AM Howard Hall 241

If you cannot make the scheduled office hours, please e-mail me about making an appointment.

60% for 3 in-class exams (20% each)
25% for a non-cumulative final exam
12% for homework problem sets
3% for class participation

#### Extra Credit

In addition to the main categories above, there are two opportunities for extra credit:

+5% for use of Anki (Instructions)
+5% for post-class reflections (Instructions)

Note: These are the only opportunities for extra credit in this course.

#### Homework

Homework will be assigned at the end of every class meeting, and listed on the Sapling page for this course. Homework assignments are due at the beginning of the next class meeting.

#### Attendance

Required. If you expect to miss 2-3 sessions of the course, you should take the course during another semester.

#### Examination Absences

If you miss an examination your grade will be zero for that exam. If you know you will be absent for an exam you must let me know at least one week in advance to schedule a make-up exam.

#### Textbook

The required textbook is:

• Brigitte Baldi and David S. Moore, The Practice of Statistics in the Life Sciences, 4th Edition (W. H. Freeman and Company, 2018, ISBN: 9781319013370).

#### Collaboration, Cheating, and Plagiarism

All submitted work should be your own. You are welcome and encouraged to consult with others while working on an assignment, including other students in the class and tutors in the Mathematics Learning Center. However, whenever you have had assistance with a problem, you must state so at the beginning of the problem solution. Unless this mechanism is abused, there will be no reduction in credit for using and reporting such assistance. This policy applies to both individual and group work. In group work, you only need to acknowledge help from outside the group. This policy does not apply to examinations.

#### Statement on Special Accommodations

Students with disabilities who need special accommodations for this class are encouraged to meet with me or the appropriate disability service provider on campus as soon as possible. In order to receive accommodations, students must be registered with the appropriate disability service provider on campus as set forth in the student handbook and must follow the University procedure for self-disclosure, which is stated in the University Guide to Services and Accommodations for Students with Disabilities. Students will not be afforded any special accommodations for academic work completed prior to the disclosure of the disability, nor will they be afforded any special accommodations prior to the completion of the documentation process with the appropriate disability office.

## R

We will use R, a programming language for statistical computing, throughout the semester for in-class activities and homework assignments. I will cover the relevant features of R throughout the course.

You can access R from any web accessible computer using RStudio Cloud. You will need to create an account on RStudio Cloud from their Registration page. I will send out a link via email for you to join a Space on RStudio Cloud for this course. Resources for homeworks, labs, etc., will be hosted on RStudio Cloud for easy access.

You can also install R on your personal computer, if you have one. You can install R by following the instructions for Windows here, for macOS here, or for Linux here. You will also want to install RStudio, and Integrated Development Environment for R, which you can find here.

We will use R as a scripting language and statistical calculator, and thus will not get into the nitty-gritty of programming in R. We will largely use functionality built into the mosaic library in R. You can find a comprehensive tutorial to using R and mosaic here.

## Anki

As stated in the Extra Credit section, you will have the opportunity to use Anki for spaced retrieval practice throughout the semester. Anki is open-source, free (as in both gratis and libre) software. You can download Anki to your personal computer from this link. If you have ever used flashcards, then Anki should be fairly intuitive. If you would like more details you can find Anki's User Manual here.

## Schedule

Subject to revision. Assignments and solutions will all be linked here, as they are available. All readings are from the textbook by Baldi and Moore unless otherwise noted.
September 3, Lecture 1:
Topics: Introduction to class. What is statistics? Types of data. Visual summaries of data: rug plots, histograms, and densities. Using R to generate visual summaries.
Sections: Chapter 1
Lab 1. Due Lecture 2.
September 5, Lecture 2:
Topics: Numerical summaries of data. Measures of center: mean, median, mode. Measures of spread: standard deviation, percentiles / quantiles, and quartiles. Boxplots.
Sections: Chapter 2
Demo for Numerical Summaries of Center and Spread
September 10, Lecture 3:
Topics: Summaries of two quantitative variables. Scatterplots. Including a categorical variable using color. Correlation.
Sections: Chapter 3
Demo for the Properties of the Sample Correlation
September 12, Lecture 4:
Topics: Summaries of two quantitative variables. Trendlines as a data summary device. Refresher on the equation of a line: $$y = mx + b$$. Trendlines and interpreting the slope and intercept. Trendlines as a prediction. How well does a trendline summarize the data?
Sections: Chapter 4
Demo for Refresher on Equation of a Line
September 17, Lecture 5:
Topics: Summaries of two categorical variables. Two-way tables. Marginals of a two-way table. Conditionals of a two-way table. Lurking variables and confounding via a three-way table.
Sections: Chapter 5
September 19, Lecture 6:
Topics: Where do data come from? Experimental versus observational studies. Causation versus association. Methods of data collection. Bias in data collection.
Sections: Chapter 6
September 24, Lecture 7:
Topics: Exam 1.
Sections: Exam on Chapters 1-6
Exam 1 Study Guide
September 26, Lecture 8:
Topics: Random chance and probability, and their relation to random sampling. Probabilities and their interpretation. Random variables: discrete and continuous. Probability distributions for discrete random variables. Density curves for continuous random variables. Querying probability distributions and density curves to determine the probability that a random variable $$X$$ takes a value.
Sections: Chapter 9
Demo for Simple Random Sampling and Random Variables
October 1, Lecture 9:
Topics: Normal random variables. A bell-shaped curve. Examples of quantities distributed according to a bell-shaped curve. The mean and standard deviation of a normal distribution. The standard normal distribution via re-centering and scaling. $$Z$$-scores for standardizing bell-shaped data. Probabilities and quantiles for normally distributed data using R.
Sections: Chapter 11
Demo for Properties of a Normal Distribution
Demo for Computing Normal Probability Queries Using R
Practice Computing Normal Probability Queries Using R
October 3, Lecture 10:
Topics: Connecting data to statistical models. Populations and samples. Parameters and statistics. The sampling distribution of a statistic. The sample mean. The mean and standard deviation of the sample mean. The central limit theorem.
Sections: Chapter 13
Demo of a Sampling Distribution via Enumeration
Demo of the Sampling Distribution of the Sample Mean $$\bar{X}$$ Under Random Sampling from a Population
Age at Time of Death By Current Age and Other Demographics by Kevin Stadler
October 10, Lecture 11:
Topics: Estimation of population parameters using a sample statistic. The sample mean as a procedure for estimating the population mean. Standard error of the sample mean. Estimating the standard error of the sample mean. Standardizing the sample mean. $$T$$-scores in place of $$Z$$-scores.
Sections: Chapter 14, Chapter 17
Demo of the "Black Box" Model for Inferential Statistics
October 15, Lecture 12:
Topics: Added uncertainty from estimating the standard error. The $$t$$-distribution. Degrees of freedom. $$t$$-values using R. A reasonable guess at the population mean. The $$T$$-based confidence interval for a population mean. Interpreting the confidence level of an interval estimator. How the width of a confidence interval depends on the sample size, precision of measurement, and confidence level.
Sections: Chapter 14, Chapter 15, Chapter 17
Demo of the Density Curve for the $$t$$-distribution with Varying Degrees of Freedom
Demo of the Dependence of a Confidence Interval for the Population Mean on $$\bar{x}, s_{X}, n$$ and $$c$$
Demo on Interpretation of the Confidence Level of an Interval Estimator
October 17, Lecture 13:
Topics: Exam 2.
Sections: Exam on Chapters 9, 11, 13, 14, and 17
Exam 2 Study Guide
October 22, Lecture 14:
Topics: Hypothesis testing. Making a claim about a population parameter. Identifying the null and alternative hypothesis based on the claim. Statistical hypotheses are always about populations. A rough hypothesis test for a population mean: how many standard errors is the sample mean from the null value?
Sections: Chapter 14, Chapter 17
Demo on the Rationale Behind Using a $$T$$-score to Test a Claim About a Population Mean
October 24, Lecture 15:
Topics: Hypothesis testing. Types of errors in hypothesis testing. The logic of hypothesis testing: all is fair in law and war. How likely is the observed result when the null hypothesis is true? Rejection regions for the one-sample $$t$$-test.
Sections: Chapter 14, Chapter 15, Chapter 17
Homework 13
Demo of Rejection Regions and $$P$$-values for a One-sample $$t$$-test
October 29, Lecture 16:
Topics: Hypothesis testing using $$P$$-values. Hypothesis testing using confidence intervals. The one-sample $$t$$-test using R. Practice with the one-sample $$t$$-test.
Sections: Chapter 17
Statistical Inference Worksheet. Due at beginning of Lecture 17.
October 31, Lecture 17:
Topics: Statistical hypotheses about two population means. Differences between population means. Unmatched populations. Estimating differences between populations with independent samples. Confidence intervals for differences of unmatched population means. The unpaired (independent) two-sample $$t$$-test.
Sections: Chapter 18
November 5, Lecture 18:
Topics: Matched populations. Estimating differences between matched population means using average difference scores. Confidence intervals for differences of matched population means. The paired two-sample $$t$$-test. Reminder of the assumptions of $$T$$-based inferential procedures. Diagnosing violations of Normality assumptions: density plots, Q-Q plots, and formal hypothesis tests. The perils of Normality testing.
Sections: Chapter 11, Chapter 17
Demo of Normality Diagnostics
November 7, Lecture 19:
Topics: Rank-based methods for testing population centers. Rank-based methods for one-sample and two-sample problems.
Sections: Chapter 27
Demo of Rank-based Tests (Wilcoxon's Rank-Sum and Signed-Rank Tests)
November 12, Lecture 20:
Topics: Exam 3.
Sections: Exam on Chapters 14, 15, 17, and 18
Exam 3 Study Guide
November 14, Lecture 21:
Topics: Parametric one-way ANOVA. Nonparametric one-way ANOVA.
Sections: Chapter 24, Chapter 27
One-way ANOVA Applet from Sapling
November 19, Lecture 22:
Topics: Hypothesis testing for categorical data in one-way tables. Statistical hypotheses about categorical data. The $$\chi^{2}$$ (chi-squared) statistic for testing claims about categorical data.
Sections: Chapter 21
November 21, Lecture 23:
Topics: Hypothesis testing for categorical data in two-way tables. Independence and association between two categorical variables. Tests of association between two categorical variables using the chi-squared statistic. Tests of association between two categorical variables using Fisher's exact test.
Sections: Chapter 22
November 26, Lecture 24:
Topics: Inferences about proportions for binary outcomes in two populations. Binary variables and binomial counts. Binomial proportions. Relative risks, odds, and odds ratios. Interpreting relative risks and odds ratios. Not interpreting the odds ratio as a relative risk. Confidence intervals for the relative risk and the odds ratio.
Sections: Chapter 9, Chapter 20
Demo of Relative Risk, Odds, and Odds Ratios
December 3, Lecture 25:
Topics: Statistical association between two quantitative variables. Simple linear regression. The simple linear regression model. The population intercept and slope. Assumptions of the simple linear regression model. Checking the validity of the simple linear regression model with diagnostic plots.
Sections: Chapter 23
Topics: The $$T$$-statistic for the sample slope. The estimate of the standard error of the sample slope. Confidence intervals for population slopes. Hypothesis tests for population slopes.