David Darmon

# MA 151-02, Statistics with Applications

## Fall 2019

Section 02: Tuesday, 1:15 PM – 2:35 PM; Thursday, 11:40 AM – 1:00 PM, Howard Hall 308

Here's the official description:

Analysis of data, probability, random variables, normal distribution, sampling theory, confidence intervals, and statistical inference.

This course covers the process of statistical analysis from beginning to end. That process, in broad strokes, is as follows: we pose a scientific question, determine what experiments or observations might provide data towards answering that question, develop approaches to collecting that data, summarize the resulting data, and derive inferences relevant to the original scientific question. In the process, you will learn about sampling, descriptive statistics, probability, probability models, inferential statistics, confidence intervals, hypothesis tests, and regression. You will also learn how to analyze data using R, a programming language ideally suited for statistical computing.

#### Prerequisites

MA 101 or MA 105 passed with a grade of C- or higher, or Math Placement Level 3 or 4. Not open to computer science majors or to students required to complete MA 125, except software engineering majors.

#### Professor

 Dr. David Darmon ddarmon [at] monmouth.edu Howard Hall 241

## Topics, Notes, Readings

This is currently a tentative listing of topics, in order.

Introduction: What is statistics? What types of questions can statistics answer? Types of data.
Descriptive statistics for quantitative data: Summaries of the entire data distribution: rug plot, dot plot, histogram, box plot. Measures of center: mean, median, mode. Measures of variation: range, standard deviation, quartiles.
Descriptive statistics for two quantitative variables: Scatterplots. Trendlines to summarize an association. Regression. Correlation.
Descriptive statistics for categorical data: Two-way tables. Marginals and conditionals of two-way tables. Lurking variables and confounding via three-way tables.
Origins of data: Experimental versus observational studies. Causation versus association. Methods of data collection and biases in data collection.
Probability: The origin of probability in games of chance. The interpretation of probability. Probabilities from random sampling of a two-way table. Probabilities from Venn diagrams.
Random variables: Random variables as 'numbers that could have been otherwise.' Random variables as an idealization of the data collection process. Relationship between random variables and histograms / density plots.
The normal distribution: The normal distribution and its properties. The normal distribution as an idealized model for a population distribution. Putting a population variable on a standard scale. Probabilities and quantiles for normally distributed populations.
Sampling distributions: The connection between a population distribution and the distribution of a sample statistic. Statistical properties of the sample mean under simple random sampling. The central limit theorem.
Confidence intervals: Confidence intervals for population means. Confidence intervals as interval estimators. The interpretation of confidence intervals. Using confidence intervals to distinguish between practical and statistical significance.
Hypothesis tests: Hypothesis tests for population means. Components of a hypothesis test. Types of error in hypothesis testing. Scientific hypotheses and statistical hypotheses. Statistical significance and practical significance. P-values.
Confidence intervals and hypothesis tests for two-sample problems: Two-sample tests for population means and their associated confidence intervals. One-sample and two-sample tests for population proportions. Tests for independence and homogeneity using two-way tables.
Correlation and regression: Simple linear regression. Interpretation of regression coefficients in the context of a population. Diagnostic plots and model checking for simple linear regression. Statistical properties of estimates of regression coefficients. Hypothesis tests and confidence intervals for population regression coefficients.
Analysis of Variance: One-way ANOVA. Relationship of one-way ANOVA to the two-sample $$t$$-test. The problem of multiple comparisons. Diagnostic plots for one-way ANOVA. Contrasts from one-way ANOVA.
Nonparametric tests for population means: Violations of the distributional assumptions of $$t$$-tests and ANOVA. Rank-based test statistics. Mann-Whitney-Wilcoxon test for two independent samples. Kruskal-Wallis test for one-way ANOVA.
See the end for the current lecture schedule, subject to revision. Homework and additional resources will be linked there, as available.

## Course Mechanics

#### Office Hours

 Tuesday,   03:00–04:00 PM Howard Hall 241 Thursday, 10:00–11:00 AM Howard Hall 241 Thursday, 01:30–02:30 PM Howard Hall 241 Friday,     09:00–10:00 AM Howard Hall 241

If you cannot make the scheduled office hours, please e-mail me about making an appointment.

Your final grade will be determined by:
60% for 3 in-class exams (20% each)
25% for a non-cumulative final exam
12% for homework problem sets
3% for class participation

#### Extra Credit

In addition to the main categories above, there are two opportunities for extra credit:

+5% for use of Anki (Instructions)
+5% for post-class reflections (Instructions)

Note: These are the only opportunities for extra credit in this course.

#### Homework

Homework will be assigned at the end of every class meeting, and listed on the Sapling page for this course. Homework assignments are due at the beginning of the next class meeting.

#### Attendance

Required. If you expect to miss 2-3 sessions of the course, you should take the course during another semester.

#### Examination Absences

If you miss an examination your grade will be zero for that exam. If you know you will be absent for an exam you must let me know at least one week in advance to schedule a make-up exam.

#### Textbook

The required textbook is:

• Brigitte Baldi and David S. Moore, The Practice of Statistics in the Life Sciences, 4th Edition (W. H. Freeman and Company, 2018, ISBN: 9781319013370).

#### Collaboration, Cheating, and Plagiarism

All submitted work should be your own. You are welcome and encouraged to consult with others while working on an assignment, including other students in the class and tutors in the Mathematics Learning Center. However, whenever you have had assistance with a problem, you must state so at the beginning of the problem solution. Unless this mechanism is abused, there will be no reduction in credit for using and reporting such assistance. This policy applies to both individual and group work. In group work, you only need to acknowledge help from outside the group. This policy does not apply to examinations.

#### Statement on Special Accommodations

Students with disabilities who need special accommodations for this class are encouraged to meet with me or the appropriate disability service provider on campus as soon as possible. In order to receive accommodations, students must be registered with the appropriate disability service provider on campus as set forth in the student handbook and must follow the University procedure for self-disclosure, which is stated in the University Guide to Services and Accommodations for Students with Disabilities. Students will not be afforded any special accommodations for academic work completed prior to the disclosure of the disability, nor will they be afforded any special accommodations prior to the completion of the documentation process with the appropriate disability office.

## R

We will use R, a programming language for statistical computing, throughout the semester for in-class activities and homework assignments. I will cover the relevant features of R throughout the course.

You can access R from any web accessible computer using RStudio Cloud. You will need to create an account on RStudio Cloud from their Registration page. I will send out a link via email for you to join a Space on RStudio Cloud for this course. Resources for homeworks, labs, etc., will be hosted on RStudio Cloud for easy access.

You can also install R on your personal computer, if you have one. You can install R by following the instructions for Windows here, for macOS here, or for Linux here. You will also want to install RStudio, and Integrated Development Environment for R, which you can find here.

We will use R as a scripting language and statistical calculator, and thus will not get into the nitty-gritty of programming in R. We will largely use functionality built into the mosaic library in R. You can find a comprehensive tutorial to using R and mosaic here.

## Anki

As stated in the Extra Credit section, you will have the opportunity to use Anki for spaced retrieval practice throughout the semester. Anki is open-source, free (as in both gratis and libre) software. You can download Anki to your personal computer from this link. If you have ever used flashcards, then Anki should be fairly intuitive. If you would like more details you can find Anki's User Manual here.

## Schedule

Subject to revision. Assignments and solutions will all be linked here, as they are available. All readings are from the textbook by Baldi and Moore unless otherwise noted.
September 3, Lecture 1:
Topics: Introduction to class. What is statistics? Types of data. Visual summaries of data: rug plots, histograms, and densities. Using R to generate visual summaries.
Sections: Chapter 1
Assigned Reading and Learning Objectives
Lab 1. Due Lecture 2.
September 5, Lecture 2:
Topics: Numerical summaries of data. Measures of center: mean, median, mode. Measures of spread: standard deviation, percentiles / quantiles, and quartiles. Boxplots.
Sections: Chapter 2
Assigned Reading and Learning Objectives
Shiny Demo for Numerical Summaries of Center and Spread
September 10, Lecture 3:
Topics: Summaries of two quantitative variables. Scatterplots. Including a categorical variable using color. Correlation.
Sections: Chapter 3
Assigned Reading and Learning Objectives
Shiny Demo for the Properties of the Sample Correlation
September 12, Lecture 4:
Topics: Summaries of two quantitative variables. Trendlines as a data summary device. Refresher on the equation of a line: $$y = mx + b$$. Trendlines and interpreting the slope and intercept. Trendlines as a prediction. How well does a trendline summarize the data?.
Sections: Chapter 4
Assigned Reading and Learning Objectives
Demo for Refresher on Equation of a Line
September 17, Lecture 5:
Topics: Summaries of two categorical variables. Two-way tables. Marginals of a two-way table. Conditionals of a two-way table. Lurking variables and confounding via a three-way table.
Sections: Chapter 5
Assigned Reading and Learning Objectives
September 19, Lecture 6:
Topics: Where do data come from? Experimental versus observational studies. Causation versus association. Methods of data collection. Bias in data collection.
Sections: Chapter 6
Assigned Reading and Learning Objectives
September 24, Lecture 7:
Topics: Exam 1.
Sections: Exam on Chapters 1-6
Exam 1 Study Guide
September 26, Lecture 8:
Topics: Random chance and probability. Probabilities and their interpretation. Probabilities from a two-way table. Probabilities from a Venn diagram. Random variables from histograms. Mean and standard deviation of a random variable .
Sections: Chapter 9
October 1, Lecture 9:
Topics: Normal random variables. A bell-shaped curve. Examples of quantities distributed according to a bell-shaped curve. The mean and standard deviation of a normal distribution. The standard normal distribution via re-centering and scaling. $$Z$$-scores for standardizing bell-shaped data. Probabilities and quantiles for normally distributed data using R.
Sections: Chapter 11
October 3, Lecture 10:
Topics: Connecting data to statistical models. Populations and samples. Parameters and statistics. The sampling distribution of a statistic. The sample mean. The mean and standard deviation of the sample mean. The central limit theorem.
Sections: Chapter 13
October 10, Lecture 11:
Topics: Estimation of population parameters using a sample statistic. The sample mean as a procedure for estimating the population mean. Standard error of the sample mean. Estimating the standard error of the sample mean. Standardizing the sample mean. $$T$$-scores in place of $$Z$$-scores.
Sections: Chapter 14, Chapter 17
October 15, Lecture 12:
Topics: Added uncertainty from estimating the standard error. The $$t$$-distribution. Degrees of freedom. $$t$$-values using R. A reasonable guess at the population mean. The $$T$$-based confidence interval for a population mean. Interpreting the confidence level of an interval estimator. How the width of a confidence interval depends on the sample size, precision of measurement, and confidence level.
Sections: Chapter 14, Chapter 15, Chapter 17
October 17, Lecture 13:
Topics: Exam 2.
Sections: Exam on Chapters 9, 11, 13, 14, and 17
October 22, Lecture 14:
Topics: Hypothesis testing. Making a claim about a population parameter. Identifying the null and alternative hypothesis based on the claim. Statistical hypotheses are always about populations. A rough hypothesis test for a population mean: how many standard errors is the sample mean from the null value?.
Sections: Chapter 14, Chapter 17
October 24, Lecture 15:
Topics: Hypothesis testing. Types of errors in hypothesis testing. The logic of hypothesis testing: all is fair in law and war. How likely is the observed result when the null hypothesis is true? $$P$$-values for $$T$$-statistics. Hypothesis testing using $$P$$-values. Hypothesis testing using confidence intervals.
Sections: Chapter 14, Chapter 15, Chapter 17
October 29, Lecture 16:
Topics: Statistical hypotheses about two populations. Differences between population means. Estimating differences between population means using differences between sample means. Estimating differences with independent samples. Confidence intervals for differences in population means. Two sample $$T$$-test.
Sections: Chapter 18
October 31, Lecture 17:
Topics: Statistical hypotheses about two matched populations. Differences between matched population means. Estimating differences between matched population means using average difference scores. Confidence intervals for differences of matched population means. Paired sample $$T$$-test.
Sections: Chapter 17
November 5, Lecture 18:
Topics: Reminder of the assumptions of $$T$$-based inferential procedures. Testing normality assumptions: histograms, Q-Q plots, and formal hypothesis tests. The perils of normality testing. Rank-based methods for testing population centers. Rank-based methods for one-sample and two-sample problems.
Sections: Chapter 11, Chapter 27
November 7, Lecture 19:
Topics: Hypothesis testing for categorical data in one-way tables. Statistical hypotheses about categorical data. The $$\chi^{2}$$ (chi-squared) statistic for testing claims about categorical data.
Sections: Chapter 21
November 12, Lecture 20:
Topics: Exam 3.
Sections: Exam on Chapters 14, 15, 17, 18, and 27
November 14, Lecture 21:
Topics: Statistical hypotheses about two population proportions. Claims about two population proportions. Testing claims about two population proportions using the chi-squared statistic.
Sections: Chapter 22
November 19, Lecture 22:
Topics: Hypothesis testing for categorical data in two-way tables. Independence and association between two categorical variables. Tests of association between two categorical variables using the chi-squared statistic. Tests of association between two categorical variables using Fisher's exact test.
Sections: Chapter 22
November 21, Lecture 23:
Topics: Statistical association between two quantitative variables. Simple linear regression. The simple linear regression model. The population intercept and slope. Assumptions of the simple linear regression model. Checking the validity of the simple linear regression model with diagnostic plots.
Sections: Chapter 23
November 26, Lecture 24:
Topics: Confidence intervals for the population intercept and slope. Hypothesis tests for the population intercept and slope. A reminder about statistical versus practical significance. A reminder that association does not imply causation.
Sections: Chapter 23
December 3, Lecture 25:
Topics: Parametric one-way ANOVA. Nonparametric one-way ANOVA.
Sections: Chapter 24, Chapter 27
December 5, Lecture 26:
Topics: Review for Final Exam.
Sections: Exam on Chapters 21, 22, 23, 24, and 27