Here's the official description:

Analysis of data, probability, random variables, normal distribution, sampling theory, confidence intervals, and statistical inference.

This course covers the process of statistical analysis from beginning to end. That process, in broad strokes, is as follows: we pose a scientific question, determine what experiments or observations might provide data towards answering that question, develop approaches to collecting that data, summarize the resulting data, and derive inferences relevant to the original scientific question. In the process, you will learn about sampling, descriptive statistics, probability, probability models, inferential statistics, confidence intervals, hypothesis tests, and regression. You will also learn how to analyze data using R, a programming language ideally suited for statistical computing.

MA 101 or MA 105 passed with a grade of C- or higher, or Math Placement Level 3 or 4. Not open to computer science majors or to students required to complete MA 125, except software engineering majors.

Dr. David Darmon | ddarmon [at] monmouth.edu | |

Howard Hall 241 |

This is currently a *tentative* listing of topics, in order.

*Introduction:*What is statistics? What types of questions can statistics answer? Types of data.*Descriptive statistics for quantitative data:*Summaries of the entire data distribution: rug plot, dot plot, histogram, box plot. Measures of center: mean, median, mode. Measures of variation: range, standard deviation, quartiles.*Descriptive statistics for two quantitative variables:*Scatterplots. Trendlines to summarize an association. Regression. Correlation.*Descriptive statistics for categorical data:*Two-way tables. Marginals and conditionals of two-way tables. Lurking variables and confounding via three-way tables.*Origins of data:*Experimental versus observational studies. Causation versus association. Methods of data collection and biases in data collection.*Probability:*The origin of probability in games of chance. The interpretation of probability. Probabilities from random sampling of a two-way table. Probabilities from Venn diagrams.*Random variables:*Random variables as 'numbers that could have been otherwise.' Random variables as an idealization of the data collection process. Relationship between random variables and histograms / density plots.*The normal distribution:*The normal distribution and its properties. The normal distribution as an idealized model for a population distribution. Putting a population variable on a standard scale. Probabilities and quantiles for normally distributed populations.*Sampling distributions:*The connection between a population distribution and the distribution of a sample statistic. Statistical properties of the sample mean under simple random sampling. The central limit theorem.*Confidence intervals:*Confidence intervals for population means. Confidence intervals as interval estimators. The interpretation of confidence intervals. Using confidence intervals to distinguish between practical and statistical significance.*Hypothesis tests:*Hypothesis tests for population means. Components of a hypothesis test. Types of error in hypothesis testing. Scientific hypotheses and statistical hypotheses. Statistical significance and practical significance.*P*-values.*Confidence intervals and hypothesis tests for two-sample problems:*Two-sample tests for population means and their associated confidence intervals. One-sample and two-sample tests for population proportions. Tests for independence and homogeneity using two-way tables.*Correlation and regression:*Simple linear regression. Interpretation of regression coefficients in the context of a population. Diagnostic plots and model checking for simple linear regression. Statistical properties of estimates of regression coefficients. Hypothesis tests and confidence intervals for population regression coefficients.*Analysis of Variance:*One-way ANOVA. Relationship of one-way ANOVA to the two-sample \(t\)-test. The problem of multiple comparisons. Diagnostic plots for one-way ANOVA. Contrasts from one-way ANOVA.*Nonparametric tests for population means:*Violations of the distributional assumptions of \(t\)-tests and ANOVA. Rank-based test statistics. Mann-Whitney-Wilcoxon test for two independent samples. Kruskal-Wallis test for one-way ANOVA.

Tuesday, 03:00–04:00 PM | Howard Hall 241 |

Thursday, 10:00–11:00 AM | Howard Hall 241 |

Thursday, 01:30–02:30 PM | Howard Hall 241 |

Friday, 09:00–10:00 AM | Howard Hall 241 |

If you cannot make the scheduled office hours, please e-mail me about making an appointment.

- 60% for 3 in-class exams (20% each)
- 25% for a non-cumulative final exam
- 12% for homework problem sets
- 3% for class participation

In addition to the main categories above, there are **two** opportunities for extra credit:

- +5% for use of Anki (Instructions)
- +5% for post-class reflections (Instructions)

**Note:** These are the **only** opportunities for extra credit in this course.

The **required** textbook is:

- Brigitte Baldi and David S. Moore,
*The Practice of Statistics in the Life Sciences*, 4th Edition (W. H. Freeman and Company, 2018, ISBN: 9781319013370).

We will use R, a programming language for statistical computing, throughout the semester for in-class activities and homework assignments. I will cover the relevant features of R throughout the course.

You can access R from any web accessible computer using RStudio Cloud. You will need to create an account on RStudio Cloud from their Registration page. I will send out a link via email for you to join a Space on RStudio Cloud for this course. Resources for homeworks, labs, etc., will be hosted on RStudio Cloud for easy access.

You can also install R on your personal computer, if you have one. You can install R by following the instructions for Windows here, for macOS here, or for Linux here. You will also want to install RStudio, and Integrated Development Environment for R, which you can find here.

We will use R as a scripting language and statistical calculator, and thus will not get into the nitty-gritty of programming in R. We will largely use functionality built into the `mosaic` library in R. You can find a comprehensive tutorial to using R and `mosaic` here.

As stated in the **Extra Credit** section, you will have the opportunity to use Anki for spaced retrieval practice throughout the semester. Anki is open-source, free (as in both *gratis* and *libre*) software. You can download Anki to your personal computer from this link. If you have ever used flashcards, then Anki should be fairly intuitive. If you would like more details you can find Anki's User Manual here.

- September 3, Lecture 1:
**Topics:**Introduction to class. What is statistics? Types of data. Visual summaries of data: rug plots, histograms, and densities. Using R to generate visual summaries.**Sections:**Chapter 1- Assigned Reading and Learning Objectives
**Lab 1.**Due Lecture 2.- September 5, Lecture 2:
**Topics:**Numerical summaries of data. Measures of center: mean, median, mode. Measures of spread: standard deviation, percentiles / quantiles, and quartiles. Boxplots.**Sections:**Chapter 2- Assigned Reading and Learning Objectives
- Shiny Demo for Numerical Summaries of Center and Spread
- September 10, Lecture 3:
**Topics:**Summaries of two quantitative variables. Scatterplots. Including a categorical variable using color. Correlation.**Sections:**Chapter 3- Assigned Reading and Learning Objectives
- Shiny Demo for the Properties of the Sample Correlation
- September 12, Lecture 4:
**Topics:**Summaries of two quantitative variables. Trendlines as a data summary device. Refresher on the equation of a line: \(y = mx + b\). Trendlines and interpreting the slope and intercept. Trendlines as a prediction. How well does a trendline summarize the data?.**Sections:**Chapter 4- Assigned Reading and Learning Objectives
- Demo for Refresher on Equation of a Line
- September 17, Lecture 5:
**Topics:**Summaries of two categorical variables. Two-way tables. Marginals of a two-way table. Conditionals of a two-way table. Lurking variables and confounding via a three-way table.**Sections:**Chapter 5- Assigned Reading and Learning Objectives
- September 19, Lecture 6:
**Topics:**Where do data come from? Experimental versus observational studies. Causation versus association. Methods of data collection. Bias in data collection.**Sections:**Chapter 6- Assigned Reading and Learning Objectives
- September 24, Lecture 7:
**Topics:**Exam 1.**Sections:**Exam on Chapters 1-6- Exam 1 Study Guide
- September 26, Lecture 8:
**Topics:**Random chance and probability. Probabilities and their interpretation. Probabilities from a two-way table. Probabilities from a Venn diagram. Random variables from histograms. Mean and standard deviation of a random variable .**Sections:**Chapter 9- October 1, Lecture 9:
**Topics:**Normal random variables.*A*bell-shaped curve. Examples of quantities distributed according to a bell-shaped curve. The mean and standard deviation of a normal distribution. The standard normal distribution via re-centering and scaling. \(Z\)-scores for standardizing bell-shaped data. Probabilities and quantiles for normally distributed data using R.**Sections:**Chapter 11- October 3, Lecture 10:
**Topics:**Connecting data to statistical models. Populations and samples. Parameters and statistics. The sampling distribution of a statistic. The sample mean. The mean and standard deviation of the sample mean. The central limit theorem.**Sections:**Chapter 13- October 10, Lecture 11:
**Topics:**Estimation of population parameters using a sample statistic. The sample mean as a procedure for estimating the population mean. Standard error of the sample mean. Estimating the standard error of the sample mean. Standardizing the sample mean. \(T\)-scores in place of \(Z\)-scores.**Sections:**Chapter 14, Chapter 17- October 15, Lecture 12:
**Topics:**Added uncertainty from estimating the standard error. The \(t\)-distribution. Degrees of freedom. \(t\)-values using R. A reasonable guess at the population mean. The \(T\)-based confidence interval for a population mean. Interpreting the confidence level of an interval estimator. How the width of a confidence interval depends on the sample size, precision of measurement, and confidence level.**Sections:**Chapter 14, Chapter 15, Chapter 17- October 17, Lecture 13:
**Topics:**Exam 2.**Sections:**Exam on Chapters 9, 11, 13, 14, and 17- October 22, Lecture 14:
**Topics:**Hypothesis testing. Making a claim about a population parameter. Identifying the null and alternative hypothesis based on the claim. Statistical hypotheses are always about populations. A rough hypothesis test for a population mean: how many standard errors is the sample mean from the null value?.**Sections:**Chapter 14, Chapter 17- October 24, Lecture 15:
**Topics:**Hypothesis testing. Types of errors in hypothesis testing. The logic of hypothesis testing: all is fair in law and war. How likely is the observed result when the null hypothesis is true? \(P\)-values for \(T\)-statistics. Hypothesis testing using \(P\)-values. Hypothesis testing using confidence intervals.**Sections:**Chapter 14, Chapter 15, Chapter 17- October 29, Lecture 16:
**Topics:**Statistical hypotheses about two populations. Differences between population means. Estimating differences between population means using differences between sample means. Estimating differences with independent samples. Confidence intervals for differences in population means. Two sample \(T\)-test.**Sections:**Chapter 18- October 31, Lecture 17:
**Topics:**Statistical hypotheses about two matched populations. Differences between matched population means. Estimating differences between matched population means using average difference scores. Confidence intervals for differences of matched population means. Paired sample \(T\)-test.**Sections:**Chapter 17- November 5, Lecture 18:
**Topics:**Reminder of the assumptions of \(T\)-based inferential procedures. Testing normality assumptions: histograms, Q-Q plots, and formal hypothesis tests. The perils of normality testing. Rank-based methods for testing population centers. Rank-based methods for one-sample and two-sample problems.**Sections:**Chapter 11, Chapter 27- November 7, Lecture 19:
**Topics:**Hypothesis testing for categorical data in one-way tables. Statistical hypotheses about categorical data. The \(\chi^{2}\) (chi-squared) statistic for testing claims about categorical data.**Sections:**Chapter 21- November 12, Lecture 20:
**Topics:**Exam 3.**Sections:**Exam on Chapters 14, 15, 17, 18, and 27- November 14, Lecture 21:
**Topics:**Statistical hypotheses about two population proportions. Claims about two population proportions. Testing claims about two population proportions using the chi-squared statistic.**Sections:**Chapter 22- November 19, Lecture 22:
**Topics:**Hypothesis testing for categorical data in two-way tables. Independence and association between two categorical variables. Tests of association between two categorical variables using the chi-squared statistic. Tests of association between two categorical variables using Fisher's exact test.**Sections:**Chapter 22- November 21, Lecture 23:
**Topics:**Statistical association between two quantitative variables. Simple linear regression. The simple linear regression model. The population intercept and slope. Assumptions of the simple linear regression model. Checking the validity of the simple linear regression model with diagnostic plots.**Sections:**Chapter 23- November 26, Lecture 24:
**Topics:**Confidence intervals for the population intercept and slope. Hypothesis tests for the population intercept and slope. A reminder about statistical versus practical significance. A reminder that association does not imply causation.**Sections:**Chapter 23- December 3, Lecture 25:
**Topics:**Parametric one-way ANOVA. Nonparametric one-way ANOVA.**Sections:**Chapter 24, Chapter 27- December 5, Lecture 26:
**Topics:**Review for Final Exam.**Sections:**Exam on Chapters 21, 22, 23, 24, and 27