Empirical Distributions from Data
Quantiles
- Define the sample quantile function \(\widehat{q}(p)\) from a data set \(X_{1}, X_{2}, \ldots, X_{n}\).
- Use R to compute the quantiles of a data set.
- Given the graph of a sample quantile function, identify the first, second, and third quartiles of the data set.
Empirical Cumulative Distribution Function
- Define the empirical cumulative distribution function \(\widehat{F}(x)\) from a data set \(X_{1}, X_{2}, \ldots, X_{n}\).
- Given a (small) data set, evaluate \(\widehat{F}\) at a given argument \(x\) by-hand.
- Use R to generate and evaluate the empirical cumulative distribution function from a data set.
- Explain how the sample quantile function and the empirical cumulative distribution function are related.
Histograms
- Explain the difference between frequency and density histograms and use
hist()
to plot both.
- Set the number of bins for a histogram generated by
hist()
.
- Set the bin selection algorithm used by
hist()
.
Kernel Density Estimates
- State the form of a kernel density estimate from a data set \(X_{1}, X_{2}, \ldots, X_{n}\).
- Draw a rough sketch of a kernel density estimate from a data set.
- State the properties that the kernel function used in a kernel density estimate must have.
- State the expression for the Gaussian kernel function.
- Explain how the bandwidth of the kernel function affects the kernel density estimate.
- Set the bandwidth used by
density()
to a specific value.
- Set the bandwidth selection method used by
density()
.