Empirical Distributions from Data

Quantiles

  1. Define the sample quantile function \(\widehat{q}(p)\) from a data set \(X_{1}, X_{2}, \ldots, X_{n}\).
  2. Use R to compute the quantiles of a data set.
  3. Given the graph of a sample quantile function, identify the first, second, and third quartiles of the data set.

Empirical Cumulative Distribution Function

  1. Define the empirical cumulative distribution function \(\widehat{F}(x)\) from a data set \(X_{1}, X_{2}, \ldots, X_{n}\).
  2. Given a (small) data set, evaluate \(\widehat{F}\) at a given argument \(x\) by-hand.
  3. Use R to generate and evaluate the empirical cumulative distribution function from a data set.
  4. Explain how the sample quantile function and the empirical cumulative distribution function are related.

Histograms

  1. Explain the difference between frequency and density histograms and use hist() to plot both.
  2. Set the number of bins for a histogram generated by hist().
  3. Set the bin selection algorithm used by hist().

Kernel Density Estimates

  1. State the form of a kernel density estimate from a data set \(X_{1}, X_{2}, \ldots, X_{n}\).
  2. Draw a rough sketch of a kernel density estimate from a data set.
  3. State the properties that the kernel function used in a kernel density estimate must have.
  4. State the expression for the Gaussian kernel function.
  5. Explain how the bandwidth of the kernel function affects the kernel density estimate.
  6. Set the bandwidth used by density() to a specific value.
  7. Set the bandwidth selection method used by density().