STAT 250: Introduction to Biostatistics
Dr. Kari Lock Morgan
By the end of the course, you should be able to...
- Identify cases and variables in a dataset, and classify variables as categorical or quantitative.
- Recognize that data and knowledge of statistics allows you to investigate a wide variety of interesting phenomena.
- Distinguish between a sample and a population.
- Recognize when it is, and is not, appropriate to use sample data to infer information about a population.
- Recognize that not every association implies causation.
- Identify potential confounding variables in an observational study.
- Distinguish between an observational study and a randomized experiment.
- Recognize that only randomized experiments can lead to claims of causation and explain why randomization is important for causality.
- Explain how and why placebos and blinding are used in experiments.
- Distinguish between a completely randomized experiment and a matched pairs experiment.
- Design and implement a basic randomized experiment.
Exploratory Data Analysis
- Create (with technology) and interpret a dotplot, boxplot, or histogram, and side-by-side dotplots, boxplots, or histograms.
- Calculate (with technology) and interpret summary statistics for a quantitive variable, including mean, median, standard deviation, five number summary, range, and IQR, and be able to calculate and compare these within groups.
- Compute and interpret a z-score for an individual value.
- Interpret percentiles.
- Create (with technology) a scatterplot between two quantitative variables, and use the plot to describe the association.
- Explain what a positive or negative association means between two quantitative variables.
- Calculate (with technology) and interpret a correlation.
- Identify outliers (informally or formally) and explain how they effect different statistics.
- Realize that it is important to plot your data if any variables are quantitative.
- Create (with technology) bar graphs and side-by-side or segmented bar graphs for categorical variables.
- Create a frequency, relative frequency, or two-way table to summarize categorical variables.
- Use a frequency, relative frequency, or two-way table to calculate proportions, difference in proportions, odds, and odds ratios.
- Calculate and interpret conditional probabilities for categorical variables.
- Determine an appropriate numerical summary statistic(s) and visualization for any one or two variables being analyzed.
- Distinguish between a population parameter and a sample statistic, recognizing that a parameter is fixed while a statistic varies from sample to sample.
- Determine and define an appropriate parameter of interest, based on a question.
- Compute a point estimate for a parameter using an appropriate statistic from a sample.
- Recognize that a sampling distribution shows how sample statistics tend to vary, but that in reality a sampling distribution can never be obtained in situations where estimation is needed.
- Recognize that statistics from random samples tend to be centered at the population parameter.
- Explain how to generate a bootstrap distribution for a given sample and statistic.
- Use technology to generate a bootstrap distribution, and recognize that it will be centered around the sample statistic.
- Demonstrate an understanding of standard error as the standard deviation of the statistic.
- Calculate a standard error from a bootstrap distribution (using technology), and from a formula for means, difference in means, proportions, and difference in proportions.
- Recognize that a confidence interval will capture the true parameter for the specified percentage of all random samples.
- Use a bootstrap distribution to construct a 95% confidence interval using the formula statistic ± 2xSE.
- Use a bootstrap distribution to construct a confidence interval using percentiles of the bootstrap distribution.
- Use the normal or t-distribution to construct a confidence interval for a mean, proportion, difference in means, difference in proportions, or correlation using technology.
- Use the normal or t-distribution and the standard error formulas to constuct a confidence interval using the formula statistic ± z*xSE for proportions and difference in proportions or statistic ± t*xSE for means, difference in means, and slope.
- Interpret a confidence interval in context.
- Explain how sample size affects standard error and the width of a confidence interval.
- Demonstrate an understanding of the central limit theorem.
- Determine whether the conditions are met for the chosen method to be valid.
- Recognize when and why statistical tests are needed.
- Specify null and alternative parameters based on a question of interest, defining relevant parameters.
- Demonstrate an understanding of the concept of statistical significance.
- Recognize that the strength of evidence against the null hypothesis depends on how unlikely it would be to get a statistic as extreme just by random chance, if the null hypothesis were true.
- Use technology to generate a randomization distribution, and realize that it will be centered around the null parameter value.
- For a given sample and null hypothesis, describe the process of creating a randomization distribution.
- Use a randomization distribution to calculate a p-value.
- Connect the definition of a p-value to the motivation behind a randomization distribution.
- Distinguish between one and two-tailed tests in stating the alternative hypothesis and calculating the p-value.
- Interpret a p-value.
- Make a formal decision in a hypothesis test by comparing the p-value to the significance level.
- State the conclusion to a hypothesis test in context.
- Recognize that two types of errors can occur, and interpret false positives (Type I) and false negatives (Type II) in context.
- Recognize a significance level as the tolerable chance of getting a false positive (making a Type I error).
- Explain the problem of multiple testing and publication bias.
- Recognize that statistical significance is not always the same as practical significance.
- Make a less formal statement about the strength of evidence in a p-value.
- Determine the decision for a two-tailed hypohtesis test from the corresponding confidence interval.
- Use technology and the normal or t-distribution to calculate a p-value for tests for means, difference in means, proportions, difference in proportions, correlation, and slope.
- Use the normal or t-distribution, the standard error formulas, and the formula (statistic - null value)/SE to calculate a p-value for tests for means, difference in means, proportions, difference in proportions, correlation, and slope.
- Determine whether a chi-square goodness of fit test or a chi-square test for association is appropriate to answer a question of interest.
- State hypotheses for a chi-square goodness-of-fit test for one categorical variable and for a chi-square test for association for two categorical variables.
- Calculate the test statistic for a chi-square goodness-of-fit test and a chi-square test for association both with and without technology.
- Use a randomization distribution or a chi-square distribution to calculate a p-value for a chi-square test.
- State the conclusion in context for a chi-square goodness-of-fit test and a chi-square test for association.
- Determine whether the conditions are met to use a normal, t, or chi-square distribution for inference.
- Conduct a hypothesis test from start to finish for a variety of different situations.
- Determine whether a confidence interval, a hypothesis test, both, or neither is most appropriate for answering a question of interest.
- Use technology to find the regression line for two quantitative variables, giving the equation and plotting the line on a scatterplot.
- Calculate predicted values from a regression equation.
- Interpret the slope (and intercept, when appropriate) of a regression line in context.
- Calculate residuals and visualize residuals on a scatterplot.
- Beware of extrapolating when making predictions, fitting a line to nonlinear data, and the effect of outliers.
- Recognize the importance of plotting your data.
- Check a scatterplot for obvious violations of the assumptions of simple linear regression.
- Construct a confidence interval and test a hypothesis about the slope in a linear regression model.
- Compute (with technology) and interpret R2 in a regression model.
- Use technology to fit a multiple regression model.
- Interpret coefficients in a multiple regression model, recognizing that care should be taken when interpreting coefficients of predictors that are strongly associated with each other.
- Use a multiple regression model to make predictions.