Data Science in Infectious Disease Modeling using R
Online Lab Schedule:
- Monday, June 22, 1:00 - 2:30 PM ET and 3:00 - 4:30 PM ET
- Tuesday June 23, 1:00 - 2:30 PM ET and 3:00 - 4:30 PM ET
Classroom: Virtual
Module Summary:
This course will foster a problem-solving mindset while exploring advanced concepts in R programming and data science. The course is designed for participants who already have solid experience coding in R (both base R and Tidyverse). We will explore multiple approaches to coding analyses, and the content will serve as a guide to help you decide which approach is best for your particular situation.
The course draws concepts from R for Data Science, Advanced R, and R for Epidemiology books, along with the instructors' experiences working with infectious disease research data. We will work in both base R and the Tidyverse to wrangle messy data and build analytic workflows tailored to public health applications.
Prerequisites:
Prior experience coding in R is essential for success in this course. This is an intermediate course that builds on existing R programming skills. Participants must have hands-on experience with all of the following:
- RStudio/Posit Studio: Comfortable navigating the RStudio interface, managing projects, and writing scripts
- Base R fundamentals: Experience with data structures (vectors, lists, data frames), functions, control flow, and subsetting
- Tidyverse: Working knowledge of dplyr, tidyr, and ggplot2 for data manipulation and visualization
- Pipes: Regular use of pipe operators (%>% and/or |>) to chain operations
- R Markdown: Ability to create reproducible documents combining code, output, and narrative text.
Note: If you are new to R or have only completed an introductory R course, we recommend first taking a foundational R programming course before enrolling in this module. Participants without the prerequisite experience may find it difficult to keep pace with the course material.
Course Objectives
By the end of this course, participants will be able to:
- Apply advanced data wrangling techniques to clean, transform, and prepare complex public health datasets for analysis
- Develop robust, reproducible analytic workflows using both base R and Tidyverse approaches
- Evaluate multiple coding approaches and select optimal methods for specific data challenges
- Handle specialized public health data types including survey data, simulation outputs, time series data, and line lists
- Implement best practices for working with personally identifiable information (PII) and sensitive health data
- Debug complex R code and troubleshoot common data wrangling challenges
Module Content:
This course will consist of three interconnected themes:
- Theme 1: Foundations in Data Science – Advanced concepts and best practices not typically covered in introductory R programming courses
- Theme 2: Advanced Data Wrangling – Techniques tailored to different types of public health data
- Theme 3: Special Considerations – SISMID-specific applications and public health data challenges
Required Software
- R Statistical Programming Language (version 4.0 or higher recommended)
- RStudio/Posit Studio (desktop or cloud)
Recommended Reading
- R for Data Science by Hadley Wickham and Garrett Grolemund
- Advanced R by Hadley Wickham
The Epidemiologist R Handbook
Instructors

Sarah Bowden, PhD
Lead Data Scientist, Division of Global Migration Health at CDC
Dr. Sarah Bowden is a Lead Data Scientist in the Division of Global Migration Health at the CDC. She has been coding in R since 2007 and has enjoyed seeing the Tidyverse develop and grow over time. Dr. Bowden uses Tidyverse tools and best practices in her day-to-day coding activities and has trained and mentored 20+ undergraduate, graduate, and postdoctoral fellows in data science and public health analytics over the past 9 years.

Raj Reni Kaul, PhD
Health Scientist (Data Scientist), Immunization Services Division at CDC
Dr. Reni Kaul is a Health Scientist in the Immunization Services Division at the CDC. She is a certified Carpentries Instructor and is committed to creating an inclusive learning environment. She has previously designed and taught coding courses in R for undergraduate and graduate students.
Required Software:
-
R Statistical Programming Language
-
RStudio/Posit Studio (desktop or cloud)
Recommended Reading:

