bcapistrant@smith.edu
, Burton 215, 413-585-3870).Population Health reflects the health outcome of a group of individuals, including the distribution of such outcomes within the group. More recently, this field of population health science has emerged quickly with an interdisciplinary endeavor to measure health at a population level, identify multi-level mechanisms and pathways to health, and evaluate programs and policies to improve population health.
This course uses a hands-on, data-centered, analysis-focused approach to introduce population health how to measure health within or between groups. The best illustations of population health core concepts will be those that you do yourself with real-world data. For students with more of a background in population/social sciences or in biological sciences, this course will be an opportunity to develop greater familiarity with employing data and using statistical and empirical approaches to understand population health. For students with more of a background or interest in statistics and data science but less exposure to health, this course will be an opportunity to apply those skills while developing some content experise in health.
Some of the course assignments will vary depending on your background so that everyone is appropriately challenged and gets to learn relevant skills. More concretely, there will be an “application” track for SDS majors or students who have taken multiple statistics courses and “computational/statistical development” track for non-majors and those looking to develop more skill with statistical analyses using R
. As this course may serve as the “application domain” for the SDS major, the “application” track activities may involve more reading, particularly of population health journal articles, to see how the statstical and data science skills you have learned in previous courses are used in population health. There may also be more advanced methods introduced for SDS students who have taken courses in regression modeling (SDS 291 or 293). Students in the “compuational/statistical development” track will have more opportunity to develop their famiiarity with R
and get an introduction to regression modeling. If you are unsure which “track” is better for you, we can discuss that during student hours.
This course will:
The course website will be the primary source for static course information - the course schedule, links to additional readings, lecture notes, etc. There will be a Slack channel for this course, which you should check regularly for questions about course readings and assignments. Assignments should be submitted to the course Moodle, which will also contain the gradebook where you can check your grade for the course.
Prerequisites:
One textbook is required:
Other readings (peer-reviewed journal articles, reports, blog posts, etc.) will be assigned as well; they will be linked via link from the schedule page.
If it has been a while since your introductory statistics course, you may wish to refer to the open-source textbook used in SDS 220, Intro Stat with Randomization and Simulation, to refresh your understanding of the basics of types of data (Ch. 1), hypothesis testing (Ch 2-4), and especially linear regression (Ch 5-6) and logistic regression (Ch 6).
Attendance is effectively required, as class participation and completion of in-class activities are both considerable parts the grade for the course. In cases were you are unable to attend class, you are expected to find out what the material that was covered in class from a classmate.
Please plan to treat me and your classmates with respect. This includes things like arriving at class on time, coming in quietly if you are late, and focusing on the task at hand.
Much of this course will operate on a collaborative basis, and you are expected and encouraged to work together with a partner or in small groups to study and complete assignments. However, every word that you write must be your own. Copying and pasting sentences, paragraphs, or blocks of code from another student is not acceptable and will receive no credit. No interaction with anyone but the instructors is allowed on any exams or quizzes. All students, staff and faculty are bound by the Smith College Honor Code, which Smith has had since 1944.
Smith College expects all students to be honest and committed to the principles of academic and intellectual integrity in their preparation and submission of course work and examinations. Students and faculty at Smith are part of an academic community defined by its commitment to scholarship, which depends on scrupulous and attentive acknowledgement of all sources of information, and honest and respectful use of college resources.
Cases of dishonesty, plagiarism, etc., will be reported to the Academic Honor Board.
Homework [20%]: Homework refers broadly to pre- or post-class work that prepares or applies your understanding of course material, typically in one of three forms: First, there will be several problem sets over the course of the semester that involve computational assignments in R
with written explanations. Second, homework may also involve completion of modules via DataCamp to get up to speed with R
. Third, homework will also include brief reflection papers based on course readings. All homework will be submitted electronically by midnight (11:55 pm) of the due date. Late assignments will lose points at the rate of 20% per day.
Quizzes [10%]: There will be 4 quizzes on Moodle throughout the semester. These will provide an opportunity for you to get some feedback on what you are learning and to ensure you have a solid understanding of key terminology and concepts.
Project & Presentation [50%]: You will work on a term project in a group of three over the course of the semester - an applied data analysis of a population health issue of your choosing (in consultation with Prof. Capistrant). This project will require developing content expertise in your topic to understand what has already been done and applying the analytical skills we develop in class to this topic. The project will primarily take the form of a written paper (~3500 words, 5 tables and figures, 20-40 references) and a poster presentation. The project will be broken into smaller assignments due throughout the semester.
Attendance and Participation [20%]: This class is a higher level class (i.e., a seminar) and relies on your active engagement. You are expected to attend class, and to come prepared to participate fully in class discussions and in-class activities based on the course readings.
When grading your written work, we are looking for solutions that are technically correct and reasoning that is clearly explained. Explanation and context of an answer are the most important component. Numerically correct answers, alone, are not sufficient on homework, tests or quizzes. Neatness and organization are valued, with brief, clear answers that explain your thinking. If I cannot read or follow your work, I cannot give you full credit for it.
The use of the R
statistical computing environment with the RStudio interface is thoroughly integrated into the course. You have two options for using RStudio
:
RStudio
on the web. The advantage of using the server version is that all of your work will be stored in the cloud, where it is automatically saved and backed up. This means that you can access your work from any computer with a web browser and an internet connection. The downside is that there are only 2 CPUs and 8 GB of RAM allocated to this machine, and you have to share those resources with each other and with all of the MTH220 students!RStudio
installed on your machine. Your laptop likely has at least 2 CPUs and at least 4 GB of RAM. The downside to this approach is that your work is only stored locally, but I get around this problem by keeping all of my work in a Dropbox folder.Note that you do not have to choose one or the other – you may use both. However, it is important that you understand the distinction so that you can keep track of your work. Both R
and RStudio
are free and open-source, and are installed on most computer labs on campus. Please see the Resources page for help with R
.
Unless otherwise noted, you should assume that it will be helpful to bring a laptop to class. It is not required, but since there are only three workstations in the classroom, we will need a critical mass (i.e. at least \(n/2\)) computers in the classroom pretty much every day.
Success in the final project will require the ability to write clearly: about your population health topic, your analysis methods, your results, and a clear discussion about what those results mean more broadly for your given topic. More generally, as a data analyst, your ability to communicate results, which may be technical in nature, to your audience, which is likely to be non-technical, is critical to your success. And as a population health scientist, your ability to explain why the topic and results matter in a way that engages the reader is also crucial.
There are Statistics TAs available from 7:00-9:00pm on Sunday–Thursday evenings in Burton-301. Although this course material is out of their perview, they may be able to help with implementation and basic data wrangling issues in R
. In addition, the Spinelli Center for Quantitative Learning (2nd Level of Neilson Library) supports students doing quantitative work across the curriculum. Data assistants are available for appointments; they will be a very useful resource for the data wrangling needed for the project. Your fellow students are also an excellent source for explanations, tips, etc. Multiple groups may be using the same datasets; use each other as resources.
The following is a brief outline of the course. Please refer to the complete day-to-day schedule for more detailed information.
Week | Dates | Reading | Topic |
---|---|---|---|
1 | 1/29-31 | Ch. 1,4 | Introduction to & Determinants of Population Health |
2 | 2/5-7 | Ch. 2,3 | Measures of Disease Occurrence; Health Indicators & Data |
3 | 2/12-14 | Ch. 3,5 | Sensitivity & Specificity; Causation |
4 | 2/19-21 | Ch. 5 | Associations & Effects; Risk, Odds, Probabilities; Absolute & Relative Measures |
5 | 2/26-28 | Ch. 5 | Modeling Risk - Regression |
6 | 3/5-7 | Ch. 5 | Error, Confounding, Interaction; Study Designs |
7 | 3/12-14 | Spring Recess | |
8 | 3/19-21 | Ch 5 | Sampling - Sampling, Complex Sampling |
9 | 3/26-28 | Ch 5 | Sampling - Weights & Standard Errors |
10 | 4/2-4 | TBD | Population Health Topics / Case Studies |
11 | 4/9-11 | TBD | Population Health Topics / Case Studies |
12 | 4/16-18 | Ch. 8 | Interventions to Improve Population Health |
13 | 4/23-25 | Other Population Health Data Sources: Social Media, Electronic Health Records | |
14 | 4/30-5/2 | Project Presentations | |
15 | 5/10 | All work due |