amcnamara@smith.edu
, Burton 215, 413-585-3851).Same as MTH 291. Formerly MTH 247. Theory and applications of regression techniques; linear and nonlinear multiple regression models, residual and influence analysis, correlation, covariance analysis, indicator variables and time series analysis. This course includes methods for choosing, fitting, evaluating and comparing statistical models and analyzes data sets taken from the natural, physical and social sciences.
The course website will be updated regularly with lecture handouts, project information, assignments, and other course resources. Homework and grades will be submitted to on Moodle. Please check both regularly.
Prerequisites:
One textbook is required:
If you are new to R, there is an optional text you can buy in the Smith bookstore or get as a PDF online:
Attendance is recommended, but not required. You are an adult and make your own priorities. If you are not in class, I will assume you understand the material that was covered. In the case where you may need an extended absense and feel it will impact your learning (e.g. an illness, death in the family, conference, etc), please let me know so we can find a way for you to make up the material.
Please plan to treat me and your classmates with respect. This includes things like arriving at class on time, coming in quietly if you are late, and focusing on the task at hand.
Much of this course will operate on a collaborative basis, and you are expected and encouraged to work together with a partner or in small groups to study, complete homework assignments, and prepare for exams. However, every word that you write must be your own. Copying and pasting sentences, paragraphs, or blocks of code from another student is not acceptable and will receive no credit. No interaction with anyone but the instructors is allowed on any exams or quizzes. All students, staff and faculty are bound by the Smith College Honor Code, which Smith has had since 1944.
Smith College expects all students to be honest and committed to the principles of academic and intellectual integrity in their preparation and submission of course work and examinations. Students and faculty at Smith are part of an academic community defined by its commitment to scholarship, which depends on scrupulous and attentive acknowledgement of all sources of information, and honest and respectful use of college resources.
Cases of dishonesty, plagiarism, etc., will be reported to the Academic Honor Board.
Homework [20%]: There will be several problem sets over the course of the semester. Problem sets will involve computational assignments in R
with written explanations. You must complete all of your homework assignments in R Markdown
. All homework will be submitted electronically by midnight (11:55 pm) of the due date. Late assignments will lose points at the rate of 20% per day.
Quizzes [5%]: There will be bi-weekly reading quizzes on Moodle. These will provide an opportunity for you to get instant feedback on what you are learning.
Project & Presentation [25%]: You will work on a term project in a group of three over the course of the semester. This is an opportunity for you to demonstrate your understanding of the material and put it into practice. More details about the project will follow.
Participation [5%]: Engagement with your group work (both in class and during the project). You are expected in fully participate. If you do so, you should recieve full credit for this portion of the grade.
Exams [45%]: There will be two, self-scheduled, closed-book exams. You may bring a calculator and one piece of paper of hand-written notes (double-sided). After the exam, you have the opportunity to raise your exam score by submitting corrections for any questions you got wrong. Smith College has had an academic honor code since 1944, and all students, staff and faculty are bound by this code. Cases of dishonesty, plagiarism, etc., will be reported to the Academic Honor Board.
When grading your written work, we are looking for solutions that are technically correct and reasoning that is clearly explained. Explanation and context of an answer are the most important component. Numerically correct answers, alone, are not sufficient on homework, tests or quizzes. Neatness and organization are valued, with brief, clear answers that explain your thinking. If we cannot read or follow your work, we cannot give you full credit for it.
The use of the R
statistical computing environment with the RStudio interface is thoroughly integrated into the course. You have two options for using RStudio
:
RStudio
on the web. The advantage of using the server version is that all of your work will be stored in the cloud, where it is automatically saved and backed up. This means that you can access your work from any computer with a web browser and an internet connection. The downside is that there are only 2 CPUs and 8 GB of RAM allocated to this machine, and you have to share those resources with each other and with all of the MTH/SDS 220 students!RStudio
installed on your machine. Your laptop likely has at least 2 CPUs and at least 4 GB of RAM. The downside to this approach is that your work is only stored locally, but I get around this problem by keeping all of my work in a Dropbox folder.Note that you do not have to choose one or the other – you may use both. However, it is important that you understand the distinction so that you can keep track of your work. Both R
and RStudio
are free and open-source, and are installed on most computer labs on campus. Please see the Resources page for help with R
.
Unless otherwise noted, you should assume that it will be helpful to bring a laptop to class. It is not required, but since there are only three workstations in the classroom, we will need a critical mass (i.e. at least \(n/2\)) computers in the classroom pretty much every day.
Your ability to communicate results, which may be technical in nature, to your audience, which is likely to be non-technical, is critical to your success as a data analyst. The assignments in this class will place an emphasis on the clarity of your writing.
There are Statistics TAs available from 7:00-9:00pm on Sunday–Thursday evenings in Burton-301. In addition, the Spinelli Center for Quantitative Learning (2nd Level of Neilson Library) supports students doing quantitative work across the curriculum, and has a Statistics Counselor available for appointments. Your fellow students are also an excellent source for explanations, tips, etc.
The following is a brief outline of the course. Please refer to the complete day-to-day schedule for more detailed information.
Week | Reading | Topic |
---|---|---|
0 | Ch. 0 | Introduction to the course |
1 | Ch. 1 | Introduction to modeling. Understanding statistical models, and the four-step process (Choose, Fit, Assess, Use). |
2 | Ch. 1 | Simple Linear Regression. Least squares estimates, fitting models, checking conditions, making transformations. |
3 | Ch. 2, MDSR | Inference in simple linear regression, hypothesis tests and confidence intervals. Regression and correlation. |
4 | Ch. 3 | Multiple Linear Regression. Fitting models, checking conditions, comparing models. |
5 | Ch. 3 | Oct. 8-11: Autumn Recess Thursday: CFAU in Multiple regression |
6 | Ch. 3 | Second-order models. Interaction terms and polynomials. Multicolinearity. Exam 1 |
7 | Ch. 4 | Model Selection. Using nested F-tests, added variable plots and stepwise regression. |
8 | Ch. 4 | Randomization & the Bootstrap as methods of inference. |
9 | Ch. 5 | ANOVA. Fitting models, checking conditions, interpretation of results. |
10 | Ch. 5 | Multiple Comparisons. Bonferroni corrections, Fisher’s LSD, Tukey’s HSD. |
Nov. 23-27 | Thanksgiving Recess | |
11 | Ch. 9, 10 | Logistic Regression and multiple logistic regression. Fitting and interpreting models. Checking conditions, formal inference. |
12 | Data wrangling. | |
13 | Exam 2 and Project Presentations | |
Dec. 22 | All work due |