You will conduct a quantitative research study in collaboration with the Smith Botanic Garden. A list of potential projects ideas are presented below, and you will work within your chosen framework. Most (if not all) studies will be experimental, meaning you will manipulate factors and randomly assign levels of those factors to units. This task will require you to write a project proposal, design a study, implement that study, collect and analyze relevant data, present your results on poster, and hand in a written report describing your study and its findings. Your project must involve fitting an analysis of avriance (ANOVA) model. ANOVA is the primary statistical technique we are learning in this course. The project is an opportunity to apply and synthesize the material we are learning in class to a specific context. It will also give you the opportunity to show off what you’ve learned about data analysis, visualization, and variance decomposition in this course and from other SDS course you may have taken. It is a major component of the class, and successful completion is required to pass.

Group Formation

You will work in relatively large groups of four to five, chosen by you (with my help). These are relatively large groups, so I will be giving recommendations for how to effectively divide up the work based on your strengths. Each group will pick a group name, and you will use that name with in all communications from that point forward.

Assignment

After a class visit from Tim Johnson, the Director of the Smith Botanic Garden, and a field trip to the Botanic Garden, you will pick from one the the potential project ideas below. Within this framework you should pose a clear research question that you find interesting and decide what factor(s) you will manipulate to test your research question. Further, you will need to decide on an appropriate response variable to measure, and the experimental units for your study.

Think carefully about your experimental units—they are not necessarily plants or people. It might be a Ziploc bag full of seeds, a days worth of social media activity, etc. (see below).

You projects might examine the following research questions:

  1. What factors contribute to milkweed seed germination?

  2. Can exposure to the Botanic Garden improve mental health?

  3. Can pictures, or messages, in the Botanic Garden lobby influence which plants visitors see/remember from their visit?

  4. What factors contribute to perceptions of the Botanic Garden as an inclusive space? Special student hours during bulb show (i.e., unity groups). Talk back-wall.

  5. What is the typical route visitors on self-guided tours take around the Garden? Can these routes be changed by manipulating signage, and/or placing arrows on the ground, etc?

  6. What features of the Botanic Garden’s Instagram posts contribute to the most views, liking, comments, and sharing?

These questions are meant to get you started, you should pose the problem that you want to solve as precisely as possible at the outset. Next, identify the factor(s) of interest, the levels of these factors (i.e., treatment conditions), what the experimental unit will be, and think about how you will measure your response variable. What kind of design will give you the clearest answer to your quest, be the most efficient, and the most feasible? You should also make a hypothesis, a priori (before you analyze the data), about the results you expect to see. Remember that each design comes with a set of questions that it can answer. For example, think about the the three important questions from Kelly’s hamster study.

After the visit from Tim, our field trip to the Botanic Garden, and you submit your project proposal, I would like to meet with each group to get you started. We will be setting up these meetings for the two weeks after your proposal is due. After the proposal, a few weeks later you will need to submit a project update/method section draft.

General Rules

You may discuss your project with other students, but each of you will have a different topic, so there is a limit to how much you can help each other. You may consult other sources for information about the non-statistical, substantive issues in your problem, but you should credit these sources in your report. Feel free to consult Randi or the Stats TAs about statistical questions.

Timeline

Please see the project schedule for due dates.

Submission

All deliverables described above must be delivered electronically via Moodle by 11:55pm (five minutes before midnight) on the dates in the schedule. Only one person from the group should submit the group’s product for each checkpoint (with the exception of the Group Dynamic, which is individual).

Components

Proposal

Count on brainstorming at least half a dozen serious ideas before you can groom one of them into a mature proposal. For the most part, within each general topic, the choice of factors to manipulate is left up to you. Try to pick something that’s interesting yet substantial and worth studying.

Content

Your project proposals should be a Markdown file (*.Rmd) containing the following content:

  1. Group Members: List the members of your group

  2. Title: Your title

  3. Purpose: Describe the general topic/phenomenon you chose to study, as well some focused research questions that you hope to answer. Lastly, some general hypotheses that you intend to test.

  4. Response Variable: What is the response variable? What is its scale of measurement? Estimate the range of possible values that it may take on. What instrument/questions will you use to measure it?

  5. Experimental Units: Specify what the experimental units will be (i.e. what are the things that you will be assigning treatments to?). Is there any blocking? That is, can you (or will you have to) cluster the smallest units into larger units? Further, describe the larger population/phenomenon to which you’ll try to generalize.

  6. Factors and Levels: Describe each factor and the levels of these factors. If you are examining more than one factor, are these factors crossed? How many conditions (cells) does that create? How will you actually conduct this manipulation? Be as specific as you can at this point. What stimuli and materials will you need to create?

  7. Overall Design: Formally describe the overall research design that you plan to use for this study? What is its name? How do the factors relate to each other? Is there any blocking? Any factorial crossing? Are some factors manipulated between blocks while other factors are manipulated within blocks?

Technical Report

Your technical report will be an annotated R Markdown file (.Rmd) that contains your R code, interspersed with explanations of what the code is doing, and what it tells you about the problem.

Content

You should not need to present all of the R code that you wrote throughout the process of working on this project. Rather, the technical report should contain the minimal set of R code that is necessary to understand your results and findings in full. If you make a claim, it must be justified by explicit calculation. A knowledgeable reviewer should be able to compile your .Rmd file without modification, and verify every statement that you have made. All of the R code necessary to produce your figures and tables must appear in the technical report. In short, the technical report should enable a reviewer to reproduce your work in full.

Tone

This document should be written for peer reviewers and/or Botanic Garden staff. You should aim for a level of complexity that is more statistically sophisticated than an article in the Science section of The New York Times, but less sophisticated than an academic journal. For example, you may use terms that that you will likely never see in the Times (e.g. bootstrap), but should not dwell on technical points with no obvious ramifications for the reader (e.g. reporting why the F-distribution is used for ratios of chi-square distributed random variables). Your goal for this paper is to convince a statistically-minded reader (e.g. a student in this class, a student from another school who has taken an introductory statistics class, or a Botanic Garden staff member) that you have addressed an interesting research question in a meaningful way. Even a reader with no background in statistics should be able to read your paper and get the gist of it.

Format

Your technical report should follow this basic format:

  1. Abstract: a short, one paragraph explanation of your project. The abstract should not consist of more than 5 or 6 sentences, but should relate what you studied and what you found. It need only convey a general sense of what you actually did. The purpose of the abstract is to give a prospective reader enough information to decide if they want to read the full paper.

  2. Introduction: an overview of your project. In a few paragraphs, you should explain clearly and precisely what your research question(s) are, why they are interesting, and what contribution you have made towards answering that question. It is important that you say why your study is novel, that is, how you are making steps to give previously unknown answers to important questions. Most readers never make it past the introduction, so this is your chance to hook the reader, and is in many ways the most important part of the paper! You need to cite at least three domain area papers (non-statistics) in your introduction. This is a great place to divide up the work! At the end of your introduction you should give an overview of some specifics of your study design, but not the full details, and end with your hypotheses.

  3. Method: a much more detailed description of your study design. Describe the factors, each level, if there is crossing, blocks, etc. Here is where you also describe your materials/stimuli. What were your experimental units? How did you measure your responses variable? How did you deliver your manipulation? When did you collect your data and why was this choice made? The method section of a research study is extremely important for others to assess the validity of your findings and to replicate what you found in future studies.

  4. Results: an explanation of what your model tells us about the research question. You will be using ANOVA here, but also present visualization that show the data and test the Fisher assumptions. You should interpret effect sizes and CIs in context and explain their relevance. What does your model tell us that we didn’t already know before? You also want to include null results, but be careful about how you interpret them. For example, you may want to say something along the lines of: “we found no evidence that factor \(A\) has an effect on response variable \(y\).” On other hand, you probably shouldn’t claim: “there is no effect of factor \(A\) on \(y\).”

  5. Conclusion: a summary of your findings and a discussion of their limitations. First, remind the reader of the question that you originally set out to answer, and summarize your findings. Second, discuss the limitations of your model, and what could be done to improve it. You might also want to do the same for your data. This is your last opportunity to clarify the scope of your findings before a journalist misinterprets them and makes wild extrapolations! Protect yourself by being clear about what is not implied by your research.

Additional Thoughts

The technical report is not simply a dump of all the R code you wrote during this project. Rather, it is a narrative, with technical details, that describes how you addressed your research question. You should not present tables or figures without a written explanation of the information that is supposed to be conveyed by that table or figure. Keep in mind the distinction between data and information. Data is just numbers, whereas information is the result of analyzing that data and digesting it into meaningful ideas that human beings can understand. Your technical report should allow a reviewer to follow your steps from converting data into information. There is no limit to the length of the technical report, but it should not be longer than it needs to be. You will not receive extra credit for simply describing your data ad infinitum. For example, simply displaying a table with the means and standard deviations of your variables is not meaningful. Writing a sentence that reiterates the content of the table (e.g. “the mean of variable \(x\) was 34.5 and the standard deviation was 2.8…”) is equally meaningless. What you should strive to do is interpret these values in context (e.g. “although variables \(x_1\) and \(x_2\) have similar means, the spread of \(x_1\) is much larger, suggesting…”).

You should present figures and tables in your technical report in context. These items should be understandable on their own, in the sense that they have understandable titles, axis labels, legends, and captions. Someone glancing through your technical report should be able to make sense of your figures and tables without having to read the entire report. That said, you should also include a discussion of what you want the reader to learn from your figures and tables.

Your report should be submitted via Moodle as an R Markdown (.Rmd) file and the corresponding rendered output (.html) file.

Poster Presentation

An effective presentation is an integral part of this project and the scientific process. One of the objectives of this class is to give you experience conveying the results of a technical investigation to a non-technical audience in a way that they can understand. Whether you choose to stay in academia or pursue a career in industry, the ability to communicate clearly is of paramount importance. As a data analyst, the burden of proof is on you to convince your audience that what you are saying is true. If your audience (who may very well be less knowledgeable about statistics than you are) cannot understand your results or their interpretations, then the technical merit of your project is irrelevant.

You will make poster and present it in a poster session on the last day of class. Your goal should be to convey to your audience a clear understanding of your research question, along with a basic understanding of your model, and how well it addresses the research question you posed. I will give you more details about the poster as well as a poster template towards the end of the semester.

Group Dynamic Report

Ideally, all group members would be equally involved and able and committed to the project. In reality, it doesn’t always work that way. We’d like to reward people fairly for their efforts in this group endeavor, because it’s inevitable that there will be variation in how high a priority people put on this class and how much effort they put into this project.

To this end, I’d like each of you (individually) to describe how well (or how poorly!) your project group worked together and shared the load. Also give some specific comments describing each member’s overall effort. Were there certain group members who really put out exceptional effort and deserve special recognition? Conversely, were there group members who really weren’t carrying their own weight? And then, at the end of your assessment, estimate the percentage of the total amount of work/effort done by each member. (Be sure your percentages sum to 100%!)

For example, suppose you have 3 group members: X, Y and Z. In the (unlikely) event that each member contributed equally, you could assign:

  • 33.3% for member X, 33.3% for member Y, and 33.3% for member Z

Or in case person Z did twice as much work as each other member, you could assign:

  • 25% for member X, 25% for member Y, and 50% for member Z

Or if member Y didn’t really do squat, you could assign:

  • 45% for member X, 10% for member Y, and 45% for member Z

I’ll find a fair way to synthesize the (possibly conflicting) assessments within each group. And eventually I’ll find a way to fairly incorporate this assessment of effort and cooperation in each individual’s overall grade. Don’t pressure one another to give everyone glowing reports unless it’s warranted, and don’t feel pressured to share your reports with one another. Just be fair to yourselves and to one another. Let us know if you have any questions or if you run into any problems.

Assessment Criteria

Your project will be evaluated based on the following criteria:

  • General: Is the research question/factors chosen interesting, and substantial, or is it trite, pedantic, and trivial? How much creativity, initiative, and ambition did the group demonstrate? Is the basic question driving the project worth investigating, or is it obviously answerable without a data-based study?

  • Design: Are the variables chosen appropriately and defined clearly, and is it clear how they were measured/observed? Can the effects of nuisance factors be measured for? Is there sufficient data to make meaningful conclusions?

  • Analysis: Are the chosen analyses appropriate for the variables/relationships under investigation, and are the assumptions underlying these analyses met? Do the analyses involve fitting and interpreting an ANOVA model? Are the analyses carried out correctly? Is there an effective mix of graphical, numerical, and inferential analyses? Did the group make appropriate conclusions from the analyses, and are these conclusions justified?

  • Technical Report: How effectively does the written report communicate the goals, procedures, and results of the study? Are the claims adequately supported? How well is the report structured and organized? Does the writing style enhance what the group is trying to communicate? How well is the report edited? Are the statistical claims justified? Are text and analyses effectively interwoven in the technical report? Clear writing, correct spelling, and good grammar are important.

  • Poster Presentation: How effectively does the poster presentation communicate the goals, procedures, and results of the study? Do the presenters seem to be well-rehearsed? Does she appear to be confident in what she is saying? Are her arguments persuasive?