R Packages

You need to load these packages in your working environment. We do this with the library() function. Run the following three lines in your console.

library(dplyr)
library(tidyr)
## Warning: package 'tidyr' was built under R version 3.4.2
library(ggplot2)
library(mosaic)

Note that you only need to load then with the library() function each time you relaunch RStudio.

Creating a reproducible lab report

Remember that we will be using R Markdown to create reproducible homework reports. See the following video describing how to get started with creating these reports for this homework, and all future homeworks involving R:

Basic R Markdown with an OpenIntro Lab


Compound within block factors

Creating a Data Frame in R

To create a data frame for data from a CB[2] design, we will want 4 variables: 1) the response variable observations, 2) the level for each observation for the first factor, 3) the level for each observation on the second factor, and 4) the block index.

The code below creates a data frame for the imagery and working memory example on page 289 of your textbook. First, notice that observations on the response variable, resp_time are specified inside of the c() function. The c stands for concatenate, and this concatenate function makes lists in R. Next, the block index is created, this variable is called subject because in this experiment, the blocks are subjects. Then, levels of the task variable are created, again with the c() function, but also making use of the rep() function. The rep() function will repeat the first argument how ever many times you specify in the second argument. Levels of the report variable are created in the same way. Finally, all four of these variables are contain in a data frame with the data.frame() function, and the data frame is saved in an object called imagery.

imagery <- data.frame(resp_time = c(11.60, 22.71, 20.96, 13.96, 14.60, 10.98, 21.08, 
                                    15.85, 15.68, 16.10, 11.87, 17.49, 24.40, 23.35, 
                                    11.24, 20.24, 15.52, 13.70, 28.15, 33.98, 13.06, 
                                    6.27, 7.77, 6.48, 6.01, 7.60, 18.77, 10.29, 9.18, 
                                    5.88, 6.91, 5.66, 6.68, 11.97, 7.50, 11.61, 10.90, 
                                    5.74, 9.32, 12.64, 16.05, 13.16, 15.87, 12.49, 
                                    14.69, 8.64, 17.24, 11.69, 17.23, 8.77, 8.44, 
                                    9.05, 18.45, 24.38, 14.49, 12.19, 10.50, 11.11, 
                                    13.85, 15.48, 11.51, 23.86, 9.51, 13.20, 12.31, 
                                    12.26, 12.68, 11.37, 18.28, 8.33, 10.60, 8.24, 
                                    8.53, 15.85, 10.91, 11.13, 10.90, 9.33, 10.01, 28.18),
                      subjects = rep(1:20, 4),
                      task = c(rep("visual", 40), rep("verbal", 40)),
                      report = c(rep("visual", 20), rep("verbal", 20),
                                 rep("visual", 20), rep("verbal", 20)))

Calculating means and SDs for cells

Next, we should get some descriptive statistics for the 4 combinations.

imagery %>%
  group_by(task, report) %>%
  summarize(mean = mean(resp_time),
            sd = sd(resp_time))

Checking our Fisher assumption of same standard deviation (S), we appear to be in good shape because the largest SD is not >3 times as large as the smallest SD.

6.12/3.37

ANOVA for Designs with Compound Within-Block Factors

If we can assume the additive error model, we use the following code that simply “controls” for the blocking.

mod <- lm(resp_time ~ subjects + task*report, data = imagery)

anova(mod)

For non-additive error models, we can get the appropriate error terms with the following code. Note that the aov() function is used instead of lm().

mod_nonadd <- aov(resp_time ~ task*report + Error(subjects/(task*report)), data = imagery)

summary(mod_nonadd)

Informal Analysis and interaction graph

We next want to look at the patterns in data. Because this is a CB[2] design, we will want to make an interaction graph.

#parallel dot graph
ggplot(imagery, aes(x = task, y = resp_time, color = report)) +
  geom_point()

When you have more than just a few points of data, a side-by-side boxplot is often a better choice than a dot graph.

#side-by-side boxplot
ggplot(imagery, aes(x = task, y = resp_time, color = report)) +
  geom_boxplot()

There appear to be a few outling points, and the boxplots indicate some skew for all groups—violating the Fisher assumption of normal errors (N). we may want to consider a transformation of our data.

#interaction graph
ggplot(imagery, aes(x = task, y = resp_time, color = report,
                    group = report)) +
  geom_point() +
  geom_smooth(method = "lm", se = 0)

The interaction group shows that there is an interaction between task mode and report mode. For verbal taks, there is no difference between reporting answers verbally versus visually on response time, but for visual taks, subjects were faster when reporting verbally than visually.

Three-way Interactions

Note that we are also able to create interaction graphs in R when we have summary data. That is, for the example in problem D1 (page 396), we only have the cell means. Below, these cell means are entered into a data frame called schizophrennia and we use ggplot() to create and interaction graph.

schizophrenia <- data.frame(resp_time = c(78, 56, 25, 25,
                                          44, 28, 25, 23),
                            diagnosis = c(rep("schizophrenic", 4), rep("normal", 4)),
                            slope = rep(c("steep", "flat"), 4),
                            instructions = c(rep("free", 2), rep("idiosyncratic", 2),
                                             rep("free", 2), rep("idiosyncratic", 2)))

For three-way interactions, we can easily see how the pattern of the two-way interaction migth differ across levels of the third factor by using the facet_wrap() function. Inside of facet_wrap(), you put ~variable, where the variable is that third factor of interest. Don’t forget the ~!

ggplot(schizophrenia, aes(x = diagnosis, y = resp_time, 
                          color = slope, group = slope)) +
  geom_smooth(method = "lm", se = 0) +
  facet_wrap(~instructions)

Homework 9 Problems

For homework 9, complete the following exercises. When you are done, please knit your homework to an HTML file and submit the HTML file on Moodle.

  1. This is problem C2 on page 387 of your textbook. Using the data in Table 9.9, enter all of the data in R, and compute averages for all basic factors of interest.

  2. Create an interaction graph. Is there evidence of an interaction between the two basic factors?

  3. Problem D2 on page 396 of your textbook. Enter the summary data into R just as shown above for the schizophrenia data.

This is a product of OpenIntro that is released under a Creative Commons Attribution-ShareAlike 3.0 Unported. This lab was adapted by Randi Garcia from labs originally written by Mark Hansen, further adapted for OpenIntro by Andrew Bray and Mine Çetinkaya-Rundel.


Resources for learning R and working in RStudio

That was a short introduction to R and RStudio, but we will provide you with more functions and a more complete sense of the language as the course progresses.

In this course we will be using R packages called dplyr for data wrangling and ggplot2 for data visualization. If you are googling for R code, make sure to also include these package names in your search query. For example, instead of googling “scatterplot in R”, google “scatterplot in R with ggplot2”.

These cheatsheets may come in handy throughout the semester:

Chester Ismay has put together a resource for new users of R, RStudio, and R Markdown here. It includes examples showing working with R Markdown files in RStudio recorded as GIFs.

Note that some of the code on these cheatsheets may be too advanced for this course, however majority of it will become useful throughout the semester.