Multiple Comparisons

Prof Randi Garcia
April 11, 2018

Reading Free-Write

  • For the walking babies example (pg. 150) below are (rounded) average times to walk (months) for the four groups. Compute the estimates for the following set of three comparisons: i) Exercise vs. no exercise, ii) Special exercise vs. exercise control, and iii) Weekly report vs. final report.

    1. Special exercise: 10.1
    2. No exercise, weekly report: 11.6
    3. Exercise control: 11.4
    4. No exercise, final report: 12.4
  • Draw a diagram with arrows depicting the top-down approach taken with this set of comparisons.


  • Project data collection should start now!
  • HW 9 (last HW!) has been assigned.
  • Book falling apart? Go show your book to Paula Thompson, by 5pm today for possible reimbursement.


  • Extending designs by factorial crossing
  • Multiple comparisons and contrasts

Extensions by Factorial Crossing

We can now imagine adding complexity to these four basic designs by including additional factors crossed with our structural factors.

Take our diabetic dogs example, and now let us add in the fact that the order of the two methods was randomly assigned. What design do we have now?
- We have an order factor and there are two levels: order 1 and order 2
- The new design is a SP/RM[2,1]


The purpose of this experiment was to study the way one species of crabgrass competed with itself and with another species for nitrogen (N), phosphorus (P), and potassium (K). Bunches of crabgrass were planted in vermiculite, in 16 Styrofoam cups; after the seeds head srouted, the plants were thinned to 20 plants per cup. Each of the 16 cups were randomly assigned to get one of 8 nutrient combinations added to its vermiculite. For example, yes-nitrogen/no-phosphorus/yes-potassium. The response is mean dry weight per plant, in milligrams.


Worms that live at the mouth of a river must deal with varying concentrations of salt. Osomoregulating worms are able to maintain reltaively constant concentration of salt in the body. An experiment wanted to test the effects of mixtures of salt water on two species of worms: Nereis virens (N) and Goldfingia gouldii (G). Eighteen worms of each species were weighted, then randomly assigned in equal numbers to one of three conditions. Six worms of each kind were placed in 100% sea water, 67% sea water, or 33% sea water. The worms were then weighted after 30, 60, and 90 minutes, then placed in 100% sea water and weighted one last time 30 minutes later. The response was body weight as percentage of initial body weight.

Compound within Block Factors

In an experiment, researchers wanted to compare how easy it is to remember four different kinds of words: 1) concrete, frequent: fork, brtoher, radio,… 2) concrete, infrequent: blimp, warthog, fedora, … 3) abstract, frequent: truth, anger, foolishness, … and 4) abstract, infrequent: slot, vastness, apostasy, …

Ten students in a pscyhology lab served as subject. During each of the 4 time slots, subjects heard a list of words from one of the four kinds, and then was tested for recall.

Compound within Block Factors

There are two possible models for chance error in models with compound within-block factors.

  1. The additive model
  2. The non-additive model

Compound within Block Factors

  1. The additive model - assumes that chance error is the same for all within-block factors, thus we could pool residual terms.
  2. The non-additive model - does not make this (often incorrect) assuption, but tests using this model are lower in power.

How can we decide?

  • Think about whether or not you would expect block X treatment interaction effects. If you would, then the additive model will be wrong.

Rule for Compound within Block F-ratios (non-additive)

\[ F = \frac{{MS}_{Factor}}{{MS}_{Blocks\times Factor}} \]

Rule for Compound between Block F-ratios

\[ F = \frac{{MS}_{Factor}}{{MS}_{Blocks}} \]

Analysis in R


  • When we have more than two levels of a factor of interest, we might want to compare specific groups to see which one differ from each other.
  • We can do a set of pairwise comparisons, or custom comparisons of more complex ideas.

Adjusting for Multiple Comparisons

When we do multiple significance tests, our effective type I error rate is inflated. Most statisticians agree that we shoudl adjust our type I error rate to account for our multiple tests, and control the expriment-wise error rate.

There are four methods discussed in the chapter:

  1. Fisher Least Significant Difference (LSD)
  2. Tukey Honest Significant Difference (HSD)
  3. Scheffe test
  4. The Bonferroni correction