Prof Randi Garcia

March 28, 2018

- When we make a scatterplot of observations between two treatments within-blocks, what are we hoping to see? Can you explain why (OK if you cannot)?
- When you plot the fitted values versus residual values after running a model, what are the issues you should look out for?
- What sorts are things are confusing/fuzzy from Ch 7?

- Jenny Smetzer, Lab Instructor candidate, talk/demo today 12:00-1:00p in Bass 002
- Pizza will be served!

- HW 7 is posted
- You can complete your homework in R if you'd like

- Project Method draft due on Monday at Midnight on Moodle

- Fred Conrad, Research Professor and Director, Michigan Program in Survey Methodology
- ANOVA for CB[1]
- Latin square designs
- Split plot designs

Modern zoos try to reproduce natural habitats in their exhibits as much as possible. They try to use appropriate plants, but these plants can be infested with inappropriate insects. Cycads (plants that look vaguely like palms) can be infected with mealybugs, and the zoo wishes to test three treatments: 1) water, 2) horticultural oil, and 3) fungal spores in water. Five infested cycads are taken to the testing area. Three branches are randomly selected from each tree, and 3 cm by 3 cm patches are marked on each branch. The number of mealybugs on the patch is counted. The three treatments then get randomly assigned to the three branches for each tree. After three days the mealybugs are counted again. The change in number of mealybugs is computed (\( before-after \)).

treatment | tree1 | tree2 | tree3 | tree4 | tree5 |
---|---|---|---|---|---|

oil | 4 | 29 | 14 | 14 | 7 |

spores | -4 | 29 | 4 | -2 | 11 |

water | -9 | 18 | 10 | 9 | -6 |

Draw the factor diagram, labeling inside outside factors.

\[ {y}_{ij}={\mu}+{\tau}_{i}+{\beta}_{j}+{e}_{ij} \]

Source | SS | df | MS | F |
---|---|---|---|---|

Treatment | \( \sum_{i=1}^{a}b(\bar{y}_{i.}-\bar{y}_{..})^{2} \) | \( a-1 \) | \( \frac{{SS}_{T}}{{df}_{T}} \) | \( \frac{{MS}_{T}}{{MS}_{E}} \) |

Blocks | \( \sum_{j=1}^{b}a(\bar{y}_{.j}-\bar{y}_{..})^{2} \) | \( b-1 \) | \( \frac{{SS}_{B}}{{df}_{B}} \) | \( \frac{{MS}_{B}}{{MS}_{E}} \) |

Error | \( \sum_{i=1}^{a}\sum_{j=1}^{b}({y}_{ij}-\bar{y}_{i.}-\bar{y}_{.j}+\bar{y}_{..})^{2} \) | \( (a-1)(b-1) \) | \( \frac{{SS}_{E}}{{df}_{E}} \) |

```
mealybugs
```

```
tree treatment bugs_change
1 tree1 water -9
2 tree1 spores -4
3 tree1 oil 4
4 tree2 water 18
5 tree2 spores 29
6 tree2 oil 29
7 tree3 water 10
8 tree3 spores 4
9 tree3 oil 14
10 tree4 water 9
11 tree4 spores -2
12 tree4 oil 14
13 tree5 water -6
14 tree5 spores 11
15 tree5 oil 7
```

```
mod <- lm(bugs_change ~ treatment + tree, data = mealybugs)
anova(mod)
```

```
Analysis of Variance Table
Response: bugs_change
Df Sum Sq Mean Sq F value Pr(>F)
treatment 2 218.13 109.07 2.9963 0.106846
tree 4 1316.40 329.10 9.0412 0.004603 **
Residuals 8 291.20 36.40
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```
library(tidyr)
library(ggplot2)
mealybugs %>%
spread(treatment, bugs_change)
```

```
tree oil spores water
1 tree1 4 -4 -9
2 tree2 29 29 18
3 tree3 14 4 10
4 tree4 14 -2 9
5 tree5 7 11 -6
```

Spores versus oil

```
mealybugs %>%
spread(treatment, bugs_change) %>%
ggplot(aes(x = spores, y = oil)) +
geom_point() +
geom_abline(slope = 1, intercept = 8)
```

Spores versus water

```
mealybugs %>%
spread(treatment, bugs_change) %>%
ggplot(aes(x = spores, y = water)) +
geom_point() +
geom_abline(slope = 1, intercept = -5)
```

Oil versus water

```
mealybugs %>%
spread(treatment, bugs_change) %>%
ggplot(aes(x = oil, y = water)) +
geom_point() +
geom_abline(slope = 1, intercept = -13)
```

This experiment is interested in the blood concentration of a drug after it has been administered. The concentration will start at zero, then go up, and back down as it is metabolized. This curve may differ depending on the form of the drug (a solution, a tablet, or a capsule). We will use three subjects, and each subject will be given the drug three times, once for each method. The area under the time-concentration curve is recorded for each subject after each method of drug delivery.

In the bioequivalence example, because the body may adapt to the drug in some way, each drug will be used once in the first period, once in the second period, and once in the third period.

- We can use a Latin Square design to control the order of drug administration
- In this way, time is a second blocking factor (subject is the first)

Treatments:

- Solution is treatment A
- Tablet is treatment B
- Capsule C is treatment C

period | 1 | 2 | 3 |
---|---|---|---|

1 | A 1799 | C 2075 | B 1396 |

2 | C 1846 | B 1156 | A 868 |

3 | B 2147 | A 1777 | C 2291 |

Factor diagram for the Latin Square??

The actual data structure for analysis is “long.”

subject | treatment | period | group | c_curve |
---|---|---|---|---|

1 | solution | 1 | A | 1799 |

1 | capsule | 2 | C | 1846 |

1 | tablet | 3 | B | 2147 |

2 | capsule | 1 | C | 2075 |

2 | tablet | 2 | B | 1156 |

2 | solution | 3 | A | 1777 |

3 | tablet | 1 | B | 1396 |

3 | solution | 2 | A | 868 |

3 | capsule | 3 | C | 2291 |

We can make a parallel dot graph

And check for equal standard deviations

```
library(mosaic)
sd <- favstats(c_curve ~ treatment, data = bioequivalence)[,8]
max(sd)/min(sd)
```

```
[1] 2.387418
```

\[ {y}_{ijk}={\mu}+{\alpha}_{i}+{\beta}_{j}+{\tau}_{k}+{e}_{ijk} \]

- \( {\mu} \) is the benchmark
- \( {\alpha}_{i} \) is the row effect
- \( {\beta}_{j} \) is the column effect
- \( {\tau}_{k} \) is the treatment effect
- There are p rows, columns, and treatments

Source | SS | df | MS | F |
---|---|---|---|---|

rows | \( \sum_{i=1}^{p}p(\bar{y}_{i..}-\bar{y}_{...})^{2} \) | \( p-1 \) | \( \frac{{SS}_{A}}{{df}_{A}} \) | \( \frac{{MS}_{A}}{{MS}_{E}} \) |

columns | \( \sum_{j=1}^{p}p(\bar{y}_{.j.}-\bar{y}_{...})^{2} \) | \( p-1 \) | \( \frac{{SS}_{B}}{{df}_{B}} \) | \( \frac{{MS}_{B}}{{MS}_{E}} \) |

treatment | \( \sum_{k=1}^{p}p(\bar{y}_{..k}-\bar{y}_{...})^{2} \) | \( p-1 \) | \( \frac{{SS}_{T}}{{df}_{T}} \) | \( \frac{{MS}_{T}}{{MS}_{E}} \) |

Error | \( \sum_{i=1}^{p}\sum_{j=1}^{p}\sum_{k=1}^{p}({y}_{ijk}-\bar{y}_{i..}-\bar{y}_{.j.}-\bar{y}_{..k}+2\bar{y}_{..})^{2} \) | \( (p-1)(p-2) \) | \( \frac{{SS}_{E}}{{df}_{E}} \) |

```
ls_mod <- lm(c_curve ~ treatment + period + subject, data = bioequivalence)
anova(ls_mod)
```

```
Analysis of Variance Table
Response: c_curve
Df Sum Sq Mean Sq F value Pr(>F)
treatment 2 608891 304445 67.733 0.014549 *
period 2 928006 464003 103.231 0.009594 **
subject 2 261115 130557 29.047 0.033282 *
Residuals 2 8990 4495
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```
bioequivalence <- bioequivalence %>%
mutate(fitted = fitted(ls_mod),
residuals = residuals(ls_mod))
ggplot(bioequivalence, aes(x = fitted, residuals)) +
geom_point() +
geom_hline(yintercept = 0, color = "red")
```

If you suspect a design in a split-plot design, you should be able to answer the following questions:

- What are the whole plots, that is, what is the nuisance factor?
- What is the between-blocks factor? Is it observational or experimental?
- What is the within-blocks factor? Is it observational or experimental?

The Canada goose is a magnificent bird, but it can be a nuisance in urban areas in large numbers. One method of population control is to addle eggs in nests, but this method can hard adult females. Would removal of the eggs at the usual hatch date prevent harm? It is suspected that females nesting together at different sites are similar to each other. We randomly select 5 different sites, and we then randomly assign 5 nests per site to the addle with no removal condition, and 5 nests per site to the addle plus removal condition. The females at the nests are banded such that survival age can be measured later.

**Crossing**: Two sets of treatments are crossed if all possible combinations of treatments occur in the design. The design is called a two-way factorial and has factorial treatment structure.**Nesting**: One factor is nested within another if each level of the first (“inside”) factor occurs with exactly one level of the second (“outside”) factor.

The disease diabetes affects the rate of turnover of lactic acid in a system of biochemical reactions called the Cori cycle. This experiment compares two methods of using radioactive carbon-14 to measure rate of turnover. Method 1 is injection all at once, and method 2 is infused continuously. 10 dogs were sorted into two groups, 5 were controls and 5 had their pancreas removed (to make it diabetic). The rate of turnover was then measured twice for each dog, once for each method. The order of the two methods was randomly assigned.

Draw the factor diagram for the data on page 263.