In this lab you will build multiple regression models, explore interactions between a pair of continuous predictors, and you will be introduced to the SEM package `lavaan`

. The `lavaan`

package can be used to fit regression models as well as systems of structural regression models.

The data you will be using for this consists of people currently involved in romantic relationships living in the city of Detroit. You can read more about the study that goes with this data here. We will be particularly interested in understanding predictors of relationship satisfaction including gender, how much tension they perceive in their relationships, and how positively they see their partners.

In this lab we will explore the data using the `dplyr`

package and visualize it using the `ggplot2`

package for data visualization. The data can be found at the URL given below. Later we will also use the `lavaan`

package, so please install it now.

Let’s load the packages.

```
library(dplyr)
library(ggplot2)
#install.packages("lavaan")
library(lavaan)
```

Please write your lab code and answers in a R Markdown file. You are to submit the knitted HTML file to Moodle when your lab is complete.

The data set for this lab consists of 296 persons involved heterosexual dating relationships from the city of Detroit.

Let’s load the data:

`acitelli <- read.csv("http://www.science.smith.edu/~rgarcia/sds390-S19/acitelli-individual.csv")`

Take a look at the data. You can use `glimpse()`

, `head()`

, `View()`

, however you like to look at your data frames. We have observations on 8 different variables, some are categorical and some numerical.

The `gender`

variable is coded -1(= Women) and 1 (= Men). You will change this next. The `self_pos`

is the mean of positive self variables and `other_pos`

is the mean of positive partner variables, where both variables the average of Cooperative, Mature, Friendly, Hard Working and Care about Others. Our main response variable, `satisfaction`

is the mean of four variables (theoretical range from 1 to 4). The variable `tension`

is the mean of two: perceived disagreement and perceived tension (theoretical range from 1 to 4). Lastly, `simhob`

is the perception of having similar hobbies to one’s partner (1 = very true; 0 = somewhat true; -1 = not at all true). We won’t be using this variable today, but you can play with it if you have time.

The

`gender`

variable is currently coded as`-1`

for women and`1`

for men, create two new indicator variables, one called`woman`

, and another`man`

.How many observations are in each level of

`cuplid`

? What are the counts of women and men in the data? Based on these two summaries, do you think the`cuplid`

variable means?Select two variables in the data set and visualize their relationship with each other. Also, calculate an appropriate summary statistic for this relationship. Describe the relationship.

The most important research question we ask today is does perceiving tension in one’s relationship lead to lower relationship satisfaction. Let’s create a scatterplot to see if this appears to be the case:

```
ggplot(data = acitelli, aes(x = tension, y = satisfaction)) +
geom_point()
```

Before we draw conclusions about the trend, compare the number of observations in the data frame with the approximate number of points on the scatterplot. Is anything awry?

Replot the scatterplot, but this time using a

`geom`

that corrects the problem or another visualization solution. Comment on your plot.*Hint:*You can jitter the points, create transparency, or check out`geom_beeswarm()`

from the`ggbeeswarm`

package for funzies!Let’s see if the apparent trend in the plot is something more than natural variation. Fit a linear model called

`mod_t`

to predict satisfaction score by tension. Interpret the slope and intercept of this model. Is tension a statistically significant predictor of satisfaction? Does it appear to be a practically significant predictor? Also, add a line of best fit to your visualization.Use residual plots,

`plot(mod_t)`

, to evaluate whether the conditions of least squares regression are reasonable. Comment on each of the regression assumptions—remember LINE: Linearity, Independence, Normality, and Error homogeneity.

If you suspected above that these participants might be in romantic relationships with *each other*, you’re correct! This indeed violates the independence assumption. For this lab we will be ignoring this fact, but we will return to it later in the course for path analysis.

The data set contains several relationship variables including the participants’ feelings about themselves and their partner. Let’s take a look at the relationship between one of these positivity variables, `other_pos`

, and `tension`

. **Before** running the code below:

- What do you expect the direction of this relationship between
`other_pos`

and`tension`

to be?

Now run the following:

```
ggplot(data = acitelli, aes(x = other_pos, y = tension)) +
geom_jitter()
acitelli %>%
summarise(cor(other_pos, tension))
```

As expected the relationship is negative. In order to see if perceived tension is still a significant predictor of satisfaction after we’ve accounted for perceptions of other positivity, we can add the `other_pos`

term into the model.

Fit a multiple regression model called

`mod_t_op`

with both tension and other positivity as predictors of satisfaction. Interpret the intercept and slope coefficients. Are they statistically significant predictors? Has the addition of`other_pos`

to the model changed the parameter estimate for`tension`

?What is the R

^{2}value for this model? Give its interpretation.

Now let’s add gender into the model that already contains `tension`

and `other_pos`

. First fit a model using the `woman`

indicator you created earlier. Call it `mod_wo`

. Then, fit a model with the original `gender`

variable. Call it `mod_gen`

.

- Interpret the coefficients for
`woman`

in`mod_wo`

and`gender`

in`mod_gen`

. What do the intercepts mean in these two models? Are the coefficients for`tension`

and`other_pos`

different across these two models?

The interpretation of the coefficients in multiple regression is slightly different from that of simple regression. The estimate for `tension`

reflects how much more satisfied a person is expected be if they perceive tension in their relationship one point higher *while holding all other variables constant*. In this case, that translates into considering only people of the same gender, who feel the same amount of positivity towards their partner, with `tension`

scores that are one point apart.

Now we can ask if positive perceptions of your partner *difuse* the negative effect of tension on relationship satisfaction. That is, we can ask, is the effect of tension on satisfaction weakened (or zero) when other positivity is high? Questions like this are Answered with interaction terms. .

- Add the interaction of
`tension`

and`other_pos`

to the model and leave`gender`

out of this model. Is there a statistically significant interaction of`tension`

and`other_pos`

? Can you give an interpretation?*Hint:*see below.

To aid in interpretation of this interaction we can estimate the effect of `tension`

on `satisfaction`

when `other_pos`

is low (=1) and when `other_pos`

is high (=5). We will need to create two new variables, `other_pos_low`

and `other_pos_high`

to do this. For the low version, everyone who’s `1`

will be made `0`

, and for the high version, everyone who is `5`

will be made `0`

.

```
acitelli <- acitelli %>%
mutate(other_pos_low = other_pos - 1,
other_pos_high = other_pos - 5)
```

Then we simply re-run the model two more times and look at the main effect of `tension`

in each model.

```
mod_low <- lm(satisfaction ~ tension*other_pos_low, data = acitelli)
summary(mod_low)
```

```
mod_high <- lm(satisfaction ~ tension*other_pos_high, data = acitelli)
summary(mod_high)
```

**Interpretation:** When other positivity is low, we would expect a strong negative effect of tension on satisfaction, but when other positivity is high, we would still expect a statistically significant negative effect of tension on satisfaction, but it is far less strong.

`lavaan`

The package that we will be using to fit structural equation models is called `lavaan`

. `lavaan`

can be used to run path analyses (PA), confirmatory factor analyses (CFA), and the combination of the two, which is a SEM. It stands for **La**tent **Va**riable **An**alysis. It is based a three step system where you first specify a model as a character string, second you pass this model to a function (along with the data) that finds parameter estimates, and finally you inspect the output. There are `lavaan`

tutorials online here.

SEM lavaan has three steps:

* Specify the model.

* Estimate that model.

* Examine the output.

`lavaan`

relational symbols used to specify a model:

* x ~ y + z — regression paths from y and z to x

* x ~~ y — covariance between x and y * x ~~ x — variance of x

We can actually use `lavaan`

to fit the `mod_wo`

model we ran above.

`library(lavaan)`

After installing `lavaan`

and loading it, use the following code to specify, fit, and examine your model output.

```
model <- 'satisfaction ~ tension + other_pos + woman'
fit <- sem(model, data = acitelli)
summary(fit)
```

Compare the path estimates obtained from

`lavaan`

to the regression coefficient estimates you got using the`lm()`

function for`mod_wo`

. Are there any differences?In order to add interactions to your model in

`lavaan`

, you will need to first create the product term in your data set. For this last exercise, please add that term to your data set and fit the model you ran in exercise 11. Are the estimates similar across models?

This is a product of OpenIntro that is released under a Creative Commons Attribution-ShareAlike 3.0 Unported. This lab was adapted from a lab written by Mine Çetinkaya-Rundel and Andrew Bray.