Measuring Abstract Constructs

In this lab you will explore how psychologists measure abstract constructs like personality features, aspirations for the future, self-esteem, and feelings of control over one’s life. You will be introduced to the package psych, a package that can be used for a variety of different tasks, including calculating reliability. Lastly, this lab will also serve as a review of basic data preparation techniques.

The data you will be using for this lab was generated by YOU in the last class period. We will be constructing summaries and visualizing overall trends, not creating profiles of individuals.

Getting Started

Load packages

In this lab we will load the data using the readr package, explore the data using the dplyr and tidyr package and visualize it using the ggplot2 package for data visualization. The data can be found at the URL given below. Later we will also use the psych package, so please install it now.

Let’s load the packages.

library(readr)
library(dplyr)
library(ggplot2)
#install.packages("psych")
library(psych)

Note that I recommend always loading the psych package after ggplot2 because the two packages contain functions with the same name. Loading both packages masks the same-named functions in the first package loaded.

Creating a reproducible lab report

Please write your lab code and answers in a R Markdown file. You are to submit the knitted HTML file to Moodle when your lab is complete.

The data

The data set for this lab consists of students in SDS 390. Refer to the survey you took here.

Let’s load the data, but using the read_csv() function from the readr package:

personality <- read_csv("http://www.science.smith.edu/~rgarcia/sds390-S19/psychometrics.csv")

Take a look at the data. You can use glimpse(), head(), View(), however you like to look at your data frames.

glimpse(personality)

The first thing you should notice is that the data is very messy. This is raw data downloaded from Qualtrics. As you scroll through the variables, you’ll start to see variables with the prefix maacl_ then procrast_ then career_, self_esteem_, big_five_, and loc.

head(personality)

You will also notice that the first case contains the variable labels, the second case contains some metadata, and the third case is completely missing.

Now Let’s remove the missing case and the metadata with the slice() function from dplyr. Let’s also get rid of some of the variables at the beginning of the dataframe with select() and rename some variables with rename(). We will do this all at once with a dplyr pipeline.

personality <- personality %>%
  slice(c(1, 4:length(personality))) %>%
  select(ResponseId, age:loc29) %>%
  rename(r_exp = age,
         icecream = Q17,
         icecream_other = Q17_6_TEXT,
         movies = Q18,
         movies_other = Q18_8_TEXT,
         birth_order = Q19)

Our next task is to save the variable labels into a dataframe called variable_view. This will be helpful later

variable_view <- personality %>% 
  slice(1) %>%
  gather(ResponseId:loc29, key = "variable", value = "label")

And finally, remove the first case from personality.

personality <- personality %>%
  slice(2:length(personality))

We can see a compact view of all the our variables with the names() function.

names(personality)

Exploring and preparing the data

  1. Looking at the variable list. What do you think the procrast_ variables are measuring? Why are there 20 of them? Check out the variable_view information, or the PDF file of the Qualtrics survey on Moodle if you need to.

Because psychological measurements often have many variables recorded for each construct, we often need to perform data cleaning tasks on many columns at once. The mutate_at() function from the dplyr package can help us with that task.

The mutate_at() function will take a select() style list of variables as an argument, but inside of the function vars(), then it will take a function name. Here I am passing the function name as.numeric. Then, mutate_at() will apply this function to the entire list of variables.

personality <- personality %>%
  mutate_at(vars(maacl_1:maacl_26), as.numeric)

Check for yourself that the maacl_ variables are now numeric type variables.

Lucily, the read_csv() function loaded the variables as chr, characters, and not fct, factors, otherwise we would have needed to convert to characters first, and then to numeric.

  1. Now it is your turn, change all of the procrast_ variables types to numeric with the mutate_at() function.

Item Correlations

The MAACL, or the Multiple Affect Adjective Checklist, measure the current emotional experience of participants. It measures both positive and negative moods in two sub-scales. Let’s see if we can find the items that measure each of the two sub-scales.

One way to do this is look at the inter-item correlation matrix. We can ask R for a correlation matrix in many ways, one way it to use the corr.test() function from the psych package. We will use select() to pick out the maacl_ items.

personality %>%
  select(maacl_1:maacl_26) %>%
  corr.test()

From the correlation matrix, we can see that some items are negatively correlated with the first item. That is, items 2, 4, 5, 10, 14, 15, 16, 17, 18, 20, 21, 22, 23, 25, and 26 are negatively correlated with item 1. On the other hand, 3, 7, 8, 9, 11, 12, 13, 19, and 24 are positively correlated with item 1. The correlation between item 1 and item 7 is basically zero.

These two sets of items might represent negative and positive emotions. Looking at correlation matrices is also helpful for identifying reverse-worded items on scales.

  1. Try this with the procrast_ items. Which items that appear to be worded in the opposite direction from item 1?

  2. Looking at the items in procrastination scale in the variable_view, if we wanted all of the items to measure more procrastination with higher numbers, which items would be need to reverse code? Does this list match with your answer to the exercise above?

Another helpful tool is the cor.plot() function from the GPArotation package. If we first save just the correlation matrix into an object here called r_matrix, we can then visualize the correlation matrix.

Note that the $r at the end of the corr.test() function plucks out the correlation matrix.

#install.packages("GPArotation")
library(GPArotation)

r_matrix <- corr.test(select(personality, maacl_1:maacl_26))$r

cor.plot(r_matrix)
  1. Create a heat map of the correlation matrix for the procrast_ items using the cor.plot() function.

Reverse scoring items

If I wanted to combine all of the information from the maacl_ items into a single measure representing how “good” participants felt, I would have to first reverse score all of the negative emotion items. This includes items 2, 4, 5, 7, 10, 13, 15, 18, 19, and 20, based off of variable_view.

personality <- personality %>%
  mutate(maacl_2.r = 6 - maacl_2,
         maacl_4.r = 6 - maacl_4,
         maacl_5.r = 6 - maacl_5,
         maacl_7.r = 6 - maacl_7,
         maacl_10.r = 6 - maacl_10,
         maacl_13.r = 6 - maacl_13,
         maacl_15.r = 6 - maacl_15,
         maacl_18.r = 6 - maacl_18,
         maacl_19.r = 6 - maacl_19,
         maacl_20.r = 6 - maacl_20)
  1. Try this with the procrast_ items. Reverse score the items that are negatively worded on the procrastination scale.

  2. Create another correlation matrix, using either corr.test() or cor.plot() if you prefer, replacing the original items with the new reversed versions. Do the items appear to me going more in the same directions?

Scale Reliability

Before combining information to create a scale score for each person we should check to see if the scale is reliable. We can do this with the alpha() function in the psych package. We want to be sure to use our reverse coded variables.

personality %>%
  select(maacl_1, maacl_2.r, maacl_3, maacl_4.r, maacl_5.r,
         maacl_6, maacl_7.r, maacl_8, maacl_9, maacl_10.r,
         maacl_11, maacl_12, maacl_13.r, maacl_14, maacl_15.r,
         maacl_16, maacl_17, maacl_18.r, maacl_19.r, maacl_20.r,
         maacl_21, maacl_22, maacl_23, maacl_24, maacl_25, maacl_26) %>%
  alpha()

Note that there are many issues with the maacl_ items. This could be for many reasons including the nature of data collection. To deal with this problem, we could find a smaller subset of items that work better together.

personality %>%
  select(maacl_21, maacl_22, maacl_23, maacl_24, maacl_25, maacl_26) %>%
  alpha()

The raw_alpha for this smaller scale is 0.86. Well above the .70 threshold to call something “reliable.”

  1. Using the procrastination items, including the items you reverse scores. Find a set of reliable items, trying ALL of the items first.

Scale Scores

The last step in this process would be to make an average score that we could use perhaps in a larger Structural Equation model. To do this, we will use the rowMeans() function in base R.

personality$positivity <- personality %>%
  select(maacl_21, maacl_22, maacl_23, maacl_24, maacl_25, maacl_26) %>%
  rowMeans(na.rm = TRUE)

Then we can look at the distribution of this new positivity variable.

qplot(x = positivity, data = personality, bins = 8)
  1. Please do the same for the procrastination scale you have been working on.

This is a product of OpenIntro that is released under a Creative Commons Attribution-ShareAlike 3.0 Unported. This lab was adapted from a lab written by Mine Çetinkaya-Rundel and Andrew Bray.