Throughout the lab, you'll occasionally see questions marked with an asterisk ('*'). To get credit for this lab, you'll post your answers to Piazza.
Scatterplot matrices are a useful tool to help visualize relationships between multiple quantitative (numerical) variables.
In Tableau, you create a scatterplot matrix by placing multiple measures on the Columns and Rows shelves. Today, we'll create a scatterplot matrix comparing five dimensions
Start by downloading the CSV file containing the College dataset, which contains statistics for a large number of US Colleges from the 1995 issue of US News and World Report.
Load the dataset into Tableau and explore the dimensions, which look something like this:
Let's try dragging the Apps, Accept, and Enroll measures onto both the rows and columns shelves to see how they relate to one another. Don't forget to turn off Analysis>Aggregate Measures!
Notice that there is a positive correlation between all pairs of these variables. This makes sense: colleges that receive large numbers of applications may in fact be larger institutions overall, and so would be likely to accept more students. Similarly, an institution that accepts more students intuitively would have a larger number of students enroll.
Bear in mind: these are absolute comparisons, meaning we are looking at the actual number of students, rather than a percentage.
*Piazza Q1:* Create calculated fields to convert the accept and enroll measures to percentages:
Now create a SPLOM using these measures. What is different?
Parallel coordinates is a common way of visualizing and exploring high-dimensional, multivariate data. While Tableau unfortunately doesn't directly support Parallel Coordinates, we'll use a little bit of magic to convince it to do our bidding.
We'll start by dragging the Measure Names dimension onto the Columns shelf, and the Measure Values measure onto the Rows shelf. If we turn off Analysis>Aggregate Measures, we get something that looks like this:
Each dimension appears as its own column along the bottom, and its values are arranged along the vertical axis. Good!
Uh-oh! We wanted an individual line for each observation. To tell Tableau draw them separately, we can drag a unique identifier onto the Detail mark. In this case, we'll use the F1 dimension, which contains the name of each college:
That's better, but we've got another problem: Tableau is mapping all our variables to a single vertical axis, which is the wrong scale for most of them. Because Tableau doesn't understand that we want a separate scale for each column, we'll have to trick it by scaling the data instead.
We can do this by creating a Calculated Field for each of the dimensions we want to include in the parallel coordinates plot that goes from 0 to 100 (think of this as a percentage of the total range). To do this for a variable called X, we'll use the following formula:
*Piazza Q2:* Create calculated fields to convert several of the measures in the College data set to percentages using the formula above. Next remove the unscaled measures from your parallel coordinates plot, leaving only the ones with a common 0-100 scale. What is different? What patterns do you notice?