So far I have used the qplot()
function in the ggplot2
package to make visualizations. But there is a far more flexible function in the ggplot2
package, the ggplot()
. Eventually you will want to learn how to use this function, because you can build all kinds of beautiful graphics with it.
Goal: by the end of this lab, you will be able to use ggplot2
to build several different data visualizations.
Remember: before we can use a library like ggplot2
, we have to load it:
library(ggplot2)
ggplot2
package?theme
system for polishing plot appearance (more on this later)The big idea: independently specify plot building blocks and combine them to create just about any kind of graphical display you want. Building blocks of a graph include:
Using the ggplot()
function in the ggplot2
package, we can specify different parts of the plot, and combine them together using the +
operator.
Housing prices
Let’s start by taking a look at some data on housing prices:
housing <- read.csv("http://www.science.smith.edu/~jcrouser/SDS192/landdata-states.csv",
header = T, stringsAsFactors = F)
head(housing[1:5])
## State region Date Home.Value Structure.Cost
## 1 AK West 2010.25 224952 160599
## 2 AK West 2010.50 225511 160252
## 3 AK West 2009.75 225820 163791
## 4 AK West 2010.00 224994 161787
## 5 AK West 2008.00 234590 155400
## 6 AK West 2008.25 233714 157458
ggplot()
FunctionStarting with an example. Let’s say we want to make a scatterplot for the relationship between the cost of a structure and the value of the land it sits on. We might use the following qplot()
code.
qplot(y = Structure.Cost, x = Land.Value, data = housing)
Now, we would like to make this same plot using the ggplot()
function. Instead of starting with qplot()
, we will now be using ggplot()
. It’s helpful to see what the the ggplot()
function produces on its own, then we will be adding to that empty plot.
ggplot(housing)
Notice that when you run that code, it produced a gray rectangle. That is exactly what it should produce! We have not yet told ggplot()
what variables we’d like to map to which aesthetics, or which geometric objects we’d like it to draw.
aes
)In ggplot
-land, aesthetic means “something you can see”. And we want to map variables to these different aesthetics. Examples include:
For our example, we will map Structure.Cost
to the y-axis and map Land.Value
to the x-axis.
ggplot(housing, aes(x = Land.Value, y = Structure.Cost))
Now, ggplot()
has all of the information that qplot()
had, but there are no points! Why? qplot()
guesses which geometiric object you probably want to draw based on the variable types you are mapping, but you will have to tell ggplot()
what to draw.
geom
)Geometric objects or geoms
are the actual marks we put on a plot. Examples include:
geom_point
, for scatter plots, dot plots, etc)geom_line
, for time series, trend lines, etc)geom_boxplot
, for, well, boxplots!)A plot must have at least one geom
; there is no upper limit. You can add a geom
to a plot using the +
operator.
You can get a list of available geometric objects by simply typing geom_
in Rstudio and waiting. Give it a try.
Finally, we can add (+
) geom_point()
to our ggplot()
statement to reproduce what qplot(y = Structure.Cost, x = Land.Value, data = housing)
gave us.
ggplot(housing, aes(x = Land.Value, y = Structure.Cost)) +
geom_point()
Each type of geom
accepts only a subset of all aesthetics—refer to the geom
help pages to see what mappings each geom
accepts. Aesthetic mappings are set with the aes()
function.
Now, you’re ready to make your first ggplot
: a scatterplot.
hp2013Q1 <- filter(housing, Date == 2013.25)
ggplot()
showing the relationship bewteen Structure.Cost
(y-axis) and Land.Value
(x-axis).You can change the color of a geom
by adding color =
and then any of the allowed colors in double quotes.
ggplot(housing, aes(x = Land.Value, y = Structure.Cost)) +
geom_point(color = "red")
You can change colors in ggplot()
simply by typing the name of another color. See all of the color choices here.
color = "nameOfcolor"
argument to the geom_point()
function.Other aesthetics are mapped in the same way as x
and y
in the previous example. We can map a third variable region
to color. This makes our bivariate scatterplot into a multivariate data visualization.
ggplot(housing, aes(x = Land.Value, y = Structure.Cost, color = region)) +
geom_point()
region
to color.ggplot2
creates separate graphs for subsets of dataggplot2
offers two functions for creating these subsets:
facet_wrap()
: define subsets as the levels of a single grouping variablefacet_grid()
: define subsets as the crossing of two grouping variablesgeoms
within a plotLet’s start by using a technique we already know—mapping State
to color
. We’ll do this for the states in the “West” region
only.
West <- housing %>%
filter(region == "West")
ggplot(West, aes(x = Date, y = Home.Value, color = State)) +
geom_line()
There are two problems here–there are too many states
to distinguish each one by color
, and the lines
obscure one another.
We can fix the previous plot by faceting
by state
rather than mapping state
to color
:
ggplot(West, aes(x = Date, y = Home.Value)) +
geom_line() +
facet_wrap(~State)
There is also a facet_grid()
function for faceting in two dimensions.
region
to color, re-create this visualization but making different facets by region
. Hint: use the facet_wrap()
function.This lab is based on the “Introduction to R Graphics with ggplot2
” workshop, which is a product of the Data Science Services team Harvard University. The original source is released under a Creative Commons Attribution-ShareAlike 4.0 Unported. This lab was adapted for SDS192: and Introduction to Data Science in Spring 2017 by R. Jordan Crouser at Smith College and then further adapted for SDS201: Statistical Methods for Undergraduate Research by Randi L. Garcia at Smith College.