Network analysis

Amelia McNamara

August 19, 2016

Networks

In the 1930s, Jacob Moreno was thinking about the “social atom” and making “sociograms” (network diagrams).

(Image via Mark Hansen)

More examples?

Identifying slumlords with SNA

Game of Thrones data

I am a little out of the cultural loop, so I don’t have a lot of contextual knowledge, but the hottest network analysis lately is about Game of Thrones, so…

library(readr)
storm <- read_csv("stormofswords.csv")
head(storm)
## # A tibble: 6 x 3
##   Source  Target Weight
##    <chr>   <chr>  <int>
## 1  Aemon   Grenn      5
## 2  Aemon Samwell     31
## 3  Aerys   Jaime     18
## 4  Aerys  Robert      6
## 5  Aerys  Tyrion      5
## 6  Aerys   Tywin      8

The data is available on the same page as the analysis.

Working with networks

R packages:

Other ways:

Lets try igraph first

library(igraph)
library(GGally)
g <- graph.data.frame(storm)
plot(g)

Defaults are pretty ugly.

Making igraph a bit more readable

plot(g, edge.arrow.size=0.1, vertex.size=7, vertex.label.cex=0.7)

Changing the layout– circle

plot(g, edge.arrow.size=0, vertex.size=7, vertex.label.cex=0.7, layout=layout_in_circle)

Changing the layout – sphere

plot(g, edge.arrow.size=0, vertex.size=7, vertex.label.cex=0.7, layout=layout_on_sphere)

But really, we want the “random” network to be less collapsed

l <- layout_with_fr(g,niter=500)
plot(g, edge.arrow.size=0, vertex.size=7, vertex.label.cex=0.7, layout=l)

The Fruchterman-Reingold layout algorithm used to have more parameters in igraph, but doesn’t seem to allow you to add a repulsion parameter anymore.

(Aside: It can be good to save your layout separately so you don’t get a subtle variation every time you plot.)

Trying to get closer to the Gephi image – curved edges

# Weirdly, you can't pass a variable name as a parameter to the igraph plot method.
E(g)$width <- E(g)$Weight/6
plot(g, edge.arrow.size=0.1, vertex.size = 10, vertex.label.cex=0.7, vertex.color="grey", layout=l, edge.curved=TRUE, vertex.label.color="black")

What about GGally?

Lots of dependencies necessary.

library(GGally)
library(network) # Uses the network package
library(sna)
library(intergraph) # But, we can translate from igraph to network with intergraph
ggnet2(g, label=TRUE)

But, defaults are prettier.

Weighted edges

# ggnet2 allows you to pass variable names!
ggnet2(g, label=TRUE, edge.size = "Weight", edge.alpha=0.5)

AAAAA

Okay, lets do some rescaling and see how nice we can get it

library(dplyr)
storm <- storm %>%
  mutate(ScaledWeight = Weight / max(Weight))
g <- graph.data.frame(storm)
ggnet2(g, label=TRUE, edge.size = "ScaledWeight", node.size = 4, mode = "fruchtermanreingold", layout.par = list(cell.jitter = 0.75, repulse.rad=0.7), label.size = 3, edge.color="black", edge.alpha=0.7)

ggraph

This package was not installing on some systems, so it’s totally optional.

# We're going off the rails a bit here
# library(devtools)
# install_github('hadley/ggplot2')
# install_github('thomasp85/ggforce')
# install_github("thomasp85/ggraph")
library(ggraph)
ggraph(g, 'igraph',algorithm = 'kk') +
  geom_edge_fan(aes(alpha = ..index..)) + 
  geom_node_point() +
  ggforce::theme_no_axes()

Networks in d3 – the lazy way!

library(networkD3)
gd <- get.data.frame(g)
simpleNetwork(storm)

There is more to networks than just node-edge diagrams

Lots of “centralities”

Eigenvector Centrality

library(ggplot2)
ec <- eigen_centrality(g)$vector
ec <- data.frame(ec)
ec$names <- rownames(ec)
ec %>%
  arrange(desc(ec)) %>%
  slice(1:10) %>%
  ggplot() + geom_bar(aes(x=reorder(names, ec), y=ec), stat="identity")

Looks like what Andrew Beveridge and Jie Shan got!

Sizing nodes by a measure

V(g)$size <- ec$ec*10
plot(g, edge.arrow.size=0, vertex.label.cex=0.7, vertex.color="grey", 
     layout=l, edge.curved=TRUE, vertex.label.color="black")

Betweenness centrality

bc <- betweenness.estimate(g, cutoff=10)
bc <- data.frame(bc)
bc$names <- rownames(bc)
bc %>% 
  arrange(desc(bc)) %>%
  slice(1:10)
##           bc   names
## 1  332.97460  Tyrion
## 2  244.63571 Samwell
## 3  226.20476 Stannis
## 4  208.62302  Robert
## 5  138.66667   Mance
## 6  119.99563   Jaime
## 7  114.33333  Sandor
## 8  111.26667     Jon
## 9   90.65000   Janos
## 10  64.59762   Aemon

Not quite what they got in the paper. Hmm.

Page rank

pageRank <- page_rank(g)$vector
pageRank <- page_rank(g, damping = 0.15)$vector
head(as.matrix(sort(pageRank,decreasing = TRUE)))
##                [,1]
## Margaery 0.01282288
## Samwell  0.01238905
## Loras    0.01208873
## Drogo    0.01108128
## Qhorin   0.01097118
## Roslin   0.01088275

Again, not the same as the paper.

Coloring by community

wc <- cluster_walktrap(g)
V(g)$color <- membership(wc)
plot(g, edge.arrow.size=0, vertex.label.cex=0.7, 
     layout=l, edge.curved=TRUE, vertex.label.color="black")

Adjacency matrix

p <- ggplot(storm, aes(x=Source, y=Target, fill=Weight)) +
  geom_raster() +
  theme_bw() +
  scale_x_discrete(drop = FALSE) +
  scale_y_discrete(drop = FALSE) +
  theme(
    # Rotate the x-axis lables so they are legible
    axis.text.x = element_text(angle = 270, hjust = 0),
    # Force the plot into a square aspect ratio
    aspect.ratio = 1,
    # Hide the legend (optional)
    legend.position = "none")
p

Whoa!

Reordering

# library(devtools)
# install_github("hadley/forcats")
library(forcats)
storm_ordered <- storm %>%
  left_join(ec, by = c("Source" = "names")) %>%
  left_join(ec, by = c("Target" = "names"))

storm_ordered <- storm_ordered %>%
  mutate(Source = fct_reorder(Source, ec.x),
         Target = fct_reorder(Target, ec.y))

# swap the data in the plot
p %+% storm_ordered

Change ordering to by betweenness centrality?

storm_ordered2 <- storm %>%
  left_join(bc, by = c("Source" = "names")) %>%
  left_join(bc, by = c("Target" = "names"))

storm_ordered2 <- storm_ordered2 %>%
  mutate(Source = fct_reorder(Source, bc.x),
         Target = fct_reorder(Target, bc.y))

# swap the data in the plot
p %+% storm_ordered2

Light aside – Bertin matrices

Another one

In R

# install.packages("bertin",repos="http://r-forge.r-project.org")
library(bertin)
data(Hotel)
image.bertin(bertinrank(Hotel, ties.method="first"), main= "Hotel data")

plot.bertin(bertinrank(Hotel, ties.method = "first"), main= "Hotel data")

plot.bertin(Hotel, palette=c("white","black"))

On our data, doesn’t work well because it is so big!

mini_storm <- storm_ordered %>%
  filter(Source %in% ec$names[1:10],
         Target %in% ec$names[1:10]) %>%
  mutate(Source = droplevels(Source),
         Target = droplevels(Target)) %>%
  select(Source, Target, Weight)
g2 <- graph.data.frame(mini_storm)
mat <- get.adjacency(g2)
plot.bertin(bertinrank(mat, ties.method="first"))

Hmm…