Useful Network Analysis Packages

Two important packages that are commonly used for network analysis include igraph and statnet. Full documentation on igraph is available at http://igraph.org/redirect.html, including functions in the package for , and .

If you have not previously installed these packages, please install them before beginning.

library(igraph, quietly = TRUE)
## 
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
library(statnet, quietly = TRUE)
## network: Classes for Relational Data
## Version 1.13.0 created on 2015-08-31.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
##                     Mark S. Handcock, University of California -- Los Angeles
##                     David R. Hunter, Penn State University
##                     Martina Morris, University of Washington
##                     Skye Bender-deMoll, University of Washington
##  For citation information, type citation("network").
##  Type help("network-package") to get started.
## 
## Attaching package: 'network'
## The following objects are masked from 'package:igraph':
## 
##     %c%, %s%, add.edges, add.vertices, delete.edges,
##     delete.vertices, get.edge.attribute, get.edges,
##     get.vertex.attribute, is.bipartite, is.directed,
##     list.edge.attributes, list.vertex.attributes,
##     set.edge.attribute, set.vertex.attribute
## 
## ergm: version 3.7.1, created on 2017-03-20
## Copyright (c) 2017, Mark S. Handcock, University of California -- Los Angeles
##                     David R. Hunter, Penn State University
##                     Carter T. Butts, University of California -- Irvine
##                     Steven M. Goodreau, University of Washington
##                     Pavel N. Krivitsky, University of Wollongong
##                     Martina Morris, University of Washington
##                     with contributions from
##                     Li Wang
##                     Kirk Li, University of Washington
##                     Skye Bender-deMoll, University of Washington
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("ergm").
## NOTE: Versions before 3.6.1 had a bug in the implementation of the
## bd() constriant which distorted the sampled distribution somewhat.
## In addition, Sampson's Monks datasets had mislabeled verteces. See
## the NEWS and the documentation for more details.
## 
## networkDynamic: version 0.9.0, created on 2016-01-12
## Copyright (c) 2016, Carter T. Butts, University of California -- Irvine
##                     Ayn Leslie-Cook, University of Washington
##                     Pavel N. Krivitsky, University of Wollongong
##                     Skye Bender-deMoll, University of Washington
##                     with contributions from
##                     Zack Almquist, University of California -- Irvine
##                     David R. Hunter, Penn State University
##                     Li Wang
##                     Kirk Li, University of Washington
##                     Steven M. Goodreau, University of Washington
##                     Jeffrey Horner
##                     Martina Morris, University of Washington
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("networkDynamic").
## 
## tergm: version 3.4.0, created on 2016-03-28
## Copyright (c) 2016, Pavel N. Krivitsky, University of Wollongong
##                     Mark S. Handcock, University of California -- Los Angeles
##                     with contributions from
##                     David R. Hunter, Penn State University
##                     Steven M. Goodreau, University of Washington
##                     Martina Morris, University of Washington
##                     Nicole Bohme Carnegie, New York University
##                     Carter T. Butts, University of California -- Irvine
##                     Ayn Leslie-Cook, University of Washington
##                     Skye Bender-deMoll
##                     Li Wang
##                     Kirk Li, University of Washington
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("tergm").
## 
## ergm.count: version 3.2.2, created on 2016-03-29
## Copyright (c) 2016, Pavel N. Krivitsky, University of Wollongong
##                     with contributions from
##                     Mark S. Handcock, University of California -- Los Angeles
##                     David R. Hunter, Penn State University
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("ergm.count").
## NOTE: The form of the term 'CMP' has been changed in version 3.2
## of 'ergm.count'. See the news or help('CMP') for more information.
## sna: Tools for Social Network Analysis
## Version 2.4 created on 2016-07-23.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
##  For citation information, type citation("sna").
##  Type help(package="sna") to get started.
## 
## Attaching package: 'sna'
## The following objects are masked from 'package:igraph':
## 
##     betweenness, bonpow, closeness, components, degree,
##     dyad.census, evcent, hierarchy, is.connected, neighborhood,
##     triad.census
## 
## statnet: version 2016.9, created on 2016-08-29
## Copyright (c) 2016, Mark S. Handcock, University of California -- Los Angeles
##                     David R. Hunter, Penn State University
##                     Carter T. Butts, University of California -- Irvine
##                     Steven M. Goodreau, University of Washington
##                     Pavel N. Krivitsky, University of Wollongong
##                     Skye Bender-deMoll
##                     Martina Morris, University of Washington
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("statnet").
## unable to reach CRAN
library(Matrix, quietly = TRUE)

Calculating In-Degree, Out-Degree, Eigenvector, Betweenness, and Closeness Centralities

  • in-degree centrality: number of neighbors that point to a node in a directed network
  • out-degree centrality: number of neighbors that a node points to in a directed network
  • eigenvector centrality: measures the importance of node based on how influential its neighbors are to the network
  • betweenness centrality: based on shortest paths, it represents the degree to which nodes lie between one another
  • closeness centrality: average length of the shortest paths from a node to all other nodes in the network

EXAMPLE: Political Blogs Network

Download Data and Convert to Matrix

For demonstration, we will look at the political blogs data set from

This data set contains roughly 1200 blogs, which were labeled as either Republican or Democrat. Directed edges represent hyperlinks from one blog to another.

#download raw edgelist
pblog.data <- read.table("https://raw.githubusercontent.com/jdwilson4/Network-Analysis-I/master/Data/polblogs.txt", sep = " ", header = TRUE, stringsAsFactors = FALSE)
pblog.edgelist <- as.matrix(pblog.data) + 1

pblog.labels <- as.matrix(read.table("https://raw.githubusercontent.com/jdwilson4/Network-Analysis-I/master/Data/polblogs_labels.txt", header = FALSE, stringsAsFactors = FALSE))

# set colors 
# 0 --> 4 = blue
pblog.labels <- replace(pblog.labels, pblog.labels == 0, 4)
# 0 --> 2 = red
pblog.labels <- replace(pblog.labels, pblog.labels == 1, 2)

Convert Network Edge Data

# create igraph from edgelist
pblog.igraph <- graph.edgelist(pblog.edgelist)
# adjacency matrix from igraph
pblog.adjacency <- as_adj(pblog.igraph)

First, let’s identify what the shortest paths are between each pair of nodes. The following treats shortest path calculations as the shortest path of the corresponding undirected graph. Then we’ll calculate the diameter of the graph - the longest shortest path in the network.

shortest.distances <- distances(pblog.igraph, mode = "all") 

#network diameter: the longest shortest path that is not infinite
max(shortest.distances[which(shortest.distances < Inf)])
## [1] 8
#what is the max shortest path?
max(shortest.distances)
## [1] Inf
#infinite tells us that there are isolates!

We see that there are isolate blogs. These can certainly affect any analysis. This is also seen from the below visualization.

# create statnet object from edgelist
pblog.network <- network(pblog.edgelist)
plot(pblog.network, main = paste("Political Blog Network"), usearrows = TRUE, edge.col = "grey50", vertex.col = pblog.labels)

Let’s remove them, and then re-plot the subnetwork.

# remove isolates
connected.nodes <- degree(pblog.network) >= 1
sub.pblog.igraph <- induced.subgraph(pblog.igraph, V(pblog.igraph)[connected.nodes])
sub.pblog.edgelist <- as_edgelist(sub.pblog.igraph)
sub.pblog.adj <- as_adjacency_matrix(sub.pblog.igraph)

# convert to statnet network
sub.pblog.network <- network(sub.pblog.edgelist)

# subset labels of connected nodes
sub.labels <- subset(pblog.labels, degree(pblog.network) >= 1)

# plot subnetwork
#first get the x and y coordinates of the graph
sub.pblog.x <- plot(sub.pblog.network, main=paste("Political Blog Subnetwork"), usearrows = TRUE, edge.col = "grey50", vertex.col=sub.labels)