Section 7 Other graphics packages

7.1 ggplot2

I won’t attempt to cover ggplot2 in any detail here. For an easy-to-read guide, I recommend Chang (2013). For a shorter guide, you could try Chapter 8 of the Cookbook for R, which is by the same author and presented in a similar style, but less comprehensive. The following examples should give you some idea how ggplot2 works. The basic ideas are as follows.

  • The data you want to plot must be in a dataframe. Start off with a ggplot command, specifying the dataframe you are using.
  • Columns in the dataframe are mapped to ‘aesthetics’ using the aes argument. You might map one column to the \(x\)-axis, another column to the \(y\) axis, a third column (a factor variable) to the colour and so on.
  • ‘Layers’ can be added to plots, e.g. one layer for the points, another for lines, another specifying axes labels etc. For example adding the layer geom_point() will add points to your existing plot, using the columns mapped to the \(x\) and \(y\) axes;
  • Plots can be stored as objects, and added to/modified later on.

7.1.1 A scatterplot

library(ggplot2)
ggplot(data = mtcars, aes(x = wt, y = mpg, colour = as.factor(cyl), 
                          shape = as.factor(cyl))) +
  geom_point(size = 2) +
  labs(x = "Weight (lb/1000)", y = "Miles/ (US) gallon", 
       colour = "number of\ncylinders",
       shape = "number of\ncylinders") 
A scatter plot with `ggplot2`. One of the advantages of `ggplot2` is that the legend colours and symbols are correctly matched to the points automatically.

Figure 7.1: A scatter plot with ggplot2. One of the advantages of ggplot2 is that the legend colours and symbols are correctly matched to the points automatically.

Here we have ‘initialised’ the scatter plot with the ggplot() command, and then used geom_point() to add the points. We don’t actually need any arguments in geom_point(), because we have already ‘mapped’ wt to the \(x\)-axis and mpg to the \(y\)-axis with the aes() argument in the ggplot() command, but I have chosen to include a size argument to draw larger points.

7.1.2 Box plots

ggplot(data = na.omit(mvscores), aes(x = innings, y = runs, colour = captain)) +
  geom_boxplot() + 
  scale_x_discrete(labels=c("First innings", "Second innings")) + 
  labs(x = "", y = "runs scored")
A box plot for the `mvscores` data, drawn with `ggplot2`. Here, we've mapped `colour` to `captain`, and the 'blocking' variable `innings` to the x-axis. This helps draw attention to the within block comparisons.

Figure 7.2: A box plot for the mvscores data, drawn with ggplot2. Here, we’ve mapped colour to captain, and the ‘blocking’ variable innings to the x-axis. This helps draw attention to the within block comparisons.

7.1.3 Comparing histograms

labels <- c(no = "Not played as captain", yes = "Played as captain")

ggplot(data = na.omit(mvscores), aes(x = runs, y = ..density..)) +
  geom_histogram(binwidth = 10, center = 5, fill = "white", col = "black") +
  facet_wrap(~ captain, labeller = labeller(captain = labels)) +
  labs(x = "runs scored")
Using `facets` to align plots corresponding to different groups. This can look a little neater than our previous (simple) attempt using `par(mfrow = c(1, 2)`, as we don't need quite so much axis labelling.

Figure 7.3: Using facets to align plots corresponding to different groups. This can look a little neater than our previous (simple) attempt using par(mfrow = c(1, 2), as we don’t need quite so much axis labelling.

7.1.4 Scatterplot with log axis

ggplot(data = medals, aes(x = population / 1000000, y = total))+
  geom_point() + 
  scale_x_log10(breaks = c(1, 10, 100, 1000)) + 
  labs(x = "population (millions)", y = "total number of medals won") 
A scatter plot with a log axis.

Figure 7.4: A scatter plot with a log axis.

7.1.5 Bar chart

ggplot(data = inequality, aes(x = factor(country,
                                         levels = country[order(gini)]),
                              y = gini)) + 
  geom_col(fill = "grey") + 
  coord_flip() + 
  geom_hline(yintercept = c(0.1, 0.2, 0.3, 0.4), col = "white") +
  labs(y = "gini coefficent", x = "") + 
  theme_classic() +
  theme(axis.line=element_blank())
A 'Tufte-style' bar chart. Note that to order the bars, we can't simply reorder the rows in the dataframe (as last time): we need to create a new `factor` variable, with the levels specified in the order that we want to plot.

Figure 7.5: A ‘Tufte-style’ bar chart. Note that to order the bars, we can’t simply reorder the rows in the dataframe (as last time): we need to create a new factor variable, with the levels specified in the order that we want to plot.

7.2 plotly

With plotly, we can produce interactive graphics, where, for example, we can hover the cursor over points to get more information. Here, we produce a scatterplot, where hovering the cursor over a point will display the model of car (the row name in the mtcars dataframe).

library(plotly)
plot_ly(mtcars, x = ~wt, y =~mpg, 
        type = 'scatter',
        mode = 'markers',
        hoverinfo= 'text',
        text = ~rownames(mtcars))

Figure 7.6: An interactive plot, drawn with plotly. Try hovering the mouse over individual points, as well as dragging the mouse to zoom in.

References

Chang, Winston. 2013. R Graphics Cookbook. Farnham: O’Reilly.