Section 6 Axes scaling

Here, we consider the medals dataset from the MAS6005 library. The aim is to compare total number of medals won against population size in the 2016 Rio olympics.

plot(medals$population,
     medals$total,
     xlab = "Population size",
     ylab = "Total number of medals won")

A couple of problems here:

  • the scientific notation (“e+”) on the \(x\)-axis isn’t very friendly;
  • there are two countries with much larger populations than the rest. This ‘distorts’ the plot somewhat, in that a lot of the remaining points are bunched together.

We could use a log transformation of the population size (perhaps the first thing you’d try if you were fitting a regression model). We’ll use a \(\log_{10}\) transformation

plot(log10(medals$population),
     medals$total,
     xlab = "log (base 10) population size",
     ylab = "Total number of medals won")

This fixes the two problems, but if you’re writing for a general audience, the reader may struggle with the idea of ‘log population size’.

An alternative is to use a log (base 10) scale on the \(x\)-axis. Visually, this will give the same pattern as plotting total medal against \(\log_{10}\) population, but the plot is (perhaps) a little easier to interpret, as we can still refer to the raw population, not the \(\log_{10}\) population. We can tidy things up further, by plotting the population in millions.

plot(medals$population/1000000, 
     medals$total,  log = "x", 
     axes = FALSE,
     xlab = "population (millions)",
     ylab = "total number of medals won")
ticks <- c(1  ,  10,  100, 1000)
axis(side = 1, at = ticks, labels = ticks)
axis(side = 2)

However, you may now need to explain to your reader how to interpret the log-scaled axis! In the following, for example, some readers may think the dashed line represents a population of 50 million (whereas it actually represents a population of \(100^{\frac{3}{4}}\) million).