Section 3 The basics
mtcars dataset in R (type
?mtcars for details), suppose we want to show the relationship between fuel economy and car weight. Suppose we use a minimal
plot command, with no additional arguments:
attach(mtcars) plot(wt, mpg)
Such a figure would be unacceptable in a report! As a bare minimum, you should always make sure that you
include proper axes labels: never simply use the R variable names;
specify the units;
give sufficiently detailed captions so that the figure can be understood on its own, and include a conclusion: what do we learn from the figure?
The first two points are obvious, but the third perhaps less so, so we’ll discuss this a little more.
3.2 Choice of plotting symbol
Although this is a matter of taste, you may prefer to use filled circles rather than empty ones. Empty circles may give a better effect if there are lots of points in a scatter plot; overlapping points are easier to see. To change the plotting symbol, use the
pch argument. Type
?points to see the options.
Putting this all together, an improvement on the previous attempt would be as follows (with the caption handled separately).
plot(wt, mpg, xlab = "Weight (lb/1000)", ylab = "Miles/(US) gallon", pch = 16)
3.3 Further customising scatter plots
You can use different symbols and colours to represent additional variables in scatter plots. You need to be careful, however, as
- your reader may print your report in black and white;
- your reader may have some form of colour blindness;
- plots can look messy with large numbers of colours and symbols, particularly if the corresponding groups are not well separated.
(Also, if you’re preparing a plot for a talk, it’s possible the colours won’t project on the screen as you’d expect). Continuing with the
mtcars dataset, suppose we want to display the number of cylinders for each model of car, on our scatter plot. The variable
cyl is a
numeric variable, so if we use
cyl to specify a vector of colours and symbols, we will get colour and symbol numbers 4 (blue cross), 6 (pink triangle) and 8 (grey star) used in the plot:
plot(wt, mpg, xlab = "Weight (lb/1000)", ylab = "Miles/(US) gallon", col = cyl, pch = cyl)
This may not be desirable (I think the choice of colours and symbols above looks odd). We can create new vectors to specify plot colours and symbols directly, which is a little more effort, but worth it in the end.
cyl.symbol <- rep(15, 32) # use a square for 4 cylinders cyl.symbol[cyl == 6] <- 19 # use a (slightly larger) circle for 6 cylinders cyl.symbol[cyl == 8] <- 17 # use a triangle for 8 cylinders cyl.colour <- rep("black", 32) cyl.colour[cyl == 6] <- "red" cyl.colour[cyl == 8] <- "blue"
Now we can use these new vectors in the plot. We must also add a legend (and modify the caption).
plot(wt, mpg, xlab = "Weight (lb/1000)", ylab = "Miles/(US) gallon", col = cyl.colour, pch = cyl.symbol) legend("topright", legend = c(4, 6, 8), col = c("black", "red", "blue"), pch = c(15, 19, 17), title = "no. of cylinders")
(We have to be careful here to make sure the legend matches the data correctly. Legends are easier to manage in
Plots with large numbers of different symbols/colours can look a little messy, particularly if the groups are not very well distinguished. In this case, it may be worth merging groups. For example, if there was particular interest in cars with 8 cylinders, we could treat cars with 4 and 6 cylinders as a single group.