Section 1 Introduction

R is a popular programming language designed for statistical computing and data analysis. We will be using R in this module and many others. In this tutorial, we will cover the basics of working with R, defining variables and using functions. (In Semester 2, we will introduce data analysis with R.)

As a motivating example, we will study a powerful technique called the Monte Carlo method, which can be used to solve a wide range of problems in probability and statistics.

1.1 Monte Carlo methods

1.1.1 Example 1: winning a game of cards

Canfield Solitaire is a card game where you have to arrange all 52 cards into four ‘foundation’ piles, one for each suit, with the cards in increasing order, following rules about when to turn over cards and when to move cards between the various piles. (You don’t actually need to know the rules for this tutorial, and they are quite hard to understand until you’ve seen someone play the game!)

Could you work out the probability of winning the game?

Working this out analytically would be incredibly hard, but there is another way: the Monte Carlo method, invented by the Polish mathematician and physicist Stanislaw Ulam (1909-1984). He wrote1:

“The first thoughts and attempts I made to practice [the Monte Carlo Method] were suggested by a question which occurred to me in 1946 as I was convalescing from an illness and playing solitaires. The question was what are the chances that a Canfield solitaire laid out with 52 cards will come out successfully? After spending a lot of time trying to estimate them by pure combinatorial calculations, I wondered whether a more practical method than”abstract thinking" might not be to lay it out say one hundred times and simply observe and count the number of successful plays. This was already possible to envisage with the beginning of the new era of fast computers, and I immediately thought of problems of neutron diffusion and other questions of mathematical physics, and more generally how to change processes described by certain differential equations into an equivalent form interpretable as a succession of random operations. Later [in 1946], I described the idea to John von Neumann, and we began to plan actual calculations."

So, we could get an estimate of the probability by

  1. playing the game lots of times;
  2. counting the proportion of times we win.

This might still take a long time, but we could instead get a computer to simulate playing the game a large number of times, and then see how many times the computer wins. Using (computer generated) simulations to estimate probabilities (and other quantities) is known as the Monte Carlo method. We can use the Monte Carlo method to get approximate solutions to lots of problems that would be too difficult to solve otherwise.

Programming the Monte Carlo method for the Canfield solitaire problem would be hard, but we can illustrate the method on some simpler examples.

1.1.2 Example 2: the birthday problem

Suppose there are twenty people in a room. Ignore leap years, and suppose that each person is equally likely to have been born on any day of the year. What is the probability that at least two people in the room have the same birthday?

We can actually solve this problem analytically (which we’ll do later; see also Q13 in the Exercises booklet), but it’s a good example for learning about R and the Monte Carlo method. The Monte Carlo method will involve

  1. getting R to simulate twenty random birthdays;
  2. determining whether any of the simulated birthdays are the same;
  3. repeating steps 1 and 2 lots of times, and counting the proportion of times we observe two or more of the twenty random birthdays to be the same.

We’ll take a first look at R in the next section.

  1. Eckhardt, Roger (1987). Stan Ulam, John von Neumann, and the Monte Carlo method. Los Alamos Science (15): 131–137.