Eliciting a distribution for the EU election turnout

This is my second attempt to elicit my own distribution, and see how well I can justify my judgements based on my available evidence.

The quantity of interest

My quantify of interest \(X\) is the turnout (as a percentage of registered voters) in the UK for the EU election on May. (I was planning to do the EU-wide turnout instead, but the UK situation is more interesting!)

My evidence dossier

  1. The turnout in the UK for previous EU elections was as follows.
year 1979 1984 1989 1994 1999 2004 2009 2014
turnout 32.35 32.57 36.37 36.43 24.00 38.52 34.70 35.60

The 1999 election stands out as having a low turnout. This election was held on the 10th June, whereas the UK local elections were held on 6th May; EU and local elections have been on the same date since.

  1. Turnout for other UK polls is given here, where I note the 72.2% turnout for the EU referendum.

  2. Opinion polls typically ask participants for their likelihood of voting, on a scale of 0 (certain not to vote) to 10 (certain to vote). Poll results are collated at www.WhatUKThinks.org/EU run by NatCen Social Research. The four most recent are shown below (earlier polls in April had similar results)

Fieldwork End Date / Pollster 0 1 2 3 4 5 6 7 8 9 10
9 May 2019 / Survation 14% 2% 2% 2% 2% 11% 4% 3% 6% 7% 48%
9 May 2019 / YouGov 13% 2% 3% 2% 2% 10% 4% 4% 6% 6% 49%
10 May 2019 / BMG Research 15% 2% 2% 2% 2% 10% 3% 4% 5% 7% 48%
10 May 2019 / Opinium 9% 1% 1% 1% 0% 15% 1% 2% 6% 5% 58%
  1. Ideally, I’d try to find out a bit more about the Opinium poll. I don’t know how accurate these sorts of polls typically are; whether participants answer truthfully, or what to make of responses in the 1-9 category. From a quick investigation, a YouGov poll for the 2017 UK General election had 67% of participants on the ‘10’ (certain to vote) category, and the actual turnout was 69%. Presumably not everyone responding ‘10’ will actually vote, but this might be offset by some of the ‘undecideds’ choosing to vote.

  2. I’d like to know what the turnout was in the recent local elections, but I’ve not been able to find a summary. Informally, looking at some Sheffield results, the turnout for 2019 seemed fairly similar to 2018; the current Brexit situation didn’t seem to result in a surge in turnout.

My plausible range

I’m going to set an upper plausible limit of 80%. A turnout this high would exceed that for the EU referendum; I can’t see that happening.

Choosing a lower plausible limit is a bit trickier. Without the opinion polls, I’d look at the 1994 turnout where, as here, there was no other election at the same time; based on that, a turnout lower than 24% has to be at least plausible. I had imagined that some Brexit-supporting voters might choose to boycott the election, and I was a bit surprised to see the poll numbers as high as they were. At this point I’ll choose a lower limit of 20%, although I think I’ll end up giving negligible probability anywhere close to this limit.

My probability judgements

I think item 3 is the most important piece of evidence, but I have little/no evidence to understand (a) what the ‘undecideds’ will do; (b) differences between the polls; (c) whether some participants don’t respond honestly; (d) whether anyone might change their mind about voting in the last few days in the run-up to the election; (e) what the margin of error would be. My only relevant evidence here is the single observation in item (4).

I’ll judge a median of 50%, favouring the three polls with similar results. If I was going to give high weight to item 4, I should expect \(X\) to be fairly close to 50%, but the issues above, I think I should express a little more uncertainty.

I’ll judge \(P(X>60\%) = 0.15\), and \(P(X\le 40\%) = 0.05\). A turnout below 40% would be in line with previous EU elections, but would mean quite a large error from the polls. I’ve given a higher chance to a turnout above 60%, which I think could result from some ‘undecideds’ changing their minds.

Fitting a distribution

This will fit all the distributions currently available in the SHELF package:

myfit <- SHELF::fitdist(vals = c(0.4, 0.5,  0.6),
                        probs = c(0.05, 0.5, 0.85),
                        lower = 0,
                        upper = 1)

I’ll now compare the different distributions in the tails, and also look at the tertiles:

SHELF::feedback(myfit, quantiles = c(0.01, 0.33, 0.66, 0.99))$fitted.quantiles
##      normal     t gamma lognormal  logt  beta  hist
## 0.01  0.322 0.222 0.342     0.349 0.285 0.326 0.080
## 0.33  0.472 0.474 0.470     0.470 0.473 0.471 0.462
## 0.66  0.539 0.532 0.538     0.537 0.532 0.539 0.546
## 0.99  0.690 0.786 0.712     0.726 0.886 0.685 0.973

I like the log normal distribution: a one in a hundred chance of beating the referendum turnout sounds right to me. Perhaps only giving a one in three chance to the turnout lying in the range 47%-54% is being a little too cautious, but we will see…

The fitted distribution is shown below.

SHELF::plotfit(myfit,
               d = "lognormal",
               percentages = TRUE,
               ql = 0.01, qu = 0.99,
               xl = 0, xu = 1)

Postscript

The actual turnout was announced about ten days after writing this post: 37%. I could have got much closer to this, had I ignored the poll data, but I don’t think I had good reason to do so at the time, so I’m not too embarrassed with the distribution I came up with. (It’s interesting to see here that turnout did increase quite a lot in some countries, e.g. Germany, from previous years; it just didn’t happen in the UK.)

Related