In the lead up to the UK referendum on EU membership, Doug Rivers and I posted an analysis of several weeks of YouGov polling data, using a methodology called multilevel regression and post-stratification (MRP). This is a different approach to analysing polling responses than the approach YouGov uses to analyse most of its UK polls, including those released immediately before and after the referendum on 23 June. The MRP approach, in addition to yielding several interesting findings that we discussed in that post on YouGov’s site regarding the interactions of age, educational qualifications, party and referendum vote, also aims to better correct for demographic imbalances in raw polling samples. Now that we know the results, we can say that the MRP analysis provided a better, if still not perfect, estimate of the result of the election than conventional methods used by YouGov and most other pollsters.
In our article several days before the referendum, we reported estimates of leave support at 51.0. During the day before the referendum, we re-ran the model and estimated leave at 50.1, however once we were able to include that evening’s polling responses on the morning of the referendum, our final estimates found leave at 50.6, versus the final result of 51.9. Given my stated level of confidence, I consider this a success:
If the result is between 48 and 52, I will consider the prediction a success. Outside 46 and 54, not so much. In between: ¯\_(ツ)_/¯
— Benjamin Lauderdale (@benlauderdale) June 22, 2016
One of the powerful features of the MRP approach is that we could calculate the estimates on different geographies. Our original article included a map showing the results on a map of parliamentary constituencies, here we look at the estimates at the level of the reporting areas, which were local authority districts. Below, we show the projected and actual leave vote shares on the 382 reporting districts.
Our local authority and regional estimates proved to be highly accurate. The correlation of the local authority leave share predications and the actual leave shares in the referendum was 0.92. Very few local authority results were far off of the predicted levels of leave support. 97% of the local authority predictions were within 10 percentage points of the true leave share, and 77% were within 5 percentage points. The worst predictions in either direction were off by less than 14 percentage points: Burnley (predicted 53.3% leave, actually 66.6%) and East Renfrewshire (predicted 39.4% leave, actually 25.7%).
Our turnout model aimed to replicate 2015 general election turnout patterns. This means that our turnout model underestimated turnout overall, and overestimated turnout in Scotland (highlighted in red), where turnout was very high in the 2015 general election and relatively low in the referendum. However, excluding Scotland, generally the local authorities where we expected relatively high turnout based on 2015 and their demographic composition did indeed have relatively high turnout. Overall, the turnout model did well in that it got the general patterns right and the Scotland turnout over-estimate was responsible for only a very small amount of the error in our prediction.
YouGov released an “online exit poll” at 10pm on the day of the referendum, which reported leave at 48 and remain at 52. Since that poll needed to be analysed very quickly, it could not use the MRP analysis that we are discussing here. We have now been able to go back and re-analyse that data, and when we do so, we estimate leave and remain tied at 50, halving the error in the poll as originally reported. This suggests that the MRP analysis is improving on the standard methods for adjusting polling data, but there is still work to be done.
Aside from better publicising what we were doing before the vote, we can identify two major areas for improvement in the future. First, our turnout model was based on the assumption that turnout patterns would look like the general election. This was not a bad assumption, but it did not capture the increase in turnout in the referendum, and led us to over-estimate turnout in Scotland because turnout was unusually high in the general election there and those levels were not maintained in the referendum. Second, our model used education qualifications, which were tremendously predictive of referendum vote, but not work status, social class or occupation. Given the voting patterns in the referendum, including these might have helped further close the gap between our estimates and the result, and including them might prove even more important in a UK general election.