Overconfidence

Overconfidence

Conflicting state and national polls for the 2020 Democratic presidential nomination are a common and ongoing occurrence. The polls illustrate the substantial theoretical limitations of survey sampling. Pollsters, poll sponsors, and polling aggregators tend to ignore those limitations in favor of assigning a level of accuracy to the polls that are unjustified and unrealistic. The 95% confidence interval reported by pollsters is as misleading as it is misunderstood. Here is an example:

In late August, Monmouth University released the results from a national survey among a subsample of 298 of respondents identifying themselves as Democrats or leaning Democrats showing Elizabeth Warren and Joe Biden tied in the race for the Democratic presidential nomination at 20% each and Bernie Sanders at 16%. Biden had dropped from 32% in a Monmouth poll completed in June while Warren was up from 15% and Sanders was up from 14% compared to that June survey.

At the same time, a nationwide YouGov survey including a subsample of 559 Democrats had Biden at 22%, Sanders at 19%, and Warren at 17%. Four polls released later in the week, however, had Biden up by 13 points in three of the polls and Biden up by 18 points in the other poll.

In response to this, Patrick Murray, the director of the Monmouth University Poll, released a statement that included the following:

As other national polls of 2020 Democratic presidential race have been released this week, it is clear that the Monmouth University Poll published Monday is an outlier. This a product of the uncertainty that is inherent in the polling process. We tend to focus on the margin of sampling error, but that margin is driven by something called the confidence interval which states that every so often you will naturally have a poll that falls outside the standard margin of error. It occurs very infrequently, but every pollster who has been in this business a while recognizes that outliers happen. This appears to be one of those instances.

Nate Silver followed up on FiveThrityEight.com with an article titled "How to Handle an Outlier Poll." He writes, in part:

But Murray doesn’t have any real reason to apologize. Outliers are a part of the business. In theory, 1 in 20 polls should fall outside the margin of error as a result of chance alone. One out of 20 might not sound like a lot, but by the time we get to the stretch run of the Democratic primary campaign in January, we’ll be getting literally dozens of new state and national polls every week. Inevitably, some of them are going to be outliers. Not to mention that the margin of error, which traditionally describes sampling error — what you get from surveying only a subset of voters rather than the whole population — is only one of several major sources of error in polls.

Murray and Silver provide nonsensical technical definitions of outliers. It is impossible for the results of a poll to be outside of its own margin of error. It is easy enough to know what they mean by an outlier — simply a poll that differs from other polls. But, more importantly, they are incorrect about the frequency of outliers in individual election polls.

Sampling theory is based on "frequently and independently" repeating the same survey. Using the sample size and the results from each repeated sampling, margins of error and the resulting confidence intervals are calculated for each sample. As the sampling is repeated using the same sample size and methodology, the theory states that the population values should be contained within the confidence intervals very close to the desired level of confidence, usually 95% of the time for public polls, if an infinite number of samples are drawn.

Jerzy Neyman pointed out when introducing the concept of confidence intervals that frequentist probability theory is "helpless" in providing the true population values from any single sample because the values obtained from the single sample provide no information about the actual population values. Here is a demonstration of sampling error at work. Note it is unlikely that any of the 50 samples exactly match the population values. The average of the 50 samples is more likely than any single sample to match the population. This is why polling aggregation works.

Pollsters generally incorrectly assume that not only their poll results reflect the actual population values (which the sampling error demonstration proves to be untrue), but also the population results fall within the the confidence intervals 95% of the time for a single poll. This is why Murray writes that outliers happen "every so often" and "very infrequently" and Silver puts outliers happening at "1 in 20 polls" (or 5% of the time).

In fact, actual outliers should occur about 25% of the time for individual election polls, or about 1 in 4 polls.

While it would seem that an event with a 95% probability of occurring in the long run would also be very likely to occur in a single event, this is not the case. Calculating for the worst case shows that an event with a 95% probability of occurring in the long run has a worst-case uncertainty of about 29% for a single event, not 5%. It takes an event with a 99.5% probability of occurring in the long run to have a worst-case uncertainty of 5% for a single event. (Hard to believe, but an event with a 71.4% probability of occurring in the long run has a worst-case uncertainty of about 86% for a single event.)

This can be tested empirically by comparing election polls to election results.

Our election polling accuracy ratings for over 5,000 final election surveys from 45 pollsters show on average that 75% of election poll results are within the 95% confidence intervals when compared to the actual vote totals (the population values), confirming the theory for individual polls. (This paper has election polls within the theoretical margins of error 73% of the time for senatorial polls, 74% of the time for gubernatorial polls, and 88% of the time for presidential polls.)

Monmouth University election poll results have been within their respective theoretical margins of error about 83% of the time, which is above average. But that still means about 1 in 6 election polls from Monmouth fell outside their respective theoretical margins of error when compared to the actual election outcomes.

Our accuracy ratings include sample sizes, which allow the accuracy of any poll to be compared to the accuracy of any other poll. The smaller the sample size of a poll, the wider the confidence interval, and, conversely, the larger the sample size of a poll, the narrower the confidence interval.

Monmouth polls tend to have smaller sample sizes, and, therefore, wider confidence intervals (note the sample size of just 298 Democratic primary voters in the national survey above).

In the 2016 presidential race in Wisconsin, for example, the final Monmouth poll was 47% for Clinton and 40% for Trump with a sample size of 403. When the poll results are compared to the actual Wisconsin results of 46.45% for Clinton and 47.22% for Trump, the Monmouth poll results were within the poll's theoretical margin of error with an accuracy score of 0.18. The final Marquette Law School poll in Wisconsin had Clinton at 46% and Trump at 40% with a sample size of 1,255. The results from the Marquette Law School poll, because of the larger sample size, fell outside of that poll's theoretical margin of error, but the poll had an accuracy score of 0.16. The Monmouth poll in Wisconsin was not very accurate, but the results were what could be expected based on sampling theory with such a small sample size. The Marquette Law School poll was more accurate than the Monmouth poll, but it should have been even more accurate based on its larger sample size.

For elections from 1978 through 2018, results from most pollsters perform as sampling theory predicts. Looking at all 45 pollsters and constructing a 95% confidence interval for accuracy by pollster shows that the results from only 4 pollsters fall outside of that 95% confidence interval. They are Harris Interactive, the Trafalgar Group, SurveyMonkey, and We Ask America. In election polling in 2014, for example, SurveyMonkey polls were outside their respective theoretical margins of error for the actual vote about 48% of the time and that trend continued in 2016.

Silver writes that sampling error "is only one of several major sources of error in polls." Sampling error can be determined from the survey data. In the polls we have tested, the average non-sampling error accounts for less than one-third of all error (including "house effects"), making sampling error the major source of error in those polls. This is reflected in our polling accuracy scores. The average absolute accuracy for the 75% of election polls that fall within their respective 95% confidence intervals is 0.08 while the average absolute accuracy for the 25% of election polls that fall outside their respective 95% confidence intervals is 0.29. Unfortunately, sampling error can be substantial, but there is no way to control for sampling error.

As for Monmouth's election polling accuracy, the average absolute accuracy for the 83% of Monmouth University election polls that fall within their respective 95% confidence intervals is 0.09 and the average absolute accuracy for the 17% of election polls that fall outside of their respective 95% confidence intervals is 0.34.

When looking at 2020 election polls, remember that (1) the results from about 1 in 4 polls on average will fall outside the respective confidence intervals for the population values based on sampling theory alone, (2) the average of all polls should be a better indication of the actual state of the race than any single poll, and (3) the results from most pollsters should be included in polling averages.

Update: Election polls from the University of New Hampshire have been within their respective theoretical margins of error about 72% of the time, which is below average. The average absolute accuracy for the 72% of University of New Hampshire election polls that fall within their respective 95% confidence intervals is 0.11 and the average absolute accuracy for the 28% of election polls that fall outside of their respective 95% confidence intervals is 0.33.

ARG Home