On Polls, Political Risk, and Trump

After the first debate in the US presidential contest, Donald Trump was able to point to many online polls declaring him the winner. Had he really won? How well do polls help us to assess political risk?

As the air cleared after the first debate between US presidential candidates Donald Trump and Hilary Clinton, Trump was able to point to many online polls declaring him the winner. This was quite a surprise to many who watched the debate. Was our judgement wrong? Had he really won the debate? Should we now assume a Trump presidency would be a certainty?

In a previous article, I pointed out the problem of relying on bookmaker's odds to assess political risk. Surely polls must be a more reliable guide.

There are a number of dangers we should always consider before we rely on polls to assess risk:

Selection Bias
Self-Selection (Non-Response Bias)
Sample Size
Manipulation by the Pollster
Manipulation by the Poll subjects

Selection Bias

When we see a poll of viewers by Fox News or of readers by Socialist Worker, it's easy to recognize that the poll may represent the views of a distinct group, but are unlikely to represent the views of the whole population. Unless the results of such polls are counter-intuitive ("Fox viewers think Bernie Sanders should be the next president") they are of little interest. "If you are the sort of person who supports Donald Trump, then you support Donald Trump".

But it's not necessarily that easy to spot selection bias. If you interviewed 1000 random people in Times Square, New York, what would that tell you? How about 1,000 Facebook users? 1,000 commuters? Clearly how you pick your selection of people to interview matters.

Even if you try and get as wide a sample as possible, things can go badly wrong. The 1936 Literary Digest Poll is the classic example.

The candidates in the US presidential election of 1936 were Alfred Landon, the governor of Kansas, and the incumbent, Franklin Roosevelt. The Literary Digest mailed out an "election ballot" to 10 million people to predict the result. That was a quarter of all voters. In terms of logistics, it was a massive achievement. They received 2.4 million responses. Their prediction: Landon 57%, Roosevelt 43%.

The fact that you haven't heard of President Landon, suggests that something went badly wrong.

The actual results were Landon 38%, Roosevelt 62%. A landslide for Roosevelt.

The primary problem here was selection bias. The addresses used for the sample were culled from telephone directories and magazine subscription lists. In 1936 telephones were luxury items. Magazine subscribers tended to be richer than the average population. An unemployed person was unlikely to have either a telephone or a magazine subscription At the time, there were 19 million unemployed — a significant fraction of voters. The people surveyed were much better off than average, and economic issues and policies were of major concern to voters due to the Great Depression.

Self-Selection Bias (or Non-Response Bias)

Another criticism of the Literary Digest poll, although it's impossible to know its effects, was self-selection or non-response bias. Only 24% of those polled returned the "ballot paper". Were these disproportionately Landon or Roosevelt supporters? Were those supporting Roosevelt more likely to view the poll as worthless and not bother to respond? It's impossible to tell.

[We note as an aside that today such a response to piece of direct mail would be viewed as an overwhelming success: a response rate of 2.4% would now be regarded as good.]

Self-selection or non-response is always a problem.

A charity (which I will not name) conducted a survey to determine the frequency of sexual assaults on female athletes. It reported that, according to their survey, over 75% of female athletes had been sexually assaulted. The problem: it was a mail survey with a 3% response rate. Athletes who had been sexually assaulted were more likely to reply than those who hadn't. All we can safely say is that at least 2.25% of surveyed female athletes reported they had been sexually assaulted. The actual number could have been anywhere between 2.25% and 99.25%.

Sample Size

"In a recent survey, 75% of Americans said they would vote for Clinton." It's perfectly true. Of the last four US voters I spoke to, three indicated that they would vote for Clinton and one for Trump.

So based on my survey should Trump supporters give up on the presidency as a lost cause? Obviously not.

A key question to ask of any survey is "what was the size of the sample?". In this case, the sample size was 4. The good news for Mr. Trump is that there is a 31% chance of getting this result or worse even if the actual chances are 50:50. The sample size is just too small to accurately predict anything. Legitimate polls will tell us both the sample size ("1000 voters"), how these voters were selected ("telephone subscribers in Seattle"), and the confidence interval in the result ("plus or minus three percent 19 times out of twenty").

Manipulation by the Pollster

In the brilliant political satire Yes, Prime Minister. there is a wonderful scene which explains one method of manipulating poll results if you are the person designing the poll. Faced with the British Prime Minister seeking to introduce National Service (conscription) because a poll suggests widespread support for the idea, the Permanent Secretary (Sir Humphrey) explains to the prime minister's Principal Secretary (Bernard) how to produce a contradictory poll result. If you've never seen it, I suggest you watch the YouTube clip now. Even though you know what is happening, I bet you found yourself mirroring Bernard's answers to the poll questions.

This is the least subtle form of manipulation, using context provided by previous questions to get the desired result.

However, there are more subtle forms of manipulation. Psychologists have repeatedly shown that people can easily be affected ("anchored") by irrelevant information (such as the last two digits of a social security number) presented to them just before a question is asked, as well as by what they believe similar (or dissimilar) people have answered to the same question.

Manipulation by the Poll Subjects

It's not always the pollster doing the manipulation. Sometimes it's the people providing the answers:

Lying.
In face-to-face or telephone polling, it is generally recognized that answers viewed as unpopular will be suppressed. People will lie to avoid the embarrassment of holding an unpopular view. Candidates or views which are unpopular tend to be under-represented as a result
Cheating.
I have a confession to make. I once rigged an online poll. The poll was to choose the name of a new in-house magazine. It was a simple matter of modifying a web browser's cookies to allow me vote multiple times. I probably overdid this a bit, as more people voted for my choice than worked for the company, but I still won. Vote early, as the saying goes, and vote often.
Collusion.
Special interest groups have been known to manipulate the results of online polls to fit their agenda. If one of their members identifies a poll relevant to the groups' aims, it is circulated to all its members who then register on the web site and take part in the poll. Members of the anarchic 4chan website are notorious for colluding to manipulate online polls - (mostly) for the fun of it. Notable manipulations include manipulating Time Magazine's Top 100, and trying to get Taylor Swift to play a concert at a school for the deaf. The 4chan (NSFW) and Reddit websites are widely credited with fixing the results of online polls after the presidential debates.

Conclusion

Polls can only be trusted if you know a lot about how they were conducted, and the measures put in place to prevent abuse either from the pollsters or the participants. Polls produced by partisan organizations or websites are easily recognized as worthless because of the opportunities and incentives for abuse, but even honest polls may go badly wrong if there is significant selection bias or participant manipulation in the results.

So when assessing political risk (e.g. the chance's of Trump or Clinton winning the US presidential election), be careful about trusting polls too much. For US politics, the FiveThirtyEight website is a good place to start. (You can find a lot of additional information on their methods in Wikpedia) For UK politics and policies, it's worth looking at the YouGov website. Their post-mortem on the Brexit poll results makes interesting reading.

And if you disregard all poll results as being meaningless, remember that our own assessments are being influenced by confirmation bias and our personal filter bubble. Polls may be bad, but in the absence of data our own opinions may be badly distorted too.

Michael Z. Bell

6 November 2016