How to Read a PollJun 18th, 2008 | By Jonathan Golob | Category: 2008, Featured Articles, Stats
As we approach November, I anticipate a tidal wave of blog posts on polls. Reading the polling data improperly is hazardous to your health. The disconnect between the polling and the 2004 election results nearly resulted in my death. Avoid my mistakes.
1. Remember that polls are always of a population that may or may not resemble who actually goes to the polls. Only pay attention to polls that randomly select respondents. Consider how the poll selects the respondents.
For example, almost all polls used in the presidential race are based off random telephone surveys of landline telephones. I only have a cell phone. Therefore, I am not sampled in the statistical population surveyed.
Thus, even if the poll is perfect, it might not reflect the reality at the polls in the fall, as the populations might not match.
2. A poll only shows a statistically meaningful difference between two candidates if the difference between them is more than twice the margin of error. Most political polls in the United States are designed to have a margin of error of +/- 3%. Therefore, the difference between the candidates must be greater than 6% to be anything other than a tie.
A margin of error of 3% tells us that the true percentage in the population has a 95% chance of being somewhere between three percent above or below the number reported by the survey.
For example, the Rasmussen June 9 2008 poll of Michigan voters has Obama at 45%, McCain at 42%. Statistically, they are tied, as the actual percentage of the population for Obama ranges from 42% to 48%, McCain 39% to 45%. The ranges overlap, and therefore we cannot say that one is leading over the other.
Another fun thing to consider. 95% confidence means that for one in twenty polls, the true population percentage will not be in this range.
The practical meaning of all this? Beware selectively looking at the poll results! If you are selective enough, you can only see the error you want to see. Net result? Suicidal thoughts in November.
3. Often the real trends are smaller than the error ranges of the surveys. We can employ two math tricks to make things better.
First, we can aggregate many surveys together and get an average of percentages. Provided the surveys are independent of one another–that the results of one survey don’t affect another–this makes the error distribution closer to normal by the central limit theorem.
The second trick is to use moving averages as a mathematically safe way to sort out random ups-and-downs in the poll numbers from the real longer term changes in the sampled population.
Think of how much your weight changes each day, by when you’ve last gone to the bathroom, how much water you’ve drank and so on. The change on a day-by-day basis is far larger than what you’ll typically gain or lose in a week. So, if you measure your weight each day, and then average together the last seven days, you end up smoothing out all the variance. Left behind is the actual change on a week-long basis. We can use the same math on the polls.
Quite a few websites are around that basically do all of this for us, limiting themselves to polls with some statistical rigor, base their analysis on the confidence intervals, and aggregate multiple polls together in a moving average. None are perfect, but I’ve taken a shine to electoral-vote.com for it’s non-commercial goodness and openness. I think the site is too aggressive in calling states–Michigan is listed as barely Obama, I think it should be a toss-up–but overall it’s a decent place to start.