Update

The polls still disagree on Quebec Solidaire support

Ask (on Twitter) and you shall receive!

Joseph Angolano, Mainstreet vice-president, sent me the link to the full report, so I got to update my voting intention graphs (with error bars!) before the undecided are allocated.

Voting intentions before distribution in polls completed between 7 and 10 September 2018

I’ve only included the last Mainstreet and Léger polls because those of CROP, Ipsos, and Forum were conducted before the campaign had even officially started.

You can see that all results are compatible (the error bars overlap) except for Quebec Solidaire:

Voting intentions for QS before distribution in polls completed between 7 and 10 September 2018

Indeed, none of the error bars touch the 12.5% line.

Last night, when the Léger results were announced, Too Close To Call’s Bryan Breguet pointed out that Léger and Mainstreet disagreed on voting intentions for the CAQ and for QS:

En gros Léger et Mainstreet s’entendent parfaitement sur le PLQ et le PQ, mais Léger a la CAQ 5 points plus élevé et QS 5 points plus bas. Intéressant que les différences soient entre CAQ et QS1.

He comes to that conclusion using scores after distribution. As we have seen, before distribution both polling firms agree that CAQ voting intentions lie somewhere between 27% and 31%.

In this morning’s blog post, Bryan expanded on the topic. He returned to the difference between Mainstreet’s local and province-wide polls, which he had mentioned in yesterday’s blog post:

Mainstreet et Léger en fait s’entendent parfaitement sur le PLQ et le PQ. Par contre ils ont des chiffres fort différents pour la CAQ et QS. Mainstreet a ces partis à respectivement 31% et 16% alors que Léger les a à 35% et 11%. Une différence de 4-5 points pour chaque parti. Qui dit vrai? Impossible d’y répondre pour sûr mais les sondages par comté de Mainstreet sont bien plus cohérents avec une Coalition à 35%-36% et QS à 11%. Ainsi je serais tenté de dire que Léger a possiblement raison ici. Mais il nous faudra attendre d’autres sondages (et en fait l’élection) pour en être sûr2.

If you can read French, I highly encourage you to read his blog post on the disagreement between Mainstreet’s province-wide and riding polls. To entice you to read the whole thing for yourselves, here’s the table he comments:

Résultats moyens des partis dans les 31 circonscriptions avec des sondages locaux
Source: Breguet, Bryan. “Les sondages par circonscription indiquent un raz de marée CAQ. Ont-ils raison?Too Close To Call (blog), 10 September 2018.

So let’s recap. On the one hand, province-wide Léger and Mainstreet polls disagree on QS support. On the other hand, the results of Mainstreet’s riding polls fit better with the picture painted by Léger (CAQ higher, QS lower).

Source data

You can access the spreadsheet from which the charts were generated on Google Spreadsheets.

Notes

Reaction

Data visualization and collection mode

I left you hanging last Friday when I promised a new data visualization of the most recent polls. To refresh your memory, the margin of error depends on the score in the poll (it increases when the score gets closer to 50%) and the sample size (one goes up while the other goes down). It does not depend on the size of the population of which you want to know the opinion.

As promised

I did a graph similar to the one in Qc125 (with margins of error this time) for the last three polls in that Qc125 diagram. I added the Forum poll (conducted on 23 August with 965 respondents) and the last Léger (conducted from 24 to 28 August with 1010 respondents).1

I first tried to do it in Google Spreadsheets, so you could access the file and check everything out. However, I could only add an error bar that was either a constant or a percentage. As we saw on Friday, polling margins of error are a bit more complicated than that.

I also tried with Excel and its open-source equivalent LibreOffice but bumped into the same problem: there was no way of defining a different error bar for each point. It doesn’t come as much of a surprise, then, that there are so few representations of polling data with margins of error.

I had managed just fine by using candlestick charts (used to describe movements in the stock market), but Martin objected that they were ugly. Hence, to please the pole in our tandem in charge of graphics, I pulled out the big guns and programmed the graph in R, an open-source statistical analysis software.

After too many hours fiddling about, here’s what I got2:

Voting intentions before distribution in polls completed between 20 and 28 August 2018

Each point situates the party’s score in the poll. The vertical line contained within the two horizontal lines describes the confidence interval if you take into account the margin of error at 95% (or 19 times out of 20). You can see that the lines higher up are longer than the lower ones. As we said at the beginning, the margin of error increases with the proportion (or rather with its proximity to 50%).

By comparing the scores of different parties vertically within a single poll, we see that:

  • in CROP, the CAQ and the Liberals are statistically tied;
  • in Forum, the Liberals are statistically tied to the PQ instead (with the CAQ way ahead);
  • in Léger, voting intentions for the CAQ and the Liberals overlap and are therefore statistically tied as in CROP.

Differences in data collection mode

Too Close To Call’s Bryan Breguet looked into QS’s diverging polling scores in a blog post last Thursday. He was troubled by the fact that the disagreement follows data collection lines:

Voting intentions for QS before distribution in polls completed between 20 and 28 August 2018

You can see that three polls place the party below 10% and two above (the ones that use IVR or robocalls). More importantly, the results of these two groups don’t overlap, even if we take into account the margin of error. (None of the horizontal bars touches the 10% line.)

Mainstreet and Forum use IVR and get results significantly higher than CROP or Léger using online polls, and Ipsos. The latter combines online polling with good old live callers: humans talking to other humans over the phone to ask them polling questions.

Same old?

Bryan ran 10,000 simulations and came to the conclusion that either Mainstreet or Léger was wrong. It assumed that “real” voter intentions for QS were at the 10% mark. He simulated for a sample size of 1,010 respondents, as was the case in Léger.

On the horizontal axis are voting intentions for Québec Solidaire (centred at 10% because that’s his starting assumption). On the vertical axis is the number of simulations for which QS got a given score.

Distribution of 10,000 simulations
with QS at 10% and a sample size of 1,010

Distribution de 10000 simulations avec QS à 10% et une taille d’échantillon de 1010
Source: Breguet, Bryan.“@Alex_Blanchet Voici la distribution de 10000 simulations avec QS à 10% et taille d’échantillon de 1010. Est-ce possible d’etre sous les 8? oui mais peu probable. Et être au-dessus de 13 (avec taille ech de 2650) est quasi impossible”. Tweet. @2closetocall, 30 August 2018.

Léger has QS at 6%, but we see very few simulations peg the left-wing party under 7%. For Mainstreet, Bryan uses data from the nightly polls (available through a paid subscription). Québec Solidaire had at the time 13,1% (it has since smashed the 15% barrier). Once again, nearly to simulations at all came up with such a high result.

An effect limited to QS voting intentions

When we turn to the other parties, we see that there is no systematic bias according to the data collection mode.

Using IVR, Forum places CAQ and the PQ way ahead of other pollsters, beyond the margin of errors.

Voting intentions for CAQ before distribution in polls completed between 20 and 28 August 2018 Voting intentions for PQ before distribution in polls completed between 20 and 28 August 2018

In the case of Liberals, CROP is the pollster that pegs them uncharacteristically high.

Voting intentions for Liberals before distribution in polls completed between 20 and 28 August 2018

We’ll therefore be keeping a close eye on how the differences in scores between pollsters evolve. They only seem to matter when trying to determine the composition of the National Assembly because it seems that we already know which party will take over the government if the election was held today: Too Close To Call’s Sunday post discusses CAQ’s over 99% chances of winning.

Source data

You can access the spreadsheet from which the charts were generated on Google Spreadsheets.

Notes

Reaction

A tale of disagreeing polls: margin of error edition

It seems that Bryan Breguet answered a tad too quickly to Marc-Antoine Berthiaume’s Tuesday tweet pointing out the enormous difference between Léger and Mainstream polls regarding Quebec Solidaire support amongst voters aged 18 to 34. (To find out what the heck I’m talking about or to refresh your memory, read my Wednesday post, “Younger voters and polling variability.”)

Mainstreet big wigs have launched a campaign on Twitter to assert just how confident they are about their polling results (and claiming in passing that Léger’s are out of whack). Here is one of their most recent tweets:

Nous assistons à une croissance réelle de QS au cours des derniers jours. Nous le ressentons de manière anecdotique, et nous le voyons dans nos sondages nocturnes. Quelque chose est en train de se passer.1

(I suppose that by “sondages nocturnes,” the Mainstreet vice-president means “nightly polls.”)

Observing the debate on Twitter, Suzanne Lachance, a former spokesperson for the Rassemblement pour l’alternative progressiste (RAP)—one of Quebec Solidaire’s grandparents,— summed up the situation:

Bon, en plus des querelles de politiciens, nous avons droit aux querelles de sondeurs… 😉2

Bryan’s simulations

To settle the matter, Bryan made 20,000 simulations, starting from the assumption that “actual” support for QS in that age group is in fact the average of the two polls’ scores: 18.4%. He posited a sub-sample size of 150 respondents (the size of Léger’s sub-sample).

He found that it was highly improbable, though not completely impossible, that, if QS is actually at 18.4% amongst voters aged 18 to 34, one poll would get 8% and another would get 25.9%. The bar chart below shows the number of polling simulations (vertical axis) for which a given score (horizontal axis) was reached for QS support with young people aged 18 to 34.

Simulations au sujet des intentions de vote des 18 à 34 ans à l'égard de Québec solidaire
Source: Breguet, Bryan. “Mais possiblement en raison des faibles tailles d’échantillons (150 chez Léger, 525 chez Mainstreet, les autres entre les deux). J’ai fait 20,000 simulations avec #QS en centrant à la moyenne des sondages (18.4%). Taille d’échantillon théorique pour les simulations: 150.” Tweet. @2closetocall, 30 August 2018.

He came to the conclusion that one of the two polls is probably out of whack (but there’s no way of knowing which one because there would need to be an election right now, not in a month’s time).

Actually, support for QS amongst voters aged 18 to 34 must be either higher, either lower than 18.4%. If it was higher, the curve would be shifted to the right, and the Mainstreet score (25.9%) would no longer be as improbable. In contrast, if it was lower, the curve would be shifted to the left, and the Léger score (8%) would no longer be impossible.

Léger and Mainstreet are the extremes, but neither one nor the other is completely isolated, as can be seen in this bar chart of QS voting intentions for 18- to 34-year-olds:

Diagramme à bandes des intentions de vote pour QS chez les 18 à 34 ans pour chacune des cinq firmes de sondage
Source: Breguet, Bryan. “Ok, dernier regard sur les 18-34 ans pour @QuebecSolidaire et les différences entre sondeurs. Tout d’abord, voici le score de QS aprmi les 18-34 ans (électeurs décidés et enclin) chez les 5 firmes.” Tweet. @2closetocall, 30 August 2018.

Here’s how Bryan sums up the situation:

En conclusion: les différences observées entre sondeurs pour QS chez les 18-34 ans ne peuvent pas être complètement expliquées par les marges d’erreur et tailles d’échantillons. Il y a quelque chose d’autre. Après, j’avoue ne pas avoir d’explication actuellement.3

So what’s this margin of error he’s talking about? Is it always ±3, 19 times out of 20?

What factors into the margin of error

Ok, so I’m going to include a formula for those for whom it makes life easier, but don’t worry, I’ll jump directly to the implications.

The margin of error at the 95% level (hence 19 times out of 20) is 1.96 standard deviations or:

Formule de la marge d'erreur à 95%
where p is the proportion (the percentage for that answer in the poll: 8% in Léger and 25.9% in Mainstreet) and n is the sample size (the number of respondents).

That means that:

  • The margin of error is not dependent on the size of the population you want to study. Whether you want to find out the opinion in a single riding or in the entire province of Quebec does not affect the margin of error of a given poll.

In other words, it’s not because you’re studying a smaller population that you can settle for a smaller sample: the margin of error depends on the sample size, not the size of the population.

  • The margin of error goes up when the sample size goes down (that’s much more intuitive).
  • The margin of error also depends on the poll result (the proportion): the lower the percentage (or, more accurately, the further away from 50%), the smaller the margin of error. It’s therefore not always ± 3 (or the margin of error given at the beginning of the poll), 19 times out of 20.

The confidence interval spreads from the value of the percentage minus the margin of error to the value of the percentage plus the margin of error.

Visualizing the margin of error

Qc125 charts presenting polling results do not show the margin of error and give the impression that it’s showing a variation across time (with the line joining the observations). I don’t like these data visualization decisions.

Comparaison des intentions de vote du 10 au 21 août 2018 telle que présentée par Qc125
Source: Fournier, Philippe J. “La CAQ se maintient en territoire majoritaire.” L’actualité, 27 August 2018.

At least, the visualization contains all the information needed to calculate the margins of error for each observation: the percentage (p) is written in the circles and the sample size (n) is at the bottom of each “column” (on top of the data collection mode and field dates, which don’t influence the margin of error4).

In my next post, I’ll offer you a slightly different way of visualizing poll results and dig deeper into the differences between polling firms.

Notes