Statistricks, part 5: the remainder

“Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write” – Samuel S Wilks, 1951

Authority figures can no longer be trusted to tell the truth. And since most of the news media is now in the hands of private owners with conspicuous agendas, and the few remaining outlets with a shred of integrity are running on fumes, journalists can no longer be relied upon to catch those authority figures out.

Which means there’s really only one gatekeeper left to protect you from disinformation: you.

The lies we’re most familiar with – and therefore the best at seeing through – are the verbal kind: sequences of words that bend, break or obfuscate the truth. But as I hope my last few posts have illustrated, those who wish to mislead us are just as adept at manipulating sequences of numbers, and it turns out we’re not half as good at spotting that.

This is a problem. And since these lies have real, measurable impact (if they didn’t, no one would bother lying), it’s your problem.

No one’s asking you to sign up for a master’s in statistics. You just need to know enough to be able to spot the red flags. So my last post on this subject will be a recap of my previous warnings on the subject, plus a wee list of other common examples of statistical chicanery.

Beware big numbers

We can all easily imagine what 10 items looks like. With a bit of effort, 100. And most of us can probably conjure a vague picture of 1,000 things. But when it comes to millions and billions and trillions, our mental gearboxes just seize up. This is what the propagandists are counting on.

The example I chose, because it is arguably one of the best known and certainly one of the most damaging, was the “£350m a week” claim by the Brexit campaign.

The Remainers were quick, ish, to point out the falsehood. But the battle was already lost. People were no less outraged by the true figure of £150m a week, because all that mattered was that it was a bafflingly large amount.

Big numbers in isolation are meaningless to the average person. To get an idea of their true significance, we need context: in this case, the cost of things of a comparable scale, like, say, the NHS budget (£2bn a week), or defence spending (£1bn a week). Most of all, we needs to know exactly what that money bought. Sure, EU membership cost a lot of money, but did it offer value for that money?

Since its wildly successful field test in the Brexit debate, this tactic is deployed on a daily basis. Whenever something within state competence is revealed as being even slightly less than ideal, the response from the state press officers is the same: trot out a big number.

“A spokesperson for the DfE said education was a top priority for the government, with an extra £2bn for schools for each of the next two years included in the autumn statement.”

Ooh, two billion! That’s a lot! Everything must be fine then.

But without the proper context, this means nothing. An extra £2bn on top of what? Was the annual increase in base funding, if it existed at all, in line with inflation? How does the total compare with the funding levels last year, or 10 years ago? How much is being spent per pupil, and how does this compare with other countries’ efforts? Most importantly of all, is this money enough to meet the current needs of the education system?

To sum up, don’t let your brain switch off when it sees big numbers. If anything, it should move to high alert.

Be on your guard against glitter

Advertisers have long made liberal use of “glitter” – words or phrases that make things sound superficially attractive, but are devoid of substance. Two of the more popular zingers are “more than” and “over”. I once saw a billboard ad for a breakfast cereal that proudly proclaimed: “Contains more than 12 vitamins!”

The reason this works (on the unwary, anyway) is the anchoring effect: the tendency of the human brain to evaluate everything with reference to the first value it encounters. In this case, the anchor value is 12, and “more than 12” signals the set of all numbers greater than 12 – loads! – when a moment’s reflection will tell us that the true figure is 13.

Now it seems politicians and journalists have learned a trick or two from copywriters, and no figure is deemed complete unless it comes with a side of comparatives or superlatives.

Recently, in the course of my subediting duties, I happened across an (unedited) article containing the line “the family were awarded over £8,129 11s 5d in reparation”. My God! Are you telling me those lucky sods received compensation of 8,129 pounds, 11 shillings and six pence?

Another word that sounds great but never survives scrutiny is “record”.

“That is why, despite facing challenging economic circumstances, we are investing a record amount in our schools and colleges.”

Well, Department for Education, I should hope you were investing a record amount every year, given that the population rises every year and that inflation is a thing.

One of the truth-twisters’ favourite buzzwords in the early days of Brexit was “fastest-growing”. Never mind those tired old European countries; we’re going to concentrate on trading with countries that actually have a future!

Here again, crucial context is missing, and the context is that these wonderful new trading partners are growing so fast because they’re starting from a much lower base. As even one prominent Brexit advocate once admitted (about a year before it became their favourite go-to gotcha), the real meaning of “fastest-growing” is “tiny”.

“Of course, if you start from nothing, it’s not hard to become the ‘fastest-growing’ campaign” – Isabel Oakeshott, 20/11/2015

Look at the IMF’s predictions for 2024.

The top five performers on this metric are Guyana (GDP $15bn), Macao ($24bn), Palau ($233m), Niger ($15bn) and Senegal ($28bn).

The GDP of the EU (even without the UK that it desperately needed to survive) is $17.2 TRILLION. That’s more than 200 times the GDP of those five countries combined. Not to mention that they’re all a lot closer and they make a lot more things that British people actually want to buy. Who is it more important to have barrier-free trade with?

Reporters and politicians are still making this same blunder today (“Next PM likely to inherit improved economy after UK growth revised up”).

If this were a sustained trend, it might tell us something significant. But the period over which the data was measured is three months. This is more likely just a course correction after a rough patch for the UK economy than a sign of sunlit uplands. At the very least, we should wait a while before leaping to any conclusions.

Be vigilant with visuals

Graphical representations of information – data visualisations, or datavis – are useful ways of communicating a lot of information quickly. And because creating them requires a modicum of expertise, they are often deployed as gotchas: “Quiver, mortal, as I blow your puny argument out of the water with my BAR CHART!”

The trouble is, in the wrong hands, datavis is as susceptible to abuse as any other mode of expression.

Be sceptical of surveys

Polling firms are businesses. Businesses serve the needs of customers. And customers have political, or commercial interests, which do not necessarily align with yours, or society’s. (Moreover, it seems an increasing number of polling firms have agendas of their own.)

Pollsters regularly use samples that are too small, fail to publish their methodology, and use daft or leading questions. Even broadly decent organisations like the WHO are not above such silliness.

One of the questions in the survey was “Have you ever tried alcohol?” 57% of 15-year-olds in the UK said they had. The WHO then quoted this answer, in the press release (which is all most time-strapped journalists ever read), under the heading “Alcohol use widespread”.

Suddenly, sipping a shandy once on a family visit to a pub garden is lumped in together with downing a bottle of Jack Daniel’s a day. Furthermore, we have no way of knowing whether these answers were completely honest. How many British 15-year-olds would be embarrassed to admit they’d never tried booze?

Polls can be tools to shape opinion as much as reflect it, first because they can influence government policies, and second because waverers in the general populace can be won over to what they perceive to be the majority view.

I could caution you to be wary of surveys that aren’t upfront about their methodology, surveys with a small sample size, surveys conducted by firms with murky political connections, or surveys whose funding is not declared. But to keep things simple: ignore polls.

Are those figures really significant?

Something else that should set the alarm bells ringing, along with big numbers, is long strings of numbers, as seen in this article.

“The data released on Monday, from the Chinese ministry of public security, showed the number of new birth registrations in 2020 was 10.035 million, compared with 11.8 million in 2019.”

The second figure in this sentence is expressed with three significant figures: 1, 1, 8. So why is the first given to five significant figures? Did data collection methods become a thousand times more reliable in a year?

Most sums bandied around in the public domain – especially those derived from polls, but also anything involving average values, like fuel prices, which are also estimated using samples– are only approximations to begin with. That is, the true value may deviate from the estimated value by 1% or more.

Say 78.5% of 1,000 people surveyed think Dominic Cummings is a giant Gollum-faced twat, and about a third of those want to punch him in his stupid Gollum face. A sizeable proportion of reporters these days would whip out their calculators and proudly conclude that 26.1666666% of all people want to assault Specsavers Boy. While that’s mathematically precise, it’s not accurate (it can’t be, unless there’s a fraction of a person out there somewhere who wants to lay Cummings out). To say anything beyond 26% is meaningless and misleading.

Similarly, if you’re performing an operation on a quantity that’s already been rounded, then it’s senseless to use more significant figures for the result.

“A slew of commercial and critical hits, including The Super Mario Bros Movie, which made $1.36bn (£1.094bn) at the global box office, has led to market experts comparing them to Marvel adaptations.”

Long strings of numbers are invariably a sign of false precision. If a politician, journalist or broadcaster is being hyper-precise with their figures in this way, they’re not necessarily consciously lying to you. But they are conveying an important truth: while they may know how to to type numbers on a keypad, and even use basic mathematical operations, they haven’t a clue how statistics works, and therefore can’t be trusted to properly understand, verify or convey the information they’ve been given.

On a related point, thanks to the uncertainty inherent in big data, running news stories about a “rise” or “fall” in something when the change is infinitesimal is just. Plain. Wrong.

In January 2018, the BBC published an article claiming that unemployment in the UK had fallen by 3,000 to 1.44 million.

That’s a whopping drop of 0.2%. But there’s no way there isn’t at least 0.5% room for error in these figures – so it may well be the case that unemployment has risen slightly. What you’re looking at here is not a news story; it’s a rubber-stamped government press release.

Why aggregates don’t add up

A few years ago, a newspaper I worked for (rightly) banned the practice of adding together jail sentences in the headlines of articles on court cases with multiple defendants. You know the sort of thing: “Members of Rochdale paedophile ring sentenced to total of 440 years”. The reasoning was that it was a) sensationalist and b) meaningless.

Because, uh, how many people were involved? (Sure, you could work it out by reading the article, but that’s an extravagance that fewer and fewer people seem to willing to stretch to.) Moreover, how do those numbers break down? If 48 people were involved, did four get put away for 55 years, and the other 44 for five? Or was the punishment more evenly spread, and they got just over nine years each?

Similar practices, however, still abound in other areas.

“UK homeowners face £19bn rise in mortgage costs as fixed-rate deals expire”

Wow, that’s going to put a dent in the holiday fund! Oh wait, they mean all UK mortgagors combined. But … context. How many people even have mortgages in the UK?

Recent figures suggest about 15.5 million homes in England and Wales are occupied by their owners, of which just under half are mortgaged. (There are separate figures for Scotland and Northern Ireland, but they’re relatively small and for our current purposes can be disregarded.) That means on average, mortgage payments would rise by about £2,600 per year per household, or £217 a month. Woop. That’s how much my rent just went up by.

A deeper dive into the figures reveals that fewer than a million households were facing monthly rises of £500 or more by 2026. Not half as sexy as the £19bn figure (and certainly not deserving of the lead slot on the front page of a global news provider), but twice as informative.

Unhappy mediums

People toss the word “average” around a lot, but as you may dimly recall from your schooldays, in the mathematical sphere, there are three distinct types: the mean, the median, and the mode. While they often give similar results, there’s sometimes significant divergence, and one kind of average is often more useful than another.

Take wages. Using the mean on a given group of people (adding up all the salaries and dividing that figure by the number of subjects) isn’t always terribly informative, because if the variance in wages is high, extreme figures skew the picture. Let’s say you have 10 people: two earn £10,000 a year, seven earn £20,000 a year, and one earns £200,000 a year. Calculating the mean would give you ((2 x £10,000) + (7 x £20,000) + (1 x £200,000))/10 = £36,000, which is a million miles from what any of the participants actually earn. The median, however – the figure in the middle if you line them up from smallest to largest – gives you £20,000, which is a much better reflection of the situation. (The mode – the figure that occurs most frequently – in this case gives the same result.)

So it’s vital to know, when someone is talking about averages, which kind they median.

Pushing your panic buttons

Barely a week goes by without the Daily Mail’s health pages shrieking about the latest thing that gives you cancer. They’re usually drawing on a “landmark report” – that is, a press release from a no-mark university – and they’re almost always lying with numbers.

The headline “Eating bacon increases your chances of getting cancer by 18%” is quite alarming, but remember, this is a relative risk, compared with the chances of someone who doesn’t eat bacon. It turns out that the absolute probability of succumbing to cancer among non-bacon eaters is pretty low – about six in 100 will get bowel cancer in their lifetimes – so an 18% increase on that doesn’t actually represent that big a jump. The unimaginable will strike only seven in 100 bacon eaters.

(There’s a fab and doubtless far from complete list of everything the Daily Mail says can give you cancer here, although the links are a bit screwy.)

Proportional misrepresentation

Some news organisations have improved their efforts in this department lately, but it’s a pit they still fall into depressingly often.

Before it was spotted and corrected, an article published in 2021 about the impact of Covid on education said: “While there was an across-the-board fall of a fifth in the proportion of children working at a level consistent with their age, those pupils in year 1 in 2019-20 appear to have suffered the most significant losses … 81% of year 1 pupils achieved age-related expectations in March 2020 … by the summer of 2020, this had dropped to 60%.”

The reporter is starting from the wrong baseline. The actual numbers are irrelevant, but for the sake of argument, let’s say there were 100 kids. If 81% of them (ie 81 kids) met the requirements in March and only 60% in June, that’s a fall of 21 percentage points, not 21 per cent. Comparing the new figure with the baseline, 81, gives a drop of a quarter rather than a fifth.

If you lack confidence in your ability to check percentages, use an online percentage checker, like this one: https://percentagecalculator.net/

Unusual? Suspect

I’m singling out the Mirror here, but virtually all the major news outlets reported this story in the same uncritical fashion. “The Royal National Lifeboat Institution has raised more than £200,000 in a single day … Its donations had increased by 2,000% from Tuesday, when it raised just £100.”

The alpha numerics among you will notice that the Mirror – and most other news providers – got their basic maths wrong here: £200,000 is an increase of not 2,000%, but almost two hundred thousand per cent on £100. But that’s not my main gripe.

The Mirror reporters (or should I say, the writers of the RNLI’s press release) have compared the latest figure with the figure from the day before – which ordinarily would not be a problem. However, we’re dealing here with not one, but two highly unusual days. Later in the piece, we discover that the average daily donation to the RNLI is not £100 (a very low outlier for the lifeboat folk), but £7,000 – a much more instructive figure against which to stand today’s total.

The most useful way to present the information would be “£200,000, around 30 times the average daily donations that RNLI receives”– but once again, the drive for a sexy headline has trumped all considerations of sense.

Finktanks

It doesn’t matter if it’s a study, a survey, a graph or a sweetie. Show nothing but scorn to anything that comes from a self-declared “thinktank” that refuses to declare its funding. The list currently includes, but is by no means limited to, the TaxPayers’ Alliance, the Adam Smith Institute, Civitas, Policy Exchange, the Centre for Policy Studies and the Institute for Economic Affairs. All, front organisations set up to advance the cause of neoliberal economics by whatever means necessary, are proven experts in weasel words, sharp practice and low-quality “studies”.

Things that should make you go “Hmm”

If you’re baffled as to why I’ve spent so much time droning on about this tedious statistics malarkey, it’s because it’s really fucking important to know when people are lying to you with numbers.

An awful lot of what’s wrong with the UK today – high prices, low pay, crumbling services, the erosion of workers’ rights, medicine shortages, rivers full of shit – has come about at least in part because people have failed to robustly challenge the falsehoods and of politicians, thinktanks and the media.

Some will shrug and say, “Meh, politicians have always lied, and things have always worked out OK.”

But disinformation is now being pumped out on a scale beyond anything we’ve ever seen. Whereas just a few years ago, politicians would do the honourable thing and resign if they were caught lying, now they’re happy to do so repeatedly, on TV, on social media, in parliament.

Campaign organisations and rogue nations are pouring unprecedented resources into their propaganda ops, much of it targeting people directly through social media and thus bypassing all scrutiny. Soon AI will be churning this stuff out faster than checkers can find it, never mind check it. All at a time when our traditional defences against disinformation are collapsing.

And because of our lack of confidence with numbers, it’s the statistical lies that are most likely to slip through.

If that sounds scary … well, good. You should be scared. But don’t panic. What I’ve been trying to communicate with these posts is that spotting this sort of deviousness isn’t as hard as you think. 

Just bear the above points in mind. Don’t assume that something’s true just because a source you personally approve of published or repeated it. Is the source reliable? Does this claim tally with what others say? Do these numbers support a particular political agenda rather too neatly?

Or to boil it down to one rule of thumb: if a number seems too good or too interesting to be true, it almost certainly is.