Lies and statistics

This week The Economist has an interesting article, Unreliable research: trouble at the lab, on the worrying level of poor quality scientific research, and weak mechanisms for correcting mistakes. Recently a drug company, Amgen, tried to reproduce 53 key studies in cancer research, but could get the original results in six. This does not appear to be untypical in attempts to reproduce research findings. The Economist points to a number of aspects of this problem, such as the way in which scientific research is published. But of particular interest is how poorly understood is the logic of statistics, not only in the world at large, but in the scientific community. This is, of course, applies particularly to the economic and social science research so beloved of political policy think tanks.

One particular aspect of this is the significance of a concept generally known as “prior probability”, or just “prior” for short, in interpreting statistical results. This is how inherently likely or unlikely a hypothesis is considered to be, absent any new evidence. The article includes an illustrative example. Hypotheses are usually tested to a 95% confidence level (a can of worms in itself, but let’s leave that to one side). Common sense might suggest that this means that there only a 5% chance of a false positive result – i.e. that the hypothesis is incorrect in spite of experimental validation. But the lower the prior (i.e. less inherently probable), the higher the chance of a false positive (if a prior is zero, at the extreme, no positive experimental result would convince you, as any positive results would be false – the result of random effects). If the prior is 10% there is a 4.5% inherent probability of a false positive, compared to an 8% change of a true positive. So there is a 36% chance that any positive result is false (and, for completeness, a 97% chance that a negative result is truly negative). Very few

The problem is this: an alternative description of “low prior” is “interesting”. Most of the attention goes to results with low priors. So most of the experimental results people talk about are much less reliable than many people assume – even before other weaknesses in statistical method (such as false assumptions of data independence, for example) are taken into account. There is, in fact, a much better statistical method for dealing with the priors problem, called Bayesian inference. This explicitly recognises the prior, and uses the experimental data to update it to a “posterior”. So a positive experimental result would raise the prior, to something over 10% in the example depending on the data, while a negative one would reduce it. This would then form the basis for the next experiment.

But the prior is an inherently subjective concept, albeit one that becomes less subjective as the evidence mounts. The scientific establishment hates to make such subjective elements so explicit, so it is much happier to go through the logical contortions required by the standard statistical method (to accept or reject a null hypothesis up to a given confidence level). This method has now become holy writ, in spite of its manifest logical flaws. And , as the article makes clear, few people using the method actually seem to understand it, so errors of both method and interpretation are rife.

One example of the scope for mischief is interesting. The UN Global Committee on Climate Change presented its conclusion recently in a Bayesian format. It said that the probability of global warming induced by human activity had been raised from 90% to 95% (from memory). This is, of course, the most sensible way of presenting its conclusion. The day this was announced the BBC’s World at One radio news programme gave high prominence to somebody from a sceptical think tank. His first line of attack was that this conclusion was invalid because the standard statistical presentation was not used. In fact, if the standard statistical presentation is appropriate ever, it would be for the presentation of a single set of experimental results, and even that would conceal much about the thinness or otherwise of its conclusion. But the waters had been muddied; our interviewer, or anybody else, was unable to challenge this flawed line of argument.

Currently I am reading a book on UK educational policy (I’ll blog about it when I’m finished). I am struck about how much emphasis is being put on a very thin base of statistical evidence – and indeed how statistical analysis is being used on inappropriate questions. This seems par for the course in political policy research.

Philosophy and statistics should be part of very physical and social sciences curriculum, and politicians and journalists should bone up too. Better than that, scientists should bring subjectivity out into the open by the adoption of Bayesian statistical techniques.

Positive linking: what do networks mean for public policy?

Ipositive linkingndependent and identically distributed. This assumption about data subject to statistical analysis is so routine that most students reduce it to the acronym “IID”. It means that the data follows a normal distribution and a routine set of analytical tools becomes available for the calculation of such things as confidence levels. Most of the evidence used by economists and other social scientists to support their theories is based on this type of analysis, and an IID assumption in the data. And yet human societies do not behave in accordance with this assumption; most of the choices we make are based on choices that other people have made, and are not independent. They are subject to network effects. It is a problem that most academic economists would rather not acknowledge. But the implications are profound.

This reflection comes to me after reading the book Positive Linking by Paul Ormerod. In this book Mr Ormerod attempts to show that all modern economics is deeply flawed because it ignores network effects, and that in future public policy should promote “positive linking”: promotion through network connections, rather than simply the design of incentives. He is only tangentially concerned with my worry over statistical analysis: he is more focused with the models built by economists based on rational people (or agents in the jargon) making independent choices based on an analysis of their options and preferences. These theoretical models lie behind the bulk of modern economic analysis, such how people might respond to taxes or changes to interest rates.

Unfortunately it is a very disappointing piece of writing. The language flows well enough, but it is full of repetition and digression. This sort of style probably works better orally than on the page, where it is a drag. But it is worse than that. His main concern seems to be to debunk conventional economic analysis rather than to promote a clearer understanding of networks and their implications. This verges on the unhinged sometimes, and you do not get the impression that arguments of the defenders of conventional economics get a fair hearing, and therefore that they are dealt with adequately. There are a lot of illustrations and “evidence”, but these are used anecdotally rather than to build up a coherent logical case. There are many digressions, for example about the rise of Protestantism in Tudor England. These seem to be included because they are good stories rather than taking his argument forward. The debunking of conventional economics is all rather old hat, though, and it has been done more coherently and entertainingly by authors such as Nasim Nicholas Taleb (of Black Swan fame).

The diatribes and digressions leave Mr Ormerod with inadequate space to develop his “twenty-first century model of rational behaviour”. His suggestions about how this might work in practice are left to a few pages at the end, and even this tends to drift into diatribes over how things are done now. For example he claims that sixty years of centralised, big-state social democratic government since the War has been a failure – on the grounds that unemployment is much the same on average as beforehand. But you can easily argue that this is the most successful period of public government in world ever – look at the rise in life expectancy, for example. Neither is it all that clear that everything, or even most things, these governments did was based on conventional economic models of human behaviour. Instead of explaining the religious dynamics of 16th Century England, he could have spent some time and space developing his argument here.

What a pity: because in the end I think he is right, and his suggestions for the way forward are sound. It isn’t that government since the War has failed, it is that its methods have run their course, and its policies now only seem to benefit an elite. Conventional economic analysis has more going for it than he suggests, but they are a blind alley now. But many economists and policy makers are in denial, to judge by the public debate – though some clear network-based ideas, like “nudge” theory are making their presence felt.

But there is a problem at the heart of the new twenty-first century network thinking, which Mr Ormerod acknowledges but dismisses too easily. The new models have weak predictive power. The point about normal distributions and the IID assumption that they are based on is that they produce a relatively tight distribution of data around a mean and few extreme results – “thin tails” in the jargon. There is a sleight of hand here: statisticians’ use of randomised data make their analysis sound more robust than it is; the IID assumption in fact makes the data tightly constrained. Consider a random walk, comprising a series of steps forward and equal steps backward. If the probability that your next step will be forward or backward is always 50% each, and the direction of earlier steps does not affect the direction of the next step, then this is an IID assumption. It sounds truly random. But you are unlikely to get very far from the starting point, which isn’t really very random at all. If your next step was more likely to be in the same direction as your last step than not, then you can end up anywhere. That’s real randomness, but it isn’t IID. There is no normal distribution. What looks like a soft assumption is in fact a hard one.

So it’s not just a question of changing the maths and updating the models. It is about accepting that social systems are fundamentally more unpredictable than we have previously accepted. It is not hard to see why policy makers and social scientists have struggled to accept this. I like to describe this by invoking the idea of “zeitgeist” – the spirit of the time, a ephemeral and unpredictable quality that in fact runs at the heart of everything. This is closely linked to Mr Ormerod’s ideas of networks, since it is networks that sustain the zeitgeist.

What to do? Mr Ormerod offers some useful rules of thumb. He also suggests investing more into research of network effects, which is self-interested but sensible, so long as we do not expect this to yield insights of anything like the theoretical precision of conventional methods. But ultimately his big idea, which he does woefully little to develop, is much greater delegation and localisation of decision making. Amen to that.