This week The Economist has an interesting article, Unreliable research: trouble at the lab, on the worrying level of poor quality scientific research, and weak mechanisms for correcting mistakes. Recently a drug company, Amgen, tried to reproduce 53 key studies in cancer research, but could get the original results in six. This does not appear to be untypical in attempts to reproduce research findings. The Economist points to a number of aspects of this problem, such as the way in which scientific research is published. But of particular interest is how poorly understood is the logic of statistics, not only in the world at large, but in the scientific community. This is, of course, applies particularly to the economic and social science research so beloved of political policy think tanks.
One particular aspect of this is the significance of a concept generally known as “prior probability”, or just “prior” for short, in interpreting statistical results. This is how inherently likely or unlikely a hypothesis is considered to be, absent any new evidence. The article includes an illustrative example. Hypotheses are usually tested to a 95% confidence level (a can of worms in itself, but let’s leave that to one side). Common sense might suggest that this means that there only a 5% chance of a false positive result – i.e. that the hypothesis is incorrect in spite of experimental validation. But the lower the prior (i.e. less inherently probable), the higher the chance of a false positive (if a prior is zero, at the extreme, no positive experimental result would convince you, as any positive results would be false – the result of random effects). If the prior is 10% there is a 4.5% inherent probability of a false positive, compared to an 8% change of a true positive. So there is a 36% chance that any positive result is false (and, for completeness, a 97% chance that a negative result is truly negative). Very few
The problem is this: an alternative description of “low prior” is “interesting”. Most of the attention goes to results with low priors. So most of the experimental results people talk about are much less reliable than many people assume – even before other weaknesses in statistical method (such as false assumptions of data independence, for example) are taken into account. There is, in fact, a much better statistical method for dealing with the priors problem, called Bayesian inference. This explicitly recognises the prior, and uses the experimental data to update it to a “posterior”. So a positive experimental result would raise the prior, to something over 10% in the example depending on the data, while a negative one would reduce it. This would then form the basis for the next experiment.
But the prior is an inherently subjective concept, albeit one that becomes less subjective as the evidence mounts. The scientific establishment hates to make such subjective elements so explicit, so it is much happier to go through the logical contortions required by the standard statistical method (to accept or reject a null hypothesis up to a given confidence level). This method has now become holy writ, in spite of its manifest logical flaws. And , as the article makes clear, few people using the method actually seem to understand it, so errors of both method and interpretation are rife.
One example of the scope for mischief is interesting. The UN Global Committee on Climate Change presented its conclusion recently in a Bayesian format. It said that the probability of global warming induced by human activity had been raised from 90% to 95% (from memory). This is, of course, the most sensible way of presenting its conclusion. The day this was announced the BBC’s World at One radio news programme gave high prominence to somebody from a sceptical think tank. His first line of attack was that this conclusion was invalid because the standard statistical presentation was not used. In fact, if the standard statistical presentation is appropriate ever, it would be for the presentation of a single set of experimental results, and even that would conceal much about the thinness or otherwise of its conclusion. But the waters had been muddied; our interviewer, or anybody else, was unable to challenge this flawed line of argument.
Currently I am reading a book on UK educational policy (I’ll blog about it when I’m finished). I am struck about how much emphasis is being put on a very thin base of statistical evidence – and indeed how statistical analysis is being used on inappropriate questions. This seems par for the course in political policy research.
Philosophy and statistics should be part of very physical and social sciences curriculum, and politicians and journalists should bone up too. Better than that, scientists should bring subjectivity out into the open by the adoption of Bayesian statistical techniques.