On Scientific Replication

Within the course of a few days, a series of articles appeared on the Web that addressed the issue of replicating scientific findings. Most of the articles dealt with psychological investigations, but not exclusively. For example, the journal Science withdrew a political science study because of concerns about faked data.

And in a short note in The Lancet (4/23/15) Richard Horton claims that much of science is untrue. He puts it this way:

“The case against science is straightforward: much of the scientific literature, perhaps half, may simply be untrue. Afflicted with studies with small sample sizes, tiny effects, invalid exploratory analysis, and flagrant conflicts of interest, together with an obsession for pursuing fashionable trends of dubious importance, science has taken a turn toward darkness.

Later Horton levels a broadside against tests of statistical significance. “Our love of “significance” pollutes the literature with many a statistical fairly tale.”

Questionable research findings that are eventually retracted are more prevalent than you might imagine. Bourree Lam reports (Atlantic September 2015) a study by Ivan Oransky and Adam Marcus of 2,047 retractions in the Proceedings of the National Academy of Sciences. They found that only 21.3 percent stemmed from error, while 67.4 percent resulted from “misconduct” that included fabrication, faked data and interpretive bias.

Benedict Carey reports in the Times (8/27/15) a major study by Brian Nosek and his team of researchers at the Center for Open Science. Carey writes:

… a painstaking years-long effort to reproduce 100 studies published in three leading psychology journals has found that more than half of the findings did not hold up when retested.

The importance of replicating scientific research cannot be over emphasized. To confirm a finding strengthens our confidence in it. Yet journals are reluctant to publish replication studies, thus investigators have little if any desire to conduct them. As a result, the problem is simply ignored, until someone like Nosek realizes its importance.

He commented about his findings, “We see this is a call to action, both to the research community to do more replication, and to funders and journals to address the dysfunctional incentives.”

The same seems to be true for medical and biological research. In Don’t Swallow Your Gum, a book about medical myths, Aaron Carroll and Rachel Vreeman note that much of what a doctor diagnoses and prescribes for a particular ailment has not been proven. And by that they mean on the basis of a randomized, controlled experiment, ideally one that has been replicated. But these studies require a great of time and money and so are rarely conducted.

Several other factors are at work. Proper control conditions may have been omitted from the original experiments, the samples may not have been randomly selected or consist of a highly uniform, unrepresentative group of individuals, usually college sophomores.

Or the results may have occurred because of experimenter biases that led to evidence supporting their hypothesis. Few experimenters really design studies to disprove, rather than confirm their hypothesis. This is a point Karl Popper emphasized many years ago.

Then there is the publication biases characteristic of most scientific journals. Researchers who do not report positive outcomes cannot get their findings published. According to one study, ninety-seven percent of psychology studies proved their hypothesis. We know this can’t be the case.

As one investigator (Richard Palmer, a biologist) noted, “Once I realized that selective reporting is everywhere in science, I got quite depressed.”

These were some of the reasons I stopped doing research in psychology and instead, turned to literature where the emphasis is on the particularities of human experience, rather than its generalities.