Psychological Research May Not Have Replication Crisis At All

Research in the field of psychology may not have replication crisis at all, says psychologists.

Replication or repeating experiments to validate original study results is a crucial factor in the scientific process. Efforts to enhance the reproducibility of science is an essential part of the accurate knowledge flow.

The psychology community was shocked when a paper from the Open Science Collaboration (OSC), published in the journal Science on Aug. 28, 2015, concluded that reproducibility of psychological research is surprisingly low.

The Original Study

First off, let us review what the original study said.

The researchers, led by Brian Nosek, repeated 100 experimental and correlational studies published in three psychology journals. According to the authors, they used high-powered designs and original materials when possible.

The findings show that only 39 out of the 100 replication efforts were successful.

There was no single factor that adequately showed replication success and the five indicators investigated were not the only means to determine reproducibility, the researchers noted.

In the end, the conclusion was clear: majority of the replications generated weak evidence of the original results even if the investigators used the materials given by the original researchers, performed advanced reviews and applied advanced statistical prowess.

The Rebuttal

Almost seven months after the study was published, a group of psychologists issued their own paper to comment on the attention-grabbing research. In the paper, they made three key points that condemn the conclusions in the study from the OSC.

Point No. 1: Error

First, the group explained the issue of error. They say that there are various errors entailed in replicating studies.

For example, if an original research documents a true effect and the replication effort uses the same methods but with a new group of subjects drawn from the original participants, then that alone creates a sampling error already.

They cited one example wherein the OSC replicated an original study asking the participants to imagine that they are being called by a professor on a group who had never been to college.

This and a lot more infidelities may cause errors, which according to the authors were not included in the OSC's benchmark.

Point No. 2: Power

The second point that Daniel Gilbert and colleagues pointed out is the power to carry out the experiments.

Given the potential errors stated above, it is valuable to conduct multiple replications to determine how much of the results of a single replication are achieved by chance. Because the OSC was not able to have that information, Nosek referred Gilbert's group to his other project, which is the "Many Labs" project (MLP).

The OSC replicated 100 studies for one time only. The result was an uninspired 47 percent successful replication.

MLP, however, replicated each of the studies 35 or 36 times. Such powerful technique yielded a whopping 85 percent successful replication.

Point No. 3: Bias

Prior to its replication efforts, the OSC contacted the original authors and asked if they would endorse reusing the original methods to the replication studies. Only 69 percent gave their endorsement.

Gilbert's group compared the rates of replication between the endorsed and unendorsed methods. They found that the endorsed protocols yield successful replications four times more than the unendorsed ones, with a 59.7 percent and 15.4 percent success rate respectively.

This suggests that the infidelities did not only produce random errors but also created a bias that led toward replication failure.

Conclusion

While Gilbert and colleagues applaud the efforts to enhance the field of psychology in a cautious, trustworthy and efficient manner, and appreciate the work that went into the OSC, they nonetheless believe that "metascience is not exempt from the rules of science."

OSC used a benchmark that did not consider numerous sources of data error. They also did not use a powerful enough design, underestimating the true rate of replication. Lastly, they allowed significant infidelities that almost absolutely biased the replications in the direction of failure.

"As a result, OSC seriously underestimated the reproducibility of psychological science," the authors write.