Observing many researchers using the same data and hypothesis reveals a hidden universe of uncertainty
https://www.pnas.org/doi/10.1073/pnas.2203150119
This study explores how researchers’ analytical choices affect the reliability of scientific findings. Most discussions of reliability problems in science focus on systematic biases. We broaden the lens to emphasize the idiosyncrasy of conscious and unconscious decisions that researchers make during data analysis.
We coordinated 161 researchers in 73 research teams and observed their research decisions as they used the same data to independently test the same prominent social science hypothesis: that greater immigration reduces support for social policies among the public. In this typical case of social science research, research teams reported both widely diverging numerical findings and substantive conclusions despite identical start conditions.
Researchers’ expertise, prior beliefs, and expectations barely predict the wide variation in research outcomes. More than 95% of the total variance in numerical results remains unexplained even after qualitative coding of all identifiable decisions in each team’s workflow.
This reveals a universe of uncertainty that remains hidden when considering a single study in isolation. The idiosyncratic nature of how researchers’ results and conclusions varied is a previously underappreciated explanation for why many scientific hypotheses remain contested. These results call for greater epistemic humility and clarity in reporting scientific findings.
Results from our controlled research design in a large-scale crowdsourced research effort involving 73 teams demonstrate that analyzing the same hypothesis with the same data can lead to substantial differences in statistical estimates and substantive conclusions. In fact, no two teams arrived at the same set of numerical results or took the same major decisions during data analysis. Our finding of outcome variability echoes those of recent studies involving many analysts undertaken across scientific disciplines. The study reported here differs from these previous efforts because it attempted to catalog every decision in the research process within each team and use those decisions and predictive modeling to explain why there is so much outcome variability.
Despite this highly granular decomposition of the analytical process, we could only explain less than 2.6% of the total variance in numerical outcomes. We also tested if expertise, beliefs, and attitudes observed among the teams biased results, but they explained little. Even highly skilled scientists motivated to come to accurate results varied tremendously in what they found when provided with the same data and hypothesis to test. The standard presentation and consumption of scientific results did not disclose the totality of research decisions in the research process. Our conclusion is that we have tapped into a hidden universe of idiosyncratic researcher variability.