Tools for Reproducibility

Reproducibility or Replicability

Reproducibility and Replicability have meant different things to different disciplines.  While most individuals are aware that disciplines have identified issues in the areas of reproducibility and replicability, the disciplines themselves have not been in agreement regarding the common description and definition of these two terms.  If you want to start a heated argument, add repeatability to the list!

For a brief discussion of this topic, please see: Plesser, Hans E. "Reproducibility vs. Replicability: A Brief History of a Confused Terminology," in Frontiers of Neuroinformatics, 18 January 2018. https://doi.org/10.3389/fninf.2017.00076

Working definitions are as follows though we recognize that the terms have been used interchangeably among disciplines.

  • Replicability (same experimental setup): The measurement can be obtained with stated precision by a different team using the same measurement procedure, the same measuring system, under the same operating conditions, in the same or a different location on multiple trials. For computational experiments, this means that an independent group can obtain the same result using the author's own artifacts. (Repeatability often refers to the same concept but with the same team of researchers).
  • Reproducibility (Different team, different experimental setup): The measurement can be obtained with stated precision by a different team, a different measuring system, in a different location on multiple trials. For computational experiments, this means that an independent group can obtain the same result using artifacts which they develop completely independently.

Of course, "sameness" of procedures and results can be further defined in different disciplines. For example...

  • The "same experimental procedure” may mean – downloading the original software and data and run it; downloading the original software, compiling it for a different machine and running it with original data; downloading software that carries out the same operations and applying  it to the original data; reading a paper producing a new implementation of the algorithms and running it on the original data; or running any of the above on a refined or updated data set
  • The “Same results” could mean – identical output at the bit level; exactly the same numbers; exactly the same numbers on a similar machine; numbers that are within some bound of error; or outputs that share certain characteristics

And reproducibility can come in a number of flavors -- Computational reproducibility and transparency, Scientific reproducibility and transparency, Computational correctness and evidence, and Statistical reproducibility

Reproducibility and replicability do not refer to one standard set of guidelines but rather are set through individual research communities and organizational standards.

Methods that support these types of activities are considered part of Open Science.

Reproducibility and the Scientific Method

Kitzes, Justin, Daniel Turek, and Fatma Deniz. 2018. The practice of reproducible research: case studies and lessons from the data-intensive sciences. Oakland, CA: University of California suggests a link between early and current research practice. 

Since the 17th century, budding scientists have been trained in the scientific method consisting in systematic observation, measurement, and experiment, and the formulation, testing, and modification of hypotheses. This helps to standardize methodologies and communication between researchers and research groups.

Journal articles have long been an expression of the results of research.  However with the rise of the Internet, sharing of data and code have assumed greater potential for supporting the goals of scientific method.  In an extension of the Kitzes' introduction, the goals of reproducibility and replicability take “ the basic principles of the scientific method that you learned at the lab bench and translate them to your laptop.” ( p. xxii)

For a brief article on scientific method, please see.BROAD, C. Francis Bacon and Scientific Method. Nature 118, 523–524 (1926). https://doi.org/10.1038/118523a0

Relationship to Open Science

Open science typically refers to the process of conducting science with recognition that this process is often collaborative in nature.  In addition, there is also a focus on the need for research communication. Open scholarship is a broader term encompassing Open Science.

What we know today as open science comprises both principles (transparency, reuse, participation, accountability, etc.) and practices (open publications, data-sharing, citizen science, etc.)

Open science is an ambitious goal that aims to ensure the availability and usability of scholarly publications, the data that result from scholarly research, and  the methodology, including code or algorithms, that was used to generate those data.

While complete openness is ambitious, there are many methods and tools that can be employed that can make your research more open.

For more details on Open Science Practices please see the Open Science Training Handbook

Benefits of Reproducible and Replicable Practices

In recent years there have been many strides made to promote reproducibility and replicability practices.  New tools to assist in the process are developed practically every day.

Why might you want to adapt some of these practices?

  •  Rigor and reliability. New standards for data and code sharing in fields make it easier for researchers to reproduce and replicate reported work, thereby strengthening scientific rigor and reliability.
  • Ability to address new questions. Open science and R&R practices allows researchers to bring data and perspectives from multiple fields to bear on their work,opening up new areas of inquiry and expanding the opportunities for interdisciplinary collaboration.
  • Faster and more inclusive dissemination of knowledge. The increase in open publication accelerates the process of disseminating research and building on results. In addition, it allows for more inclusive participation in research and greater collaboration.
  • Broader participation in research. Large-scale projects in fields such as astronomy and ecology are utilizing open data and expanding opportunities for citizen scientists to contribute to scientific advances.
  • Effective use of resources. Reuse of data in fields such as clinical research is facilitating the aggregation of multiple studies for meta-analysis and allows for more effective testing of new hypotheses.
  • Improved performance of research tasks. New tools enable more accurate recording of research workstreams and automate various data curation tasks.
  • Open publication for public benefit.  In the case of publicly funded research, the ultimate sponsor is the taxpayer. The public benefits from open science as new knowledge is utilized more rapidly to improve health, protect environmental quality, and deliver new products and services.

Barriers

It is important to note that there are barriers to full reproducibility and replicability.  By acknowledging these barriers, we can better make informed decisions that move our work toward openness.

  • Costs and infrastructure. There remain significant cost barriers to widespread implementation of open publication and open data. Some disciplines and institutions have helped to remove this barrier; however in other disciplines appropriate responses to open requirements are difficult to identify.
  • Structure of scholarly communications. Most publications are still only available on a subscription basis, and some potential pathways to open publication may disrupt the current scholarly communications ecosystem, including scientific society publishers, or may disadvantage early career researchers, researchers working in the developing world, or those in institutions with fewer resources.
  • Lack of supportive culture, incentives and training. Open practices such as preparing datasets and code for sharing and making preprints available are not generally rewarded and may even be discouraged by current incentive and reward systems. This may have the unintended consequence of causing a disadvantage to early career researchers.
  • Privacy, security, and proprietary barriers to sharing. Sharing data, code, and other research products is becoming more common, but barriers related to ensuring patient confidentiality and the protection of national security information exist in some domains. Proprietary research also presents barriers. Ultimately, some parts of the research enterprise may not be open.
  • Disciplinary differences. The nature of research and practices surrounding treatment of data and code differ by discipline and even within a discipline. The size of datasets and the nature of some data may prevent immediate, complete sharing. Safeguards to prevent misuse or misrepresentation of data will be needed.