The terminology ‘repeatability, replicability, reproducibility’ has multiple uses, multiple meanings and different domains of application.
The first conceptual clarification that the project will bring into the debate involves the distinction between measurements that are nonreplicable for some contingent factors and measurements that are nonreplicable because of a lack of certain intrinsic features. For instance, a measurement presented in the literature may be not replicable because of a lack of transparent communication (the scientists fail to disclose some relevant information about the details of the measurements); or it may not be replicable because the procedures involve subjective or arbitrary elements that elude replication (for instance, they are based on participant observations). Moreover, some measurements may not be replicable because their objects of investigation are extremely perishable and rare, or because they are constituted by new unique organisms (for instance, the first case of cloned animals or a single instance of disease).
Along these lines, another aspect that will be taken into consideration in this project is that replicability may refer to measurements that, if replicated, lead to the same results, but it may also refer to measurements that can, in principle, be made again (without necessarily meaning that they will lead to the same results). The first meaning of replicability focuses on the result or outcome of the measurement, while the latter focuses on the procedure. So, whenever scientists say that a certain experiment is replicable, they might mean either that that experiment, when repeated, leads to the very same results, or that it can, in principle, be repeated because we have all the necessary information and conditions to recreate the same measurement process. In the first case, we refer to result replicability, and in the second to methodological replicability.
While these terminological remarks may lack philosophical depth, other kinds of clarifications are of great philosophical interest and will be investigated.
The first concerns the distinction between the terms ‘repeatability’, ‘replicability’, ‘reproducibility’. While in some literature they have for the most part been used indistinguishably, as if they referred to the same concept, it is certainly the case that we need different terminologies for different concepts. Moreover, they have been used differently across different disciplines: thus, we not only need a clear definition of each term but also a definition that can work equally well for all scientific disciplines, so as to avoid excessive localism and parochialism.
According to the proposal offered in Plesser 2018, repeatability, replicability and reproducibility should be differentiated in the following way:
Repeatability (Same team, same experimental setup): The measurement can be obtained with stated precision by the same team using the same measurement procedure.
Replicability (Different team, same experimental setup): The measurement can be obtained with stated precision by a different team using the same measurement procedure.
Reproducibility (Different team, different experimental setup): The measurement can be obtained with stated precision by a different team but with a different measuring system. (Plesser 2018, p. 76-77).
A second proposal (Milkowski et al. 2018) drops the distinction between repeatability and replicability as meaningless and is based on only two notions: direct and conceptual replications. While direct replications consist of recreating the very same experimental setup used in a previous study, and follows slavishly the very same procedure, conceptual replications aim to achieve the very same results by pursuing an original and novel procedure, with different experimental setups and instruments. Given this definition, one question would be whether there is a grey area between the two definitions, as it may be claimed that there is a continuum between replicability and reproducibility. To support this view, one may point out that it is almost never possible to replicate the very same experiment in the very same way, by keeping the same measurement procedures, same instruments, same location and same objects. In this regard, the difference between replicability and reproducibility may be drawn from the intention of the experimenter.
Given these two terminological characterizations, the project will inquire which of the two proposals is optimal, and thus whether it would be optimal to drop the term ‘repeatability’. In order to reach an answer, we will investigate whether it is possible to have measurements that are repeatable but not replicable. In particular, this could be the case in psychology, where the personality of psychologists may influence the outcome of a patient’s survey, or in medicine, where some measurements are based on the doctor’s physical skills (ability to hear heart beats, or phlegm in the lungs). Another question to be asked is whether it is possible to have measurements that are replicable but not repeatable. This may happen, for instance, whenever the measurement damages or changes the experimenters in such a way that they cannot conduct the experiment again. The P.I.’s working hypothesis is that there is no complete overlap between repeatable and replicable experiments, and this would lead to the conclusion that it is better to keep repeatability as a third distinct conception.
Finally, a further goal of the first part of this project is to check whether this universal distinction can be meaningfully applied in different disciplines, by looking closely at how these terms are used in different disciplines and by depicting the specificities and needs of each. One problem we anticipate is that it overlooks measurements in theoretical science. For instance, a computer or data scientist may find it problematic to adopt a terminology based on experimental setups.