(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .



Open Science 2.0: Towards a truly collaborative research ecosystem [1]

['Robert T. Thibault', 'Meta-Research Innovation Center At Stanford', 'Metrics', 'Stanford University', 'Stanford', 'California', 'Unites States Of America', 'Olavo B. Amaral', 'Institute Of Medical Biochemistry Leopoldo De Meis', 'Universidade Federal Do Rio De Janeiro']

Date: 2023-10

Conversations about open science have reached the mainstream, yet many open science practices such as data sharing remain uncommon. Our efforts towards openness therefore need to increase in scale and aim for a more ambitious target. We need an ecosystem not only where research outputs are openly shared but also in which transparency permeates the research process from the start and lends itself to more rigorous and collaborative research. To support this vision, this Essay provides an overview of a selection of open science initiatives from the past 2 decades, focusing on methods transparency, scholarly communication, team science, and research culture, and speculates about what the future of open science could look like. It then draws on these examples to provide recommendations for how funders, institutions, journals, regulators, and other stakeholders can create an environment that is ripe for improvement.

Abbreviations: ARRIVE, Animal Research Reporting of In Vivo Experiments; BIDS, Brain Imaging Data Structure; CONSORT, Consolidated Standards of Reporting Trials; COS, Center for Open Science; EQUATOR, Enhancing the QUAlity and Transparency Of health Research; FAIR, findable, accessible, interoperable, and reusable; FDA, Food and Drug Administration; ISRCTN, International Standard Randomised Controlled Trial Number; MDAR, Materials, Design, Analysis, and Reporting; NIH, National Institutes of Health; OSF, Open Science Framework; OSTP, Office of Science and Technology Policy; PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses; REF, Research Excellence Framework; RRFP, Registered Report Funding Partnership; RRID, Research Resource Identifier; SciELO, Scientific Electronic Library Online; TACS, Transparency in Author Contributions in Science; TOPS, Transform to Open Science; UKRN, UK Reproducibility Network; WHO, World Health Organization

Competing interests: We have read the journal’s policy and the authors of this manuscript have the following competing interests: AB is an Academic Editor for PLOS Biology. AB is a founder of SciCrunch Inc, a company that works with publishers to improve the representation of research resources in scientific literature. She is also a member of the board, and serves as the CEO. The terms of this arrangement have been reviewed and approved by the University of California, San Diego in accordance with its conflict of interest policies. OBA is a member of the eLife Global South Advisory Committee and co-founder of the Brazilian Reproducibility Network. N.D. is an external consultant and animal welfare officer at Medizinisches Kompetenzzentrum c/o HCx Consulting, Brandenburg, Germany. All other authors declare no conflict of interest.

Copyright: © 2023 Thibault et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Realizing this vision requires that we challenge traditional research norms and embrace a collaborative spirit to iteratively improve our research practices and infrastructures. In this sense, we end this Essay with recommendations for how funders, institutions, publishers, regulators, and other stakeholders can foster a research environment that cultivates openness, rigor, and collaboration. We argue for concerted and persistent efforts, supported by sustained public funding mechanisms, that treat open science as a milepost toward a more effective research ecosystem. But first things first: What do we mean by “open science”?

This Essay reviews achievements in open science over the past few decades and outlines a vision for Open Science 2.0, a research environment where the entire scientific process from idea generation to data analysis is openly available. Where researchers seamlessly interact to build on the work of others, and where the research infrastructure and cultural norms have evolved to foster efficient and widespread collaboration. We use this term not simply to suggest a large step forward but to invoke transformational change in the capacity and purpose of a system, as was observed with the Web 2.0.

The past decades have seen a shift in the nature of human communication. With the advent of the World Wide Web, accessing information from across the globe became commonplace. But it was not until the Web 2.0—also known as the participatory web [ 1 ]—that users transformed from passive consumers of information to engaged participants interacting across a dynamic landscape. In a similar vein, the past 20 years have seen information about research become more accessible, through developments like open access and clinical trial registration. More recently, however, open science initiatives are increasingly pushing beyond the goal of simply sharing research products and towards creating a more rigorous research ecosystem. These advancements not only facilitate human collaboration but also enable the development and deployment of automated tools for data synthesis and analysis, which thrive on large quantities of open and high-quality data.

Open science: A primer

A strict definition for open science has yet to emerge, but most explanations overlap substantially. UNESCO has recently defined open science as “an inclusive construct that combines various movements and practices aiming to make multilingual scientific knowledge openly available, accessible, and reusable for everyone, to increase scientific collaborations and sharing of information for the benefits of science and society, and to open the processes of scientific knowledge creation, evaluation, and communication to societal actors beyond the traditional scientific community.” Increasingly, definitions are extending beyond transparency (e.g., sharing of research outputs) to emphasize its downstream goals (e.g., increased collaboration and greater rigor).

Every step of the research process can benefit from openness, including idea generation, study design, data collection, data analysis, results reporting, and related activities such as grant applications, peer review, and policy development. Openness makes the process and outputs of scientific research more available and easier to evaluate. However, openness by itself does not necessarily imply that research is rigorous, collaborative, efficient, equitable, or conducted with societal priorities in mind. Instead, it allows people to more accurately assess these factors.

Open science is an umbrella term that emerged from several parallel initiatives. Open access aimed to make research publications freely available to the public [2–5]. Open source software and open educational resources strived to dissolve access barriers and foster collaborative communities. Meanwhile, the “replication crisis” reached headlines and catalyzed the uptake of open science as a means to improve the trustworthiness of scientific findings [6–9] (see Box 1 for a first-hand account). Many of these initiatives became possible with widespread adoption of the internet and the ability to share large amounts of information across the globe at low cost. They have now coalesced as a multifaceted movement to open up the research process and its outputs [10].

Box 1. A personal journey through the reproducibility timescape A perspective written by Marcus Munafò, co-founder of the UK Reproducibility Network and Associate Pro Vice Chancellor for Research Culture at the University of Bristol. My own experience of the problems of reproducibility began early. During my PhD about 25 years ago, I was unable to replicate a key finding that the literature would have me believe was absolutely robust. This was meant to be the foundation of three years of research, and it did not work! It was only because I was fortunate enough to speak to a senior academic who reassured me that the finding was surprisingly flaky that I did not simply decide I was not cut out for a career as an academic scientist. But that knowledge was hidden from view. More than 20 years later there is far greater awareness of the problem, even if we are still some way from implementing potential solutions. During my postdoctoral career, I started to explore patterns within the published literature such as the decline effect, where the strength of evidence for scientific claims declines over time. I also saw my own field—the study of genetic associations with complex behavioral phenotypes—transform from what was effectively an enterprise in generating noise (the candidate gene era) to one of collaboration, data and code sharing, statistical stringency, and unprecedented replicability (the genome-wide association era). Publications such as “Why Most Published Research Findings Are False” [11,12] reassured me that I was not the only one to see the problems, and that they were not unique to any one field. But my various attempts to draw attention to this didn’t make me popular; one senior scientist dubbed me “Dr No”, and later told me he had assumed I was a curmudgeonly 60-year old statistician, rather than a 30-year old psychologist (I took it as a compliment!). For many years I despaired. Having been talking about the problems for almost 20 years, I have recently found myself focusing much more on potential solutions, and all of the exciting innovations and grassroots enthusiasm for change (particularly among early career researchers). Revolutions happen very slowly, then all at once. Although there is much more to do, it finally feels like we are making progress.

In this Essay, we define Open Science 2.0 as a state in which the research ecosystem meets 2 criteria: the vast majority of research products and processes (i.e., scholarship) are openly available; and scientific actors directly and regularly interact with the openly available scholarship of others to increase research impact and rigor. These collaborative activities would be fostered by appropriate infrastructure, incentives, and cultural norms. These aims appear prominently in recent overviews of open science, including the UNESCO Recommendation on Open Science [10]. We differentiate this state from Open Science 1.0, which we propose as a retronym that meets only the first criteria—widespread openness. We are not implying that current efforts only focus on Open Science 1.0 or that we are close to achieving its more modest goals. Instead, we propose this framework to reflect on how current open science initiatives and cultural norms align with the loftier goals of Open Science 2.0.

Methods transparency The methods section of many publications lacks key information that would be necessary to repeat an experiment. In response to this lack of transparency, researchers across a range of health disciplines have come together to develop standardized reporting guidelines. The EQUATOR Network (Enhancing the QUAlity and Transparency Of health Research) now includes over 500 reporting guidelines for different types of health research. Some of the highly adopted checklists include CONSORT (Consolidated Standards of Reporting Trials) [19,20], ARRIVE (Animal Research: Reporting of In Vivo Experiments) [13,14], and PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) [21]. To achieve their current impact, these guidelines have gone through updates informed by wide-reaching consensus processes. For example, despite the first iteration of the ARRIVE guidelines being endorsed by over a thousand journals [22], they had limited impact on improving transparent reporting, even when authors were explicitly requested to use the ARRIVE checklist [23]. Guidelines were then revised and updated to focus on feasibility and include educational resources and examples. Development of reporting standards is an ongoing process, and some are now being harmonized through initiatives such as the MDAR Checklist (Materials, Design, Analysis, and Reporting) [24,25] and the alignment of guidelines for reporting trial protocols (SPIRIT) and results (CONSORT) [26]. Beyond guidelines that outline what details to include in a publication, research transparency also depends on standardized structures for how to report this information. A few decades ago, catalogs of reagents for biological experiments contained a few hundred listings. A company name and antibody target were generally sufficient to unambiguously identify a reagent. Today, a catalog from a single company can list over 100,000 antibodies, with hundreds of antibodies targeting the same protein. Simply citing a company name and target leaves much ambiguity and, in a surprisingly large percentage of cases, leads scientists to waste money and time trying to optimize the wrong reagent [27–29]. To address the issue, researchers convened meetings and workshops with the editors-in-chief of 25 major neuroscience journals, officers from the US National Institutes of Health (NIH), and representatives of several nonprofit organizations to work on a plan to address the underreporting of reagents. They then proposed a 3-month pilot project in which journals requested that antibodies, organisms, and other tools listed in publications contain the reagent name, catalog or stock number, company name, and Research Resource Identifier (RRID), a reagent identifier that persists regardless of whether companies merge or stock centers move. This RRID initiative [30] is now in its ninth year and over a thousand journals request RRIDs. In 2020, nearly half of published references to antibodies included sufficient information to track the antibody down, a big shift from 15% in the 1990s [31]. By asking researchers to publish RRIDs, researchers were also inadvertently encouraged to double-check their reagents, reducing not only errors in antibodies but also the use of problematic cell lines, with no additional effort on the part of journals [29]. The success of the RRID initiative depended on a dedicated group of volunteers who worked for nearly a decade to overcome an initial unwillingness from actors who held power to make change. The initiative was initially contentious because it added to the workload of journal editors and simply updating author guidelines to request RRIDs proved ineffective. Achieving greater compliance required convincing journals to take an active approach, which depended on the persistence of the RRID Initiative leadership, alongside sufficient infrastructure for authors to easily find their reagents and a helpful helpdesk for when the infrastructure fails to perform as expected. When prominent journals such as Cell began to visibly request RRIDs, the conversation shifted. While we could celebrate the success of the RRID initiative as an example of the benefits of grassroots initiatives, an alternative argument can be made: that similar initiatives would be far more common if supported by standard funding mechanisms and greater stakeholder involvement.

Scholarly communication Publishing technology has undergone remarkable transformations, and scientists can now instantaneously share nearly all aspects of their scholarship with a worldwide audience. However, the academic research community continues to treat journal articles as the principal way of sharing research and efforts for change generally remain tied to this journal-centric system. One unfortunate legacy of the print era—when publishing was expensive and limited in length and structure—is that publications often serve as an advertisement of research rather than a complete record of the research process and outcomes [32]. This state of affairs, combined with an incentive structure that rewards groundbreaking and positive findings, has led to a muddled scientific record that entails irreproducible studies and wasted resources. The past few decades, however, have seen several open science initiatives making stepwise progress toward sharing the components of research. These efforts include preregistration of study design and outcome measures, as well as open sharing of materials, protocols, data, and code. Some disciplines have been much more successful than others in these endeavors. ClinicalTrials.gov and the International Standard Randomised Controlled Trial Number (ISRCTN) were launched in the year 2000 and now contain over half a million registrations. These registries brought transparency to the research process by allowing anyone with access to the internet not only to see what clinical trials were being run but also to have information on the methods, including the study intervention, the inclusion criteria, the outcomes measures of interest, and, increasingly, the results. Their uptake was made possible by funded infrastructure from key organizations such as the US NIH, the European Commission, and the World Health Organization (WHO), and their adoption was fostered by 2 decades of policies from the International Committee of Medical Journal Editors [33], the Declaration of Helsinki [34], and the US Food and Drug Administration (FDA), among others. While the purpose of trial registration was initially to recruit participants and reduce duplication, the infrastructure was iteratively updated. First to make study plans transparent and later to serve as a database of clinical trial results with the aim to reduce selective reporting and wasted research efforts. These updates came with new policies from regulatory agencies, including a requirement for researchers to post their trial results. Notably, policies alone were not enough, and advocacy and external monitoring have been key to press researchers to adhere [35]. Today, most clinical trials are registered and report their results [36–38]. In disciplines beyond clinical trials, preregistration has yet to become standard practice. In psychology, recent estimates for the prevalence of preregistration are lacking, but it likely remains around or below 10% [39,40]. In the social sciences, preregistration prevalence is much lower [41], and in preclinical research, one of the main registries has only 161 registrations as of September 2023 [42–44]. This low prevalence may stem from research protocols in more exploratory fields being less strictly defined in advance as compared to clinical trials. Nevertheless, these disciplines could draw on the experience of clinical trial registration to encourage uptake where applicable and also explore alternative interventions that may prove more viable (e.g., blinded data analysis of electronic health records, as done on OpenSAFELY) [33]. Beyond increasing the uptake of preregistration, we can benefit from ensuring that preregistration is serving its intended purpose. One study found that 2 researchers could only agree on the number of hypotheses in 14% of the preregistrations they assessed [45]. A meta-analysis also found that about one-third of clinical trials published at least 1 primary outcome that was different than what was registered and that these deviations were rarely disclosed [46]. These data underscore the need to acknowledge that, although conversations about preregistration appear to have reached the mainstream, concerted and persistent efforts are needed to ensure their uptake and achieve their intended impacts. Sharing of research data and code has also recently entered mainstream discussions. At the more advanced end of the spectrum, some manuscripts are now entirely reproducible with a button press [47]. However, a recent meta-analysis of over 2 million publications revealed that while 5% to 11% (95% confidence interval) of publications declared to have publicly available data, only 1% to 3% actually had publicly available data [48]. For code sharing, the estimate was <0.5%. The meta-analysis also found that only declarations of data sharing increased over time. Whether shared data are findable, accessible, interoperable, and reusable (FAIR) is yet another question, and some evidence, at least in the field of psychology, suggests that this is often not the case [49,50]. Meanwhile, several national-level funding agencies are quickly moving towards mandating the open sharing of data (US NIH, Canada’s Tri-Agency). While these policies are a step in the right direction, ensuring their success will take substantial effort beyond the policy alone [51,52].

Team science To improve methods transparency and data sharing, we could benefit from employing individuals specialized in these tasks. The predominant model of academic research—where a senior researcher supervises several more junior researchers who each lead almost every aspect of their own project [53]—remains a vestige of an outdated apprenticeship model of scientific research. In practice, each aspect of a research project can benefit from distinct expertise, including domain-specific knowledge (e.g., designing a study), technical capabilities (e.g., statistical analysis), and procedural proficiencies (e.g., data curation and data deposit). Poor distribution of labor and lack of task specialization may be part of the reason data and code sharing remain rare [48,54], publications regularly overlook previous research conducted on the same topic [55], and the majority of studies in some disciplines use sample sizes too small to reasonably answer their research question [56]. Efforts to recognize diverse research contributions are helping usher in a new research model that fosters open science. The Contributor Roles Taxonomy (CRediT), launched in 2014, brings attention to the need for diverse contributions by outlining 14 standardized contributor roles, such as conceptualization, data curation, and writing (review and editing). Dozens of notable publishers have adopted CRediT, and some (e.g., PLOS) require a CRediT statement when submitting a manuscript [57]. While the concept of authorship continues to overshadow “contributorship,” the widespread adoption of CRediT is a first step in recognizing diverse research inputs: including efforts related to open science and reproducibility by including roles in data curation and validation. CRediT statements also provide a dataset that meta-researchers can use to study the research ecosystem and realign incentives [53,58]. The US National Academy of Sciences has taken a step towards this goal by establishing the TACS (Transparency in Author Contributions in Science) website, which will list journals committed to setting authorship standards, defining corresponding authors’ responsibilities, requiring ORCID identifiers, and adopting the CRediT taxonomy. Promoting role specialization can also help foster the creation of large research teams and, in turn, valuable large-scale research resources. For example, the UK Biobank contains detailed genetic, biological, and questionnaire data from over 500,000 individuals and has been analyzed by over 30,000 researchers in about 100 countries [59–61]. Another initiative, the Brain Imaging Data Structure (BIDS) is a standard for file structure and metadata that allows results from expensive brain imaging studies to be more easily reproduced and meta-analyzed [62]. These efforts, however, require large specialized groups: The UK Biobank includes 15 distinct teams, including imaging, executive, data analyst, laboratory, study administration, and finance [63]; BIDS credits over 250 contributors across 26 roles [64]. Academic funding schemes, however, mainly support small to medium size teams. When larger teams are funded, they generally comprise several smaller teams and sometimes lack the organizational structure and efficiency that specialization can entail, including staff dedicated to human resources, information technology, and project management. Several exceptions exist across the biological sciences where large consortia are becoming more common (e.g., the European Commission Human Brain Project, the US NIH’s Knockout Mouse Program), and in high-energy physics, where CERN has served as a model for large-scale scientific collaboration. Consortia in other disciplines, however, continue to have difficulty securing funding and largely comprise volunteers with their main responsibilities elsewhere (e.g., the Psychological Science Accelerator) [65]. In the absence of mainstream funding opportunities for large and enduring research teams, the possibility of answering certain questions is left to those who can afford it, such as industry, government, and exceptional philanthropists. These actors may not prioritize the advancement of science and betterment of society in the same way one would hope that impartial academics do. For academia to remain competitive across the landscape of research questions, we envision a future where the systems for funding, hiring, and promotion prioritize the flourishing of large and long-lasting research teams.

[END]
---
[1] Url: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3002362

Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/