(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .



An analysis of retracted papers in Computer Science [1]

['Martin Shepperd', 'Department Of Computer Science', 'Gothenburg Chalmers University', 'Gothenburg', 'Brunel University London', 'Uxbridge', 'United Kingdom', 'Leila Yousefi', 'Department Of Life Sciences']

Date: 2023-06

Unfortunately, retraction seems to be a sufficiently common outcome for a scientific paper that we as a research community need to take it more seriously, e.g., standardising procedures and taxonomies across publishers and the provision of appropriate research tools. Finally, we recommend particular caution when undertaking secondary analyses and meta-analyses which are at risk of becoming contaminated by these problem primary studies.

We find that of the 33,955 entries in the Retraction watch database (16 May 2022), 2,816 are classified as CS, i.e., ≈ 8%. For CS, 56% of retracted papers provide little or no information as to the reasons. This contrasts with 26% for other disciplines. There is also some disparity between different publishers, a tendency for multiple versions of a retracted paper to be available beyond the Version of Record (VoR), and for new citations long after a paper is officially retracted (median = 3; maximum = 18). Systematic reviews are also impacted with ≈ 30% of the retracted papers having one or more citations from a review.

The retraction of research papers, for whatever reason, is a growing phenomenon. However, although retracted paper information is publicly available via publishers, it is somewhat distributed and inconsistent.

Data Availability: The complete raw data cannot be shared publicly because of a data usage agreement with Retraction Watch that prohibits publishing more than 2% of the data set. This requirement arises because, in order to fund Retraction Watch’s continued operations, given that their initial grants have ended, they are licensing their data to commercial entities. Therefore researchers will need to approach Retraction Watch directly (retractionwatch.org) to obtain the same data set. We have placed our relevant code for this study in the Zenodo repository ( https://doi.org/10.5281/zenodo.6634462 ).

Copyright: © 2023 Shepperd, Yousefi. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

The remainder of the paper is structured as follows. The next section provides general information about the retraction, reasons for retraction, a brief history and some indicators of the overall scale. We then describe our methods based on the Retraction Watch [ 5 ] and the Google Scholar citation databases and present the detailed results from our data analysis organised by the research question. We conclude, with a discussion of some threats to validity, a summary of our findings and the practical implications that arise from them.

Therefore, we investigate the nature and prevalence of retracted papers in CS. In addition, we consider how such papers are cited, in particular how they are used by secondary analyses, systematic reviews and meta-analyses and their post-retraction citation behaviour.

Although various other scientific disciplines have raised retraction as a concern this has not been explicitly investigated within the discipline of Computer Science (CS). The sole exception, to the best of our knowledge, is Al-Hidabi and Teh [ 4 ] who examined 36 CS retractions, constituting approximately 2% of those actually retracted at the time of their analysis (2017). Fortunately, we have been able to benefit from the pioneering work of Retraction Watch (RW) and their database [ 5 ] to enable a considerably more comprehensive plus up-to-date analysis of retraction in CS.

This problem is exacerbated by the increasing use of open archives, e.g., arXiv to publish pre- or post-prints of papers in addition to publishers’ official publication websites—which are frequently protected by paywalls. The challenge here is that some archives are relatively informally organised and often rely on voluntary activities or temporary income streams. This means there may not necessarily be any formal mechanism for dealing with expressions of concern or retraction and even if such decisions are made, they might not propagate across all versions of the paper in question. Indeed, as we will show, they do not.

With an ever-increasing number of scientific papers being published every year. For example, Elsevier alone published in excess of 550,000 new scientific articles in 2020 across roughly 4,300 journals; its archives contain over 17 million documents (2020 RELX Company Report). Similarly, the rising count of retracted papers has been noted (e.g., [ 1 – 3 ]). What is unclear is whether the growth is due to rising misconduct or improvements in our abilities to detect such situations as argued by Fanelli [ 2 ]. In addition, new questions arise concerning our ability to police or manage those papers that are retracted. This is a particularly important matter if such papers are subsequently cited to support some argument, or worse, contribute to, or contaminate, a meta-analysis so that weight is given to one or more primary studies that are no longer considered valid.

Despite the foregoing discussion, there are almost no similar studies in Computer Science and none that examine the potential impact of systematic reviews, mapping studies, and meta-analyses. The only work we have located is by Al-Hidabi and Teh [ 4 ] which retrieved 36 retracted CS papers and classified the reasons as random (4/36), non-random (31/36), and no reason given (1/36). By random it would seem the authors are meaning so-called ‘honest’ errors and non-random refers to various forms of misconduct such as plagiarism and duplicate publication. However, we do feel constrained to observe that 36 out of 1818 retracted papers in the RW database at the point of their analysis (2017) is a quite small (2%) proportion.

The complexities of actual citation practice, we believe, demonstrate the need for care when analysing real citation behaviour from published scientific papers. Interestingly, a number of researchers have sought to automate the process of determining citation purpose using a range of contextual data along with linguistic cues, e.g., Teufel et al. [ 22 ], more recently Heibi and Peroni [ 23 ], and in the specific domain of algorithm citation, Tuarob et al. [ 24 ]. This supports the notion that treating all citations as equal in quantitative citation analysis may be misleading. So it is likely that not all citations to retracted articles are going to be equally impacted, but those from the argumentation and data categories will tend to be most vulnerable.

Other relevant work has also been undertaken by researchers trying to understand the reasons for citations in scientific papers. Key themes are that citations need to be understood in context, that they cannot be seen as homogeneous, and that they are not “simply a function of scientific argumentation in a pure sense, there are many motives for citing authors, a point hidden by, or only implicit in, many functionalist accounts of citation” Erikson and Erlandson [ 21 ]. This certainly seems to be borne out by the number of papers retracted for inappropriate citations.

This analysis of post-retraction citation patterns was followed up by Mott et al. [ 16 ] and Schneider et al. [ 17 ] both examining the field of clinical trials and more recently, Heibi and Peroni [ 18 ] and Bolland et al. [ 19 ] looking more generally. Even secondary studies and systematic reviews are not immune e.g., the “Bibliometric study of Electronic Commerce Research in Information Systems & MIS Journals” (Note, we are intentionally not citing retracted papers but rather referring to them by their title. This enables the curious to locate them should they wish.) This review was published in 2016, retracted in 2020, and has still been cited 11 times subsequently. If systematic reviews are seen as the ‘pinnacle’ of the evidence hierarchy then we should be particularly vigilant about their possible contamination through the inclusion of retracted primary studies. Brown et al. [ 20 ] explored these phenomena in pharmacology and found that of 1,396 retracted publications in the field approximately 20% (283) were cited by systematic reviews. Of these, 40% were retracted for data problems including falsification, and 26% for concerns about methods and analysis.

Another area of investigation has been the citation patterns of papers after they have been retracted. As mentioned, back in 1990 Pfeifer and Snodgrass [ 12 ] found that citations continued although at a reduced rate. More recently, Bar-Ilan and Halevi [ 15 ] analysed 15 retracted papers from 2015-16 for which there were a total of (obtainable) 238 citing documents. Given that the papers were all publicly retracted this is a disturbingly high level. However, the authors pointed out that not all reasons for retraction of necessity invalidate the findings e.g., self-plagiarism means the authors are unjustifiably attempting to obtain extra credit for their research, not that the research is flawed. Of the 15 papers, 8 constituted more serious cases where the data or the images were manipulated hence the results cannot be trusted. Of the citations to these papers that contained unsafe results 83% were judged to be positive, 12% neutral and only 5% were negative. This is despite being after the papers were publicly retracted, leading to concerns about the way this information is disseminated or the state of scientists’ scholarship.

Within retraction, one topic that has been quite widely investigated is identifying possible predictors for papers likely to be retracted. However, the obvious predictor of the prevalence of negative citations (i.e., those questioning the target paper) was found to have surprisingly little impact on the likelihood of retraction by Bordignon [ 13 ]. More positively, Lesk et al. [ 14 ] reported that PLoS papers that included open data were substantially less likely to be retracted than others.

Investigating retracted papers is not new. More than 30 years ago Pfeifer and Snodgrass [ 12 ] looked into this phenomenon within medical research. They found 82 retracted papers and then assessed their subsequent impact via post-retraction citations which totalled 733. They computed that this constituted an approximate 35% reduction in what might otherwise have been expected but is still disturbing particularly in a critical domain such as medicine.

A major contributor to our understanding of the phenomenon of retracted papers is the not-for-profit organisation RW founded by Adam Marcus and Ivan Oransky in 2010 who maintain the largest and most definitive database of retracted papers. This was kindly made available for analysis in this paper. For an overview of the role of the RW organisation see Brainard [ 11 ].

Note that not all of the above reasons are related to research misconduct. So we should be clear that whilst retraction means something has gone wrong, it does not necessarily imply culpability or some kind of moral lapse on the part of one or more of the authors. We should perhaps find encouragement in the findings of Fanelli [ 10 ] that only 1-2% of researchers admitted to ever fabricating data or results. In contrast, the survey of 2,000 experimental psychologists by John et al. [ 8 ] found that the use of so-called QRPs was disturbingly high. Thus, whilst misconduct is likely very rare, poor research practices may be less so.

A research paper may be retracted for one or more reasons. Note that the list of reasons is growing over time as publishers and editors encounter new situations and find the need for new categories. RetractionWatch presently identify 102 reasons [ 9 ]. This is of course a very fine-grained classification scheme. A non-exhaustive list of the more commonplace reasons includes:

Analysis and results

In order to explore the phenomena of retracted scientific papers, we made use of the Retraction Watch database [5], dated 16th May 2022. This comprises 33,955 records, one per retracted paper, of which 2,816 are classified as Computer Science, i.e., approximately 8.6%. The RW data set covers papers in any scientific discipline (very broadly defined) and includes Business and Technology, Life Sciences, Environmental Sciences, Health Sciences, Humanities, Physical Sciences, and Social Sciences. In addition to bibliographic data for each retracted paper, the database contains the publication date, retraction date, and retraction reason(s) and classifies the paper in terms of discipline(s) and article type(s). Note that our analysis covers retractions only, so whilst important, EoCs and corrigenda are excluded.

RQ2: The post-retraction citation behaviour of retracted works For our detailed analysis, we selected all retracted reviews including informal narrative reviews, systematic reviews, and meta-analyses. This amounted to 25 reviews with a further 4 papers being excluded due to not actually being a review of other work or, in one case, no longer available). We then undertook a stratified by retraction year random sample of a further 95 retracted non-review articles making a total of 120 articles. We resorted to sampling because an automated approach, for instance, using the R package Scholar, proved to be impractical due to our need to manually identify how many versions were available, language, determine and count meaningful citations, inconsistencies between publishers and so forth. Of these articles, 115/120 are available in the sense that the full text could be obtained using a Google Scholar search, albeit possibly behind a paywall. Ideally, all retracted articles should remain visible so that there is a public record of the history of the paper. The Committee on Publication Ethics (COPE) guidelines state that retracted papers should be labelled as retracted and be accessible (see https://publicationethics.org/files/cope-retraction-guidelines-v2.pdf. More worrying is the finding that only 98/120 (≈ 82%) of the papers are clearly labelled as retracted in at least one, though not necessarily all, publicly available versions. It would seem that different publishers have different practices. Indeed individual journals and conferences may have different practices and these may evolve over time. Another observation—and most likely relevant to the issue of post-retraction citations—is the proliferation of versions (see Fig 4). The median = 3 but as is clear from the histogram some papers have many more, with a maximum of 18. A feature of Google Scholar is it identifies links to multiple versions of a paper and often these will include informally published versions that might reside in the authors’ institutional archive or repositories such as arXiv. The potential problem is that if there is no link to the VoR then even if the publisher marks a paper as retracted this decision may not, indeed frequently does not, propagate across all other versions thus the reader could be unaware of any retraction issues, or for that matter corrections. PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 4. Online versions. Histogram and density plot of the available version count for 120 sampled CS retracted papers. https://doi.org/10.1371/journal.pone.0285383.g004 Next, we identified all meaningful citations (i.e., from an English language, research article including pre-review items such as arXiv reports, also books, and dissertations) to the retracted paper. In total, our sample of 120 retracted papers was cited 1,472 times or put another way, potentially ‘contaminated’ 1,472 other CS papers. Since this represents less than 5% of all retracted papers (120/2,816) one can begin to imagine the impact. A very crude linear extrapolation based on the extrapolation is (2818/120) × 1472 ≈ 34543 might suggest that of the order of 30,000+ CS papers may be impacted by citations to retracted papers, although one possible distortion is that our sample contains all the review articles which one might expect to be more heavily reviewed although this does not appear to be strongly the case (see the discussion for RQ3). These citations to retracted CS papers were classified as being either before the year of retraction, during the year of retraction, or subsequent to the retraction i.e., post-retraction citations. Total citations ranged from zero to 145 with the median = 4, but the distribution is strongly positively skewed with the mean = 13.03. Unsurprisingly, we observed very disparate citation behaviours between the retracted CS papers, noting that of course, older papers have more opportunities to be cited. Interestingly the relationship between paper age and citation count is less strong than one might imagine, see Fig 6 plus there are other factors such as the venue and the topic. Focusing now on post-retraction citations whilst allowing for concurrent (i.e., the citing paper is published at the same time as the cited paper is being retracted) submission, post-retraction is defined as being at least the following year to the retraction, we find a total of 672 post-retraction citations to 120 retracted papers (i.e., our sample). Again there is a huge disparity and these ranged from zero to a rather disturbing 82 with a median = 2 and a mean = 5.9 (see Fig 5). This is of course disturbing. It suggests either poor scholarship or ineffective promulgation of the retraction notice. Recall that for 18% of our sample of 120 CS papers there was no clear indication that the paper had been retracted. Of course, one wonders about the remaining 82%. PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 5. Post-retraction citation counts. Histogram of the distribution of the post-retraction citation count of 120 sampled papers. https://doi.org/10.1371/journal.pone.0285383.g005 Next, we consider the relationship with the year of retraction which is depicted by the scatter plot in Fig 6 along with a log-linear regression line and 95% confidence interval. This suggests the relationship with age is surprisingly weak. We also indicate retracted review papers as distinct from regular papers but again the distinction is not strong (the respective medians when normalised by years since retraction are: review papers = 0.32 and non-reviews = 0.20). PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 6. Post-retraction citations by year. Scatterplot of the distribution of the post-retraction citation by retraction year. https://doi.org/10.1371/journal.pone.0285383.g006 Although not of much comfort, we observe that the post-retraction citation patterns for CS are not really distinct from other disciplines, and thus the problems of continuing to cite retracted papers are widespread and seem to run across all research communities [15, 17–19].

[END]
---
[1] Url: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0285383

Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/