(C) PLOS One

(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .

Four principles to establish a universal virus taxonomy [1]

['Peter Simmonds', 'Nuffield Department Of Medicine', 'University Of Oxford', 'Oxford', 'United Kingdom', 'Evelien M. Adriaenssens', 'Quadram Institute Bioscience', 'Norwich Research Park', 'Norwich', 'F. Murilo Zerbini']

Date: 2023-02

The International Committee on Taxonomy of Viruses (ICTV) is the official body mandated by the International Union of Microbiology Societies to develop and maintain a taxonomy of viruses and the naming of their taxa. Throughout its history, the rules and codes associated with taxonomy have been updated many times in response to new discoveries, changes in understanding of evolutionary relationships among viruses, and, importantly, the advent of new technologies, such as high-throughput sequencing (HTS) that have vastly increased our global knowledge of viral diversity.

With roots in a pregenomic age, the criteria used for virus classification (see Box 1 for definitions of terms used in virus taxonomy) and taxon nomenclature were originally and by necessity based on observational properties of virus isolates, including the morphology of virion particles [1], type of nucleic acid in their genomes [2], and physical attributes such as susceptibility to inactivation by high temperature, organic solvents, and low pH [3,4]. While the vast majority of viruses now included in the ICTV taxonomy have been characterized at the genomic level (and this has been recently introduced as prerequisite for classification), there remains active debate on the extent to which historical reliance on physical and biological properties might continue to be useful as classification criteria and, indeed, whether viruses need to be characterized in in vitro culture or by virion visualization to be eligible for taxonomic assignment [5,6]. This topic is hotly debated among virologists, as among prokaryotic and fungal taxonomists, who are discussing whether to require strain isolation, phenotypic characterization, and placement in publicly available collections. Current prokaryote and fungi species lists capture only a small fraction of the true genetic diversity of these organisms in the wider environment, with species totals in the tens of thousands rather than the millions that genomic surveys estimate to exist [7,8].

Nomenclature : The naming of viruses or taxa. Taxon nomenclature is regulated by the ICTV and has a number of typographical restrictions concerning italicization and capitalization; taxon names above the rank of species possess suffixes to indicate taxonomic rank. Species nomenclature follows a binomial format (genus name + species epithet). In contrast, the naming of viruses is not regulated by the ICTV.

An expert group convened by the ICTV in 2016 debated and affirmed a policy to allow viruses known from their genome sequences alone to be incorporated into virus taxonomy. This policy enables taxonomic assignments without requiring prior knowledge of a virus phenotypic properties, such as host range or pathogenicity, nor isolation of viruses in cell culture/local lesion hosts, or visualization of virions [9]. Subsequent discussions led to the publication of guidelines for minimum standards for virus sequence data to ensure that viruses assigned to the ICTV taxonomy are represented by complete or coding-complete genomic sequences, which are accurately assembled and free from artifacts [10,11]. This development has led to large numbers of new taxa being incorporated into the official taxonomy, primarily from genomic data accrued from large-scale metagenomic surveys [12–18]. It also led to a renewed debate on the merits of having different criteria being used for taxonomic assignments among different groups of viruses. In particular, the emphasis on biological properties for many viruses infecting animals and plants versus the almost exclusive use of nucleic acid–based features for viruses infecting prokaryotes.

The creation of a unified evolutionary taxonomy that incorporates viruses classified both by traditional and metagenomics-based analyses requires considerable knowledge and insight into how virus properties are genomically encoded, about their evolutionary histories, and the influence of past recombination or reassortment of genomic regions on phylogenetic congruence. Furthermore, viruses have multiple, independent, and likely ancient evolutionary origins (reviewed in [15,19,20]). To develop criteria for assigning viruses to taxa, consensus is required on which genes are most informative in recovering relationships that best represent the evolutionary histories of each of these different clades.

4: Taxonomic assignments of viruses inferred from metagenomic sequences require strict sequence quality control. Sequence-based assignment of a new taxon in the absence of other virus characterization requires it to be both accurate and complete. Published guidelines for minimum information about an uncultivated virus genome for taxonomic assignment have been produced [ 10 ].

3: Taxonomy is but one of many possible means to classify viruses. The taxonomy produced by the ICTV provides an overarching framework for classifying viruses based on evolutionary relationships. However, alternative classifications based on, for example, clinical or epidemiological properties or regulatory requirements have their own utilities in specific circumstances. These may not follow evolutionary relationships (like the Baltimore classification) or may include polyphyletic categories, such as arboviruses or human immunodeficiency viruses, that have epidemiological or clinical value but cannot be represented within an evolutionary taxonomy.

2: Virus properties may guide assignment of ranks to maximize their utility. While evolutionary relationships determine the topology of virus taxonomies, the ranks assigned within it are human-made constructs, with up to 15 available from realm to species. Placement of viruses should follow patterns of evolutionary, genomic, and phenotypic properties; for example, species assignments may be based on host range, disease associations, or epidemiology, provided that such categories result in monophyletic groups.

1: Virus taxonomy should reflect the evolutionary history of viruses. Most viruses can be assigned to independent virus realms, each with an inferred separate evolutionary origin. Members of each realm possess sets of ancestral orthologous genes, termed hallmark genes, typically corresponding to replication or virion formation modules within their genomes. Their evolutionary relationships define monophyletic taxonomic assignments within each of these virus groups.

The meeting achieved a substantial consensus on a range of approaches and challenges for taxonomy development, with all but two of the 45 participants endorsing a series of agreed recommendations in the form of four virus taxonomy principles ( Box 2 ). We believe these will have long-term relevance and practical utility to inform the continued development of a universal virus taxonomy by the ICTV for many years to come.

A group of 45 basic and clinical virologists, bioinformaticians, and evolutionary and structural biologists met in Oxford, United Kingdom, in April 2022, to develop a community-wide consensus on methodologies used for virus classification and to establish an integrated and internally consistent taxonomic framework. The discussions focused primarily on how an evolutionary taxonomy of all viruses infecting eukaryotes, archaea, and bacteria might be constructed, which tools and approaches could be used, and how this process could be guided by identification of the most evolutionarily informative attributes of virus genome sequences and their organization. The group also considered the broader issue of how to reconcile an expanding genetic and structural classification with a partly phenetic classification developed by virologists over many decades that takes into account, among other properties, clinical and regulatory utility, virus/host ecology, and epidemiology.

Principle 1. Virus taxonomy should reflect the evolutionary history of viruses

Ranks used for virus taxonomy (realm, kingdom, phylum, class, order, family, genus, and species) must reflect degrees of evolutionary relatedness of the viruses assigned at each rank (Fig 1). This implies that viruses assigned to an individual rank form a monophyletic clade, i.e., all members of a rank share a most recent common ancestor that is distinct from all other evolutionary lineages assigned to the taxonomy despite the impact of gene acquisition, recombination, or reassortment events on genome organizations. This statement may seem obvious, but it is, in fact, the first formal recognition by the ICTV that virus taxonomy should be guided at all ranks by the inference of evolutionary history. This principle provides the necessary route forward for a taxonomy that can incorporate viruses characterized from metagenomics studies.

PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 1. Ranks used in virus taxonomy. Schematic depiction of the 15-rank taxonomic framework used by the ICTV. It includes the methodologies that may be used to determine virus evolutionary relationships and make assignments at each rank. The pyramid shape indicates that the number of taxa increases from the top rank (realm) to the most basal rank (species, Sp.). The names of the 15 ranks are shown on the left of the pyramid, and the methodologies are on the right (AAS, amino acid sequence similarity; NS, nucleotide sequence similarity). The pyramid includes a hypothetical example of the taxonomy of a realm, indicating the number of taxa at each rank (filled circles). The phenotypic properties of classified viruses that may inform rank placements are depicted below the pyramid. https://doi.org/10.1371/journal.pbio.3001922.g001

Virus evolutionary histories and choice of hallmark genes. We recognize that the establishment of a coherent virus taxonomy requires a variety of tools and approaches to reconstruct the underlying evolutionary relationships of viruses across their spectrum of diversity. Reconstruction of the deeper evolutionary histories of viruses is particularly challenging due to the lack of conserved genes across all virus genomes. This reflects the growing certainty that viruses have emerged on multiple independent occasions [20–22]. The impossibility of creating a taxonomic structure for all viruses with a single common ancestor contrasts with the biological classification of cellular life forms that possess a set of core genes, such as those encoding ribosomal proteins and ribosomal RNAs. While acknowledging the reticulate nature of the tree of life, these universal genes testify to the shared ancestry of genes present in bacteria, archaea, and eukaryotes linking back to a last universal cellular ancestor (LUCA) and that can be aligned to infer the deepest evolutionary relationships among all domains of cellular life forms [19,23]. Despite the lack of universal virus genes, considerable progress has been made recently in better defining virus groups that share common ancestry [15,24]. The majority of viruses can be assigned to one of several independent realms, each of which is unified through possession of a shared orthologous gene or gene set, termed hallmark gene(s) [15]. Each realm is inferred to represent a distinct, independent origin of its constituent members. Two major functional components, the genome replication module and the virion formation module [25], are currently used for realm definition. These hallmark genes are thus considered to be ancestral to the members of each realm [25]. Virion morphogenesis modules were chosen as the defining characters for DNA viruses with larger genomes and govern assignments into the realms Adnaviria, Duplodnaviria, and Varidnaviria. Viruses in these three realms encode major capsid proteins (MCPs) that are structurally radically different, as well as distinct virion assembly and genome packaging machineries [15,19,26], suggesting independent evolutionary origins. The evolutionary relationships of the genes involved in replication were not considered suitable for defining the realms of large DNA viruses because even relatively closely related viruses within the same realm often have distinct genome replication modules. For example, related viruses can encode nonhomologous or distantly related DNA polymerase genes of families A, B, or C that are interspersed with cellular counterparts. Some may lack DNA polymerase genes altogether and instead encode diverse replication initiators that facilitate the recruitment of the host replisome [25,27]. On the other hand, the key features of the genome replication machinery are the most suitable for defining the realms Riboviria [15], Monodnaviria (ICTV Taxonomy proposal 2019.005G.R.Monodnaviria), and Ribozyviria (ICTV Taxonomy proposal 2020.012D.R.Ribozyviria). The realm Riboviria unifies RNA viruses (kingdom Orthornavirae) and reverse-transcribing viruses (kingdom Pararnavirae), all of which encode homologous right-handed palm-domain RNA-directed RNA polymerase (RdRP) or reverse transcriptase (RT) genes, respectively. The phylogeny of these RdRPs and RTs was, therefore, used to guide the taxonomy within the Riboviria. In contrast, the capsid genes of RNA viruses fall into several unrelated groups, many likely to have been separately acquired from their hosts [28] or are completely absent. Analogously, all members of the realm Monodnaviria encode homologous histidine–hydrophobic residue–histidine (HUH) superfamily endonucleases [15,29], but the virion morphogenesis modules are distinct for viruses from different phyla within this realm. Finally, members of the Kolmioviridae, currently the sole family in the realm Ribozyviria, have small circular negative-sense RNA genomes that do not encode an RNA polymerase but contain a particular ribozyme that serves to define the realm.

Modular evolution of viruses. Virus evolution is frequently punctuated by large-scale genome reorganizations and the exchange of gene modules analogous to horizontal gene transfer in prokaryotes. For example, alpha-, beta-, gamma-, and deltaflexiviruses and tymoviruses possess an evolutionarily conserved set of replication genes (Rep) that define their classification in the order Tymovirales in the realm Riboviria. However, their capsid morphologies are diverse, including particles that are isometric (members of the Tymoviridae), filamentous/helical (viruses in the Alphaflexiviridae, Betaflexiviridae, and Gammaflexiviridae), or form no particles at all (members of the Deltaflexiviridae and fungus-infecting members of the Alphaflexiviridae). Even within a family, the phylogeny of capsid genes may be noncongruent with that of the replication genes, such as between genera of Alphaflexiviridae [30]. Similarly, members of the order Martellivirales share relatively closely related RdRPs and other genes involved in replication, such as helicases and capping enzymes, but produce flexible filamentous, rod-shaped, or icosahedral particles constructed from unrelated capsid proteins [28], or no classic virions at all (i.e., endornaviruses), suggesting the acquisition or loss of capsid morphogenesis genome modules from taxonomically distant viruses. Furthermore, capsid genes can be exchanged between viruses that are otherwise evolutionarily unrelated. For example, a range of plant and animal RNA viruses and small single-stranded (ss) DNA viruses encode homologous horizontal single a jelly-roll capsid proteins, despite the RNA viruses being assigned to the realm Riboviria and the ssDNA viruses to the realm Monodnaviria [31–33]. Some prokaryotic viruses, in particular those alternating between lysogenic and lytic infections (“temperate” viruses), such as λ-like phages and those in the realm Duplodnaviria infecting Mycolicibacterium species, are substantially influenced by horizontal gene transfer [34]. These viruses possess a so-called mosaic genome structure, in which different parts of the genome can have quite different evolutionary histories [34]. In such cases, the placement of taxonomic boundaries to form monophyletic groups at certain ranks is arbitrary as there are multiple possible evolutionary histories. Although gene-sharing networks are informative for tracking gene exchange across virus groups [35], the relationships they depict violate the principles of ancestral descent that are used in taxonomy. Therefore, while different gene components are equally parts of the evolutionary histories of viruses and contribute to their phenotypes, for pragmatic purposes, we assign primacy to the most evolutionarily conserved hallmark genes in the construction of a hierarchical taxonomy. The use of hallmark genes for virus taxonomy is conceptually analogous to the use of a core set of conserved genes (primarily those for translation system components) for taxonomy of cellular life forms and eschews the use of the much more variable complements of genes subjected to horizontal gene transfer and loss [36]. Alternative taxonomies could be developed by selection of different genes to determine relatedness (for example, through basing the taxonomy of RNA viruses on capsid gene relationships, or of large DNA viruses by DNA polymerase genes). However, these typically yield a much greater number of unrelated virus groups and a less parsimonious association with virus properties.

Methodology for virus phylogenetics and taxonomy. Within individual virus realms, currently, a range of genome sequence comparison methods are needed to describe and assign viruses to different taxonomic ranks. For viruses with similar genome sequences, i.e., within the same species and genus, genetic relationships may be inferred from alignments of nucleotide or amino acid sequences of (near) complete genomes or of specific genes. The relationship among viruses can be further explored by phylogenetic tree inference and analysis, and where this is not practical, clustering by sequence similarity and analysis of pairwise distance distributions using tools such as PASC [37], DEmARC [38], and VIRIDIC [39]. However, these values only serve as an approximation of evolutionary relatedness [40,41]. The latter may be better inferred by phylogenetic methods that are also capable of calculating clade support, such as VICTOR [42] (Table 1). PPT PowerPoint slide

PNG larger image

TIFF original image Download: Table 1. Examples of methodologies used for virus classification at different taxonomic ranks*. https://doi.org/10.1371/journal.pbio.3001922.t001 At the intermediate levels of family, order, and class, relationships can be inferred by comparing sequences of evolutionarily conserved hallmark genes using sensitive methods for protein family profile comparison, such as HHPred [43], with subsequent phylogenetic analysis using appropriate methods for tree inference, for example, maximum likelihood methods. Comparison of hallmark proteins can be combined with metrics based on gene content, gene order and orientation (synteny), and other aspects of genome organization, using tools such as vConTACT2 [44,45] and GRAViTy [46], which are based on hierarchical clustering of gene sharing networks and the detection of hidden Markov model profiles of conserved protein families, with GRAViTy also taking into account metrics based on gene order and genome organization [47,48]. ViPTree [49] has been also used to define family level taxa of prokaryotic double-stranded (ds) DNA viruses, whereas VICTOR [42] can classify all prokaryotic viruses at the species, genus, subfamily, and family ranks through a joint clustering- and phylogeny-driven approach. Taxonomic assignments at higher ranks, such as phyla, kingdoms, and realms, are based either on sequence comparison of the most highly conserved hallmark proteins and/or on protein structure comparisons. The latter can be informative for making evolutionary comparisons because homologous proteins typically retain similar structures, even when the corresponding amino acid sequences have diverged to the point that they are no longer sufficiently similar to infer homology based on sequence alone. Structure-based comparison methods include clustering based on estimates of distances between structures and structure-based phylogenetic analysis [50]. Much of the data used for this purpose originates from structures resolved experimentally with X-ray crystallography and, more recently, cryo-electron microscopy [51,52]. However, protein structure prediction methods have become much more accurate and insightful with the potential to enable large-scale bioinformatics-based reconstructions of structural features from sequence data alone. An important caveat is that, at this time, the recently developed and highly successful programs AlphaFold [53] and RosettaFold [54] generalize from known protein structures in the protein databank (PDB), a dataset in which virus proteins are substantially underrepresented [55], thus limiting their predictive power for analysis of relationships among viruses. Hallmark gene-based assignments at the levels of kingdom and realm can be hampered by high levels of protein sequence divergence, with homology only detectable once high-resolution structures for the corresponding proteins become available. For this reason, the validity of the phylogenetic analyses used to designate kingdoms and phyla through evolutionary relationships among RdRP and RT genes of Riboviria using sequence analysis alone has been questioned, as these are based on the purported arbitrariness of the alignment of highly divergent sequences [56]. Alignment methodologies continue to be refined [57], but, ultimately, a range of sequence and protein structure comparison methods are likely to be required to delineate the higher ranks with confidence. Indeed, while protein structure can be influenced by environmental conditions (such as temperature, ionic strength, etc.), the optimal fold determined under standardized conditions is a highly evolutionarily conserved attribute of a protein coding sequence, and structural homology may be recoverable even when detectable sequence homology is lost. Encouragingly, a phylogeny based on protein structure comparisons of the viral RdRPs of members of Riboviria [58] matched the relationships inferred by aligned sequence comparison methods [15,59] at all but the highest ranks, as well as by the known functional diversification of these enzymes (i.e., transcription and priming mechanisms) and replication complex morphology [58,60]. Along similar lines, structure-based clustering and phylogeny of capsid proteins can provide a powerful approach when the reliable inference of evolutionary relationships by sequence comparisons (“traceability”) is lost [61–63]. Thus, deeper evolutionary relationships that underpin capsid protein structure and virion architecture may be used to classify large DNA viruses into realms and kingdoms. As an example, the structure-based PRD1-adenovirus lineage, whose members encode MCPs with a vertical double jelly-roll fold [61,63] (Fig 2), can be assigned to the kingdom Bamfordvirae, which falls within the realm Varidnaviria. Conversely, established structural relationships can now be used to inform sequence alignments and allow the incorporation of the ever-expanding wealth of virus sequence data into taxonomy [59,64]. Detection of subtle sequence conservation among structurally similar major capsid proteins of large DNA viruses further validates the use of these proteins as hallmarks for Varidnaviria and Duplodnaviria [64–66]. PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 2. Structure-based dendrogram of capsid proteins of members of the kingdom Bamfordvirae. Structure-based phylogenetic tree inferred from major capsid protein (MCP) structures of the members of the kingdom Bamfordvirae in the Varidnaviria realm. Members of Bamfordvirae encode a vertical double-jelly roll fold MCP, which is the hallmark protein of this group of viruses. Next to each MCP structure are the virus name (top), the phylum (middle), and family (bottom), with “Faustovirus” not yet officially classified and Finnlakeviridae not yet assigned to any higher taxon. The evolutionary distances across the depicted members of the originally called PRD1-adenovirus viral lineage [67] were calculated with the Homologous Structure Finder software [50] and depicted with PHYLIP (https://evolution.genetics.washington.edu/phylip.html); the evolutionary distances are shown next to each branch. The protein data bank identifiers (PDBid) for the structures are as follows: PRD1: PDBid 1HX6; PBCV-1: 1M3Y; adenovirus: 1P2Z; STIV: 2BBD; Vaccinia D13: 2YGB; Sputnik: 3J26; Faustovirus: 5J7O; FLiP: 5OAC; ASFV p72: 6KU9; PM2: 2W0C. Adapted from [62]. https://doi.org/10.1371/journal.pbio.3001922.g002 The ranges of sequence divergence (and, consequently, rank levels) over which the various analytical methods used in virus taxonomy are defined overlap substantially (Table 1). The recent delineation and assignment of a new family of bacterial viruses (Herelleviridae) [48] is an illustrative example of the value of such a combined approach. Concordance between multiple methods using different approaches increases the reliability of the taxonomic placement of novel taxa, whereas conflicts are informative regarding both the suitability of different comparison methods, and the nature of the relationships among viruses. Such conflicts can also arise from gene sharing networks and have led to several examples for which ICTV taxonomic revisions were needed [45]. Conflicts between different methods may also indicate the need to postpone taxonomic assignments until more data become available.

[END]
---
[1] Url: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001922

Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/