(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .



Eukaryotic CD-NTase, STING, and viperin proteins evolved via domain shuffling, horizontal transfer, and ancient inheritance from prokaryotes [1]

['Edward M. Culbertson', 'University Of Pittsburgh', 'Department Of Biological Sciences', 'Pittsburgh', 'Pennsylvania', 'United States Of America', 'Tera C. Levin']

Date: 2023-12

Animals use a variety of cell-autonomous innate immune proteins to detect viral infections and prevent replication. Recent studies have discovered that a subset of mammalian antiviral proteins have homology to antiphage defense proteins in bacteria, implying that there are aspects of innate immunity that are shared across the Tree of Life. While the majority of these studies have focused on characterizing the diversity and biochemical functions of the bacterial proteins, the evolutionary relationships between animal and bacterial proteins are less clear. This ambiguity is partly due to the long evolutionary distances separating animal and bacterial proteins, which obscures their relationships. Here, we tackle this problem for 3 innate immune families (CD-NTases [including cGAS], STINGs, and viperins) by deeply sampling protein diversity across eukaryotes. We find that viperins and OAS family CD-NTases are ancient immune proteins, likely inherited since the earliest eukaryotes first arose. In contrast, we find other immune proteins that were acquired via at least 4 independent events of horizontal gene transfer (HGT) from bacteria. Two of these events allowed algae to acquire new bacterial viperins, while 2 more HGT events gave rise to distinct superfamilies of eukaryotic CD-NTases: the cGLR superfamily (containing cGAS) that has diversified via a series of animal-specific duplications and a previously undefined eSMODS superfamily, which more closely resembles bacterial CD-NTases. Finally, we found that cGAS and STING proteins have substantially different histories, with STING protein domains undergoing convergent domain shuffling in bacteria and eukaryotes. Overall, our findings paint a picture of eukaryotic innate immunity as highly dynamic, where eukaryotes build upon their ancient antiviral repertoires through the reuse of protein domains and by repeatedly sampling a rich reservoir of bacterial antiphage genes.

Funding: This research was supported in part by the University of Pittsburgh Center for Research Computing, RRID:SCR_022735, through the resources provided. Specifically, this work used the HTC cluster, which is supported by NIH award number S10OD028483. EMC was supported by NSF Postdoctoral fellowship 2208971 and TCL was supported by NIH R00AI139344 and R35GM150681. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

We found eukaryotic CD-NTases arose following multiple HGT events between bacteria and eukaryotes. cGAS fall within a unique, mainly metazoan clade. In contrast, OAS-like proteins were independently acquired and are the predominant type of CD-NTase found across most eukaryotes. Separately, we have discovered diverged eukaryotic STING proteins that bridge the evolutionary gap between metazoan and bacterial STINGs, as well as 2 separate instances where bacteria and eukaryotes have acquired similar proteins via convergent domain shuffling. Finally, we find that viperin was likely present in the LECA and possibly earlier, with both broad representation across the eukaryotic tree of life and evidence of 2 additional HGT events where eukaryotes recently acquired new bacterial viperins. Overall, our results demonstrate that immune proteins shared between bacteria and eukaryotes are evolutionarily dynamic, with eukaryotes taking multiple routes to acquire and deploy these ancient immune modules.

We investigated the ancestry of 3 gene families that are shared between animal and bacterial immunity: Stimulator of Interferon Genes (STING), cyclic GMP-AMP synthase (cGAS) and its broader family of CD-NTases, and viperin. STING, CD-NTases, and viperin are all interferon-stimulated genes that function as antiviral immune modules, disrupting the viral life cycle by activating downstream immune genes, sensing viral infection, or disrupting viral processes, respectively [ 20 ]. We choose to focus on the cGAS, STING, and viperin for a number of reasons. First, in metazoans cGAS and STING are part of the same signaling pathway, whereas bacterial CD-NTases often act independently of bacterial STINGs [ 21 ], raising interesting questions about how eukaryotic immune proteins have gained their signaling partners. Also, given the vast breadth of bacterial CD-NTase diversity, we were curious as to if any eukaryotes had acquired CD-NTases distinct from cGAS. For similar reasons, we investigated viperin, which also has a wide diversity in bacteria but a much more narrow described function in eukaryotes.

One common hypothesis in the field is that these immune proteins are ancient and have been inherited since the last common ancestor of bacteria and eukaryotes [ 5 ]. In other cases, horizontal gene transfer (HGT) between bacteria and eukaryotes has been invoked to explain the similarities [ 6 , 19 ]. However, because most papers in this field have focused on searching genomic databases for new bacterial immune genes and biochemically characterizing them, the evolution of these proteins in eukaryotes has not been as thoroughly investigated.

As the first line of defense against pathogens, all forms of life rely on cell-autonomous innate immunity to recognize threats and respond with countermeasures. Until recently, many components of innate immunity were thought to be lineage-specific [ 1 ]. However, new studies have revealed that an ever-growing number of proteins used in mammalian antiviral immunity are homologous to bacterial immune proteins used to fight off bacteriophage infections. This list includes Argonaute, CARD domains, cGAS and other CD-NTases, Death-like domains, Gasdermin, NACHT domains, STING, SamHD1, TRADD-N domains, TIR domains, and viperin, among others [ 2 – 13 ]. Perhaps one of the most exciting discoveries from these bacterial defense systems is the highly varied biochemical functions carried out by these bacterial proteins. For example, bacterial cGAS-DncV-like nucleotidyltransferases (CD-NTases), which generate cyclic nucleotide messengers (similar to cGAS), are massively diverse with over 6,000 CD-NTase proteins discovered to date. Beyond the cyclic GMP-AMP signals produced by animal cGAS proteins, bacterial CD-NTases are capable of producing a wide array of nucleotide signals including cyclic dinucleotides, cyclic trinucleotides, and linear oligonucleotides [ 11 , 14 ]. Many of these bacterial CD-NTase products are critical for bacterial defense against viral infection [ 8 ]. Interestingly, these discoveries with the CD-NTases mirror what has been discovered with bacterial viperins. In mammals, viperin proteins restrict viral replication by generating 3′-deoxy-3′,4′didehdro- (ddh) nucleotides [ 4 , 15 – 17 ] block RNA synthesis and thereby inhibit viral replication [ 15 , 18 ]. Mammalian viperin generates ddhCTP molecules while bacterial viperins can generate ddhCTP, ddhUTP, and ddhGTP. In some cases, a single bacterial protein is capable of synthesizing 2 or 3 of these ddh derivatives [ 4 ]. These discoveries have been surprising and exciting, as they imply that some cellular defenses have deep commonalities spanning across the entire Tree of Life, with additional new mechanisms of immunity waiting to be discovered within diverse microbial lineages. But despite significant homology, these bacterial and animal immune proteins are often distinct in their molecular functions and operate within dramatically different signaling pathways (reviewed here [ 5 ]). How, then, have animals and other eukaryotes acquired these immune proteins?

Results

Discovering immune homologs across the eukaryotic tree of life The first step to understanding the evolution of CD-NTases, STINGs, and viperins was to acquire sequences for these proteins from across the eukaryotic tree. To search for diverse immune homologs, we employed a hidden Markov model (HMM) strategy, which has high sensitivity, a low number of false positives, and the ability to separately analyze multiple (potentially independently evolving) domains in the same protein [22–24]. We used this HMM strategy to search the EukProt database, which has been developed to reflect the true scope of eukaryotic diversity through the genomes and transcriptomes of nearly 1,000 species, specifically selected to span the eukaryotic tree [25]. EukProt contains sequences from NCBI and Ensemble, plus many diverged eukaryotic species not found in any other database, making it a unique resource for eukaryotic diversity [25]. While it can be challenging to acquire diverse eukaryotic sequences from traditional databases due to an overrepresentation of metazoan data [26], EukProt ameliorates this bias by downsampling traditionally overrepresented taxa. To broaden our searches from initial animal homologs to eukaryotic sequences more generally, we used iterative HMM searches of the EukProt database, incorporating the hits from each search into the subsequent HMM. After using this approach to create pan-eukaryotic HMMs for each protein family, we then added in bacterial homologs to generate universal HMMs (Figs 1A and S1), continuing our iterative searches until we either failed to find any new protein sequences or began finding proteins outside of the family of interest (S1 Fig). To define the boundaries that separated our proteins of interest from neighboring gene families, we focused on including homologs that shared protein domains that defined that family (see Materials and methods for domain designations) and were closer to in-group sequences than the outgroup sequences on a phylogenetic tree (outgroup sequences are noted in the Materials and methods). Our searches for CD-NTases, STINGs, and viperins recovered hundreds of eukaryotic proteins from each family, including a particularly large number of metazoan sequences (red bars, Fig 1B). It is not surprising that we found so many metazoan homologs, as each of these proteins was discovered and characterized in metazoans and these animal genomes tend to be of higher quality than other taxa (S2 Fig). We also recovered homologs from other species spread across the eukaryotic tree, demonstrating that our approach could successfully identify deeply diverged homologs (Fig 1B). However, outside of Metazoa, these homologs were sparsely distributed, such that for most species in our dataset (711/993), we did not recover proteins from any of the 3 immune families examined (white space, lack of colored bars, Fig 1B). While some of these absences may be due to technical errors or dataset incompleteness (S2 Fig), we interpret this pattern as a reflection of ongoing, repeated gene losses across eukaryotes, as has been found for other innate immune proteins [27–29] and other types of gene families surveyed across eukaryotes [28,30–32]. Indeed, many of the species that lacked any of the immune homologs were represented by high-quality datasets (Ex: Metazoa, Chlorplastida, and Fungi). Thus, although it is always possible that our approach has missed some homologs, we believe the resulting data represents a fair assessment of the diversity across eukaryotes, at least for those species currently included within EukProt. PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 1. HMM searches to find homologs across the eukaryotic Tree of Life. (A) A schematic of the HMM search process. Starting from initial, animal-dominated HMM profiles for each protein family, we used iterative HMM searches of the EukProt database to generate pan-eukaryotic HMMs. These were combined with bacterial sequences to enable discovery of bacteria-like homologs in eukaryotes. Each set of searches was repeated until few or no additional eukaryotic sequences were recovered which was between 3 and 5 times in all cases. (B) Phylogenetic tree of eukaryotes, with major supergroups color coded. The height of the colored rectangles for each group is proportional to its species representation in EukProt. Horizontal, colored bars mark each eukaryotic species in which we found homologs of STINGs, CD-NTases, or viperins. White space indicates species where we searched but did not recover any homologs. The CD-NTase hits are divided into the 3 eukaryotic superfamilies, defined in Fig 2. Individual data are available in S1 File. CD-NTase, cGAS-DncV-like nucleotidyltransferase; HMM, hidden Markov model; STING, Stimulator of Interferon Genes. https://doi.org/10.1371/journal.pbio.3002436.g001

[END]
---
[1] Url: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3002436

Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/