(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .
Genome size distributions in bacteria and archaea are strongly linked to evolutionary history at broad phylogenetic scales [1]
['Carolina A. Martinez-Gutierrez', 'Department Of Biological Sciences', 'Virginia Tech', 'Blacksburg', 'Virginia', 'United States Of America', 'Frank O. Aylward', 'Center For Emerging', 'Zoonotic', 'Arthropod-Borne Pathogens']
Date: 2022-07
The evolutionary forces that determine genome size in bacteria and archaea have been the subject of intense debate over the last few decades. Although the preferential loss of genes observed in prokaryotes is explained through the deletional bias, factors promoting and preventing the fixation of such gene losses often remain unclear. Importantly, statistical analyses on this topic typically do not consider the potential bias introduced by the shared ancestry of many lineages, which is critical when using species as data points because of the potential dependence on residuals. In this study, we investigated the genome size distributions across a broad diversity of bacteria and archaea to evaluate if this trait is phylogenetically conserved at broad phylogenetic scales. After model fit, Pagel’s lambda indicated a strong phylogenetic signal in genome size data, suggesting that the diversification of this trait is influenced by shared evolutionary histories. We used a phylogenetic generalized least-squares analysis (PGLS) to test whether phylogeny influences the predictability of genome size from dN/dS ratios and 16S copy number, two variables that have been previously linked to genome size. These results confirm that failure to account for evolutionary history can lead to biased interpretations of genome size predictors. Overall, our results indicate that although bacteria and archaea can rapidly gain and lose genetic material through gene transfers and deletions, respectively, phylogenetic signal for genome size distributions can still be recovered at broad phylogenetic scales that should be taken into account when inferring the drivers of genome size evolution.
The evolutionary forces driving genome size in bacteria and archaea have been subject to debate during the last decades. Typically, independent comparative analyses have suggested that unique variables, such as the strength of selection, environmental complexity, and mutation rate, are the main drivers of this trait, without considering for potential biases derived from shared ancestry. Here, we applied a phylogeny-based statistical approach to assess how tightly genome size in bacteria and archaea is linked to evolutionary history. Moreover, we also evaluated the predictability of genome size from the strength of purifying selection and ecological strategy on a broad diversity of bacteria and archaea genomes under a phylogenetic comparative framework. Our approach indicates that despite the ability of bacteria and archaea to rapidly exchange genes, a strong phylogenetic signal to genome size distributions can be recovered at broad phylogenetic scales.
Funding: This investigation was supported by grants from the Institute for Critical Technology, and Applied Science, the National Science Foundation (grant IIBR-1918271), and a Simons Early Career Award in Marine Microbial Ecology and Evolution to F.O.A. The funders had no role in study design, data collection, analysis, decision to publish, or preparation of the manuscript.
Importantly, most statistical analyses exploring the association between genome size and other traits have typically not used phylogenetic comparative methods that are necessary when using species as data points. Shared evolutionary history may obscure the relationship between traits because the phylogenetic dependence between lineages leads to the violation of the statistical assumption of independence in residuals. Thus, conventional statistical methods can lead to overestimation of the strength of the association between traits [ 26 , 27 ]. In this study, we estimated the phylogenetic signal of genome size across a broad diversity of bacterial and archaeal genomes available on the Genome Taxonomy Database (GTDB) [ 28 , 29 ]. Although genome size has been shown to change rapidly in prokaryotes due to HGT and gene loss, we sought to test if this trait still bore a phylogenetic signal across broad phylogenetic scales. Moreover, because previous studies have suggested that effective population size or ecological niche are potential drivers of genome size [ 3 , 8 ], we evaluated whether correlations with these factors would change if evolutionary history was taken into account. Our work provides important insights into the complex mechanisms that shape genome size in bacteria and archaea, and the importance of considering shared evolutionary relationships when studying its evolution to avoid bias in the association between traits.
Multiple individual factors have been hypothesized to be primary drivers of genome size in bacteria and archaea. Early studies suggested that effective population size (Ne) may be the primary force that determines genome size and fluidity in prokaryotes [ 7 , 8 ]. For example, genome reduction has been observed in host-dependent bacteria that have small Ne and correspondingly high levels of genetic drift due to population contractions. Under such evolutionary constraints, slightly deleterious deletions accumulate and cause overall genome reduction [ 9 – 13 ]. Paradoxically, later studies focusing on abundant free-living planktonic lineages in the ocean suggested that genome reduction can also be observed in bacteria with larger Ne that experience strong purifying selection [ 14 – 17 ]. In this case selection favors genomic economization, such as the removal of paralogs and intergenic sequences. Factors other than Ne and the strength of purifying selection have also been postulated to play a role in determining prokaryotic genome size. Recently, one study suggested that environmental stress leads to genome streamlining in soil bacteria [ 18 ], and other genomics studies have suggested that habitat complexity and ecological strategy [ 19 ], as well as the capability to use oxygen [ 20 ] may also play major roles in determining genome size in bacteria and archaea [ 19 ]. Mutation rate has also been proposed to be a major factor determining genome size [ 21 , 22 ]. In particular, it was suggested that a high mutation rate would be the primary cause of genome reduction in both streamlined and host-dependent bacteria due to the erosion of genes, loss of function, and subsequent deletion [ 21 – 23 ]. However, other studies analyzing the mutation rate of the abundant picocyanobacteria Prochlorococcus show estimates similar to Escherichia coli, casting doubt on the view that high mutation rates drive genome reduction in all cases [ 24 , 25 ]. Given the large number of forces that have been proposed to be primary determinants of genome size, it remains largely unknown whether genome size in prokaryotes is driven by unique variables, their interaction, or variables that have specific influence depending on the lineage.
Bacterial and archaeal genomes are densely packed with genes and contain relatively little non-coding DNA, and therefore an increase in genome size is directly translated into more genes [ 1 – 3 ]. In contrast, multicellular eukaryotes generally show genome expansion due to the proliferation of noncoding-DNA as a consequence of high genetic drift [ 2 ]. The Depletion of non-functional elements in prokaryotes is explained through the bias towards more deletions than insertions; newly acquired or existing genes are removed if selection on those genes is insufficient for their maintenance in the population [ 4 – 6 ]. Although narrowly constrained when compared with eukaryotes, prokaryotic genome sizes still vary by over one order of magnitude. Assuming an intrinsic deletion bias across all prokaryotes, it remains unclear what evolutionary forces determine which genes are maintained and which are lost, and what determines the variability of genome sizes across the broad diversity of bacteria and archaea.
Results and discussion
Ecological strategy plays a role in genome size evolution in bacteria and archaea In addition to testing the effect of the strength of selection on genome size, we also assessed the predictability of genome size from 16S rRNA copies as an approximation to ecological strategy using both, GLS and PGLS. Previous studies have shown that copies of the rrn operon can be a predictor of the number of ribosomes that a cell can produce simultaneously, and that this reflects the ecological strategy in microorganisms [54,55]. A large number of rrn copies is associated with the ability to adapt quickly to fluctuating environmental conditions (i.e., “boom and bust” strategies) [56], while multiple rrn copies would confer a metabolic burden to slow-growing microorganisms living in stable or low-nutrients environments because of ribosome overproduction [54]. Similarly to what we observed for dN/dS, we found a weak, positive, and significant relationship between genome size and 16S rRNA copies when using GLS (P<0.001, Pseudo-R2 = 0.01, Table 2, Fig 3B). Interestingly, we still observed a significant relationship when accounting for the phylogenetic signal in the residuals through a PGLS analysis (P = 0.003, Pseudo-R2 = 0.01, Table 2, Fig 3B). However, the Pagel’s lambda of this model was not significantly different from 1 (Table 2), indicating that the residuals of this model show a distribution closer to the BM expectation. After fitting under the BM, we still observed a positive and significant relationship between genome size and 16S rRNA copies (P<0.001, Pseudo-R2 = 0.02). Although the predictability of 16S rRNA is weak under both BM and Pagel’s model, our findings suggest that environment complexity plays a role on genome size independently of phylogenetic relationships. This is consistent with the observation that larger genomes tend to inhabit environments with temporal variability and diversity of resources [57,58]. In addition to fitting our model using dN/dS and 16S rRNA copies individually as predictors, we fitted an additive model with both variables (Table 2). An ANOVA test showed that a model including both variables does not significantly improve the fit when compared with the model based on 16S rRNA copies as a unique predictor variable (P = 0.48).
[END]
---
[1] Url:
https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1010220
Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.
via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/