(C) PLOS One [1]. This unaltered content originally appeared in journals.plosone.org.
Licensed under Creative Commons Attribution (CC BY) license.
url:https://journals.plos.org/plosone/s/licenses-and-copyright

------------



Unusual mammalian usage of TGA stop codons reveals that sequence conservation need not imply purifying selection

['Alexander Thomas Ho', 'Milner Centre For Evolution', 'University Of Bath', 'Bath', 'United Kingdom', 'Laurence Daniel Hurst']

Date: 2022-05

The assumption that conservation of sequence implies the action of purifying selection is central to diverse methodologies to infer functional importance. GC-biased gene conversion (gBGC), a meiotic mismatch repair bias strongly favouring GC over AT, can in principle mimic the action of selection, this being thought to be especially important in mammals. As mutation is GC→AT biased, to demonstrate that gBGC does indeed cause false signals requires evidence that an AT-rich residue is selectively optimal compared to its more GC-rich allele, while showing also that the GC-rich alternative is conserved. We propose that mammalian stop codon evolution provides a robust test case. Although in most taxa TAA is the optimal stop codon, TGA is both abundant and conserved in mammalian genomes. We show that this mammalian exceptionalism is well explained by gBGC mimicking purifying selection and that TAA is the selectively optimal codon. Supportive of gBGC, we observe (i) TGA usage trends are consistent at the focal stop codon and elsewhere (in UTR sequences); (ii) that higher TGA usage and higher TAA→TGA substitution rates are predicted by a high recombination rate; and (iii) across species the difference in TAA <-> TGA substitution rates between GC-rich and GC-poor genes is largest in genomes that possess higher between-gene GC variation. TAA optimality is supported both by enrichment in highly expressed genes and trends associated with effective population size. High TGA usage and high TAA→TGA rates in mammals are thus consistent with gBGC’s predicted ability to “drive” deleterious mutations and supports the hypothesis that sequence conservation need not be indicative of purifying selection. A general trend for GC-rich trinucleotides to reside at frequencies far above their mutational equilibrium in high recombining domains supports the generality of these results.

Data Availability: The raw data used are all publicly available from the repositories outlined in the methods section. All data manipulation was performed using bespoke Python 3.6 scripts, while statistical analyses and data visualisations were performed using R 3.3.3. These scripts are hosted at https://github.com/ath32/gBGC where they are organised into folders corresponding to each part of the methods section. The underlying data for each figure may be found in the corresponding source data as outlined in the figure legends.

Introduction

If at a given site in DNA a mutation appears in a population and is eliminated by selection owing to its deleterious effects, the site in question will tend to be more conserved between species than comparable neutrally evolving sequence. This simple logic underpins the notion that the functionality of sequence can be inferred from its degree of conservation—for discussion, see Ponting [1]. It is explicit in, for example, molecular evolutionary tests for purifying selection (e.g., Ka/Ks test [2–5]) attempts to identify sites prone to disease-causing mutations [6,7], and estimates of the proportion of DNA within a genome that is “functional” [8].

These methods assume, however, that no force other than selection can deterministically act to alter the frequency of extant alleles. Over the past 2 decades, GC-biased gene conversion (gBGC) has been established as a potentially important influence on allele frequencies [9], mimicking selection [10–12]. The process of gBGC results from a repair bias favouring G/C alleles over A/T alleles during GC:AT mismatch repair in a (commonly assumed to be meiotic) heteroduplex [13,14]. In humans, at non-crossover gene conversion events 67.6% of GC:AT mismatches favour the GC allele [15]. It is probably as a consequence of this bias, coupled with the regionalisation of recombination domains over extended time periods, that mammals, alongside birds and possibly other amniotes [16], have genomes with large (>300 Mb) blocks of relatively homogeneous higher or lower GC content (isochores) [10,11,17]. Importantly, assuming consistency of local recombination rates over evolutionary time and a correlation between crossover rates and non-crossover rates [18], gBGC also can explain the relatively strong correlation between GC content of these blocks and local recombination rates in mammals [19–22] (but see also [23,24]). Consistent with such models, SNP analysis reveals the predicted fixation bias for AT→GC mutations in GC-rich domains, even after allowing for nonequilibrium GC content [25,26].

While the human conversion bias is strong, defining the expected impact of gBGC on the human genome is not trivial. For example, in any given generation, the net effect of bias is a function of the length of the relevant conversion tracts, the commonality of AT:GC mismatches within the tracts and the rate of initiation of such tracts. Williams and colleagues [18] estimate a mean rate in human non-crossover events (where there is the strong GC:AT bias) of 5.9 × 10−6 per bp per generation. More generally, Glemin and colleagues [27] estimate that the net effect on substitutions is on average in the nearly neutral area. However, as recombination occurs primarily within recombination hotspots approximately 2% of the human genome is subject to strong gBGC in any generation [27]. Over the longer term, as the location of recombination hotspots evolves rapidly, they predict that a large fraction of the genome is affected by short episodes of strong gBGC [27]. Galtier [28] estimates that approximately 60% of all synonymous AT→GC substitutions are influenced by gBGC.

Strong gene conversion is, however, not phylogenetically universal. In the best-resolved instance, yeast, where meiotic tetrads can be directly studied, the bias is extremely weak at best. The highest estimates suggests that the GC allele is the donor allele in 50.62% of cases [11,29]. Further analysis report a lesser bias [30], with a further large study reporting weak bias in the opposite direction [31]. Meta-analysis of over 100,000 GC:AT mismatch resolutions in Saccharomyces cerevisiae determined a net segregation of 50.03%, only just in favour of the GC alleles and not significantly different from 50:50 segregation [31]. To date, strong conversion has been observed in only a few taxa [31], mammals [11], and birds [32,33], being the 2 well-described exceptions, though weaker and nonregionalised gBGC is suspected in many taxa [21].

In terms of the population genetical influence, the action of gBGC is directly comparable to meiotic drive (alias segregation distortion) [34]. In this sense, gBGC may be said to “drive” alleles. In turn, such drive can mimic positive selection [35]. Importantly, it has previously been noted that gBGC can (and in birds and mammals regularly does) create false signals of positive selection by promoting the spread from rare to common of AT→GC mutations [12,36–40]. However, as is implicit in all such models [41], gBGC could also mimic the action of purifying selection. A GC allele at fixation mutating to a selectively advantageous AT allele would be forced by gBGC to eliminate the AT allele, causing conservation of the deleterious GC allele.

Mimicry of positive selection owing to gBGC in mammals is thought to be common and, to date, analyses have focused on the substitutional process, rather than the conservation process [12,36–40]. We are aware of no clear example of gBGC causing false signals of purifying selection. A core difficulty is finding a circumstance where gBGC makes predictions different from those of mutation bias and selectionist models. Differentiating between the effects of gBGC and mutation bias tends to be relatively straightforward, as mutation is near-universally GC→AT biased [42–46], while gBGC is biased in the opposite direction. More problematic is the possibility that the GC state is also the selectively optimal state. If so, then both gBGC and selection make the same predictions of conservation of GC and covariation with the recombination rate. Given Bengtsson’s argument that gBGC may be biased in this direction to counter a deleterious GC→AT biased mutational process [47], it may well be unusual to have the selectively optimal state being promoted by mutation bias but not by gBGC. Indeed, in Drosophila, for example, “optimal” codons tend to end in G or C [48]. Codon optimality may also not be adequate to define the direction of selection; however, as such selection may also be contingent on the overall GC-richness of the sequence (owing to RNA structure effects [41]). Thus, the core difficulty in establishing gBGC as a cause of false signals of purifying selection and of conservation of deleterious alleles is to identify a case where we can have confidence (and independently verify) that the AT state is selectively optimal compared to its GC-richer allele.

Here, we suggest that mammalian stop codon usage may provide an exceptional test case. Across all domains of life, the 3 stop codons, TAA, TGA, and TAG, are not used equally [49], with TAA being commonly, if not universally, selectively favoured [49]. This is probably owing, in large part, to selective avoidance of translational readthrough (TR). During TR, the stop codon is missed by its cognate release factor [50] due to the misbinding of a near-cognate tRNA [51,52], leading to the erroneous translation of the 3′ UTR and the generation of potentially deleterious protein products [53]. Each stop codon has a distinct intrinsic error rate such that TGA>TAG>TAA in bacteria [54–59] and eukaryotes [55,60] (including humans [61]). TR rate reduction in any given gene might thus be achieved by selection for TAA.

Evocation of such selection presumes that TR is usually deleterious [62,63]. This is likely as the formation of C-terminal extensions cause energetic wastage [64] as well as problems with protein stability [65–67], aggregation [68,69], and localisation [70,71]. Alternatively, in the absence of another 3′ in-frame stop codon, both the readthrough transcript and nascent protein are likely to be degraded when the translational machinery reaches the polyA+ tail [72,73]. In addition to reducing TR costs, TAA also has several other benefits: There may be selection for fast release of the ribosome to prevent ribosomal traffic jams [74], and it is robust to 2 mistranscription events (TAA→TGA, TAA→TAG) while the 2 other stop codons are resilient to just one (TGA→TAA, TAG→TAA).

It is then noteworthy that stop codon usage in mammals is different from that seen elsewhere [49,75]: TGA is more often conserved than TAA [76] and, unusually, the substitution rate of TAA→TGA is higher than the reverse [49]. Despite the fact that in humans, TAA is disproportionately employed in highly expressed genes (HEGs) [77]; this signal of conservation has been interpreted as evidence that purifying selection is operating to preserve TGA in mammals [76]. Gene conversion would, however, oppose fixation of TGA→TAA mutations (while also favouring TAA→TGA) and hence mimic purifying selection on TGA, even if selection were operating in the opposite direction. Biased gene conversion, thought to be especially influential in humans [15], could thus resolve the exceptionalism of TGA conservation in mammals.

Here, we evaluate this suggestion. Duret and Galtier [11] provide a series of tests for differentiating gBGC from selection, noting that the trend to the higher GC state should be correlated with recombination and common to all sites regardless of functional status. We consider several analyses that examined these predictions finding all to be robustly supported. However, to be confident that TAA underusage at the focal stop codon is indeed maladaptive, we also need evidence that TAA is the optimal stop codon. We consider several tests, all of which support this. Finally, we show that complex mutational biases cannot fully explain the TAA/TGA usage trends and confirm a general pattern for GC-rich trinucleotides to reside at frequencies far above their mutational equilibria in GC-rich (high recombining) domains. The latter results are consistent with broadscale patterns of conservation of GC-rich residues owing to gBGC. The same analysis resolves the trinucleotide usage in domains not likely to be subject to gBGC is as expected from a model of complex mutation bias. Indeed, these models predict higher TGA usage than TAG usage in these domains. However, different trinucleotides of same nucleotide content (such as TGA and TAG) have repeatable differences in the extent to which they are subject to fixation bias in GC-rich isochores. The cause of these previously unknown complex fixation biases is unresolved.

[END]

[1] Url: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001588

(C) Plos One. "Accelerating the publication of peer-reviewed science."
Licensed under Creative Commons Attribution (CC BY 4.0)
URL: https://creativecommons.org/licenses/by/4.0/


via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/