(C) PLOS One

(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .

Spontaneous single-nucleotide substitutions and microsatellite mutations have distinct distributions of fitness effects [1]

['Yevgeniy Plavskin', 'Center For Genomics', 'Systems Biology', 'New York University', 'New York', 'United States Of America', 'Department Of Biology', 'Maria Stella De Biase', 'Berlin Institute For Medical Systems Biology', 'Max Delbrück Center For Molecular Medicine In The Helmholtz Association']

Date: 2024-07

The fitness effects of new mutations determine key properties of evolutionary processes. Beneficial mutations drive evolution, yet selection is also shaped by the frequency of small-effect deleterious mutations, whose combined effect can burden otherwise adaptive lineages and alter evolutionary trajectories and outcomes in clonally evolving organisms such as viruses, microbes, and tumors. The small effect sizes of these important mutations have made accurate measurements of their rates difficult. In microbes, assessing the effect of mutations on growth can be especially instructive, as this complex phenotype is closely linked to fitness in clonally evolving organisms. Here, we perform high-throughput time-lapse microscopy on cells from mutation-accumulation strains to precisely infer the distribution of mutational effects on growth rate in the budding yeast, Saccharomyces cerevisiae. We show that mutational effects on growth rate are overwhelmingly negative, highly skewed towards very small effect sizes, and frequent enough to suggest that deleterious hitchhikers may impose a significant burden on evolving lineages. By using lines that accumulated mutations in either wild-type or slippage repair-defective backgrounds, we further disentangle the effects of 2 common types of mutations, single-nucleotide substitutions and simple sequence repeat indels, and show that they have distinct effects on yeast growth rate. Although the average effect of a simple sequence repeat mutation is very small (approximately 0.3%), many do alter growth rate, implying that this class of frequent mutations has an important evolutionary impact.

Funding: This work was supported by National Institutes of Health grants R35GM118170 and R35GM148344 (to MLS), and National Institutes of Health Grant R01GM097415 (to MLS and DWH). RFS is a Professor at the Cancer Research Center Cologne Essen (CCCE) funded by the Ministry of Culture and Science of the State of North Rhine-Westphalia. This work was partially funded by the German Ministry for Education and Research as BIFOLD - Berlin Institute for the Foundations of Learning and Data (ref. 01IS18025A and ref 01IS18037A to RFS). LJ was supported by a NYU Dean’s Undergraduate Research Fund Grant. YOZ was supported by the A*STAR National Science Scholarship PhD. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Here, we combine sequence information, precise growth rate measurements, and modeling to interrogate the properties of spontaneous mutations in yeast. We are particularly interested in 3 key questions: how frequent are small-effect deleterious mutations, what proportion of the genome affects growth when mutated, and how do the effects of different classes of mutations affect growth? To answer these questions, we estimate the effects of spontaneous mutations on growth rate, a complex phenotype closely related to microbial fitness, in 2 sets of MA lines with different mutation spectra. We first show that our microscopy-based growth rate assay allows us to accurately and precisely estimate the net effect on growth of the mutations in each line, notwithstanding stochastic variation in the proportion of slow-growing petite cells across the experimental samples. We next use these individual-level growth data along with substitution rate data from MA lines to fit a distribution of mutational effects. Our results demonstrate that the distribution of spontaneous SNMs is highly skewed towards mutations with extremely small effects on growth rate, and that the vast majority of these mutations decrease growth rate in rich media. Finally, we use an additional, slippage repair-deficient set of MA lines to show that spontaneous indels in SSRs significantly affect growth rate. By applying high-throughput phenotyping and integrating genotype and phenotype data into a single framework for fitting mutational effects, we show that the effects of spontaneous mutations accumulated in MA experiments can be parsed into multiple classes and that SSR mutations make important contributions to trait variation, on the order of a quarter of the combined effect of SNMs. Our results underscore the role that deleterious load from a range of mutational types is likely to play in clonal evolution.

Precise and accurate phenotypic measurements are especially important if mutations of small effect dominate. In microbes, batch culture can be used to generate growth rate measurements averaged across tens of millions of cells within a population. However, such measurements can still have appreciable errors, likely caused by the interactions of small biological and technical variations with the exponential growth process: for example, in one study, yeast growth rate measured by optical density in batch culture varied across replicates with a standard deviation of 3% of the mean [ 31 ], limiting the ability to detect small mutational effects to strains with the most extreme effects or largest numbers of mutations. Moreover, in laboratory strains of budding yeast, frequently occurring respiration-deficient, slow-growing “petite” cells can stochastically bias average growth rates downwards to extents that are independent of the genetic properties of each individual strain [ 32 , 33 ]. We have developed an alternative to batch culture measurements that uses time-lapse microscopy to perform growth rate measurements simultaneously in tens of thousands of microbial microcolonies [ 34 , 35 ]. Because of the highly replicated nature of the assay, it yields very precise estimates of strains’ mean growth rates [ 35 , 36 ].

Interpretation of mean mutational effect sizes is further complicated by the fact that although precise estimates of single-nucleotide mutation (SNM) rate are now available from MA studies across a wide range of organisms and conditions, other frequent mutation types, especially mutations in simple sequence repeats (SSRs), are more difficult to identify using conventional analyses of next-generation sequencing data [ 22 , 23 ]. However, because of their repetitive nature, these regions are particularly prone to acquiring mutations by forming loops during replication (polymerase slippage events), which lead to contraction or expansion of the repeat locus. Recent advances in genome-wide SSR genotyping (e.g., [ 24 ]) have allowed high-throughput studies of the effects of SSR variants, which demonstrated that variation in these difficult-to-genotype mutation types contributes significantly to phenotypic variation in nature: thousands of short SSR loci contribute substantially to the variance attributed to common polymorphisms affecting gene expression across human tissues and cell lines [ 6 , 25 , 26 ], rare variants and de novo mutations in SSRs are associated with autism spectrum disorder [ 27 , 28 ], and expression of genes whose promoters contain these repeats diverges more than SSR-free promoters among closely related yeast species [ 29 ]. Evidence that mutations in short repeats may contribute significantly to the spectrum of mutational effects is also emerging in MA studies. For example, it has been suggested that the higher estimate of fitness-altering mutation rate in Dictyostelium discoideum when compared to other single-celled organisms may be explained by the large number of SSRs in its genome, and the resulting high frequency of expansion/contraction events occurring at these highly mutable loci [ 9 ]. More direct evidence comes from estimating the frequency of SSR mutations in MA experiments in Daphnia pulex [ 30 ]. Selection against SSR mutations was demonstrated by comparing their prevalence in an MA experiment to a control in which selection was active [ 30 ]. However, with the exception of that study, little is known about the relative contribution of SNMs as compared to SSR indels and other mutation types to the full spectrum of mutational effects.

Several recent studies have sequenced MA lines to make independent measurements of mutation rate. The expected numbers of mutations per line for these sequencing-based mutation rates tend to far exceed the expected numbers of non-neutral mutations estimated from phenotypic measurements in MA lines. This observation has led to the conclusion that in most cases, the majority of substitutions are neutral or nearly neutral with respect to the observed phenotype (reviewed in [ 10 ]). However, caution must be taken in transferring mutation rate estimates between different MA experiments. There is ample evidence that mutation rate is highly experiment-dependent even within a species, with substitution rates differing with strain ploidy, genetic background, and even the environmental conditions in which the mutation accumulation occurred [ 13 – 16 ]. Recent work using either direct measurement of accumulated mutation number in phenotyped MA lines in Chlamydomonas reinhardtii [ 17 ], Drosophila melanogaster [ 18 ], mice [ 19 ], and Escherichia coli [ 20 ] or measuring mutation number and phenotype in parallel MA experiments in a mismatch repair-deficient strain of E. coli [ 21 ] has provided more precise estimates of the distribution of mutational effect size in these species. For example, Robert and colleagues [ 21 ] and Böndel and colleagues [ 17 ] both show strong evidence for highly leptokurtic (L-shaped, with most mutations having very small effect sizes) distributions of fitness effects in E. coli and C. reinhardtii, respectively, and Sane and colleagues [ 20 ] identify significant differences in the rate of beneficial mutations between transitions and transversions in E. coli.

Mutation-accumulation (MA) lines in model organisms have allowed unbiased exploration of the properties of new mutations. Repeatedly passaging organisms through extreme bottlenecks for many generations allows mutations to accumulate while largely shielded from selection. The phenotypes of these MA lines can then be assayed, revealing the spectrum of mutational effects of new mutations. Studies have used mutation accumulation to probe mutational effects in diverse organisms, but the resulting estimates of typical effect sizes vary widely, even among studies assaying closely related phenotypes in the same species (reviewed in [ 9 , 10 ]). Two culprits likely explain the discrepancies. First, MA studies have historically lacked genotypic information. That is, it was not known how many mutations were present in each strain, let alone how many trait-altering mutations there were. Many studies addressed this issue by assuming a parametric distribution representing a single mutational effect; each MA strain was then modeled as containing a Poisson random number of mutations with an unknown mean. The parameters of the distribution of mutational effects were then jointly fitted with a parameter representing the mean number of mutations present across the MA strains of interest. However, these estimates of mutation rate are difficult to interpret because in most cases, the confidence intervals of such estimates have no upper bound (see for example [ 9 , 11 , 12 ]). This problem is caused in part by the second culprit: noisy phenotype measurements. The identification of small mutational effects depends on the amount of measurement noise. In addition, because estimates of mutational parameters are confounded with each other [ 11 ], the lack of a precise mutation rate estimate translates into uncertainty in the estimates of the other mutational parameters, which describe the shape of the effect distribution.

Mutations constitute the raw material upon which selection acts. Understanding the properties of new mutations is therefore of central importance to evolutionary biology [ 1 ]. For example, the frequency and effect sizes of mutations that increase fitness are key determinants of the rate of evolutionary adaptation [ 2 ]. The frequencies of mutations that decrease fitness also impact adaptation, as well as patterns of genetic diversity [ 3 ]. In addition, mutational properties are informative of the structure of genetic networks: If a large proportion of mutations affecting a phenotype is non-neutral, then the phenotype can be affected by changes to the function of a large number of genes across the genome, suggesting a high degree of interconnectedness among the gene-regulatory networks operating in the cell. For example, evidence of large numbers of variants affecting complex traits in humans has recently been proposed to support a model of widespread interconnectedness among gene-regulatory networks [ 4 ]. The relative contributions of different mutational types (e.g., single-nucleotide substitutions, copy-number variants, repetitive sequence expansions/contractions) to phenotypic differences among organisms is another poorly understood property of mutations. Shedding light on this property is critical not only for understanding a phenotype’s propensity to change, but also for selecting appropriate technologies to assay the phenotype’s genetic basis [ 5 , 6 ]. Finally, the properties of new mutations are also of interest because of their relevance to human health: de novo mutations are thought to constitute a major set of causative variants for many genetic disorders [ 7 ], and the rate of small-effect deleterious mutations has been shown to play a significant role in tumor evolution [ 8 ].

Results

Statistical modeling accounts for across-strain variability in the proportion of respiration-deficient (petite) colonies Our study seeks to infer the effects of spontaneous mutations on yeast growth rate. However, estimating the growth rates of interest is nontrivial. Laboratory strains of Saccharomyces cerevisiae are prone to the spontaneous formation of petites, mutants with impaired mitochondrial function that grow at a slower rate than their non-petite counterparts [32,33]. Variation in petite numbers across samples can arise from chance events that cause different numbers of petites in the original founder populations for each sample. Such variation would impact the mean growth rate estimated in each strain, resulting in estimates that reflect stochastic inter-strain differences in petite proportions, obscuring the genetic effect of mutations on the rate of growth of non-petite cells. To determine whether differences in petite proportions across MA line estimates could be impacting growth rate estimates, we first tested whether experimental aliquots of genetically similar strains truly differ in the proportion of petite cells. On petri dishes, petite colonies in ade2 mutant strains can be identified by color, as they lack the red color typical of mutants in the end stages of the adenine biosynthesis pathway [37]. We therefore assayed the proportion of petites in a set of 18 MA lines described in [38]. Because these strains differed from each other by only approximately 2 mutations on average, large variation in the proportion of petites across these strains was not likely to be explained by genetic differences among the strains. The line with the highest proportion of non-red colonies had a large proportion of non-petite (large, rapidly growing) colonies that were not red, indicating a decoupling between colony color and respiratory ability; this line was excluded from further analysis. We found significant variation in the proportion of colonies that were red across the remaining lines (p << 0.001 by likelihood ratio test, see Methods) (S1A Fig). We next sought to determine whether we could accurately estimate the petite proportion in each strain directly from microcolony growth rate data. Unlike batch culture-based measures of growth rate, which estimate population-average growth rates, the output of the microcolony growth rate measurements we performed is a distribution of individual microcolony growth rates for each sample (Fig 1A) [35,39]. We therefore can make estimates of the proportion of petites directly from microcolony growth rate data, while simultaneously estimating the mean growth rate of the non-petite microcolonies. We model the distribution of colony-wise growth rates as a mixture of 2 Gaussian growth rate distributions, with the parameters of the distribution of petite growth rates estimated from independent petite strains derived from the MA ancestor (see Materials and methods). We found that microcolony assay-derived petite proportions are highly correlated with colony color-based proportion estimates (Pearson correlation coefficient = 0.83), indicating that microcolony growth rate data can be used directly to partition growth rates of petite and respiring colonies. Microcolony assay-based petite proportion estimates are approximately 4% lower than the colony color-based estimates (S1B Fig). This discrepancy may be a consequence of underestimation of the proportion of petites using growth rate data; alternatively, the discrepancy may arise as a result of a small number of non-petite white-colored colonies (which we have seen in these strains [40]). However, because measurements of mutational effect on growth rate are relative to the ancestral strain, a consistent offset in the estimated proportion of petites would result in consistent bias in mean growth rate estimates of all strains, including the ancestor, resulting in accurate estimates of relative growth rates. PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 1. Measuring the cumulative effects of spontaneously accumulated mutations on growth rate across MA strains. (A) The microcolony growth rate assay. Microcolony growth rates are measured in parallel using automated microscopy and image analysis. Fluorescent imaging at the end of the growth period is used to differentiate between the MA strain and the ancestor-derived strain grown in each well as a reference. The brightfield images of 3 time points show automated colony detection for 2 representative colonies: from a GFP-marked ancestral reference (red) and an MA line (blue), growing side-by-side with different starting sizes and lag times, but at similar rates. A fluorescence image taken after the final time point shows the GFP expression in the reference colony. The points on the plot represent log(area) over a 10-h time period measured for these colonies, with the best fit used to determine growth rates for each colony (using a 7-time point window) shown as a line. (B) Mutation effects (s) in MA lines relative to ancestral reference strain. Points in the plot on the bottom are colored yellow if their s value differs significantly from the ancestor at an FDR of 0.05. Blue points represent mutational effects calculated for 2 control strains derived from independent haploid spores of the ancestral diploid (these are not included in histogram or boxplot calculation). Boxplot shows the 25th and 75th percentiles, and median, of the s value across all MA strains. The data and code needed to generate this figure can be found on OSF: https://doi.org/10.17605/OSF.IO/H4J9F. FDR, false discovery rate; MA, mutation-accumulation. https://doi.org/10.1371/journal.pbio.3002698.g001

Accumulated mutations have primarily negative effects on haploid growth rate To assess the effects of spontaneous mutations, we first examined the distribution of growth rates of a set of haploid MA lines, each likely harboring a unique set of mutations. These lines are derived from diploid parent strains that accumulated mutations over the course of 2,000 generations [31,40,41], accumulating an average of approximately 8 SNMs each [42] (or approximately 4 SNMs per haploid strain assayed here). Hall and colleagues reported that 2.5% of these diploid yeast MA strains, and 14% of viable haploid progeny of these strains, had a significant increase in growth rate as compared to the ancestral line [43]. Sequencing revealed that about a fifth of these diploid MA lines harbored aneuploidies [42]; we excluded the progeny of these aneuploid strains from our study to avoid confounding the effects of smaller mutations, although we did not eliminate strains that underwent aneuploidization events during meiosis. We assayed the growth rates of 70 haploid viable MA line progeny (each derived from a single unique diploid parent MA strain) using the microcolony growth assay [34,35,39]. Cells of each MA strain were grown and imaged in independent wells of 96-well plates; each well also contained cells of a reference strain (a GFP-marked haploid line derived from the MA ancestor) to control for well effects on growth (Fig 1A). The use of haploid MA lines allows us to assay the effects of any mutations in these lines in the absence of dominance effects. We are interested in the distribution of the changes in growth rate among MA-derived haploid strains relative to the growth rate of the ancestral strain. We measure these differences as the selection coefficient, s, which is positive when mutations are beneficial (increase growth rate) and negative when they are deleterious: (1) where s MA is the selection coefficient representing the combined effect of the mutations in a given MA line on growth rate, and g MA and g anc are the growth rates of the MA and ancestral strains, respectively. We estimate the proportion of petites in each strain by fitting a mixture of Gaussians as described above (and in Materials and methods), and estimating g MA and g anc as the respective means of the non-petite colony growth rates for the MA and ancestral strains. Before examining the distribution of MA-line selection coefficients, we first tested the effect of partitioning petite and non-petite growth rates. As expected, modeling colony growth rates in each strain as a mixture of 2 Gaussians that allows for a subpopulation of petites produces a significantly better fit to the data (p << 0.001 by likelihood ratio test; see Methods) than simply fitting a single Gaussian distribution to the colony growth rates of each MA line (S1B Fig). In addition, the strains subjected to growth rate assays included 2 non-GFP marked control strains, derived from independent haploid spores of the ancestral diploid; these served as independent controls in the experiment, as their growth rates should be the same as that of the ancestral reference strain. As expected, the confidence interval for the selection coefficient s estimated for each of these strains overlaps 0, indicating that they do not significantly differ in growth rate from the ancestral strain. However, these strains do differ from the reference strain in the proportion of petite colonies as estimated by our modeling. As a result, if mutational effects are estimated without accounting for petites, these ancestral control strains are incorrectly estimated to have a significant mutational effect relative to the GFP-marked ancestral control (S1C Fig). Together, these results support the importance of using modeling to separate the effects of stochastically variable petite proportions across strains from the genetic effects of spontaneous mutations on growth rate. The distribution of MA-line mutational effects (Fig 1B and S1 Table) reveals that the majority of strains contain at least 1 mutation that alters growth rate, and that mutations tend to be deleterious. At a false discovery rate (FDR) of 0.05, approximately 4% of strains have a significant increase in non-petite growth rate (positive s value), and 56% have a significant decrease in non-petite growth rate (negative s value) relative to the ancestral strain. The growth-rate differences tend to be small, and 37% of strains have an s value between –0.01 and –0.05 (1% to 5% decrease relative to the ancestral growth rate), and only 10% of strains have an s value below –0.05 (growth rate decrease below 5%); an additional 9% of strains have significant decreases but with an s value above –0.01. Only a single strain has an s value of >0.01.

SNMs do not fully explain observed mutational effects One likely source of variation in s values across strains is differences in the effect of these mutations on protein-coding genes. To test whether strains’ s values were explained by the predicted severity of the substitutions found in these strains, we sequenced the haploid MA strains and identified SNMs and short indels in non-repetitive regions relative to the ancestral strain, as described in [38]. We identified a total of 307 SNMs and 3 indels across 68 strains. In some cases, multiple nearby SNMs comprised complex mutations in a single locus; by grouping together mutations within 50 bp of each other, we identified 271 mutated loci (S2 Table). We then predicted the putative effect of each mutation using snpEff [44]. snpEff categorizes each mutation into one of 4 groups: “high” effect mutations, such as nonsense mutations and frameshifts; “moderate” effect mutations, such as in-frame indels and nonsynonymous substitutions; “low” effect mutations, such as synonymous substitutions; and “modifier” mutations, such as mutations outside the coding region of genes. We identified mutations in 262 unique genes (with a small number of genes mutated in more than 1 strain). Among the mutant genes in each strain, 8 had at least 1 mutation with an effect categorized as “high”-impact, 148 had no more than “moderate” effect mutations, and 47 and 65 had mutations predicted to be no more severe than “low” and “modifier” effects, respectively (S2 Table). We also identified aneuploidies in 2 strains (S1 Table); both these strains also had additional mutations. Six strains also lacked any non-repeat mutations. Because many of our strains contain mutations in multiple loci, we first grouped mutations in each strain and identified the most high-impact mutation that each strain contained. We then compared the magnitude (absolute value) of s values across strains in which the most severe mutations had high, moderate, low, or modifier effects, as well as s value magnitudes in strains with no mutations (S2 Fig). Note that this analysis does not take into account the total number of mutations found in each strain. Although median s magnitudes were higher for strains that included at least 1 moderate- or high-impact mutation, there was no significant effect of the most severe impact type on mean s value (Kruskal–Wallis test p-value = 0.35). Critically, four of the 6 strains that did not have any identified mutations had significant growth defects, including 1 strain with an s value of −0.047. This finding indicates that the mutations identified outside of repeat regions in these strains do not fully explain the variation in MA strain s values, and strongly suggests that additional, unidentified mutations are affecting yeast growth rate.

Modeling reveals distinct distributions of the effects of SNMs and unidentified mutations We next sought to determine the properties of the distribution of individual mutational effects (DMEs) whose combined effects were observed in Fig 1B. To model the DME, we expanded on the approach proposed by Keightley [11]; in short, individual mutational effects are modeled as drawn from a reflected gamma distribution, with sides weighted to represent the different proportions of mutations with positive versus negative effects on the phenotype of interest. The gamma distribution is advantageous because it captures a range of distribution shapes, from highly peaked to exponential, with only 2 parameters: here, we use the mean (m) and shape (k) of the distribution. To account for the fact that mutations may be biased in the direction of their effects, the 2 sides of the reflected gamma distribution are weighted based on q, a parameter representing the proportion of mutations causing a positive effect on the observed phenotype (see http://shiny.bio.nyu.edu/ms4131/MAmodel/ to interactively explore how changes to parameters affect the distribution of mutational effects in MA lines). We treat individual mutations as additive: the net mutational effect in each strain (s MA from Eq 1) is the sum of the mutational effects of individual mutations found in that strain. Unlike earlier work, where the number of mutations per strain was not known, here we leverage sequence information to constrain the model. The mean number of non-neutral mutations per strain, U, is modeled as half the average number of mutations in the MA strains’ diploid parents [42], corrected with a fitted parameter (p 0 ) estimating the total proportion of mutations that are neutral with respect to growth rate (Eq 13) (note that U has also been used to denote the deleterious rate specifically [45], which here would be (1-q)U). The distribution of observed mutational effects in the MA strains, s MA , is therefore modeled as a multifold convolution of the distribution of individual mutational effects. Although we expected that constraining the model by the known number of mutations per diploid-parent strain would improve fitting, the existence of growth defects in strains lacking identified mutations suggests that there may be a substantial number of mutations missed in the initial sequence analysis of the MA lines whose haploid derivatives are phenotyped here. In particular, the analysis in [42] and the analysis described in the previous section disregarded any repetitive regions, including SSRs, which have a higher mutation rate than the surrounding genome [46]. As a result, the true number of mutations in the MA lines may be the sum of the number of known mutations (almost all SNMs), and of an additional set of “unidentified” mutations, which would include mutations in SSR regions. Therefore, in addition to the “SNMs only” model described above, we considered 3 approaches to modeling the distribution of the effects of “unidentified” mutations: the “single DME” model, in which both substitution effects and unidentified mutation effects are modeled as being drawn from a single distribution of mutational effects; the “two-gamma” model, in which the effects of unidentified mutations are modeled as being drawn from a separate reflected gamma distribution; and a “Gaussian” model, in which substitution effects are modeled as a reflected gamma distribution and the combined effects of unidentified mutations in each strain are modeled as a Gaussian distribution. Below, we lay out the properties and justifications for each of these models in more detail, and then present the results of fitting these models to our data. If there is no fundamental difference in the distribution of effects of “unidentified” mutations and the distribution of SNM effects, it should be possible to model their effects by releasing the constraint on the average number of mutations per strain (essentially the model proposed by Keightley [11], with no constraint on the value of U); the difference between the estimate of U in this model and the estimate of non-neutral mutations estimated by our SNM-only model would provide an estimate for the typical number of unidentified mutations per strain. We fit this model to our data in the “single DME” model. The other 2 approaches for modeling unidentified mutational effects are rooted in the possibility that SNMs and unidentified mutations have distinct distributions of phenotypic effects and that our phenotyping data are precise enough to be able to distinguish these 2 distributions. In this case, the effects of SNMs are described as above in the “SNM-only” model, but the DME for unidentified mutations is modeled separately in one of 2 ways. First, it is possible to model the effects of these mutations as a reflected gamma distribution with an unknown number of mutations (the “two-gamma” model). This is the same model described above for SNMs, with the proportion of positive versus negative mutational effects, the shape and mean of the gamma distribution, and (unlike for SNMs) the average number of unidentified mutations with an effect on growth rate all fitted by the model. However, we hypothesized that the parameter estimates from this model would not be very informative due to the confounding between mutation number and mutation effect size/distribution shape when the total number of mutations is unknown, especially because the effects being modeled by this distribution represent an unknown portion of the total observed effects and the rate of non-neutral mutations must be high enough to be consistent with most strains’ differing in growth rate from the ancestral strain. Considering the lack of information about the number of unidentified mutations in each MA strain, we can instead seek to understand the typical combined per-strain contribution of these mutations. To do so, we modeled the combined effects of unidentified mutations in each line as being drawn from a Gaussian distribution with mean μ unid and standard deviation σ unid (“Gaussian” model). In this model, the σ unid term fits variance not explained by experimental noise or by the distribution of mutational effects fit to SNMs. Although this model is not informative regarding the parameters of the distribution of single unidentified mutations, it provides useful information regarding the distribution of the cumulative effects of the unidentified mutations on the growth of each MA strain. In all 3 cases, the observed growth rate of each MA line is the result of the sum of the effects of its SNMs (whose average number per line is known), its unidentified mutations (whose number is unknown), and experimental noise. We initially fit all 3 models, as well as the “SNM-only” model that includes only the effects of sequenced substitution mutations, to the mutational effects estimated for each strain (see Materials and methods for maximum-likelihood estimation procedure, S1 Table for the data that was used as input into the models, and S3 Table for model results). The Akaike information criterion (AIC) score was lowest for the “Gaussian” model, suggesting that this model fits the data best, and that the fit of the “two-gamma” model was not sufficiently improved to warrant the addition of the extra parameters. We also found that, as expected, it was impossible to interpret the mutational parameter estimates in the “two-gamma” model, which has large confidence intervals; this is likely the result of confounding effects among all parameter values when attempting to fit a distribution of individual mutational effects with an unknown total mutation number. Importantly, the significantly improved fit of the “Gaussian” model over both the “SNM-only” and “single-DME” models indicates that our phenotypic data were precise enough to identify distinct distributions of the effects of SNMs and unidentified mutations. The better fit of the “Gaussian” model relative to the “single DME” model in particular implies that the effect distributions of the 2 mutation classes are distinct. Although our approach of fitting the distribution of mutational effects based on summary statistics of the mutational effects observed in each strain is computationally efficient, it treats uncertainty in the mutational effect estimates for the different strains as uncorrelated; however, in practice, these estimates depend on a number of shared parameters, such as the estimate for the means and standard deviations in growth rate of the reference strain and of petite yeast microcolonies. These dependencies mean the parameter space is likely more constrained than it appears when fitting the model to uncorrelated mutational effect estimates: for example, an overestimate of the mean growth rate of the reference strain would lead to a consistent overestimate of the magnitude of s across all slow-growing MA strains. The correlated uncertainty in strain estimates should propagate to the estimates of DME parameters (in this example, likely leading to an overestimate of the mean effect size of a single mutation). Failing to account for the correlated structure of strain estimates can lead to incorrect estimates of uncertainty on DME parameters and of the relative goodness of fits of different models. We therefore repeated the fit to the distribution of mutational effects model using the microcolony growth rate data directly. We limited this analysis to the “SNM-only” model and the “Gaussian” model, which had the best fit to the summary statistic-based data. Parameter estimates and confidence intervals were very similar to those estimated in the summary statistic-based fit, with slightly less uncertainty in the parameter estimates of the “SNM-only” model when using the microcolony data directly (Tables 1 and S3). Consistent with our previous finding, the “Gaussian” model, which modeled SNMs and unidentified mutations as having 2 independent DMEs, provided the best fit to the data (ΔAIC = −10.8, LRT-based p = 0.00002 as compared to the “SNM-only” model) (Table 1 and Figs 2A and S3). PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 2. The distributions of mutational effects estimated by a model with independent distributions for sequenced and unidentified mutations. (A) The cumulative density function of the fit of the “Gaussian” model (which models SNM effects as a reflected gamma distribution and the sum of all unidentified mutations in a strain as a Normal distribution) to all individual MA strain mutational effects s. Inset: histogram of mutational effects, with PDF of the model overlaid. To account for the effect of experimental noise on the estimates of s, the model density function is shown convolved with a Gaussian noise kernel with a variance that is the mean of the error variances of each strain’s mutational effect estimate. (B) The distributions corresponding to the maximum likelihood estimates of individual effects of SNMs (pink line) and combined effects of unidentified mutations per strain (orange line) plotted over the distribution of MA strain mutational effects. The data and code needed to generate this figure can be found on OSF: https://doi.org/10.17605/OSF.IO/H4J9F. MA, mutation-accumulation; PDF, probability density function; SNM, single-nucleotide mutation. https://doi.org/10.1371/journal.pbio.3002698.g002 PPT PowerPoint slide

PNG larger image

TIFF original image Download: Table 1. Properties of DMEs identified by alternative models on full data. A model that accounts for unidentified mutations by fitting a Gaussian distribution representing the effects of these mutations across strains performs better than a model that only accounts for SNMs. Parameter values for each model shown with 95% confidence intervals; ΔAIC is calculated relative to the “SNMs only” model. https://doi.org/10.1371/journal.pbio.3002698.t001 We find that the vast majority of non-neutral SNMs are deleterious. We further find that the inferred distribution of SNM effects is highly skewed towards mutations with an effect size approaching 0 (Fig 2B). As a result, there is large uncertainty regarding the proportion of SNMs that are completely neutral with respect to growth rate; however, at a selection coefficient cutoff of 10−6 (larger than the reciprocal of effective population size for wild yeast populations, which has been estimated to be on the order of 3.4 × 106 [47]) our best-fit model indicates that 3% of all substitutions have a significant positive effect on growth rate, and 39% have a significant negative effect on growth rate. Our model estimates that the mean effect of unidentified mutations across the MA lines is likely to be moderately deleterious, and that the typical combined effect of all the unidentified mutations in an MA line is comparable to the effect of a single SNM.

[END]
---
[1] Url: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3002698

Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/