(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .
The clock-like accumulation of germline and somatic mutations can arise from the interplay of DNA damage and repair [1]
['Natanael Spisak', 'Department Of Biological Sciences', 'Columbia University', 'New York', 'United States Of America', 'Marc De Manuel', 'William Milligan', 'Guy Sella', 'Program For Mathematical Genomics', 'Molly Przeworski']
Date: 2024-06
In reality, more than one of these phenomena is likely operating at once, and it is therefore important, in disentangling the origins of clock-like mutations, to distinguish those that accrue with age from those that contribute to the intercept, and may or may not be clock-like. Concretely, in data sets for which there is a significant positive intercept, the goal is to tease apart signatures that contribute at a constant level in all samples from signatures for which mutations increase in number with age. To this end, we extend the standard signature decomposition method to allow for a mixture of 2 components, a constant and an age-dependent one, and attribute mutational signatures to the two jointly.
Conversely, an increase in the rate of cell division or the rate of damage at x>x 0 (such as occurs in the lung epithelia of regular smokers [ 21 ]) will lead to an increased slope in the number of mutations with age. If the mutation rate was significantly lower at earlier stages (say, before the person smoked), i.e., if μ 0 <μ, then a regression of the total number of mutations on age will yield a negative intercept.
A nonzero intercept can also be generated by a clock-like process occurring at a varying pace over development. We discuss model expectations in this case with a toy piece-wise linear model, in which mutations accumulate at 2 different rates over 2 time periods ( S1 Fig ): initially, the number of mutations grows at rate μ 0 , and after some time x 0 , the underlying parameters of mutagenesis ( Eq 1 ) change and mutations accumulate at rate μ≠μ 0 . For data points collected after time x 0 , the expected number of mutations is then (3) where a = μ and b = (μ 0 −μ)x 0 . Hence, the intercept is positive if μ 0 >μ; for example, if entering a postmitotic state at x 0 .
In any given cell, a combination of clock-like and non-clock-like mutagenic processes contribute to the mutation rate. Thus, although in our model, we focus on the conditions under which a clock-like accumulation of mutations will arise, in practice, the total number of mutations of any signature observed in a cell will likely not be strictly proportional to time (Eqs 1 and 2 ). Instead, there may be a non-negligible contribution of mutations that do not increase with age (i.e., are “age-independent”). These mutations can have several different sources: for example, they may have accrued during the first few cell divisions of development [ 39 ], the last stages of cell differentiation [ 21 ], or in response to an acute exposure to exogenous mutagens [ 40 ]. If these mutations occur in a burst, i.e., over a short period of time, then they will contribute a set number of mutations regardless of age, and a regression of the number of mutations against age can lead to a positive intercept.
Some of the CpG transitions could arise from DNA replication errors, if methylated cytosines are a difficult template for DNA polymerases [ 37 , 38 ]. On average, replicatory errors contribute wlP transitions per cell division, where w denotes the probability of a misincorporation per base pair, and P the probability that such a misincorporation is left unrepaired by mismatch repair and leads to a mutation.
Under an alternative hypothesis that is not mutually exclusive ( Fig 2B ), CpG transitions result from the deamination of methylated cytosines in single-stranded DNA immediately prior to replication. Mismatches arising at this point cannot be repaired. The number of mutations that arise from such deamination events is proportional to the expected number of cell divisions ϕx and the time of transient single-strandedness Δt. The rate of deamination of methylated cytosines in single-stranded DNA is estimated to be orders of magnitude higher than in double-stranded DNA (u s = 3.5×10 −3 per year [ 33 ]), making this scenario a plausible explanation for the observed SBS1 mutation rates: to account for 1 mutation in 10 divisions, the time of transient single-strandedness would need to be of order Δt∼1 min. This order of magnitude can be compared with the typical velocity of replication fork in human cells, of order 1 kb per minute [ 36 ].
As in the more general model, at early times, t<r −1 , the number of mismatches grows linearly with the rate u d , eventually reaching a steady state with the expected number of mismatches equal to u d l/r, where l stands for the total number of methylated cytosines. Given the high estimated rate of spontaneous deamination and the relatively low observed rates of SBS1 mutations [ 13 ], repair must be efficient and have time to act before the cell division. The implication is that the repair rate is higher than the cell division rate, r≫ϕ, and that the cell divides after the number of mismatches has reached a steady state. At DNA replication, unrepaired mismatches lead to mutations in one of the 2 daughter cells. Therefore, the number of mutations m at age x depends on the cell division rate ϕ. On the other hand, the number of mutations due to defective repair follows absolute time, and such mutations accumulate regardless of cell divisions.
(A) Model of the consequences of spontaneous deamination of methylated cytosines during the entire cell cycle [ 23 ]. Double-stranded cytosines deaminate at a constant rate u d per basepair and the resulting mismatches are repaired at a rate r. With probability ϵ, the mismatch resolution is incorrect and leads to a mutation. We assume that the cell divides immediately after DNA replication and treat the 2 processes as simultaneous (occurring at a rate ϕ). Unresolved mismatches at cell division lead to a mutation in one of the 2 daughter cells. We provide the prediction of the model for the number of mutations m at a given age x. Inefficiently repaired mismatches accumulate with the cell divisions, which occur at rate ϕ, and repair errors accumulate with absolute time, independent of cell divisions. The number of methylated cytosines is denoted by l. (B) An alternative source of CpG transitions is the deamination of methylated cytosines in single-stranded DNA (at a rate of u s ) during DNA replication. The model assumes that a deamination immediately preceding the polymerization of the second strand is not repaired and leads to a mutation in one of the daughter cells. The expected number of mutations is proportional to the rate of cell divisions and the time of transient single-strandedness Δt.
The process of spontaneous cytosine deamination contributes substantially to the mutation rate [ 31 ]. The standard explanation is that because at methylated cytosines, the deamination rate results in a thymine, one of the canonical bases, the efficiency of repair is low [ 32 ]. To investigate the dynamics underlying the accumulation of CpG transitions, we first consider the consequences of methylated cytosine deamination during the cell cycle. Given that deamination leads directly to a mismatch, we can employ a simpler model, originally introduced in [ 23 ]. The dynamics is analogous to the general model and can be recovered in the q→0 limit (see Methods , Eq 12 ).
The 2 terms in this equation correspond to 2 kinds of clock-like behaviors. The first depends on the cell division rates and includes damage-induced mutations as well as DNA replication errors; assuming, as we do, that cell division rates are fixed, these mutations accrue with age at a constant rate. The second type of mutation is driven by errors of DNA repair in response to damage; assuming damage rates are constant, it too depends on absolute time. However, a distinguishing feature of the 2 types of clock-like mutations is how they behave as a function of cellular turnover rates; in particular, postmitotic cells should show no increase of the first type of mutations.
Lastly, mismatches also arise during DNA replication due to the misincorporation of nucleotides by replicative polymerases. We denote the probability of a misincorporation per base pair by w and the probability that such a misincorporation leads to a mutation by P. Importantly, replicating DNA carries transient features that distinguish the newly synthesized strand from its template [ 30 ] and help mismatch repair substantially decrease the number of replication errors that become mutations, i.e., P≪1. In sum, the number of mutations due to replication errors increases with age at a rate of wlPϕ.
Mismatches unresolved during the cell cycle cause a mutation in one of the 2 daughter cells. Given our assumption that the rate of mismatch resolution far exceeds the rate of cell division (i.e., q≫ϕ), these mutations track cell division and accumulate at a rate of ulϵϕ/2q. In contrast, mutations caused by repair errors are independent of cell divisions and accumulate with absolute time, at a rate of ulϵ(1−p).
The next part of the model describes the mutational processes that occur during replication ( Fig 1B ). For simplicity, we assume that the cell divides immediately after DNA replication and treat the 2 processes as simultaneous, occurring at a fixed rate ϕ. An unrepaired lesion stalls DNA replication and triggers the recruitment of polymerases that can replicate over the lesion, through translesion synthesis [ 28 ]. This synthesis leads to the incorporation of the incorrect nucleotide opposite the lesion on the template strand with probability R, which depends on the type of lesion and possibly on replication timing [ 29 ]. If, with probability 1-R, the correct nucleotide is incorporated then, given our assumption that repair is rapid relative to the rate of cell division (i.e., that r≫ϕ), the lesion is likely to be repaired during the next cell cycle. If the translesion polymerase incorporates an incorrect nucleotide, however, the repair process will propagate the error to the complementary strand, generating a mutation. Overall, erroneous translesion synthesis introduces mutations at a rate of ulRϕ/2r.
From independent lines of evidence, we know that the vast majority of DNA damage in healthy cells does not lead to mutations: the numbers of mutations in healthy cells are substantially lower than estimated damage rates would suggest [ 24 ], and mutation rates in individuals with DNA repair deficiencies are orders of magnitude higher [ 25 ]. These observations imply that in healthy cells, the repair rate is on the order of the total damage rate, i.e., that r∼ul. Therefore, the steady state between damage and repair likely is established long before the next cell division. For simplicity, we further assume the rate of mismatch resolution is of the order of the repair rate, q∼r. We note that the dynamics of cancer cells may be different from those of the healthy cells on which we focus here; in particular, the repair machinery may be debilitated or overwhelmed and lesions may persist over multiple cell divisions, a phenomenon termed “lesion segregation” [ 26 , 27 ].
Based on these assumptions, we derive the expected number of lesions, mismatches, and mutations and their variances as a function of the time since the last round of DNA replication t in a genome of length l. The full analysis of the model is presented in Methods; here, we describe the main findings. Over short time periods, t<r −1 , during which repair has had little chance to occur, the number of lesions grows at rate ul and the number of mismatches at rate ulϵ. Over longer times, t≫r −1 and ≫q −1 , the expected number of lesions and mismatches reach a steady state, at which they are approximately equal to ul/r and ulϵ/q, respectively. In turn, the number of mutations due to incorrect repair increases at rate ulϵ(1−p).
(A) Interplay of DNA damage and repair during the cell cycle. DNA damage leads to lesions at rate u per basepair. Lesions are repaired at rate r and lead to mismatches with probability ϵ, due to the misincorporation of nucleotides by the DNA polymerase used in repair. Mismatches are resolved at rate q, resulting in the incorrect basepair and a mutation with probability 1-p. (B) Consequences of DNA replication. Replicating DNA over a lesion requires translesion synthesis. This process is not always accurate: it causes an error and a mutation in one of 2 daughter cells with probability R (assuming that the lesion is repaired in the next cell cycle, i.e., that r≫ϕ). Unresolved mismatches cause a mutation in one of the 2 daughter cells. (C) The predicted number of mutations, m, in a genome length l at age x contributed by the different mechanisms. The genome length is denoted by l and the rate of cell division by ϕ.
We study the interplay between DNA replication, damage and repair, by extending the model introduced in [ 23 ]. We first consider the mutational processes that occur between cell divisions ( Fig 1A ). A given source of damage causes lesions at a rate u per base pair, which are detected and repaired at a rate r. Repair leads to mismatches with probability ϵ, due to the misincorporation of a nucleotide. We assume that once repair is complete, there are no mechanisms that differentiate the newly synthesized strand from the strand used as a template by the DNA repair polymerase. We denote the rate of mismatch resolution by q and the probability of correct resolution by p. The outcome of mismatch resolution can vary depending on the type of the mismatch and the local sequence context. We discuss the special case of T:G mismatches in CpG contexts in more detail below.
Clock-like signatures across cell types
In order to gain insight into the origins of mutations that accumulate with age, we analyze patterns of mutation accumulation across cell types with different characteristics. To this end, we consider data sets that provide single-cell resolution mutation data, collected using a variety of experimental approaches (see Methods), including mutations in neurons and muscle cells [3], liver hepatocytes [41], lung epithelium [21], small bowel epithelium [42], colonic epithelium, and testis seminiferous tubules [2], as well as germline mutations identified from blood samples of pedigrees [43,44]. We rely on mutation data from donors without a disease diagnosis and on lung samples from non-smokers (see Methods).
For each cell or tissue type, we attribute mutational signatures by relying on the COSMIC database of signatures inferred from a large collection of cancer samples [8], as also done previously to describe mutational landscapes in noncancerous soma (e.g., in ref. [41]) and germline mutations (e.g., in [12]). Most of these signatures have been linked to specific mechanisms or are associated with exposures to mutagens, and they therefore provide a useful basis for analyzing and comparing mutation accumulation across tissues and cell types. Nonetheless, because the signatures were originally inferred from tumor samples, they may not fully capture the mutational processes acting in normal cell types, particularly in the male and female germ cells. This limitation could lead to a poorer fit, as well as to incorrect assignments of mutations to signatures to which they do not, in fact, belong. Here, we focus on clock-like signatures and choose a method of signature attribution that limits erroneous assignments of mutations to SBS1 and SBS5 and thus avoids overestimating their contributions (see Methods for details).
To compare observations with our model predictions, we develop an approach to focus on the subset of mutations in a given cell type that accumulate with age. This is done in 2 steps. First, we fit a linear model for all mutations jointly, i.e., y = ax+b, where y denotes the number of mutations per genome and x denotes age (see Fig 3A for mutations in neurons). Second, we model the distribution over the 96 substitution types as a mixture distribution of 2 components: the constant component (Fig 3B, yellow) contributes on average the same number of mutations in samples of all ages, whereas the age-dependent component (Fig 3B, blue) contributes an increasing number of mutations with age. We decompose the slope and the intercept into COSMIC signatures jointly, such that the dependence of the number of mutations y S attributed to a given signature s on age takes the form (4) where P a (s) and P b (s) denote the loadings of signature s in age-dependent and constant components, respectively. We estimate the loadings by extending the standard methods of signature attribution (see Methods); see Fig 3C and 3D for the example of loadings estimated for mutations in neurons. We note that applying this method is only possible if the intercept b is large enough (i.e., if the data contains enough age-independent mutations to attribute mutations to signatures in the constant component).
PPT PowerPoint slide
PNG larger image
TIFF original image Download: Fig 3. Clock-like mutational signatures in different cell types. (A) Age-dependent signature attribution of mutations in neurons. Shown is the increase in the number of mutations with age (reported per haploid genome); each point corresponds to a single donor. (B) The relative contributions of different signatures vary with age. The decomposition of the mutation spectrum into age-dependent (blue, C) and constant signature distributions (yellow, D). Mutational signatures are indicated by their COSMIC label; asterisks indicate unattributed signatures (see Methods). (E–L) The number of mutations assigned to clock-like signatures, SBS1 (red) and SBS5 (turquoise) in: (E) neurons, (F) maternal germline mutations, (G) paternal germline mutations, (H) smooth muscle from bladder, (I) liver hepatocytes, (J) lung epithelium, (K) small bowel epithelium, (L) colon epithelium (*for this data set, in which the intercept is negative, we assume both signatures increase with age). Throughout (E–L), shaded areas represent 95% confidence intervals, estimated by bootstrapping (see Methods). Underlying data for this figure can be found in S2 Data.
https://doi.org/10.1371/journal.pbio.3002678.g003
In all cell types except for colonic epithelium, we find significant positive intercepts in the regression of the number of mutations on age, consistent with a burst of mutations in early development, for example. In these cases, we decompose mutations into the age-dependent distribution P a and the constant component distribution P b . In the case of colonic epithelium (Fig 3L), the intercept is significantly negative, possibly because the mutation rate during ontogenesis is lower than in adult life (S1B Fig), when the exposure to damage and the cell division rate is higher [45]. To proceed with our analysis in this case, we assume (rather than infer, as for other cell types) that all signatures increase at constant rates with age.
Our analysis reveals multiple signatures that increase with age (see S2 Fig), including the 2 that were previously reported [13], SBS1 and SBS5, and 2 additional signatures that are common across cell types considered, SBS12 and SBS16. SBS16 may not be independent from SBS5 [8,10]. In turn, SBS12 contributes up to 7% to 10% of mutations in neurons and the female germline, approximately 7% of mutations in lung and liver, approximately 3% in small bowel and colon, approximately 1% in the paternal germline, but is not found at detectable levels in muscle. Both SBS12 and SBS16 are of unknown etiology and dominated by T to C/A to G transitions.
In examining the mutation accumulation across cell types that vary in their division rates, we focus on SBS1 and SBS5, the 2 ubiquitous clock-like signatures, and use our decomposition method to examine possible sources for their age dependencies. If driven by cell division, we predict that the rate at which they will accumulate should vary substantially with cellular turnover rates. In contrast, if driven by damage rates, the rate should be much less sensitive to turnover rates, but may vary among tissues owing to differences in endogenous and exogenous damage rates.
In this regard, the accumulation of mutations with age observed in neurons is particularly informative, given that neurons are fully postmitotic cells. Despite the lack of cell divisions, mutations accumulate at rates similar to actively dividing lineages [3]. Using our decomposition, signatures whose mutation numbers increase with age are distinct from those that do not (Fig 3C and 3D). Notably, the increase with age is predominantly driven by mutations assigned to signature SBS5, with secondary contributions from SBS16 and SBS12. Strikingly, there is no discernible contribution of SBS1, as we discuss in more detail below. In turn, mutations in the constant component are attributed primarily to signatures SBS5 and SBS1, as well as signature SBS89, which is of unknown etiology but has been reported to be active in the first decade of life [45].
Mutations assigned to the clock-like signature SBS5 are found across cell types and increase significantly with age in every one (Fig 3E–3L). Moreover, SBS5 is the prevalent mutation signature in all cell types, except for small bowel and colon, for which more mutations are attributed to SBS1. That SBS5 is the dominant signature in postmitotic cells such as neurons, as well as in maternal mutations, most of which arose in oocytes, indicates that such mutations can arise independently of DNA replication cycles and points to errors in DNA repair, which accumulate with damage rates (Eq 1). Similarly, SBS12 and SBS16 contribute to both neurons and female germline mutations as well as to mutations in rapidly dividing cells (S2 Fig), suggesting that the age dependencies of these signatures is not driven by DNA replication cycles either.
When not arising from replication errors, our model predicts that the number of mutations will be clock-like only if the damage rate u is constant. If we assume that probabilities ϵ and p are fixed, as seems sensible if they are primarily determined by inherent properties of DNA repair (e.g., the error rate of a polymerase), then the variation in the rate of SBS5 mutation across cell types reflects differences in rates of endogenous and exogenous damage. Consistent with this notion, the rate of SBS5 mutation is highest for epithelia in the colon and lung (Fig 3), which plausibly experience high rates of damage, and lowest for mutations assigned to the maternal genome, potentially reflecting the fact that oocytes are particularly well protected [46,47]. This model also helps to explain the observation that increasing the damage rate by exogenous factors, such as long-term exposure to tobacco, significantly increases SBS5 mutation rate in lung cells [21,48].
In that light, it may seem puzzling that in such different cell types, which presumably experience distinct sources of damage, a large fraction of mutations are consistently comprised of SBS5. As an explanation, we propose that SBS5 reflects errors in DNA synthesis during repair, a critical step in many repair pathways (e.g., nucleotide-excision repair or homologous recombination) [4]. These pathways often involve the synthesis of multiple nucleotides surrounding the lesion, using the intact strand as a template. The errors of the gap-filling polymerase may be displaced from the position of the original lesion, disassociating the mutational signature from the context of the original damage. We therefore hypothesize that this mutational signature reflects the error profile of the polymerase (ϵ) and the asymmetry of mismatch resolution (p).
The second signature to increase with age, SBS1, does so in all cell types considered, except for liver, where hepatocytes are routinely dormant in the cell cycle [49], and neurons, which are postmitotic. More generally, the rate at which mutations assigned to SBS1 increase with age varies widely among cell types and is highest in those characterized by the highest turnover rates (such as intestinal epithelia, where turnover time estimates are of the order of 3 days [50]). Thus, SBS1 appears to be driven by cell division rates. A possible exception is the observation of a slight increase with age in maternal germline mutations, most of which arose in oocytes (see the discussion of germline mutations below). These observations are in agreement with previous observations from cancer studies [6,13,51].
The origin of CpG transitions remains unclear. If they arise because methylated CpGs are a poor template for replication or from spontaneous deamination of single strands during replication, their dependence on cell divisions is expected. Less intuitively perhaps, the same expectation holds if they arise from spontaneous deamination and are efficiently and accurately repaired during the cell cycle (Fig 2). Current data do not allow us to pinpoint when in the cell cycle the damage accrues, however. Two plausible sources are unrepaired mismatches that accrue during the cell cycle and deamination of single-stranded cytosines during DNA replication. As we show, their relative importance will depend on efficiency of mismatch repair as well as the length of time spent single-stranded during replication, parameters that are to our knowledge unknown.
[END]
---
[1] Url:
https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3002678
Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.
via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/