(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .
Unveiling recent and ongoing adaptive selection in human populations [1]
['Ziyue Gao', 'Department Of Genetics', 'Perelman School Of Medicine', 'University Of Pennsylvania', 'Philadelphia', 'Pennsylvania', 'United States Of America']
Date: 2024-02
Genome-wide scans for signals of selection have become a routine part of the analysis of population genomic variation datasets and have resulted in compelling evidence of selection during recent human evolution. This Essay spotlights methodological innovations that have enabled the detection of selection over very recent timescales, even in contemporary human populations. By harnessing large-scale genomic and phenotypic datasets, these new methods use different strategies to uncover connections between genotype, phenotype, and fitness. This Essay outlines the rationale and key findings of each strategy, discusses challenges in interpretation, and describes opportunities to improve detection and understanding of ongoing selection in human populations.
Funding: This work is supported by a Research Fellowship (FG-2021-15702) from the Alfred P. Sloan Foundation (
https://sloan.org/ ) and a grant (R35GM146810) from the National Institute of General Medical Sciences to ZG. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
(A) A “genotype-focused” strategy focuses on the cumulative effects of historical selection on genetic variation patterns and relies on population genetics modeling to tease apart the influence of other evolutionary forces. Ancient DNA data provide direct information on allele frequency changes, which helps reduce inference uncertainty and confounding by demographic history. (B) A “fitness-focused” strategy focuses on direct association between genotype and fitness component(s) and utilizes allele frequency changes within one generation to detect selection in contemporary populations. As a special case of this strategy, between-sex differences in adult allele frequency or effect size of association to fitness components can be leveraged to detect sex-differential selection. (C) A “phenotype-focused” strategy relies on aggregation of selection signals revealed by genotype-focused or fitness-focused strategies across trait-associated variants identified by genome-wide association studies (GWAS).
This Essay aims to highlight growing evidence for very recent and ongoing genetic adaptation in the human genome, with a focus on positive selection and directional selection on polygenic traits, as these modes of selection may potentially contribute to genetic and phenotypic differences across populations. It is important to note that the effects of negative selection (such as purifying selection and background selection; Box 1 ) are evident and prevalent in the human genome. However, due to space limitations, this Essay does not discuss the advances made in the past decade in identifying genomic regions and phenotypes subject to recent and ongoing negative and stabilizing selection (e.g., [ 5 – 8 ]). Instead, it only briefly discusses the challenges associated with detecting and interpretating signals of positive and directional selection in the context of pervasive negative selection. The Essay starts with the latest methodological innovations in inference of positive selection at individual genomic loci, and then discusses techniques for detecting aggregate selection signals across genetic loci that collectively influence a quantitative trait. Rather than delving deeply into the technical details, it emphasizes the connection and distinction among “genotype-focused,” “phenotype-focused,” and “fitness-focused” strategies, as well as the advantages and limitations of each ( Fig 3 ). Some major findings stemming from these innovative approaches are discussed, along with challenges in interpretation of the signals.
Numerous scans have been carried out in the human genome for targets under selection of intermediate scales (e.g., over 1,000 generations), but it remains a challenging task to demonstrate that selection on the identified targets is still ongoing or to detect selection that started recently. Enabled by the recent availability of population-scale genomic data and the development of efficient algorithms for inferring local genealogical trees, many new methods have been developed in the past 20 years to detect signals of selection from the past few millennia (e.g., [ 1 – 4 ]). Complementary to this approach, ancient DNA data provide direct estimates of past allele frequencies in human populations across time and geography and have refined estimation of the tempo and strength of selection in many instances of selection signals identified in modern genomes. Most recently, population-scale biobank-style datasets, encompassing genomic information and phenotypic data on reproduction, disease, mortality, and other quantitative traits, have pinpointed variants associated with various fitness components, at times in a sex-specific manner. These findings signify the presence of ongoing selection occurring within just one or a few generations.
Polygenicity refers to a scenario in which variation in a trait within a population is contributed to by genetic variants at multiple genes or genomic loci rather than by just one or a few. Many complex traits in humans, such as height and disease susceptibility, are highly polygenic.
A type of natural selection that favors individuals with an intermediate value of a fitness-relevant trait. Individuals with deviation from the optimal trait value are selected against, and the result is a stabilization of the trait around a specific value. Stabilizing selection concerns the relationship between phenotype and fitness, regardless of the genetic basis. Other types of phenotype-focused selection include disruptive selection, which favors individuals with extreme trait values, and directional selection, which favors individuals at only one end of the phenotypic spectrum.
The process by which organisms evolve heritable characteristics or traits that help them to better survive and reproduce in their specific environment. In many cases, adaptation is used synonymously with positive selection, but adaptation also encompasses other selection modes such as balancing selection and polygenic adaptation.
Two inseparable concepts that describe the same phenomenon from different angles. To facilitate communication, population geneticists often adopt either of these terms focusing on the impact of selection on the derived allele, such that positive selection tends to speed up molecular evolution, whereas negative selection decelerates or prevents it. Nonetheless, in many cases, identity of the derived allele is ambiguous or less relevant (e.g., during transient selection), and the direction of selection often refers to the effect of selection on the rare allele (for example, a scenario where the rare allele is beneficial is often considered positive selection, although one could consider the same scenario as negative selection against the more common allele).
(A) The hallmark of positive selection is faster allele frequency increase than would be expected under neutrality. (B) The rapid allele frequency change leaves footprints in the surrounding genomic region, although the specific patterns depend on the strength, tempo, and mode of selection (e.g., selection on standing variation versus on de novo variants). (C) Major methods for detecting positive selection based on present-day genetic variation.
In this conceptual framework, selection on genotype is mediated by fitness-relevant phenotype and manifests in allele frequency changes and genetic variation patterns. In any specific environment, genotype and environment together shape the phenotype of an individual, which in turn determines the fitness. In addition to its direct effect on the phenotype (solid purple arrow), the environment also modifies the genotype-to-phenotype mapping (i.e., genotype-by-environment interaction; indicated by the dotted purple arrow) and phenotype-to-fitness mapping (dashed purple arrow). Through interactions with other evolutionary forces (indicated by the brown plus sign), natural selection shapes the allele frequency trajectory over time and leaves footprints in genomic variation in present-day populations.
A central query in human evolutionary genetics is to understand the functions and evolutionary history of genes or genomic regions that are under natural selection. Selection favors genetic variants that lead to advantageous phenotypic changes in specific environments, resulting in increases in allele frequency over time and distinctive patterns of genetic variation in present-day populations (Figs 1 , 2A and 2B ). Beyond unraveling the origin and evolutionary history of these selective genetic changes, it is of immense interest to gauge their contribution to phenotypic diversity in present-day human populations, as well as their impacts on disease risk and overall fitness ( Box 1 ) in contemporary environments. Therefore, recent research endeavors are increasingly shifted towards identifying and characterizing extremely recent and even ongoing selection.
Positive selection at individual genomic loci
Genomic footprints in present-day genetic variation Traditional methods for detecting selection take a genotype-focused approach (Fig 3A) by adopting classic population genetics models. Specifically, these models predict changes in allele frequency and patterns of surrounding genomic variation by assuming arbitrary fitness effects of different genotypes at a single genetic locus. The obvious advantage of this modeling approach is that it establishes expectations for genomic signatures of selection while requiring very little phenotypic information, such as how genotypes map to phenotypes or which phenotypes are under selective pressure. Typical genomic signatures of positive selection include extreme differentiation in allele frequencies across populations, extended haplotypes/linkage disequilibrium, or distortion in the site frequency spectrum of segregating variants (reviewed in [9–11]; Fig 2C(i–iii)). These statistics capture complementary features of genomic variation, but most are powerful in detecting selection on intermediate timescales (i.e., hundreds of generations or longer). More recent methods increase detection power by considering multiple summary statistics jointly. This idea was initially implemented using a few basic summary statistics [12] and later expanded through techniques such as Approximate Bayesian Computation [13] or supervised machine learning (reviewed in [14]). Thanks to the recently available population-scale genomic data and continuous theoretical and methodological developments, genome-wide scans based on population genetic summary statistics have identified thousands of putative targets under selection, largely independently of biological knowledge regarding the corresponding phenotype or selective pressure. Despite being able to pick up selection signals over the past hundreds or thousands of generations, these scans are limited in power for detecting very recent selection because the narrow time window involved leaves very subtle genetic footprints in the site frequency spectrum or haplotype structure. From the perspective of the local genealogical tree, very recent selection only impacts branches near the leaf nodes but leaves most of the tree unchanged. Realizing this, researchers have developed methods that explicitly leverage features of terminal branches of the local genealogical tree. The singleton density score (SDS) is one such method that detects recent allele frequency changes based on extremely rare variants [15]. Specifically, SDS tests for deficiency of singletons (i.e., variants that appear exactly once in the entire sample) on haplotypes carrying the putatively favored allele, which is indicative of a faster coalescent rate in the recent past (Fig 2C(iv)). Along these lines, another method called ascertained sequentially Markovian coalescent (ASMC) detects targets of recent positive selection by inferring pairwise coalescent times and looking for unusually high densities of coalescent events in the recent past (Fig 2C(v)) [16,17]. When applied to whole-genome sequences of approximately 3,200 individuals of European ancestry, SDS detected selection signals in the past 2,000 to 3,000 years in the major histocompatibility complex (MHC) region and at variants associated with lactose tolerance and pigmentation [15]. In comparison, application of ASMC to over 487,000 British individuals identified signals of selection in the past 1,500 years, including those detected by SDS, as well as several new candidate loci harboring genes related to immune response, tumor growth, and other phenotypes [17]. With the recent development of algorithms for inference of the ancestral recombination graph or its proxies, several tree-based statistics have been developed for detecting positive selection (reviewed in [18]; Fig 2C(vi)). One of these methods, Relate, estimates local genealogy from sequence data and detects selection by searching for rapid propagation of lineages carrying a putatively beneficial allele relative to other lineages, effectively testing for differences in the coalescent rate between haplotypes carrying different alleles [19]. However, this selection metric is calculated on only one point estimate of the local genealogy. By contrast, a likelihood method called CLUES leverages the posterior distribution of local genealogical trees to infer selection coefficients and allele frequency trajectories at individual loci [20]. These new methods have confirmed strong selection on variants associated with lactase persistence, immune response, and pigmentation traits in Europeans in the past few thousand years and some signals in other populations (such as the EDAR gene in East Asians), although very few new signals have been detected.
Selection signals in ancient genomes While modern genomes provide a snapshot of population evolution and allow for indirect inference of past demographic and selective events, genomic sequences from ancient samples enable direct glimpses into the genetic history of human populations. By providing estimates of allele frequencies at multiple time points (Fig 2A and 2B), ancient DNA has shed valuable insights on the evolutionary histories of multiple selected variants in human evolution during the past 15,000 years (reviewed in [21–23]). Analysis based on ancient DNA has also been particularly helpful in detecting candidates under spatially or temporally restricted selection. Ancient DNA transformed our understanding of selection in humans by resolving complex interactions between selection and demographic history. As recent human history features many episodes of population splits and admixture, signals of selection are often obscured by changes in ancestry [24]. One instance is the evolutionary history of the FADS locus, which contains genes encoding enzymes involved in the conversion of long-chain polyunsaturated fatty acids. Using present-day genomic data, studies detected strong selection signals on FADS genes in human populations from multiple continents, with different alleles being favored across time and geography [25–29]. However, analysis of ancient DNA showed that the selection signal in Native Americans was largely an artifact driven by parallel selection in European and Asian populations [30]. Another intriguing case is the evolution of pigmentation in west Eurasia in the context of several major admixture events revealed by ancient DNA. The derived alleles associated with lighter skin or eye color at several pigmentation-associated genes exhibited distinct frequencies in different ancestral populations, potentially reflecting differential selective pressures across geography prior to the Mesolithic period (i.e., before 9,000 to 10,000 years ago) [31,32]. Moreover, the observed allele frequencies and ancestry fractions at these pigmentation-associated variants in later admixed populations significantly deviated from neutral expectations, suggesting subsequent selection during the Neolithic, Bronze Age, and historical periods [33–35]. These findings point to continued selective pressure for light pigmentation over the past 2,000 years in west Eurasia and support the concept that admixture may facilitate rapid adaptation by introducing advantageous alleles [34–37]. Ancient DNA data have also refined our knowledge of the onset, duration, and strength of selection events. For example, selection on the variant conferring lactase persistence was initially estimated to begin around 7,500 years ago based on modern genomic data and archeological evidence of dairy production [38]. Surprisingly, ancient DNA data have shown that the selected allele was rare in Bronze Age Europe until 3,000 years ago, suggesting a much later onset of positive selection than was previously inferred [31]. In addition, based on the allele frequency trajectory in ancient DNA samples, the positive selection for this allele was inferred to be strong 100 to 150 generations ago but drastically reduced in the past 100 generations [39]. Significant variation in selection strength has also been found at several other previously identified selected loci [39]. Overall, ancient DNA studies have confirmed selection signals near multiple genes associated with diet, pigmentation, and immune response revealed in modern genomic data, and have provided fine-resolution insights into the temporal dynamics and geographic distribution of the selected variants and the corresponding selection strengths [26,34,39,40]. With recurrent observations of selection targeting genes in immune pathways, the quest to discern the specific pathogens driving these selective pressures has been immensely captivating. A strategy to link selection signals with the causative pathogens is to search for variants with unusual allele frequency changes during well-documented catastrophic pandemics. A recent investigation scrutinized ancient genomes of roughly 200 individuals who died before, during, and after the Black Death pandemic in the fourteenth century [41]. This study reported an overall enrichment of allele frequency differentiation in immune genes as well as a handful of potential targets under positive selection. However, serious skepticism has been raised towards the findings due to technical concerns [42], and other studies adopting similar designs (though with smaller sample sizes) failed to replicate the selection signals at immune genes overall or at individual candidates [43,44]. These results suggest the selection effects of historical pandemics at individual genomic loci are relatively modest, necessitating expansive sample sizes for detection.
Fitness-focused strategy for detecting selection in contemporary populations The fitness of an individual consists of several components such as viability, mating success, and fecundity. A genetic variant that influences any of these components is subject to natural selection unless its effects on all components cancel out. Based on this reasoning, one can identify loci under ongoing selection using a fitness-focused approach by performing GWAS on proxies for fitness components (Fig 3B). However, traits closely associated with fitness are expected to have low heritability [45], and fitness-related variants tend to be rare in frequency. Therefore, identification of these variants via association requires exceedingly large sample sizes, which only became feasible in the past decade. It is worth noting that, due to limited power, this association approach is biased towards detecting common variants and does not pick up fitness-influencing variants that are under strong negative selection. One of the most studied proxies of fertility is the number of children ever born to or fathered by an individual, because it can be easily surveyed and approximates the overall fitness well in modern populations with low mortality. Using data from hundreds of thousands of individuals born in the 1950s to 1970s, dozens of genomic loci have been associated with the number of children [46–48]. Interestingly, among the top associations stands the FADS locus, which also harbors strong signals of historical positive selection in both ancient and present-day DNA samples [26,28,29,49]. By contrast, the two most significant association regions lack evidence of historical positive selection but demonstrate signals of balancing selection, possibly due to pleiotropic effects (Box 1) on other fitness components or temporally fluctuating selection [46,50,51]. Besides reproduction, viability is a key component of fitness. In principle, the number of children closely reflects their contribution to the population gene pool of the next generation, but current association studies for this trait include only individuals who survived to completion of their reproductive lifespan, leaving out those who did not reach adulthood. To detect common variants linked to early-life survival, Wu and colleagues performed a clever GWAS on time- and location-matched infant mortality rate (IMR) for living individuals in the UK Biobank [52]. The rationale is that individuals who survived in tougher environments during infancy, as indexed by a higher local IMR in their birth years, tend have higher “relative viability.” Interestingly, the two genome-wide significant loci identified by this approach, LCT and TLR6-TLR1-TLR10, are both known targets of recent positive selection in Europeans, with the survival-increasing alleles matching the evolutionarily favored allele [15,26]. A more direct approach for identifying variants that affect viability is by looking for shifts in allele frequency across individuals of different ages [2]. Limited by the age distribution of participating individuals in current cohorts, this method is underpowered to detect allele frequency changes in early life, when selective pressure is expected to be strong. However, in humans, even variants that exclusively affect viability late in life may be under selection, due to late male reproduction, intergenerational resource transfer, and other reasons [53,54]. By testing for changes in allele frequency with age, a study found and replicated two genome-wide significant signals in 2 independent datasets: one overlaps with the APOE ε4 allele that is associated with reduced lifespan and increased risk of Alzheimer’s disease and cardiovascular diseases [55,56]; the other locus contains variants that are close to a nicotine receptor gene CHRNA3 and associated with increased smoking quantity [57]. Intriguingly, the relatively common frequencies of these survival-reducing variants in present-day populations suggest that they were not under strong negative selection in the recent past. The authors interpreted the lack of abundant associations as evidence for purifying selection against variants with large effects on late-onset disease and speculated that the APOE and CHRNA3 loci were found because their deleterious effects have recently increased in humans due to environmental changes.
Fitness-focused strategy for detecting sex-differential selection The extraordinary level of sexual dimorphism in many animal species, including humans, reflects sex-specific phenotypic effects and sex differences in the fitness landscape. The fitness effect of a genetic variant may differ between sexes in magnitude or sometimes in direction. Such sex-differential selection is challenging to study because mendelian inheritance equalizes autosomal allele frequencies between the 2 sexes at fertilization in each generation. Nevertheless, the special case of sex-differential selection on viability is expected to leave a distinctive signature in population genetic variation: allele frequency differences between adult females and males (Fig 3B, right). An early study seeking this signature reported signals at hundreds of genetic regions and an enrichment of signals on the X chromosome compared to autosomes [58]. Unfortunately, these findings turned out to be largely false positives driven by random noise, sex-biased genotyping error, and biases due to hemizygosity of the X chromosome in males. Later studies on much larger biobank datasets failed to detect robust signals at any autosomal loci [59] or enrichment on the X chromosome [60]. While signals of sex-differential viability selection are expected to be exceptionally weak at individual loci [61,62], subtle between-sex allele frequency differences across many variants may be detectable in aggregation. Leveraging the genomic and reproductive history data of approximately 250,000 adults in the UK Biobank, Ruzicka and colleagues developed new metrics to measure between-sex allele frequency differentiation over different stages of a life cycle. They found significant shifts in the genome-wide distributions of these metrics, which is consistent with effects of sex-differential selection on survival, reproductive success, and overall fitness [4].
[END]
---
[1] Url:
https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3002469
Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.
via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/