(C) PLOS One [1]. This unaltered content originally appeared in journals.plosone.org.
Licensed under Creative Commons Attribution (CC BY) license.
url:
https://journals.plos.org/plosone/s/licenses-and-copyright
------------
Powerful detection of polygenic selection and evidence of environmental adaptation in US beef cattle
['Troy N. Rowan', 'Division Of Animal Sciences', 'University Of Missouri', 'Columbia', 'Missouri', 'United States Of America', 'Genetics Area Program', 'Department Of Animal Science', 'University Of Tennessee', 'Knoxville']
Date: 2021-09
Selection on complex traits can rapidly drive evolution, especially in stressful environments. This polygenic selection does not leave intense sweep signatures on the genome, rather many loci experience small allele frequency shifts, resulting in large cumulative phenotypic changes. Directional selection and local adaptation are changing populations; but, identifying loci underlying polygenic or environmental selection has been difficult. We use genomic data on tens of thousands of cattle from three populations, distributed over time and landscapes, in linear mixed models with novel dependent variables to map signatures of selection on complex traits and local adaptation. We identify 207 genomic loci associated with an animal’s birth date, representing ongoing selection for monogenic and polygenic traits. Additionally, hundreds of additional loci are associated with continuous and discrete environments, providing evidence for historical local adaptation. These candidate loci highlight the nervous system’s possible role in local adaptation. While advanced technologies have increased the rate of directional selection in cattle, it has likely been at the expense of historically generated local adaptation, which is especially problematic in changing climates. When applied to large, diverse cattle datasets, these selection mapping methods provide an insight into how selection on complex traits continually shapes the genome. Further, understanding the genomic loci implicated in adaptation may help us breed more adapted and efficient cattle, and begin to understand the basis for mammalian adaptation, especially in changing climates. These selection mapping approaches help clarify selective forces and loci in evolutionary, model, and agricultural contexts.
Interest in mapping the impacts of selection and local adaptation on the genome is increasing due to the novel stressors presented by climate change. Until now, approaches have largely focused on mapping “sweeps” on large-effect loci. Highly powered datasets that are both temporally and geographically distributed have not existed. Recently, large numbers of beef cattle have been genotyped across the United States, including influential individuals with cryopreserved semen. This has created multiple powerful datasets distributed over time and landscapes. Here, we map the recent effects of selection and local adaptation in three cattle populations. The results provide insight into the biology of mammalian adaptation and generate useful tools for selecting and breeding better-adapted cattle for a changing environment.
Funding: This project was supported by Agriculture and Food Research Initiative Competitive Grant no. 2016-68004-24827, no. MO-HAAS0027, and NSRP-8 from the USDA National Institute of Food and Agriculture awarded to JED.
https://reeis.usda.gov/web/crisprojectpages/1008909-identifying-local-adaptation-and-creating-region-specific-genomic-predictions-in-beef-cattle.html Funders did not play any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Data Availability: The raw data underlying the results presented in the study are available from the Red Angus Association of America (
https://redangus.org/contact-us/ , Ryan Boldt, Director of Breed Improvement,
[email protected] , (940) 387-3502 ext. 12), American Simmental Association (
https://simmental.org/site/index.php/contact#asadirectors , Jackie Atkins, Director Science and Education Operations,
[email protected] , (406)587-4531), and American Gelbvieh Association (
https://gelbvieh.org/about/contact , Megan Slater, Executive Director,
[email protected] , (303)465-2333, Ext 485) under a Data Use Agreement, but are not publicly available. Derived data (analytical results) are however available as supplementary files or as a Zenodo repository (doi: 10.5281/zenodo.4455543 ) associated with this publication.
Herein, we use two methods ( Fig 1 ), the first for detecting complex polygenic selection (Generation Proxy Selection Mapping, GPSM), and the second for identifying local adaptation (environmental Genome-Wide Association Studies, envGWAS). When applied to three US beef cattle populations, each with ~15,000 genotyped individuals, we identified numerous genomic regions harboring directional or environmentally associated mutations. Further, using a meta-analysis approach, we identified loci with ecoregion-specific allele frequency changes ( Fig 1E and 1F ), largely due to the erosion of local adaptation caused by gene flow among ecoregions from the use of artificial insemination sires. This study is the first step in assisting beef cattle producers to identify locally adapted individuals, which will reduce the industry’s environmental footprint by increasing efficiency and resilience to stressors. Further, this repurposing of commercially-generated genomic data provides us unprecedented power to gain insight into the biology of polygenic selection and adaptation in mammalian species.
To identify genomic regions potentially contributing to local adaptation, we used continuous environmental variables as quantitative phenotypes or discrete ecoregions as case-control phenotypes in a linear mixed model framework. We refer to these approaches as “environmental genome-wide association studies”, or envGWAS. Using a genomic relationship matrix in a LMM allows us to control the high levels of relatedness between spatially close individuals, and more confidently identify real environmentally-associated alleles. This method builds on the theory of the Bayenv approach from Coop et al. (2010) [ 30 , 31 ] that uses allele frequency correlations along environmental gradients to identify potential local adaptation.
Though domesticated, beef cattle are exposed to a broad spectrum of unique environments and local selection pressures, as compared to other more intensely managed livestock populations. This suggests that local adaptation and genotype-by-environment interactions play important roles in the expression of complex traits. Local adaptation and genotype-by-environment (GxE) interactions are known to exist in closely related cattle populations [ 22 , 23 ]. Previous work has identified the presence of extensive GxE in beef cattle populations [ 24 – 28 ], but limited work exploring the genomic basis of local adaptation has occurred [ 29 ]. Further, we anticipate that the increased use of artificial insemination, responsible for dramatic increases in production efficiency, may be eroding environmentally adaptive allele frequency differences in populations. Understanding genetic interactions with the environment, and their presence in cattle populations will become increasingly important in the face of changing climates.
From domestication to the present, humans have used phenotypic selection to change cattle. Since the 1980s and since the 2010s, genetic and genomic predictions, respectively, have been available to U.S. beef producers. However, even with these advanced tools, most beef producers still rely, at least partially, on phenotypic selection [ 21 ]. Prior to the 1980s, cattle were selected via phenotypic selection and there was very little movement of animals (i.e. gene flow) between regions. This strong, artificial phenotypic selection allows unintended selection on naturally-occurring abiotic and biotic stressor traits, akin to natural selection. Further, phenotypic selection could act on loci with genotype-by-environment effects (which BLUP breeding value-based selection would not), thus creating local adaptation. As an example, phenotypic selection could select for animals with better innate immune systems as they will grow faster and look more vigorous.
(a-c) Allele frequency trajectories for 20 SNPs colored by relative effect sizes from stochastic selection simulations. (a) Effect size = 0, representing stochastic changes in allele frequency due to genetic drift. (b) Large-effect alleles rapidly becoming fixed in the population representing selective sweeps. (c) Moderate-to-small effect size SNPs changing in frequency slowly over time, representing polygenic selection. (d) An overview of the linear mixed model approach used for Generation Proxy Selection Mapping and environmental GWAS. (e-f) A single SNP under ecoregion-specific selection. Different colors represent the trajectory of a given SNP in one of five different ecoregions. Ecoregion-specific selection can lead to allele frequencies that (e) diverge from or (f) converge to the population mean. Maps were plotted using public domain data from the US Department of Commerce, Census Bureau via the R package maps (version 3.1,
https://cran.r-project.org/web/packages/maps/ ).
Though the first cattle single nucleotide polymorphism (SNP) genotyping assay was developed just over a decade ago [ 18 ], numerous influential males who have been deceased for 30 to 40 years have been genotyped from cryopreserved semen ( S1 Fig , Table A in S1 Text ). These bulls add a temporally-stratified, multi-generational component to the datasets of thousands of contemporary animals genotyped from the numerically largest US beef breeds. Furthermore, the large number of animals genotyped from the most recent generations provide remarkable power for detecting small allele frequency changes due to ongoing selection. Under directional selection, alleles will be at significantly different frequencies in more recent generations compared with distant ones ( Fig 1C ). This creates a statistical association between allele frequencies at a selected locus and an individual’s generation number. With multiple generations sampled and genotyped, we can disentangle small shifts in allele frequency due to directional selection from the stochastic small changes caused by drift ( Fig 1A ) using Generation Proxy Selection Mapping (GPSM) [ 17 , 19 ]. The GPSM method searches for allele frequency changes by identifying allelic associations with an individual’s generation (or some proxy), while accounting for confounding population and family structure with a genomic relationship matrix (GRM) ( Fig 1D ) [ 20 ]. When pedigrees are missing, are missing large amounts of data, or have complex, overlapping generations, a proxy for generation can be used such as variety release date or birth date. Cattle producers are selecting on various combinations of growth, maternal, and carcass traits, but the genomic changes that result from this selection are not well-understood. Numerous genome-wide association studies have been undertaken on individual traits, but these say nothing about the underlying genomic changes that populations are experiencing due to selection. The GPSM method identifies the allele frequency changes underlying selection in a trait-agnostic manner, allowing us to observe the impacts of selection decisions in real time and understand how strong selection alters the genome over very short timescales.
As climate changes, organisms either migrate, rapidly adapt, or perish. The genes and alleles that underlie adaptation have been difficult to identify, except for a handful of large-effect variants that underwent selective sweeps [ 1 ]. It is becoming increasingly apparent that for adaptation, hard sweeps are likely to be the exception, rather than the rule [ 2 ]. Polygenic selection on complex traits can cause a significant change in the mean phenotype while producing only subtle changes in allele frequencies throughout the genome [ 3 ]. Additionally, we expect that polygenic selection is the major selective force both during and after domestication in agricultural species. Many selection mapping methods rely on allele frequency differences between diverged or artificially defined populations (e.g. F ST , FLK, XP-CLR) [ 4 – 6 ], making the detection of selection within a largely panmictic population difficult. Others rely on detecting the disruption of normal LD patterns (iHS, EHH, ROH, etc.) [ 7 – 9 ]. In cattle, these methods have successfully identified genomic regions under selection that control Mendelian and simple traits like coat color, the absence of horns, or large-effect genes involved in domestication [ 10 – 15 ]. Further, in many cases these models are unable to derive additional power from massive increases in sample size [ 16 ]. Millions of North American Bos taurus beef cattle have been exposed to strong artificial and environmental selection for more than 50 years (~10 generations) [ 17 ], making them a powerful model for studying the impacts selection has on genomes over short time periods and across diverse environments.
Results
Simulations To identify the robustness of GPSM to distinguish selection from drift, we performed two major sets of simulations, stochastic and gene drop. First, we performed a set of stochastic simulations to demonstrate how selection in the context of different effective population sizes, selection intensities, generational sampling, and genomic architectures produce GPSM signal. Stochastic simulations under multiple selection intensities, time periods, and trait architectures showed consistently that GPSM is able to map polygenic selection (S1 Table). Across all simulated scenarios and architectures, we identified an average of 38.5 selected loci (min 5.2, max 64.1) that reached genome-wide significance (q < 0.1) in GPSM. Depending on the genomic architecture of the simulated trait under selection, this represented between 0.57% and 32.54% of possible true positives (median = 9.94%). In most cases, we observed that significant hits were not the largest effect simulated QTL, but the loci that underwent the greatest allele frequency shifts over the course of genotype sampling. Usually, the largest effect QTL were fixed in the population during burn-in simulations, making their detection by GPSM impossible. Simulations suggested that GPSM effectively distinguished allele frequency changes due to selection from those associated with drift. Across all 36 scenarios of random selection, with drift acting as the only force by which alleles could change in frequency, we detected an average of 0.15 GPSM false positive SNPs (q-value < 0.1) per replicate. These rare false positives were not driven by changes in any single component of the simulations. The average number of false positive SNPs detected per replicate ranged from 0.0 to 0.45 across all scenarios, accounting for, at most, 1 false positive GPSM SNP per 100,000 tests. Genotype sampling also significantly impacted the number of true and false positives detected by GPSM across scenarios. When genotypes were evenly sampled across generations, GPSM detected, on average, 18.05 more true positives (paired t-test p < 2 x 10−16) and 0.90 less false positives (paired t-test p < 2 x 10−16) compared with heavier sampling of recently born individuals. That said, across all scenarios, the uneven sampling scheme that more closely resembled our real datasets still detected over 20 selected loci on average across all simulations (min 1.9, max 36.4) The proportion variation explained (PVE, S1 Table) of birth date across simulation scenarios reflects the number of generations and the number of crosses in the simulations across both selection and random scenarios (p < 2e-16). For random scenarios, the number of segregating QTL (p = 9.94e-06) was also associated with PVE. Importantly, for selection scenarios, the proportion of true positives (p = 2.87e-07) and QTL distribution (p = 2.91e-13) were also significantly associated with the PVE. Gene dropping simulations using the Red Angus pedigree generated an average of 0.4 (sd = 0.52) significant GPSM loci (q < 0.1) per 200K markers tested, equating to 1–2 in our real genotype data (Table A in S1 Text). Thus, pedigree structure is not responsible for the significant SNPs that we detected in real datasets.
Detecting environmental associations using envGWAS Using an equivalent form of model to GPSM, but with continuous environmental variables (30 year normals for temperature, precipitation, and elevation) or statistically-derived discrete ecoregions as the dependent variable (rather than birth date in GPSM) allows us to identify environmentally-associated loci that have been subjected to artificial and, perhaps in this context more importantly, natural selection [49]. We refer to this method as environmental GWAS (envGWAS). envGWAS extends the theory of the Bayenv approach of Coop et al. (2010) which searches for allele frequency correlations along environmental gradients to identify potentially adaptive loci [30]. Similar approaches have been applied to plant datasets [50,51], but we extend envGWAS to panmictic, biobank-sized mammalian populations. Unlike many genome-environment association analyses which only used linear models [51,52], our large dataset and the use of multivariate models provides power to identify association while importantly controlling for geographic dependence between samples using a genomic relationship matrix (S3 and S7 Figs). We used K-means clustering with 30-year normal values for temperature, precipitation, and elevation to partition the United States into 9 discrete ecoregions (Fig 3A). These ecoregions are largely consistent with those represented in previously-published maps from the environmetrics and atmospheric science literature [53], and reflect well-known differences in cattle production environments. The resulting ecoregions capture not only combinations of climate and environmental variables, but associated differences in forage type, local pathogens, and ecoregion-wide management differences to which animals are exposed. Thus, using these ecoregions as case-control phenotypes in envGWAS allowed us to detect more complex environmental associations. The three studied populations are not universally present in all ecoregions (Figs 3B, S4B and S5B, Table F in S1 Text). Since the development of these US populations in the late 1960s and early 1970s, registered seedstock animals from these populations have a small footprint in desert regions with extreme temperatures and low rainfall. PPT PowerPoint slide
PNG larger image
TIFF original image Download: Fig 3. Manhattan plots for discrete and continuous envGWAS in Red Angus cattle. (a) Nine continental US ecoregions defined by K-means clustering of 30-year normal temperatures, precipitations, and elevations. (b) Locations of sampled Red Angus animals colored by breeder’s ecoregion and sized by the number of animals at that location. (c) Multivariate discrete envGWAS (case-control for six regions with > 600 animals). Locations of sampled Red Angus animals colored by (d) 30-year normal temperature, (e) 30-year normal precipitation, and (f) elevation. (g) Multivariate continuous envGWAS with temperature, precipitation, and elevation as dependent variables. For all Manhattan plots the red line indicates the empirically-derived p-value significance threshold from permutation analysis (p < 1⨯10−5). Maps were plotted using public domain data from the US Department of Commerce, Census Bureau via the R package maps (version 3.1,
https://cran.r-project.org/web/packages/maps/).
https://doi.org/10.1371/journal.pgen.1009652.g003 Although environmental variables and ecoregions are not inherited, the estimated PVE measures the extent to which genome-wide genotypes change in frequency across the environments in which the animals were born and lived. The proportion of variance explained by SNPs ranged from 0.586 to 0.691 for temperature, 0.526 to 0.677 for precipitation, and 0.585 to 0.644 for elevation (Table G in S1 Text). In Red Angus, PVE for ecoregion membership ranged from 0.463 for the Arid Prairie to 0.673 for the Fescue Belt (Table H in S1 Text). We observe similar environmental PVE in both Simmental and Gelbvieh datasets. These measures suggest that genetic associations exist along both continuous environmental gradients and within discrete ecoregions. Permutation tests that shuffled environmental dependent variables, removing the relationship between the environment and the animal’s genotype, resulted in all PVEs being reduced to ~ 0, strongly suggesting that the detected associations between genotype and environment were not spurious. An additional permutation test that permuted animals’ zip codes, such that all animals from a given zip code were assigned the same “new” zip code from a potentially different ecoregion provided similar results, indicating that bias due to sampling at certain zip codes was not producing envGWAS signals. From 10 rounds of permutation, there were no SNP associations with p-values < 1⨯10−5. Consequently, we used this empirically-derived p-value threshold to determine SNP significance in all of the envGWAS analyses. Gene drop simulations suggested that a small portion of the identified associations are likely due to pedigree structure or founder effects (average of 1.36 false positive envGWAS loci per 200,000 tests). However, in this data, the pedigree structure reflects selection decisions of farmers and ranchers that are not beyond the influence of performance differences relative to environmental differences.
Discrete ecoregion envGWAS In Red Angus, we identified 54 variants defining 18 genomic loci significantly associated with membership of an ecoregion in the discrete multivariate envGWAS analysis (Fig 3C). Despite locus-specific signal, principal component analysis (PCA) does not suggest that ecoregion-driven population structure exists in any of the populations (S6 Fig). Of these loci, only two overlapped with loci identified in the continuous envGWAS analyses, suggesting that using alternative definitions of environment in envGWAS may detect different sources of adaptation. Of the 18 significant loci, 17 were within or near (< 100 kb) positional candidate genes (Table I in S1 Text, S6 Table), many of which have potentially adaptive functions. For example, envGWAS identified SNPs immediately (22.13 kb) upstream of CUX1 (Cut Like Homeobox 1) gene on chromosome 25. CUX1 controls hair coat phenotypes in mice [54], and alleles within CUX1 can be used to differentiate between breeds of goats raised for meat versus those raised for fiber [55]. The role of CUX1 in hair coat phenotypes makes it a strong adaptive candidate in environments where animals are exposed to heat, cold, or toxic ergot alkaloids from fescue stress [56]. In Simmental, we identified 11 loci tagged by 39 variants significantly associated with membership of an ecoregion in the multivariate envGWAS analysis (S4 Fig). In Gelbvieh, 66 variants identified 33 local adaptation loci (S5 Fig). In the analyses of all three datasets, we identified a common local adaptation signature on chromosome 23 (peak SNP rs1023574). Multivariate analyses in all three populations identified alleles at this SNP to be significantly associated with one or more ecoregions (q = 1.24 x 10−13, 3.15 x 10−12, 4.82 x 10−5 in RAN, SIM, and GEL, respectively). In all three datasets, we identified rs1023574 as a univariate envGWAS association with membership of the Forested Mountains ecoregion. However, the most significant univariate association in Red Angus was with the Arid Prairie region which was excluded from both the Simmental and Gelbvieh analyses due to low within-region sample size. In the multivariate analysis for Red Angus, the associated locus spanned 18 SNPs from (1,708,914 to 1,780,836 bp) and contained the pseudogene LOC782044. The nearest annotated gene, KHDRBS2 (KH RNA Binding Domain Containing, Signal Transduction Associated 2) has previously been identified by other adaptation studies in cattle, sheep, and pigs [57–59]. This variant was not significantly associated with any continuous environmental variable in Red Angus. However, rs1023574 was significantly associated with temperature, elevation, and humidity variables in Simmental. The KHDRBS2 locus was preferentially introgressed between Bos taurus and domestic yak [60]. Further, this locus shows an abnormal allele frequency trajectory (Fig 4C), indicating that it may be a target of balancing selection. PPT PowerPoint slide
PNG larger image
TIFF original image Download: Fig 4. Meta-analysis of within-ecoregion GPSM for Red Angus cattle. (a) Manhattan plot of per-variant Cochran’s Q p-values. Points colored green had significant Cochran’s Q (p < 1⨯10−5) and were significant in at least one within-region GPSM analysis (p < 1⨯10−5). (b) Ecoregion effect plots for lead SNPs from six loci from (a). Points are colored by ecoregion and are sized based on Cochran’s Q value. (c) Ecoregion-specific allele frequency histories for SNPs from (b), colored by ecoregion.
https://doi.org/10.1371/journal.pgen.1009652.g004
Continuous environmental variable envGWAS Using continuous temperature, precipitation, and elevation data as quantitative dependent variables in a multivariate envGWAS analysis of Red Angus animals, we identified 46 significantly associated SNPs (Fig 3G). These SNPs tag 17 loci, many of which are within 100 kb of positional candidate genes. Univariate envGWAS identified 23, 17, and 10 variants associated with temperature, precipitation, and elevation, respectively (S7 Fig). The most significant multivariate association in Red Angus is located on chromosome 29 within BBS1 (Bardet-Biedl syndrome 1), which is involved in energy homeostasis [61]. BBS1 mutant knock-in mice show irregularities in photoreceptors and olfactory sensory cilia [62] functions that are likely important to an individual’s ability to sense its local environment. This region was not significantly associated in any of the univariate analyses of environmental variables, and was not identified in any of the discrete ecoregion envGWAS. Of the other positional candidate genes identified in this Red Angus analysis, 9 have previously been implicated in adaptive functions in humans or cattle (Table J in S1 Text). For example, SNPs near GRIA4 were implicated in body temperature maintenance in cold stressed Siberian cattle [63]. Significant SNPs and their corresponding candidate genes for all three datasets are reported in S6 Table. While we found minimal candidate gene overlap between populations, we identified multiple shared biological pathways and processes (S7 Table) derived from lists of envGWAS positional candidate genes. Pathways in common between populations were driven by largely different gene sets. Across all populations, we identified the “axon guidance” pathway, and numerous other gene ontology (GO) terms related to axon development and guidance as enriched with environmentally-associated loci. Ai et al. (2015) suggested that axon development and migration in the central nervous system is essential for the maintenance of homeostatic temperatures by modulating heat loss or production [64]. In addition to axonal development, a host of other neural signaling pathways were identified in multiple populations. A genome-wide association study for gene-by-environment interactions with production traits in Simmental cattle by Braz et al. (2020) identified a similar set of enriched pathways [28]. These common neural signaling pathways identified by envGWAS are regulators of stress response, temperature homeostasis, and vasoconstriction [65]. We identified other shared pathways involved in the control of vasodilation and vasoconstriction (relaxin signaling, renin secretion, and insulin secretion). Vasodilation and vasoconstriction are essential to physiological temperature control in cattle and other species [66]. The ability to mount a physiological response to temperature stress has a direct impact on cattle performance, making vasodilation a prime candidate for environment-specific selection. Further, vasodilation and vasoconstriction likely also represent adaptation to hypoxic, high elevation environments. Pathways and processes identified by envGWAS signals are reported in S7 Table. To further explore the biology underlying adaptive signatures, we performed Tissue Set Enrichment Analysis of our envGWAS candidate gene lists. These analyses using expression data from humans and distantly related worms (C. elegans) both identified brain and nerve tissues as the lone tissues where envGWAS candidate genes show significantly enriched expression (S8–S11 Tables). Tissue-specific expression in the brain further supports our observed enrichment of local adaptation pathways involved in neural signaling and development.
[END]
[1] Url:
https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1009652
(C) Plos One. "Accelerating the publication of peer-reviewed science."
Licensed under Creative Commons Attribution (CC BY 4.0)
URL:
https://creativecommons.org/licenses/by/4.0/
via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/