(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .
Genomes from historical Drosophila melanogaster specimens illuminate adaptive and demographic changes across more than 200 years of evolution [1]
['Max Shpak', 'Laboratory Of Genetics', 'University Of Wisconsin Madison', 'Madison', 'Wisconsin', 'United States Of America', 'Hamid R. Ghanavi', 'Department Of Biology', 'Lund University', 'Lund']
Date: 2023-10
The ability to perform genomic sequencing on long-dead organisms is opening new frontiers in evolutionary research. These opportunities are especially notable in the case of museum collections, from which countless documented specimens may now be suitable for genomic analysis—if data of sufficient quality can be obtained. Here, we report 25 newly sequenced genomes from museum specimens of the model organism Drosophila melanogaster, including the oldest extant specimens of this species. By comparing historical samples ranging from the early 1800s to 1933 against modern-day genomes, we document evolution across thousands of generations, including time periods that encompass the species’ initial occupation of northern Europe and an era of rapidly increasing human activity. We also find that the Lund, Sweden population underwent local genetic differentiation during the early 1800s to 1933 interval (potentially due to drift in a small population) but then became more similar to other European populations thereafter (potentially due to increased migration). Within each century-scale time period, our temporal sampling allows us to document compelling candidates for recent natural selection. In some cases, we gain insights regarding previously implicated selection candidates, such as ChKov1, for which our inferred timing of selection favors the hypothesis of antiviral resistance over insecticide resistance. Other candidates are novel, such as the circadian-related gene Ahcy, which yields a selection signal that rivals that of the DDT resistance gene Cyp6g1. These insights deepen our understanding of recent evolution in a model system, and highlight the potential of future museomic studies.
Funding: Funding was provided from Vetenskapsrådet (2019-03536) and the Max Planck Center on Next Generation Insect Chemical Ecology to MCS, and from NIH NIGMS award 144 AAH7544 to JEP. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Data Availability: All new sequence data used in the present study can be found in NIH SRA Bioproject PRJNA945389. All scripts used in the present study can be found at:
https://github.com/mshpak76/Museum_Drosophila_Genomes . These scripts can also be found, along with full SNP-level PBE results, at:
https://doi.org/10.5281/zenodo.8290185 .
In all, the Swedish and Danish museum collections contain 26 D. melanogaster specimens suitable for analysis. We here provide whole-genome analysis from these flies, obtaining from most of them relatively complete and high-quality genomes, and we compare these against modern fly genomes ( Fig 1J ). The roughly 200 year time frame of this analysis may encompass 3,000 fly generations [ 16 , 17 ], analogous to studying human evolution across 75,000 years (approximately the time before present when modern humans first colonized Eurasia from Africa (e.g., [ 18 – 21 ])). Examining changes in genomic diversity across 2 roughly century-scale time intervals, we find that the relationship between the Lund population and other European populations has changed over time. We identify a variety of strong candidates for the action of positive selection in each time interval, providing temporal context for previously known cases of selection, while also identifying novel putative selection targets. These findings illustrate the potential of museomic studies to deepen our understanding of recent evolution in a rapidly changing world.
From the collection in Copenhagen, there are 2 specimens from Passau, Germany ( Fig 1F ) collected by Joseph Waltl (1805 to 1888), undated but likely collected in the 1850s. Also from Copenhagen, there is 1 specimen collected by Rasmus William Traugott Schlick (1839 to 1916) on northeast Zealand, Denmark ( Fig 1G ), undated but likely collected in the 1880s. Lastly, we have a set of 15 specimens from the collection in Lund ( Fig 1H ), collector unknown, labeled “Zootis 1933”—the informal name of the former Zoological Department at Lund University ( Fig 1I ).
The precise timing, location, and collector for the 6 examined specimens is not easy to decipher. The Fallén samples, collected in Lund, are accompanied by an original note, stating (translated) “Imago and larvae in sawdust [unintelligible] can of raisins.” If the flies were caught by Fallén himself, a potential locality would then be his lodgings in central Lund ( Fig 1D ), and that the material was collected before 1811, when Fallén began to spend considerable time elsewhere. His first mention of Drosophila can be found in a letter to Zetterstedt dated 14 September 1810 (Zetterstedt archive, Lund University). The paralectotype samples examined from Zetterstedt’s collection are labeled Lund and Smol (i.e., Småland) respectively, and they are likewise difficult to precisely date. Based on Zetterstedt’s Diptera Scandinaviae disposita et descripta, he collected D. approximata in Lund (possibly at his home in central Lund; Fig 1E ). The specimen examined here from Småland (and likewise the lectotype) was attributed to Sven Ingemar Ljungh (1757 to 1828), civil servant and prominent naturalist, who resided at Skärsjö manor (Jönköping County, Småland—roughly 200 km from Lund). It is unclear whether these specimens are the ones mentioned in his treatise on Scandinavian Diptera, but if so, the flies would likely have been caught within a decade or so of Fallén’s samples. However, the owner of a given specimen may or may not have been its collector (e.g., Zetterstedt kept many of Fallén’s specimens in his collection). These Swedish specimens are, to the best of our knowledge, the earliest D. melanogaster available in any collection, potentially predating the Meigen holotypes kept in Paris by 2 decades.
The insect collections held in Lund and Stockholm are among the oldest in the world and contain many important samples, including the specimens used to erect the genus Drosophila (Fallén, 1814 to 1826) and multiple early 1800s specimens of D. melanogaster. These include 4 specimens from the collection of Carl Fredrik Fallén of Lund University (1764 to 1830; Fig 1B ). From the same time period, 3 specimens exist from the collection of Johan Wilhelm Zetterstedt (1785 to 1874, Fig 1C ), a student of Fallén, and from 1822 professor at Lund University. This material comprises the lectotype (not examined here) and 2 paralectotypes of Drosophila approximata, a junior synonym of D. melanogaster.
(A) Timeline showing the chronology of the analyzed fly samples, notable human events, and climate trends in Sweden (temperature data from SMHI, Sweden). (B) One of the 4 D. melanogaster specimens from the Fallén collection at the Swedish Natural History Museum (Stockholm). Insert shows the label (in Fallén’s handwriting) accompanying the specimen. Scale bar: 1 mm. (C) One of 3 D. melanogaster specimens from the Zetterstedt collection at the Biological Museum (Lund). Insert depicts the attached label. Scale bar: 1 mm. (D) Lydbecksa Gården in Lund, where Fallén had his lodgnings until 1811. (E) Sandgatan 5 in Lund, where Zetterstedt lived. (F) D. melanogaster specimen from the Natural History Museum of Denmark (Copenhagen), collected by Joseph Waltl in Passau, Germany. Scale bar: 1 mm. (G) D. melanogaster specimen from the Natural History Museum of Denmark (Copenhagen), collected by William Schlick on NE Själland, Denmark. Scale bar: 1 mm. (H) Fifteen D. melanogaster specimens from Lund, kept in collections of the Biological Museum (Lund). Scale bar: 5 mm. (I) The old department of Zoology building at Lund University. (J) A representation of the 6 contemporary population samples included in the present analysis. Photos: (B, C, F–H) C. Fägerström, (D) Leonard Axel Jägerskiöld/Göteborgs Naturhistoriska Museum, (E) Joseph Magnus Stäck/Lund University, (I) Lund University, (J) M. Stensmyr.
Here, we obtain and analyze genomes from historical D. melanogaster specimens of approximately 100 to 200 years in age, using minimally destructive techniques. The roughly 200 year time frame of our analysis should encompass the earliest stages of this ancestrally tropical species’ adaptation to a novel high latitude environment [ 15 ], along with profound human-mediated changes to the environment of this commensal species ( Fig 1A ). Hence, museomic study of European D. melanogaster offers the potential to reveal both demographic and adaptive changes during this dynamic time interval, against the backdrop of a well-annotated genome with detailed knowledge of gene function.
Drosophila melanogaster is a primary model system for population genetics. Recently, the availability of data from multiple time points has begun to improve the prospects for distinguishing natural selection from neutral evolution in this system. Genome sequencing of wild-caught and laboratory-maintained D. melanogaster individuals has enabled the study of genomic evolution across seasonal (e.g., [ 12 , 13 ]) and decadal [ 14 ] time scales. However, no previous study has investigated genome-scale variation from historical D. melanogaster specimens nor from any species of such a minute animal.
A handful of prior studies have targeted whole-genome sequences from museum insect specimens. These have included proof-of-concept sequencing studies [ 3 , 4 ] and investigations of the taxonomic status of museum specimens [ 5 – 7 ], with 1 study including butterfly specimens up to 150 years old [ 8 ]. A few insect museomic studies have sequenced multiple individuals to examine potential targets of recent positive selection, such as focusing on the responses of honey bees to the introduction of a parasitic mite [ 9 , 10 ] and the emergence of the Colorado potato beetle as a crop pest [ 11 ].
Museum collections around the world contain billions of specimens collected over the last centuries. These collections serve as a window back in time—to an era before industrialization and modern agricultural practices—and could as such provide insights into issues ranging from insecticide resistance to climate change. The analysis of so-called historical DNA (hDNA), including from museum samples, accordingly holds much promise, but obtaining high-quality genomic data from older specimens remains a significant challenge [ 1 , 2 ].
Results and discussion
Most historical flies yielded genomes of comparable quality to contemporary flies For proof-of-principle, we first attempted to extract DNA from one of the 1933 “Zootis” specimens (Fig 1H). Briefly, the whole fly—still attached to the pin—was first incubated in a lysis buffer at 56°C for 2 h, after which the fly was washed, dehydrated, and returned to the collection (Fig 2A). The extraction process left the fly largely unharmed, except for a slight loss in coloration and some shrinkage of the abdomen (Fig 2B). The extracted DNA was subsequently prepared for Illumina short read sequencing. As expected, the DNA was highly fragmented with an average fragment length of about 50 bp. The Illumina run yielded approximately 24 million reads, and after reference mapping, the ensuing genome was 91% complete with a 15.7× mean sequencing depth. In short, the employed method was indeed able to extract enough DNA to piece together a largely complete genome, while doing minimal harm to the specimen. PPT PowerPoint slide
PNG larger image
TIFF original image Download: Fig 2. Sequencing of historical Drosophila specimens yields genomes with slightly reduced genomic coverage and some evidence of cytosine deamination. (A) Illustration of the process of extracting and sequencing DNA from pinned insect specimens. Flies were submerged in a lysis buffer for 2 h, then put through 3 washing steps and dried, before being restored to the museum repository. Genomic libraries were then prepared and sequenced using the Illumina NovaSeq 6000 platform. Further details appear in the Methods section. (B) Images of the appearance of a representative specimen (from 1933) before and after the DNA extraction protocol, illustrating minor changes in coloration. Photo: C. Fägerström. (C) The fraction of genomic coverage is plotted as a function of mean per-site read depth for each historical fly genome and for a representative set of contemporary inbred line genomes from Lyon, France. For a given level of depth, most historical fly genomes achieve levels of genomic coverage only slightly lower than contemporary genomes. (D) The relationship between depth and unique (singleton) variant-calling. Only the lowest quality historical fly genomes show notable elevations in the rate of singleton variants called. (E) The fraction of singleton variants called as adenine or thymine is elevated for historical fly genomes, particularly those with greater singleton rates, potentially due to cytosine deamination. Data underlying panels C–E can be found in S1 Table.
https://doi.org/10.1371/journal.pbio.3002333.g002 Having verified the method, we applied it to the other 25 specimens, and we were able to generate genomes from all but one. While the mean depth of coverage per site varies among the historical fly genomes (Fig 2C), the median value among these genomes was approximately 20×, which is comparable to typical D. melanogaster genomes generated from contemporary source material [22]. We found, however, that with similar mean depth, genomic coverage tends to be slightly lower in historical versus contemporary genomes (often with only 2% to 3% less of the genome covered; Fig 2C and S1 Table). The modest reduction in genomic coverage for most of the historical fly genomes may relate to their consistently shorter insert sizes—typically only about 50 bp (S1 Table), compared to contemporary genomes that are generally sequenced from fragments of a few hundred bp in length. The shorter DNA fragments from the old specimens are expected to limit the potential of reads to map uniquely in repetitive genomic regions, and hence such regions may not receive base calls in the historical genomes, leading to somewhat reduced genomic coverage as observed above. Genomes from the 1800s flies had mean insert sizes averaging 47.6 bp, while those from the 1933 collection averaged 50.8 bp. Although the distributions of mean insert sizes overlap between time points, they nevertheless differ significantly (P = 0.0188; Mann–Whitney two-tailed U test). There was a marginal correlation between fragment length and depth of coverage (correlation r = 0.40, P = 0.055). A stronger and significant association was found between fragment length and genomic coverage (r = 0.63, P = 0.0001). The shorter length of historical fly DNA fragments may be due to DNA degradation, which may also explain the reduced depth and genomic coverage obtained from a minority of the examined specimens. Among the nine 1800s samples, 4 genomes were characterized by a low average read depth of under 10 reads per site and/or a low genomic coverage of less than 80%. From the 1933 collection, 1 additional sample was excluded from further analyses due to extremely low coverage and read depth. Most samples with low coverage also had low mean read depth (Fig 2C and S1 Table), and the overall correlation between these quality metrics was statistically significant (r = 0.404, P = 0.018). We note that all 7 low-quality genomes (as defined above) were from female flies (out of 14 total females), whereas all 11 males yielded genomes with higher coverage and depth. This sex difference is statistically significant (two-tailed binomial P = 0.0057) and may be due to female flies being larger and potentially experiencing greater decay before dehydration. Regarding the 6 historical fly genomes retained with lower sequencing depth, inferences of diploid genotype are not statistically robust at sites for which an individual has few reads. Therefore, we treated the lower quality genomes as though they were haploid for autosomal and female X chromosomes by assigning only a single, most frequent nucleotide at each site. Consequently, there were effectively 14 rather than 18 autosome alleles for the 1800s samples and 28 rather than 30 for the 1933 autosomes. Similarly, treating each low-quality female X as haploid reduced allele counts to 11 and 21 for 1800s and 1933 X chromosomes, respectively (from what would otherwise be X chromosome allele counts of 15 and 23). The DNA degradation inferred above could also lead to incorrect base identification via anomalous sample-specific variants at individual sites. Therefore, we assessed whether singleton variants (defined as an allele called only once among the full panel of 1800s, 1933, and contemporary genomes analyzed, for the subset of sites with no missing data among these genomes) were enriched in samples with low depth and coverage. Indeed, we found a higher relative incidence of singleton variants unique to genomes with lower genomic coverage (two-tailed binomial P < 0.01; Fig 2B) and to a lesser extent those with low depth as well (P = 0.05). One of the most common types of nucleotide degradation is cytosine deamination, which can lead to spurious thymine enrichment, which has been documented extensively in ancient DNA samples (e.g., [23,24]). We therefore assessed the extent to which A/T nucleotides are overrepresented among singletons from each genome. We found a higher fraction of A/T sites in singletons among the historical samples than in modern genomes (Fig 2D and 2E and S1 Table), which could reflect the presence of some degradation-associated errors. Whereas the modern genomes had singleton A/T proportions of approximately 55%, many of the historical genomes’ singletons were 60% to 74% AT, presumably due to cytosine deamination. Although most historical genomes do not show meaningfully elevated singleton rates, our findings regarding singleton AT% justify the exclusion of singleton alleles for analyses that are sensitive to rare alleles and small frequency differences, such as demographic inferences based on allele frequency change over time. In contrast, analyses that search for window-scale outliers for elevated allele frequency differentiation between samples should be little affected by the low rates of spurious singleton variants indicated by these analyses.
The historical fly genomes show signs of relatedness and inbreeding A preliminary analysis of pairwise genetic distances among the genomes indicated that some specific pairs of individuals from within the early 1800s and 1933 Sweden samples had genetic similarity implying relatedness (S2 Table). Because the inclusion of related individuals in estimates of allele frequency generates artifactual nonindependent sampling, we sought to identify and mask individual chromosomal intervals displaying “identity by descent” (IBD) from downstream population genetic analyses (Fig 3A). PPT PowerPoint slide
PNG larger image
TIFF original image Download: Fig 3. Significant levels of IBD were observed within and between many historical fly genomes, reflecting inbreeding and relatedness. (A) Illustration of the genealogical relationships that may result in IBD due to relatedness between individuals (brown) or inbreeding within individuals (blue). Dashed lines illustrate masked segments, excluded from downstream analysis. (B) For the 1800s Sweden-Lund and Germany-Passau samples, the proportions of each genome detected as IBD due to apparent inbreeding (given along the diagonal) were low or zero. Above the diagonal, the proportions of each pairwise genomic comparison detected as IBD due to apparent relatedness varied within the Lund sample from zero to over 75% (the latter indicating a close familial relationship), while the 2 Passau genomes showed low-level relatedness IBD. Individuals are labeled based on their collection century, location, and arbitrary individual number—e.g., 18SL4 is 1800s Sweden-Lund individual 4, whereas DZ and GP refer to the Denmark (Zealand) and Germany (Passau) samples, respectively (S1 Table). (C) The same quantities are given for the Lund 1933 population sample, revealing some genomes with high levels of inbreeding, and a broad spectrum of relatedness among genomes. Since the probability of IBD detection is different for diploid and haploid sequences, males (with a single X chromosome) are given in italics. Lower quality genomes treated as haploids are shown in bold. IBD regions were masked in subsequent analyses.
https://doi.org/10.1371/journal.pbio.3002333.g003 We identified IBD regions between a given pair of individuals based on windows in which they had unusually low genetic distance between them (closer to the distance expected if at least 1 haplotype from each individual was IBD than to the distance expected for unrelated haplotypes). We inferred IBD windows spanning various lengths, up to whole chromosome arms (S2 and S3 Tables). Results indicated varying levels of relatedness among individuals within the 1800s and 1933 Sweden samples, up to levels expected for first or second order relatives, and a much lower level of relatedness between the 2 specimens from 1840s Germany (Fig 3B and 3C). In addition to IBD between genomes, we found that a few of the 1933 Lund samples had higher levels of within-genome IBD due to inbreeding—several samples from 1933 had long regions of chromosomes with depleted heterozygosity (Fig 3C and S2 and S4 Tables), consistent with mating between close relatives. The 6 Swedish flies from the early 1800s also all showed relatedness, with some pairs even at sibling-level IBD. Since close relatedness is only expected from spatially and temporally proximate samples, we conclude that in spite of museum records tentatively linking them with 3 different scientists and 2 localities, these specimens were all from the same Lund collection event. These results may reflect sharing of specimens among contemporary scientists and/or imperfect record keeping through time. The extensive IBD observed here contrasts with the low levels of relatedness observed from contemporary population samples, even when multiple fly lines founded from the same trap site were analyzed [22]. The elevated IBD of the historical fly genomes could have resulted from sampling methodology (e.g., propagation of collected insects in the lab) and/or from a lower population density of flies in earlier eras (as suggested below). We masked individual chromosomal intervals to prevent IBD from biasing downstream analyses—masking one individual’s genotypes within each pairwise relatedness IBD block, and counting only 1 allele per individual within an inbreeding IBD block (see Methods). Following IBD masking, we were left with averages of 9 to 11 sampled chromosomes for autosomal arms for the combined 1800s samples (versus 14 before masking) and 10 to 14 (versus 28) for the 1933 samples. The average sample size of X chromosome sites was then 8 for the 1800s set and 11 for 1933, versus 12 and 20 before masking. Based on the mosaic pattern of window masking due to intergenomic and intragenomic IBD across individuals on each chromosome arm, post-filtering sample size varied around these averages, and downstream analyses included sample size thresholds for site inclusion (see Methods).
Genomic diversity suggests transiently elevated genetic differentiation We then sought to examine how within-population genetic diversity and between-population genetic differentiation have changed across more than 2 centuries. We focused initially on the X chromosome in order to avoid the influence of inversions (e.g., [25]). Patterns of nucleotide diversity (π) in Lund appeared to show a decline with time (Fig 4A), which could reflect the action of genetic drift. However, we note that damage-induced errors may inflate diversity estimates, especially for the 1800s sample (as potentially indicated by its slightly elevated D xy values). In addition, the modern Lund sample represents a published pool-seq data set, analyzed via a distinct pipeline including the masking of rare variants, which may deflate its diversity estimate. PPT PowerPoint slide
PNG larger image
TIFF original image Download: Fig 4. Genetic differences among historical and modern fly populations suggest transient differentiation in the Lund population. (A) Genetic differentiation between population samples is summarized by F ST (below diagonal, red heat map) and by the rate of pairwise sequence differences D xy (above diagonal, blue heat map). Along the diagonal, the rate of pairwise sequence differences within populations (nucleotide diversity) is given. Data reflects the X chromosome only (to avoid effects of inversions), with down-sampling to 4 alleles per population at each site. The Lund population appears to show decreasing diversity with time, subject to caveats regarding historical genome quality and the differentially processed Lund 2010s pool-seq data. When Lund allele frequencies are compared to other European populations (via F ST ), differentiation increases between the 19th and 20th century Lund samples, then decreases again between the 20th and 21st century samples. (B) PCA results are shown for the same individually sequenced population samples included in (A), in terms of loadings from the first 2 principal components. Here again, the Lund 1933 population shows elevated differentiation, whereas genomes from the three 1800s locations cluster with modern European genomes. These results reflect an analysis of X chromosome variation, with each population sample downsampled to 5 individuals, with non-inbred historical genomes lacking close relatedness chosen for this analysis. Data underlying this figure panel can be found in S5 Table.
https://doi.org/10.1371/journal.pbio.3002333.g004 Patterns of genetic differentiation (as indicated by F ST ) revealed a curious temporal trajectory (Fig 4A). When Lund samples are compared to modern samples from around Europe, between the 1800s and 1900s sampling points, Lund became more differentiated from other populations. In contrast, between the 1900s and 2000s sampling points, Lund became more similar to other modern European populations. Concordant patterns were observed from principal components analysis (PCA) based on X-linked variation (Fig 4B and S5 Table). Here, 1800s Lund genomes were seen to cluster with modern European genomes, whereas the 1933 Lund genomes form a distinct cluster. This unexpected pattern of transient local differentiation could indicate that distinct evolutionary forces had predominant influences on genomic diversity between these time points. Increased genetic differentiation during the 1800s to 1933 period could reflect an elevated influence of genetic drift. This interval likely represents the earliest days of D. melanogaster occupying northern Europe [15], perhaps as shifting human activities only just permitted the species to survive in this region, and hence we might predict that the initial abundance of this species was low. The climate during this period was also colder than at present (Fig 1A), which may have also limited local population sizes. Consistent with this hypothesis, Zetterstedt [26] described the species as rare in southern and middle Sweden, and it was absent from the earlier Scandinavian insect descriptions of Linnaeus [27,28] and Fabricius [29]. In contrast, the genomic homogenization between Lund and other European regions that occurred between 1933 and the 2010s could reflect increased migration through time, in conjunction with reduced genetic drift in fly populations that may have grown with time. Both of those demographic changes might be predicted based on concurrent changes in human activity during the 20th century. This time period featured a 5-fold increase in the human population of Lund and the surrounding region, along with a warming climate (Fig 1A), both of which may have been conducive to population growth in this human commensal insect (and hence reduced drift). In parallel, increased human transportation, particularly the expanded shipping of fruit and other commodities, would be expected to facilitate increased long-distance dispersal of D. melanogaster. We note that such temporal shifts in demography would not be detected by conventional demographic inference based on modern samples, and it is unclear how well simple demographic models can recapitulate the effects of more complex histories on genomic diversity. We also examined whether changes in the frequencies of common polymorphic inversions may have influenced genetic diversity between time points (see Methods). We found only a single copy of (2L)t from one 1933 Lund genome, and no inversions among the 1800s genomes (S6 Table). In the modern Lund sample, (2L)t is at 13% frequency, while other tested inversions appear to be absent (S6 Table). Hence, although we cannot rule out changes in inversion frequencies through time, inversions do not appear to be a likely driver of genomic differentiation between time points. We further investigated a simple model of the demographic history of Lund by using a Bayesian approach to identify the best-fitting effective population size (N e ) for each time interval. For each chromosome arm separately, we used simulations to identify the value of N e that best matched the empirical mean change in allele frequency at non-singleton SNPs between time points (see Methods). In contrast to longer-term estimates of N e in this species, which are often on the order of 1 million (e.g. [30]), our point estimates of Lund N e were only 2,500 to 3,300 diploid individuals for the 1800s to 1933 time interval and 2,400 to 3,200 for the 1933 to 2010s interval (S7 Table). These values are of the same order of magnitude as an estimate of 9,500 from a northeast United States population between 1975 and 2015 [14]. Estimates from both studies share, however, a key limitation in assuming that genetic drift (along with random sampling variance) is responsible for all observed frequency differences. To address how well this assumption holds, we compared π from coalescent simulations based on our estimated N e values to those from the empirical data, using a demographic model previously estimated for a French population [30] as a starting point for the pre-1800s history. While this history allowed us to match the Lund 1800s π reasonably well, our low estimates of N e yielded simulated values of π that were less than half of those actually observed in our empirical data (S7 Table). These results indicated that drift-only models did not provide an accurate approximation of the Lund population’s demographic history and that true N e values were probably greater than our estimates. Instead, migration may have played an important role in shifting allele frequencies without decreasing π, as suggested above. Estimating a reasonable spatiotemporal demographic model that incorporates both local population sizes and geographic patterns of genetic structure and gene flow may require more extensive sampling of population genomic data across space and time.
Genome scans to detect recent shifts in allele frequencies One principal goal of this study is to identify instances of elevated genetic differentiation between time points that may reflect the action of recent positive selection. Our SNP-based genome scan focused on population branch excess (PBE [31]), a statistic that quantifies elevated allele frequency differentiation in a focal population compared to 2 other reference/outgroup populations. PBE builds upon the F ST -based population branch statistic (PBS [32]) but is more tailored to detecting selection that is specific to the focal population. To search for selection between the early 1800s and 1933, we defined the 1800s samples as the focal population and included the Lund 1933 and Lund 2010s samples as outgroups. To focus on the 1933 to 2010s interval, Lund 1933 was the focal population and Lund 2010s and France were the outgroups. PBE was evaluated in diversity-scaled windows averaging 4.6 kb in length. In the absence of a suitable demographic model to provide a null hypothesis for the extent of genetic differentiation, we focused our analysis on the top 1% of window PBE values. However, we emphasize that any outlier-based genome scan may entail both false positives and false negatives. Our SNP-based PBE scans revealed notable outlier peaks reflecting elevated allele frequency change for each time period (Fig 5A and 5B). Overall, we identified 190 outlier regions for the 1800s to 1933 time interval (referred to below as scan A) and 173 for the 1933 to 2010s interval (scan B), with some of these regions incorporating multiple outlier windows (S8 Table). Since not all outliers may represent true positive targets of recent natural selection, we only discuss loci that had among the most extreme frequency changes across each century, referring to these outlier regions based on their ranked maximal window PBE values. This restricted focus is motivated by uncertainty regarding the actual number of loci under selection in each time interval, which we do not attempt to estimate due to limitations in the data and in our ability to identify a realistic neutral model. PPT PowerPoint slide
PNG larger image
TIFF original image Download: Fig 5. Allele frequency changes across centuries reveal candidates for positive selection. (A) Allele frequency change for approximately 5 kb genomic windows, focusing on the 1800s–1933 time interval (scan A), as quantified by PBE. Genes within top outlier regions that are discussed in the text are indicated. (B) PBE is shown for the same genomic windows, focused on the 1933–2010s time interval (scan B). (C) Window-level PBE is shown for a roughly 50 kb region focusing on the top outlier region from scan A, with the position of the known selection target ChKov1 indicated. (D) Window-level PBE is shown for a roughly 100 kb region focusing on the top outlier region from scan B, with the position of the known selection target Cyp6g1 indicated. Data underlying this figure can be found in S8 Table.
https://doi.org/10.1371/journal.pbio.3002333.g005
Top SNP-based outliers inform classic examples of positive selection In both time intervals, the top window PBE value was also associated with the widest outlier region, which is consistent with relatively strong positive selection. For the 1800s to 1933 interval, the top outlier (labeled A1) was a 31 kb region that included the gene CHKov1 (Fig 5C), which is thought to encode an ecdysteroid kinase [33]. This gene is known to be associated with a protein-truncating transposon insertion under recent positive selection in non-African D. melanogaster populations, which was reported to correlate with resistance to an organophosphate pesticide [34]. Transposon insertion and gene duplication at the CHKov1/CHKov2 locus has, however, also been found to correlate with resistance to infection by the D. melanogaster sigmavirus (DmelSV) [35]. Given the lack of widespread insecticide usage during this period, the timing of selection indicated by our analysis rather favors the antiviral role of this locus as a potential selective advantage. It was estimated that D. melanogaster acquired this rhabdovirus around 200 years ago [36], which predicts that our 1800s interval should have included key stages in the evolution of resistance to DmelSV. Consistent with this hypothesis, we also found the gene refractory to Sigma P (ref(2)p) within outlier region A20. Ref2p activates the Toll pathway and has been shown to confer resistance to DmelSV [37]. CHKov1 and ref(2)p are of particular interest because mutational variants associated with viral resistance have been previously identified [35,38]. In CHKov1, several SNPs have alleles in linkage disequilibrium with a transposon insertion in the gene [34] associated with viral resistance [35]. Based on 7 such SNPs scored in our data, we estimated that the resistant haplotype increased from 16.7% in the 1800s sample to 100% in the 1933 and 2015 Lund samples. The occurrence of 3 resistance-associated haplotypes among the 1800s genomes (carried in heterozygous form by 2 early 1800s Lund flies, as well as the later Zealand sample) could indicate that selection on this allele began prior to the collection of our 1800s flies. The fixation of the resistant haplotype in the 1933 and 2010s Lund samples suggests a complete (or nearly complete) sweep, mirroring findings from some, but not all recently collected D. melanogaster population samples [34]. The outlier PBE score for ref(2)p for the 1800s time interval appeared not to be driven by the short, complex deletion identified by [38]. Based on inspecting reads, this variant appeared to exist in 2 early 1800s Lund flies and two 1933 Lund flies, implying modest frequency change between temporal samples. The highest SNP-level PBE score at this gene was at a non-synonymous variant over 1 kb downstream from the complex deletion (at R5 position 2L:19544138, a threonine/serine polymorphism). The top outlier for the 1933 to 2010s interval (denoted region B1) spanned 75 kb and included a known target of insecticide resistance evolution, Cytochrome P450 6g1 (Cyp6g1, shown in Fig 5D). This gene is associated with resistance to dichlorodiphenyl-trichloroethane (DDT) and other insecticides in D. melanogaster. As with ChKov1, there is prior evidence for positive selection associated with transposon insertions and gene duplication [39–42]. Here, the novel and widespread deployment of DDT, introduced in 1944, provides a clear hypothesis for a selective pressure driving the observed frequency changes at SNPs linked to the locus in question.
Genomic and temporal distributions of outlier frequency changes As hinted by the exclusively X-linked examples shown in Fig 7, for the 1933 to 2010s scan specifically, there was an abundance of X-linked loci among the top outliers: although the X chromosome constitutes less than one fifth of the analyzed genome, 12 of the top 15 outlier regions were X-linked (S8 Table). Explanations of this enrichment could include greater X chromosome effects of sampling variance (although this effect should have been stronger for the 1800s scan), genetic drift, or positive selection (potentially facilitated by X chromosome hemizygosity [70]). However, neither scan was meaningfully enriched for X-linked loci if the full list of outliers was considered (with the X contributing 21.7% of outliers for scan A and 18.6% for scan B). The excess of top outliers specifically for scan B could indicate that more of the X chromosome’s selection in this interval was from hard sweeps (as suggested by [71]) and therefore more readily detectable from window genetic differentiation [43]. However, the narrower peaks of differentiation shown in Fig 7 could indicate a role of X-linked soft sweeps as well. Among the top outliers discussed from each period, none appeared within the full list of 1% PBE outliers from the alternate time period. At least for genes promoting adaptation to local environmental conditions (such as temperature or day length), we might predict that alleles driving adaptation in the 1933 to 2010s time interval would have been adaptive in the 1800s to 1933 time interval as well. It is possible that each local population initially possessed only a subset of adaptive variants, and others arrived later via migration (or mutation). Alternatively, in the 1800s to 1933 interval, the frequencies of causative variants at some genes may have still been too early in their rise to generate extreme outlier PBE signals, but poising them to rise more quickly in the 1933 to 2010s interval. Another possibility is contingency—some variants may have only become beneficial after other adaptive changes had occurred due to epistatic interactions among circadian genes. Alternatively, this lack of outlier overlap could indicate that selective pressures differed between eras or that adaptive events tended to complete within a given time period rather than spanning between them.
Further insights into previously implicated selection targets Three other top outliers for the 1933 to 2010s interval included genes previously implicated in recent positive selection in European D. melanogaster. Prosap (in region B6) is thought to encode a structural component of the postsynaptic density [72] and affects circadian rhythm [73]. This gene was previously found to display signals of parallel adaptation across cold-adapted D. melanogaster populations [64], and it harbors a polymorphic transposon insertion associated with signals of selection [74]. Region B7 is centered on phantom (phm), which encodes a cytochrome P450 protein involved in ecdysteroid biosynthesis [75]. Variation at this gene revealed a selective sweep signal in a European D. melanogaster population [76], and the protein displays rapid evolution between species [77]. Further down the list, the region B21 contained polyhomeotic-proximal (ph-p), a developmental and chromatin-modifying gene (e.g., [78,79]), which also displayed evidence of a selective sweep in a European population [80], associated with altered thermosensitivity of gene expression [81]. For these outliers, our findings offer insights regarding the timing of positive selection.
Potential shifts in membranes and metabolism We also applied a Gene Ontology (GO) enrichment analysis to the broader sets of top 1% PBE outlier regions, using the permutation approach initially described in [82], separately to the outliers detected from each temporal scan. From the 1800s to 1933 scan, some of the most enriched categories are related to cell shape and morphogenesis, including actin-mediated contraction and membrane invagination (S9 Table). Enrichment was also observed for membrane (and vesicle) organization, compatible with the existence of population differences in membrane lipid composition [83] and this time interval coinciding with the species’ earliest known occupation of northern Europe [15]. Curiously, “response to DDT” was also enriched in the 1800s to 1933 interval, due to 3 cytochrome P450 outlier genes other than Cyp6g1 in separate outlier regions, potentially reflecting pre-insecticide roles of insecticide-related genes (as with the ChKov locus described above). For the 1933 to 2010s interval, insecticide-related GO categories were not enriched among outliers. Instead, metabolic processes dominated the top results (including amide, fatty acid, lipid, and peptide metabolism), and categories related to cell division were enriched as well (S9 Table). As with membrane composition, metabolism is also known to vary significantly between D. melanogaster populations from contrasting thermal environments (e.g., [84,85]).
[END]
---
[1] Url:
https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3002333
Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.
via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/