(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .
Reconstructing the history of founder events using genome-wide patterns of allele sharing across individuals [1]
['Rémi Tournebize', 'Department Of Molecular', 'Cell Biology', 'University Of California', 'Berkeley', 'California', 'United States Of America', 'Center For Computational Biology', 'Gillian Chu', 'Department Of Electrical Engineering']
Date: 2022-09
Founder events play a critical role in shaping genetic diversity, fitness and disease risk in a population. Yet our understanding of the prevalence and distribution of founder events in humans and other species remains incomplete, as most existing methods require large sample sizes or phased genomes. Thus, we developed ASCEND that measures the correlation in allele sharing between pairs of individuals across the genome to infer the age and strength of founder events. We show that ASCEND can reliably estimate the parameters of founder events under a range of demographic scenarios. We then apply ASCEND to two species with contrasting evolutionary histories: ~460 worldwide human populations and ~40 modern dog breeds. In humans, we find that over half of the analyzed populations have evidence for recent founder events, associated with geographic isolation, modes of sustenance, or cultural practices such as endogamy. Notably, island populations have lower population sizes than continental groups and most hunter-gatherer, nomadic and indigenous groups have evidence of recent founder events. Many present-day groups––including Native Americans, Oceanians and South Asians––have experienced more extreme founder events than Ashkenazi Jews who have high rates of recessive diseases due their known history of founder events. Using ancient genomes, we show that the strength of founder events differs markedly across geographic regions and time––with three major founder events related to the peopling of Americas and a trend in decreasing strength of founder events in Europe following the Neolithic transition and steppe migrations. In dogs, we estimate extreme founder events in most breeds that occurred in the last 25 generations, concordant with the establishment of many dog breeds during the Victorian times. Our analysis highlights a widespread history of founder events in humans and dogs and elucidates some of the demographic and cultural practices related to these events.
A founder event occurs when small numbers of ancestral individuals give rise to a large fraction of the population. Founder events reduce genetic variation and increase the risk of recessive diseases. Despite their importance in evolutionary and disease studies, we still only have a limited comprehension of their prevalence and properties in humans and other species, as most existing methods require large sample sizes or phased genomes. Here, we present a flexible method, ASCEND, to infer the timing and the strength of founder events that is suitable for sparse datasets with few samples or limited coverage. ASCEND provides reliable estimates across a wide range of demographic scenarios. By applying it to data from two species (humans and dogs), we document a widespread history of recent founder events in both species and provide insights about the demographic processes related to these events. Our analysis helps to identify groups with strong founder events that should be prioritized for future studies as they offer a unique opportunity for biological discovery and reducing disease burden through mapping of recessive disease-causing genes and pathways, as previously shown in studies of Ashkenazi Jews and Finns.
In this article, we introduce ASCEND (Allele Sharing Correlation for the Estimation of Non-equilibrium Demography) that extends the idea introduced in Reich et al. (2009) with two major improvements: (i) we estimate the strength of founder events in addition to their timing, and (ii) we implement the fast Fourier transform to make the approach computationally tractable, allowing us to survey large datasets. Further, we provide theoretical expectations for leveraging allele sharing correlation for estimating the parameters of the founder events. We report simulations under a range of demographic scenarios to assess the reliably of ASCEND and apply the method to empirical datasets from two species—humans (using present-day and ancient samples) and modern dog populations—to characterize the spatiotemporal patterns of founder events in their history.
A third class of methods, introduced by Reich et al. (2009), characterizes the average allele sharing correlation across individuals in a population to infer the time of the founder event [ 18 ]. This approach uses the insight that a founder event introduces long-range linkage disequilibrium (LD) or allelic correlation in nearby loci co-inherited from a common ancestor by a pair of individuals in a population. As recombination occurs in each generation, it breaks down these associations over time. Thus, by measuring the decay of allelic association or LD across the genome at sites that are shared between pairs of individuals (i.e., inherited identical by state (IBS)), the time of the founder event can be inferred. A major advantage of this approach is that it does not require phased data or explicit identification of IBD segments, making it suitable for sparse datasets.
There are two main classes of methods currently available for characterizing founder events: polymorphism-based and haplotype-based approaches. Polymorphism-based approaches leverage the observed patterns of genetic variation, either by studying the density of heterozygous sites in a region (e.g., PSMC and MSMC [ 10 , 11 ]) or by analyzing the allele frequencies of markers in a population (such as δaδi, PopSizeABC or fastsimcoal [ 12 – 14 ]), to recover the time to the most recent common ancestor across the genome. These methods make inferences based on the mutation clock and thus have low resolution at recent timescales [ 10 , 11 ]. Haplotype-based methods characterize the distribution of IBD segments in a population to infer recent demographic history [ 15 , 16 ]. Most commonly used IBD-based methods, e.g. DoRIS [ 16 ] and IBDNe [ 15 ], recommend the use of phased data that can be obtained from computational phasing of population data; however, this typically requires large numbers of high quality samples. Errors in computational phasing (“switch errors”) can result in biased estimates of IBD segment lengths, and in turn population size inference in real data [ 16 ]. As the rate of phasing errors is inversely proportional to the sample size and the length of IBD segments [ 17 ], IBD-based methods tend to be noisy and less accurate at older timescales and when using sparse datasets with small sample sizes or limited coverage such as ancient genomes.
Despite the importance of founder events in evolutionary and disease studies, we still only have a limited comprehension of their number, tempo, and strength in humans and other species. Characterizing the timing and strength of founder events is the first step towards improving our understanding of the impact of founder events on neutral and deleterious genetic variation. In particular, the estimated timing of the founder event (referred to as founder age, henceforth) can inform us about the cultural or environmental factors underlying the founder events. Further, it offers insights about the expected length of genomic segments that are inherited identical-by-descent (IBD) among individuals in the population. The strength of a founder event measured as the reduction in population size due to the bottleneck is informative about the probability of fixation of alleles, including deleterious and disease-associated variants [ 9 ]. Together, these parameters can reveal the evolutionary history and impact of founder events in shaping genetic diversity and disease risk.
Founder events reduce genetic variation in a population, decrease the efficacy of selection to remove deleterious variants, and increase the risk of recessive diseases [ 1 ]. Understanding the history of founder events can thus be helpful for learning about the cultural and demographic events leading to population bottlenecks, and importantly, for mapping functional and disease variants. Gene mapping efforts in founder populations—including Ashkenazi Jews, Finns, Amish, and French Canadians—have resulted in the discovery of numerous disease-causing mutations in each group [ 7 ] and refined our understanding of disease architectures [ 8 ].
A founder event occurs when a new population is formed by a subset of individuals from a larger group or when the original population goes through a reduction in size due to a bottleneck [ 1 ]. Founder events have played a critical role in shaping genetic diversity in many species, including humans. For instance, anatomically modern humans spread worldwide in the past ~50,000-100,000 years, following periods of successive bottlenecks and mixtures [ 2 ]. Many human populations have further undergone severe founder events in the recent past (past hundreds of generations) due to geographical isolation (e.g., Finns [ 3 ]) or historical migrations (e.g., Roma [ 4 ]) or cultural practices (e.g., Amish [ 5 ] and Ashkenazi Jews [ 6 ]).
Results of ASCEND for all dog breeds that passed the filtering criteria and showed evidence for significant founder events in the Hayward dataset (see Methods ). The breeds are grouped according to their role as reported by the American Kennel Club database. The outer (colored) rim represents the founder intensity, with bar height proportional to the founder intensity. Note that within each role category, breeds are classified by decreasing founder intensity in the clockwise direction. The inner (gray) rim represents the founder age (in generations before present), with the bar height proportional to the founder age. The width of the bars is inversely proportional to the number of breeds within each role category. Icons were retrieved from openclipart.org.
Application of ASCEND to canids revealed significant evidence of founder events in all populations analyzed, with an average intensity of ~25.3% across breeds (ranging between 1.3% in village dogs to 77.7% in Boxers) ( Fig 4 and S5 Table ). Note, a subset of SNPs used in this analysis were ascertained in Boxers and the canid reference genome (CamFam) also derives from a Boxer individual. These factors could lead to increased power to detect founder events in Boxers, though it should have minimal impact on other breeds [ 52 – 54 ]. Interestingly, founder intensities differed significantly across the traditional roles of dog breeds, with significantly higher estimates in traditionally agricultural or sedentary breeds (non-sporting dogs or working dogs) than breeds used for hunting or sports (hounds or sporting dogs) (P = 6×10 −4 , Kruskal-Wallis test) ( Fig 4 ). We obtained highly correlated results in the Sams and Hayward datasets (founder age: Pearson’s r = 0.9, P = 4×10 −4 , and founder intensity: r = 0.92, P = 1.9×10 −4 ) (Fig AD in S1 Text ). Founder events in all breeds occurred very recently, within the past 25 generations ( Fig 4 and S5 Table ). The most recent founder event we inferred occurred ~6 generations ago (in Gordon Setter) and the oldest ~24 generations ago (in Bulldogs) ( Fig 4 ). Assuming a generation time of 3–5 years [ 55 , 56 ], this translates to ~75–125 years ago.
To demonstrate the general applicability of our method, we applied ASCEND to data from modern dog breeds. The domestication and establishment of various dog breeds were accompanied by severe founder events and selection [ 49 ]. To reconstruct the history of strength and timing of these founder events, we used data from ~6,000 domesticated dogs (~200 breeds, including populations of village dogs and mixed breeds) from two publicly available datasets from the Sams [ 50 ] and Hayward [ 51 ] studies. We excluded individuals with evidence of recent inbreeding or close relatedness and considered only groups with more than 5 individuals. After filtering, we retained 52 populations belonging to 40 unique breeds and two village dog populations (Methods, S4 Table , Notes S6 in S1 Text ).
We estimated the oldest founder event in Upper Paleolithic individuals sampled from the Taforalt Cave cemetery (eastern Morocco). These individuals were associated with the Iberomaurusian culture of microlithic bladelet technology from North Africa [ 47 ]. A recent study found that these individuals can be modeled as a mixture of Near Eastern hunter-gatherers (Natufians) and sub-Saharan Africans, with no apparent gene flow from the Epigravettian culture of Paleolithic southern Europe [ 47 ]. Due to the staged appearance of microlithic bladelet technologies and its rare geometric form, it had been suggested that population structure, population bottlenecks or intermittent isolation of populations in North Africa could potentially explain the lack of continuity in stone tool cultures in this region [ 48 ]. Indeed, we found evidence for a significant founder event occurring ~16,700 years BP with a founder intensity of ~20% [16.3%–23.7%] similar to the strongest founder event inferred in present-day human populations ( S3 and S5 Figs). Our results support the hypothesis that a relatively small group of individuals developed the Iberomaurusian tool culture.
Turning to time transects across Europe, we compared hunter-gatherers, Neolithic farmers and Bronze Age individuals across diverse regions. These samples are associated with major changes in lifestyle, technologies and genetic ancestry across Europe, coupled with high rates of population growth. Comparing the founder events across ancient Europeans, we estimated that the frequency of founder events decreased over time, varying between 100% in hunter-gatherer to 78% in Neolithic Europeans and finally, 55% in the Bronze Age individuals. Further, we found that the intensity of founder events decreased significantly across these three periods of transition (Kruskal-Wallis test, P = 1x10 -5 ) ( Fig 3C ). Interestingly, we note that founder ages in hunter-gatherers, Neolithic and Bronze Age populations have overlapping intervals with the medians across groups ranging between 9,000–15,000 years (P > 0.05). Moreover, founder intensity in these populations was strongly and positively correlated with their sampling age (P < 10 −7 ). These results suggest that the founder events could be in part or fully related to a shared founding bottleneck in the ancestors of Europeans and the founder intensity has decreased over time, possibly due to gene flow from Near Eastern farmers and Steppe pastoralists.
Recent analysis has shown that present-day Europeans are a mixture of three major ancestry groups related to ancient European hunter-gatherers, Anatolian farmers, and Eurasian Steppe pastoralists [ 44 ]. Previous comparisons of diversity patterns in ancient samples suggested that ancient hunter-gatherers had very low genetic diversity [ 45 ]. However, these inferences were based on low coverage data, where calling diploid genotypes is challenging and hence estimates of heterozygosity can be uncertain [ 45 , 46 ]. We applied ASCEND to ancient west Eurasian samples and compared the intensity of founder events (a proxy for genetic diversity in the population) across groups. Specifically, we used European hunter-gatherers (sampled between 6,394–9,721 years BP), Near Eastern farmers (4,749–9,958 years BP) and Eurasian Steppe pastoralists (3,505–7,530 years BP). To expand our sample size, we considered founder events that occurred in the past 300 generations before the individuals lived (note, this is older than the threshold of 200 generations used elsewhere, but we carefully inspected that the fitted curves were reliable). Across the three groups, we found that the frequency of founder events was similar, ranging between 90–100%. However, the average founder intensity was significantly higher in European hunter-gatherers (15.4%–29.5%, interpercentile range at 95%) compared to the Near Easter farmers (9.1%–12.4%; P = 0.0053) or the Steppe pastoralists (4.9%–10.6%; P < 10 −20 ) ( Fig 3C ). Similar results were obtained when we restricted the comparison to groups with founder ages below 200 generations (P < 0.01 for both pairwise comparisons). This highlights the impact of modes of sustenance (foraging, farming and pastoralism) on human population sizes, mirroring the pattern seen in the analysis of present-day individuals.
In the ancient Americas, we observed evidence for three main founder events. Using the Lapa do Santo individuals from Brazil, dated 9,500 years BP, we estimated a founder event that occurred ~11,900–12,700 years ago. We obtained similar dates using the Sumidouro samples from Brazil, though many samples in this group have unusually high transition-to-transversion ratio suggesting potential data quality issues [ 42 ]. These founder ages overlap estimated dates of the settlement of the Americas [ 43 ]; however, they should be viewed as tentative evidence considering the data quality issues. Another bottleneck we inferred occurred ~5,500–6,000 years BP in ancient samples from three Caribbean islands (Cuba, Dominican Republic and Bahamas) with diverse sampling ages, ranging between 470 to 2,300 years. Finally, two ancient populations from the Pacific coast of North America (San Nicolas Island from California and Aleut from Alaska) showed evidence for founder events between 2,400 and 2,800 years BP. Unlike in present-day samples, these dates overlap archeological evidence for the peopling of the various islands as the samples are closer to the timing of first settlements in these islands. Finally, we found ancient groups from the Americas had markedly stronger founder events than present-day individuals from this region, with on average four-fold higher intensity.
Applying ASCEND, we discovered that 36% (n = 60) of the ancient groups had significant founder events that occurred 200 generations before the individuals lived ( S5 Fig and S2 Table ). Overall, the median intensity across these groups was ~9.2%, around four-fold higher than the median in present-day individuals. The strongest founder intensity was estimated in the ancient samples from Cuba—with nearly two-fold higher intensity than present-day Onge—suggesting that Cuba was settled by a small group of individuals or has maintained historically low population sizes for many generations. In general, we observed both the frequency and average founder intensity were significantly higher in groups from the Americas (frequency of 53% out of 9 analyzed groups with intensity of 10.4%–45.9%) compared to groups from West Eurasia (32% of 127 groups with intensity of 2.2%–30.7%) (P = 3×10 −5 ). We inferred the founder events occurred ~5–200 generations before the individuals lived. Accounting for the mean generation time and sampling age of the ancient specimens, these results translate to estimated founder ages of ~500–17,000 years before present (BP). We noticed a trend of decreasing frequency of founder events with age—lowest frequency towards the present despite large sample sizes at recent timelines ( S7 Fig , Kolmogorov–Smirnov test: P = 2×10 −12 ), highlighting the recent human population growth.
To investigate founder events deeper into the human evolutionary past, we applied ASCEND to ancient DNA samples from the v44.3 release of AADR (that has a larger set of ancient genomes compared to v37.2). We limited the analysis to unrelated individuals from populations with at least 5 individuals. For ancient DNA samples, it is difficult to match the outgroup population as samples are from different timescales and geographic locations. Thus, we used the within-population allele sharing weighted covariance to avoid any bias introduced by the choice of outgroup as well as use of pseudo-haploid genotypes (Methods). We verified the reliability of this approach by simulations (Note S2.7.3 in S1 Text ). After filtering, we retained 1,947 individuals from 164 worldwide ancient populations, though the vast majority of the samples were from the Americas and West Eurasia (Notes S3 in S1 Text ). Our sampling ages fall in the range of ~100–15,000 years inferred using radiocarbon dating or based on the cultural context of the specimens ( S1 Table ). We note that most groups older than 15,000 years had less than five samples and thus were excluded from further analysis.
To understand the demographic processes leading to these extreme founder events in South Asia, we investigated the timing of the founder events across diverse ethno-linguistic groups. Previous studies have shown that most present-day Indians have ancestry from two divergent ancestral populations: Ancestral North Indians (ANI) related to Central Asians and Iranians, and Ancestral South Indians (ASI) distantly related to the Onge population [ 18 , 41 ]. Ancient DNA analyses have further shown that both ANI and ASI are in turn mixtures of ancient groups of South Asian hunter-gatherers, Iran Neolithic farmers, and Eurasian Steppe pastoralists [ 27 ]. Application of ASCEND revealed that founder ages ranged between ~115 years ago (Gujjar) to ~3,500 years ago (Gujaratis), with the majority of the founder events occurring within the past 1,000 years ( S2 Table ). Comparing the founder ages with dates of ANI-ASI admixture [ 27 ], we found that founder ages significantly postdated the admixture dates in most groups (P = 2.2×10 −5 ) [ 27 ]. There were no significant differences in founder ages across speakers of the four major language families spoken in India (i.e., Austro-Asiatic, Dravidian, Indo-European and Tibeto-Burmese languages) (Kruskal-Wallis test: P>0.05), suggesting that the spread of languages is not associated to the founder events in these groups.
Among contemporary populations, we found that the majority (64%) of South Asian groups in the HO37 dataset had more extreme founder events than in AJs ( Fig 2 ). To investigate this history in more detail, we analyzed a larger dataset of 1,662 individuals from 249 South Asian ethno-linguistic groups genotyped on the Human Origins array [ 25 ] (referred to as IndiaHO dataset). After filtering, we found that 56% (66 out of the 118 groups) showed significant evidence of founder events (Methods, S1 and S3 Figs). Estimated founder intensities were strongly correlated with the IBD scores (a measure of the strength of founder events) calculated in an earlier study [ 25 ] (Pearson’s r = 0.95, P < 10 −5 ), highlighting the reliability of using allele sharing correlations to infer the strength of the founder event (Notes S5.2 in S1 Text ). In concordance with patterns seen in other worldwide regions, we observed indigenous and tribal groups (n = 23) had significantly stronger founder events than other groups (n = 15) (Kruskal-Wallis test: P = 0.024) ( S2 Table and Fig 3B ).
Native American groups have experienced major population declines associated with the impact of European colonization. The precise extent of this decline and the timing are, however, still debated [ 37 ]. Application of ASCEND to seven Native American groups from Central and South America showed evidence for significant recent founder events ( S2 Table ). Despite recent European gene flow in most groups [ 38 ], we inferred the median founder intensity in Native Americans was almost three-fold higher than in AJs, ranging between two- (in Quechua) to seven-fold (in Rapa Nui) higher intensity than AJ ( Fig 2 ). We inferred the founder event occurred ~200–500 years ago ( Fig 2 ), postdating the European colonization of the Americas [ 39 ]. The strongest founder event was documented in Rapa Nui that occurred ~260 years ago, coinciding with the migration of Europeans to the island [ 40 ].
We detected strong founder events related to nomadic lifestyle and modes of sustenance across the analyzed populations. In Africa, many hunter-gatherer groups had significant founder events including Biaka pygmies, Mbuti pygmies and Ju|ʾhoan hunter-gatherers from South Africa ( Fig 2 ). These founder events occurred recently within the past 10–20 generations. Our estimate for Mbuti pygmies (18–24 generations ago) is consistent but more precise than previous estimates (10–100 generations [ 34 ]). We also documented strong founder events in nomadic groups from Yemen Desert and Bedouins ( Fig 2 ). Previous studies have highlighted high rates of consanguineous marriages in Bedouins [ 35 , 36 ]. Given that we removed recent relatives from our analysis, these results indicate that Bedouins likely experienced strong recent founder events, in addition to a history of consanguineous marriages [ 36 ]. Most of the studied indigenous Northeast Asian groups—including Aleut, Chukchi, Eskimos and Yakut—had evidence for extreme founder events (with median intensity almost three-fold higher than AJ) in the past 1,000 years ( Fig 2 ).
We show the variation in estimated founder intensity as a violin plot across groups, classified in three plots. Each violin plot includes a boxplot and the number of populations (n) in each group along with the mean ± standard deviation and the total number of individuals used in the analysis. Note that within each panel, the areas of violins are the same. (A) Continental vs. island populations. This plot shows the variation in founder intensities estimated for present-day populations in the HO37 dataset classified according to geography. (B) Tribal vs. non-tribal groups in South Asia. This plot shows the variation in founder intensities estimated in the South Asian groups from the IndiaHO dataset. (C) Ancient hunter-gatherers, Near Easter farmers, European Neolithic farmers, Steppe pastoralists and Bronze Age populations. This plot shows the variation in estimated founder intensities for ancient groups in the HO44 dataset, classified based on their mode of sustenance. Below the number of individuals used in the analysis, we report the median radiocarbon age of each category in years BP (yBP). We note that in order to increase the number of groups in each category, we considered populations where the estimated founder age was dated below 300 generations before sampling age (default for other analyses was below 200 generations).
We observed that island populations had more extreme founder events compared to continental groups. Using data from 16 islands and 97 continental groups, we found that on average island groups had a ~2.5-fold higher founder intensity than estimated in continental groups (P = 3x10 -3 , bootstrap resampling) ( Fig 3A ). Following Onge in South Asia, populations in Oceania (n = 4 populations) and Southeast Asia (n = 6) were found to have experienced very strong founder events. For instance, the founder intensity inferred in island groups from Papua New Guinea, Philippines and Taiwan is almost five- to ten-fold higher than in AJ. In Europe, groups from Iceland, Malta, Orkney and Sardinia had estimated founder intensities on par with AJ or more extreme, suggesting these groups had strong historical bottlenecks. In most cases, the founder ages postdate the estimates for the first settlement of the islands, suggesting ongoing population bottlenecks in many groups after their initial habitation ( S2 Table ).
Our dataset includes samples from 11 Jewish groups, including Ashkenazi, Caucasus (Georgian), Middle Eastern (Turkish, Iranian and Iraqi), African (Moroccan, Libyan, Tunisian and Ethiopian), and Indian (Cochin) Jewish communities. We observed significant evidence of founder events in most Jewish groups, except in Ethiopian and Turkish Jews ( S2 Fig ). While many studies have focused on understanding the founder events in AJ, we found that the founder intensity estimated in most other Jewish groups was higher than in AJ, with the exception of Middle Eastern Jews. These results are in line with previous results based on runs of homozygosity (ROH) or IBD analysis [ 31 , 32 ]. The estimated founder ages varied significantly across Jewish communities, ranging between 280–1,300 years ago, highlighting recent population bottlenecks that have impacted the genetic diversity in each group, but still older than what could be expected by consanguineous marriages with the presence of very long runs of homozygosity (ROH) [ 32 ]. The strongest and most recent founder event was inferred in Cochin Jews, with the inferred timing similar to previous reports [ 33 ].
Across worldwide populations, we identified 53 groups that have experienced more extreme founder events (with significantly higher founder intensity) than AJs, who have high rates of recessive diseases due to their history of founder events [ 1 , 21 – 23 ]. These populations are particularly interesting from a population and medical genetics perspective to understand the genetic consequences of population bottlenecks. Below we highlight a few notable patterns of founder events across worldwide groups and provide detailed results in S2 Table .
Results of ASCEND for present-day populations in the Human Origins v37 dataset that passed filtering criteria and showed significant evidence of founder events (see Methods ). Each point shown represents a population and the vertical segment represents the age of its associated founder event (where the segment length is proportional to the founder age). To avoid overplotting in certain areas, we shifted the location of a few populations and indicated their original location (black diamond point) with an arrow getting darker towards the original location. The color gradient of the points and segments is proportional to the estimated founder intensity. Points with a black border represent populations which have experienced significantly stronger founder events than Ashkenazi Jews (shown in legend for reference). The strongest founder event is estimated for the Andamanese population Onge (21.2%). The world map was obtained from the R package maps with GPL-2 public license.
We applied ASCEND to study the global patterns of founder events in recent human history. We found that 61% of the analyzed populations (113 out of 184) experienced a significant recent founder event that occurred in the past 200 generations ( Fig 2 ). The most extreme founder event (with highest intensity) was observed in the Onge population from the Andaman Islands (20.6%–21.2%), almost 10-fold higher than in AJ ( Fig 2 ). The Onge are a demographically small and historically isolated population [ 28 ]. Demographic records suggest that this population has maintained a historically low population size and has a current census size of ~100 individuals (
http://censusindia.gov.in/ ). Across continental groups, we found that the frequency of founder events varied significantly; with the highest frequency in Oceania (80% out of 5 groups) and Americas (78% out of 9 groups) and the lowest proportion in Europe (38% out of 30 populations) ( Fig 2 ). In addition to the frequency, the average founder intensity differed significantly across continental groups (Kruskal-Wallis test, P = 7x10 -5 ). The founder ages ranged from ~10 generations (in Aleuts) to 195 generations (in Icelanders) or ~280–5,460 years, assuming 28 years per generation [ 29 , 30 ]. We found no correlation between founder age and founder intensity, suggesting that we can reliably disentangle the estimation of both parameters (P = 0.99).
Lastly, we confirmed that our results are reliable in groups with complex demography, particularly those involving admixture events. We compared the direct estimates of founder ages and dates of admixture inferred using genomic dating methods such as GLOBETROTTER and ALDER [ 26 , 27 ]. Across 64 worldwide populations, there was no significant correlation between estimated founder ages and average dates of admixture (P = 0.77 for ASCEND and GLOBETROTTER; P = 0.10 for ASCEND and ALDER) ( S3 Table ). This suggests that the inferred founder ages are not confounded by long ancestry blocks inherited through admixture, as seen in simulations (Notes S2.4 in S1 Text ).
We first assessed the reliability of ASCEND in real data by comparing our results with previous publications. ASCEND is an extension of the allele sharing correlation statistic introduced in Reich et al. (2009) that was applied to date founder events in India. Applying ASCEND to this dataset, we obtained highly concordant results for all groups except one (Sahariya), where the fit in the original study looked noisy (Notes S5.1 in S1 Text ). We also applied ASCEND to Ashkenazi Jews (AJ) and Finns that have been previously studied for their history of founder events [ 3 , 21 – 23 ]. Applying ASCEND to nine Finns and seven Ashkenazi Jews (AJ) in the HO37 dataset, we obtained significant evidence of founder events in both groups ( S2 Table ). We inferred that the founder event in Finns occurred ~120–245 generations ago, consistent with the separation of the western and eastern areas of Finland and the arrival of the Corded Ware Complex in this region [ 3 , 24 ]. Similar to previous reports, we found that the founder intensity in Finns was higher than in Ashkenazi Jews [ 25 ]. In AJ, we inferred the founder event occurred 23–51 generations ago with an intensity of 0.013–0.021 (henceforth, reported as percentages for easier readability, i.e., 1.3%–2.1%) (95% confidence interval). Our estimates are consistent with a previous study that used 128 whole genome sequences of AJ and inferred a founder age of ~25–50 generations ago and effective population size during the bottleneck of ~250–420 that translates to an intensity of ~1.8–3% (assuming the average bottleneck duration of 15 generations) [ 21 , 23 ]. This demonstrates the reliability of ASCEND in real data with few individuals and SNP genotypes alone.
We applied ASCEND to genome-wide data from 3,102 present-day individuals genotyped on the Affymetrix Human Origins array that are part of the Allen Ancient DNA Resource (AADR, v37.2 release) (referred as HO37 henceforth). We limited our analysis to all groups with a minimum of 5 samples. To ensure we are characterizing founder events, and not consanguinity that can also lead to long-range IBD sharing among individuals, we removed all individuals with evidence of recent relatedness (Methods). After filtering, we retained 2,310 present-day individuals from 184 groups ( S1 Table and Notes S3 in S1 Text ). Unless otherwise stated, we used a random set of 15 unrelated individuals to estimate the cross-population allele sharing correlation.
Finally, we implemented the fast Fourier transform (FFT) to make the allele sharing correlation calculations computationally tractable. The naïve approach for computing pairwise correlations across hundreds of thousands of markers (n) in the genome can be exceedingly slow in large datasets, requiring a runtime of O (n 2 ). Following Loh et al. (2013) and considering the similarity with admixture LD calculations, we computed allele sharing correlations using FFT (Methods). Using simulations for a range of parameters, we show that the FFT approach is up to 50 times faster and provides nearly identical results to the naïve implementation (Table D in S1 Text ). This allows us to apply the method to large genomic datasets.
An important feature of ASCEND is that it does not require phased data, which makes it suitable for datasets with small sample sizes and low coverage such as ancient genomes. To test the reliability of ASCEND for application to sparse datasets, we investigated the impact of: (i) sample size; (ii) missing data, and (iii) features of ancient DNA samples that include (i) and (ii) along with the use of pseudo-haploid genotypes—a common practice in ancient DNA studies, where due to low sequencing coverage, the diploid genotype is determined by selecting a single random allele observed in the reads mapped at a particular site [ 20 ]. We observed that ASCEND estimated the parameters of the founder event accurately for target groups with greater than 5 samples (Fig R in S1 Text ), even with a high rate (up to 70%) of missing data (Fig S in S1 Text ). Comparing the results obtained using pseudo-haploid and diploid genotypes, we found that the estimated founder ages were similar, though the founder intensity was underestimated with pseudo-haploid genotypes (Fig V in S1 Text ). This is expected because pseudo-haploid data lack heterozygous sites and thus cross-over events at short distances are missed, leading to an inflation in the variance of the allele sharing correlation. We show that by applying a correction based on the sample heterozygosity (referred as weighted allele sharing covariance), we obtain unbiased estimates for the founder intensity even in samples with large amount of missing data (Methods, Fig W in S1 Text ). We thus use the weighted allele sharing covariance for ancient DNA and sparse present-day samples.
Next, we simulated the target population with a more complex history involving gene flow and multiple population bottlenecks. We generated data for a target population with recent gene flow that occurred ~100 generations ago that was followed by a founder event (Notes S2.4.1 in S1 Text ). Applying ASCEND to the target population and using one of the ancestral groups as the outgroup to compute cross-population allele sharing, we inferred accurate estimates for both founder age and intensity and observed no significant impact of admixture on the inferred parameters (Fig K in S1 Text ). We also simulated data for a target group with a history of two successive founder events, separated by a period of 10–200 generations (Notes S2.3 in S1 Text ). We found that under this scenario, ASCEND reliably recovered the intensity of the strongest founder event. For severe bottlenecks (N f = 5), we recovered the age of the most recent founder event, though for less severe bottlenecks, the estimated age was intermediate between the two founder ages, proportional to the weighted average of the timing of the two events (roughly weighted by intensity) (Fig I in S1 Text ).
(A) Model of founder event. Consider a population which has experienced a founder event in its past. This history can be divided into three main periods (from the most ancient to the most recent): a period P o where the population has a constant effective population size of N o , followed by a period P b where the effective population size reduces to N f for the duration of D f generations till T f generations before present. Then, the population recovers and the population size returns to N o during the period P P . We simulated two populations, population A (target) which experienced a founder event and population O (outgroup, no founder event, with constant size N o ) that diverged 1,800 generations ago. We ran ASCEND and compared the estimated parameters with the true parameters of the founder event in population A. (B) Accuracy in estimating founder age. The X-axis shows the true founder age that was simulated in generations before present (gBP) and the Y-axis shows the founder age estimated by ASCEND. The diagonal represents the expectation (i.e., the case where the estimated values are the same as the true values). We note that for D f > 0 we show a thick band for the diagonal, proportional to duration of the founder event. (C) Accuracy in estimating founder intensity. We define the founder intensity as the ratio of the bottleneck duration over twice the effective population size during the bottleneck, i.e. I f = D f /(2N f ). The X-axis shows the true founder intensity, and the Y-axis shows the estimated founder intensity. The diagonal represents the expectation (i.e., the case where the estimated values are the same as the true values).
To characterize the reliability of ASCEND, we simulated data under a range of demographic scenarios. First, we generated data for a single epoch bottleneck model where the target population had a founder event that occurred T f (= 10 and 300) generations ago such that the population size reduced to N f (= 5 to 500) for a short duration D f (= 1 to 30) generations. After the founder event, the population recovered to its original size N o (= 12,500) ( Fig 1A ). Applying ASCEND to the target population and accounting for cross-population allele sharing correlation with an outgroup (a distantly related group that does not share the bottleneck with the target), we found that ASCEND reliably inferred the age and intensity of the bottleneck when the founder event occurred within the past 200 generations ( Fig 1B ). We note that in addition to its high sensitivity to detect recent founder events, ASCEND also has a low false discovery rate and gives reliable results in the absence of founder events. We simulated a three-population demographic model to represent three modern human populations, including a West African population with constant population size and two non-African groups—Northern Europeans and East Asians—with a history of out-of-Africa bottleneck and recent expansion but no recent founder events (in the past 200 generations). We found that ASCEND accurately failed to detect a significant founder event in all three populations using the criteria described in Methods (Notes S2.8 in S1 Text ).
ASCEND leverages the allele sharing correlation across the genome to infer the time and strength of the founder event in a population. There are three main steps in ASCEND. First, for each single nucleotide polymorphism (SNP) in the genome, we infer the number of alleles that are shared IBS across pairs of individuals in the population (assuming one shared allele for heterozygous sites regardless of haplotype phase). Next, for pairs of SNPs in the target population, we compute the correlation using shared allele counts across individuals, instead of using individual genotypes (referred as within-population allele sharing correlation). To account for ancestral allele sharing inherited from the common ancestor, we subtract the allele sharing inferred in pairs of individuals from the target population and an outgroup (referred as cross-population correlation). Finally, we measure the decay of the allele sharing correlation across SNPs separated by increasing genetic distances (Methods). This statistic is expected to decay exponentially with genetic distance and the rate of the decay is informative of the founder age (in generations) while the amplitude is related to the strength of the founder event. Intuitively, the more recent and stronger the founder event, the more correlated the shared alleles will be at short genetic distances and hence the amplitude of the exponential decay will be higher and the rate of decay slower. The strength of the founder event is captured as a composite parameter, referred to as the founder intensity (I f ) which is related to both the duration of the bottleneck (D f ) and population size (N f ) during the bottleneck (following [ 9 ], we define ). This parameter is proportional to the probability of coalescence during the bottleneck and provides insights about the non-equilibrium response of variant frequencies to a population bottleneck [ 9 , 19 ].
Discussion
We introduce ASCEND, a two-locus approach that leverages the allele sharing correlation across the genome to infer the time and strength of the founder event in a population. ASCEND is complementary to IBD-based methods—such as DoRIS and IBDNe [15,16]—in its time range but is more flexible as it does not require phased data and hence is suitable for sparse datasets. By applying ASCEND to around 300 present-day and 160 ancient human populations, we document that more than half the groups in our study experienced a strong founder event during the past 10,000 years. To our knowledge, this is the first comprehensive survey of founder events across worldwide human populations and provides insights about the frequency and demographic processes underlying population bottlenecks during human evolution. We note that the sampling of human populations in our study is not random as the Human Origins and the Allen’s Ancient DNA Resource datasets are enriched for small, isolated groups and some regions are better represented than others. Thus, the reported frequencies could differ in future worldwide surveys of human populations.
We recover previously reported signals of founder events in AJs, Finns and South Asians, as well as provide new insights and details about events in many groups such as Oceanians, Native Americans and Northeast Asians. Our results suggest that geographic isolation, modes of sustenance and cultural practices are notable predictors of founder intensity. Specifically, we document that populations living on islands have experienced stronger founder events than continental groups. This could be because island populations are formed by small numbers of individuals or because they have maintained a small population size due to limited resources. Most hunter-gatherers, nomadic and indigenous groups in our dataset had strong founder events, possibly linked to limitations of resources and extreme environmental pressures. Across diverse present-day populations including South Asia and the Americas, we document significant founder events that postdate periods of historical migrations and admixture. Finally, cultural practices such as endogamous marriages contributed to founder events as seen in Ashkenazi Jews and South Asians.
Applying ASCEND to ancient human genomes, we surveyed the founder events in deep human history and pre-history. We inferred that founder intensities in ancient groups were markedly higher than in present-day individuals. As observed in present-day samples, modes of sustenance (foraging, agriculture, and pastoralism) were associated with the strength of founder events. We found ancient hunter-gatherers had stronger founder events than the Near Eastern farmers and Eurasian Steppe pastoralists. Moreover, using time transect samples from Europe, we found local hunter-gatherer groups had more extreme founder events than the Neolithic farmers or Bronze Age individuals. This suggests that population sizes in Europe have increased over time, coupled with changes in ancestry and transitions in lifestyle. Our results are consistent with a recent study that measured short runs of homozygosity in ancient Europeans and found a similar increase in population size during the Neolithic period [57]. Our results are also in agreement with archeological evidence for increased population size during the Neolithic transition [58]. This underscores the power of using simple, flexible statistics to make inferences with the limited data available from ancient DNA samples.
Our estimates of founder ages help to shed light on the historical events that led to population isolation and bottlenecks. In the Americas, we detected four main episodes of founder events which can be related to important historical events in this region. First, using ancient samples, we obtained tentative evidence for a strong founder event that occurred ~12,000 years BP, concordant with the early settlement of the Americas [43]. Future ancient DNA studies with more samples will provide more precise and reliable estimates of the founding bottleneck in the Americas. The second episode was inferred ~5,500–6,000 years ago using data from Caribbean regions, in agreement with archeological dates for the peopling of the Caribbean [59]. A third founder event that occurred ~2,500 years ago in individuals from Aleutian Islands and San Nicolas Islands could be related to the coastal settlement of North America. Finally, we inferred recent founder events that occurred ~200–500 years ago in present-day Native Americans that postdate the European colonization of the Americas [60]. Together, these results help to reconstruct the key founder events that have shaped the genetic variation in the Americas.
The large and comprehensive set of samples from India in our study—including samples from most geographical regions, speakers of all major language families and tribal and caste groups—highlights the widespread history of founder events in this region and provides insights about the origin of endogamy in India (S2 Table). In many Indian communities, marriages across caste (varṇa) and sub-castes (jāti) are restricted. Earlier writings describe the caste system—comprising Brahmana, Kshatriya, Vaishya, and Shudra—as a class structure based on occupation. The later writings especially the law code of Manu (Manusmṛiti) introduced restrictions against intermarriage across castes [61]; though the chronology of Manusmṛiti remains debated [62]. Alternatively, the origin of endogamy has been proposed to be very recent—tracing back to restrictions against intermarriages that occurred in the past few hundred years during the British Raj [63]. Our direct estimates of founder ages provide an independent line of evidence to understand the origin of endogamy in India. We inferred that these founder events occurred between ~120–3,500 years ago across 78 ethno-linguistic groups in India. Our dates are consistent with a previous smaller survey including 13 ethno-linguistic groups from India [18]. In a majority of the populations, the founder events occurred within the past 600–1,000 years, suggesting this period was integral to shaping endogamy in India. These estimates pre-date the British colonization of India but postdate the ANI-ASI admixture (or spread of Iranian farmer or Steppe pastoralist ancestry to the subcontinent) [27,41]. Endogamy likely became stronger during the British Raj which could have further contributed to the founder events in many groups. In this scenario, our dates would reflect average estimates of multiple founder events, though the patterns we observe cannot be fully explained by recent events alone.
The oldest founder event (~16,700 years BP) was dated in the ancient Taforalt individuals from Morocco, who had a ten-fold more extreme population bottleneck than present-day AJs. This group has been associated with the Iberomaurusian microlithic bladelet technology and a question has been how large was the population of the Taforalt that introduced this technology. Using the direct estimate of founder intensity in Taforalt individuals and assuming the founder duration of 15 generations (as AJ [23]), we infer an approximate effective population size of ~40. Analysis of modern human populations suggests that the effective population size tends to be around one-third of the census size [64]. This translates to an estimate for the census population size of the Taforalt at roughly ~120 individuals. This is similar in scale to the census size of Andamanese islanders who have a similar founder intensity as the Taforalt individuals (S2 Table). We note that these calculations make a number of simplifications and the estimated population sizes may vary depending on the demographic model (e.g., duration of the bottleneck, migration etc.). However, they provide qualitative evidence that the Taforalt individuals were descendants of a relatively small, isolated group that was on the order of hundreds of individuals.
We show the wide applicability of our approach to non-human species by applying ASCEND to the domesticated dog species. Among the 42 unique populations that we analyzed, all breed dogs and two village dog populations had significant evidence for a recent founder event. These founder events may be due to: (i) inbreeding that involves mating between closely related individuals, or (ii) the sire effect whereby only a very few highly valued individuals (based on selected phenotypic traits) are bred repeatedly and contribute disproportionately to the next generations [65]. Such strong founder events can lead to high rates of homozygosity, and in turn increased risk of diseases. In accordance, some breeds like Boxers and Bulldogs that have among the strongest founder events in our analysis are also known to be affected by high rates of recessive diseases. For instance, arrhythmogenic right ventricular cardiomyopathy, a cardiac disease-causing sudden death in dogs, is seen at high frequency in both breeds [66,67]. We found the temporal distribution of founder events in dogs (~75–125 years ago) overlaps with the Victorian era when a large number of modern dog breeds were created in Great Britain in the context of popularizing dog-fancying and showing [68]. Our results are consistent with a recent study that measured IBD and ROH patterns in ~4,000 breed dogs and found severe bottlenecks in most breeds [69].
A caveat to our results is that we have estimated parameters of the founder events assuming an epoch model, but in fact we have not distinguished between the patterns expected under other models such as a gradual exponential growth or no recovery bottleneck model. In Notes S2 in S1 Text, we report simulations showing results for these scenarios. In the case where the population maintains a small size from the time of the bottleneck to the present (no recovery model), we were only able to recover the founder intensity and not the timing of the start of the bottleneck. For a gradual exponential model, we do not recover the founder parameters reliably. In ASCEND, we fit an exponential model with two parameters, age and intensity. This means that parameters like the rate of population size increase after the founder event are not captured (but assumed to be large) and thus the results should be interpreted with caution if there is evidence supporting the exponential growth model (Notes S2.5 in S1 Text). We note that non-parametric methods that characterize the distribution of IBD segment lengths can also provide biased estimates of historical effective population sizes despite modeling the exponential growth. This is because IBD segments inherited from common ancestors living during periods of high population sizes will be rare and short, thus hard to detect [70]. Further, these methods are not applicable to groups with few individuals or low coverage ancient genomes that cannot be reliably phased. Indeed, when we applied IBDNe to the IndiaHO dataset with small sample sizes, we obtained very unstable and noisy results (Notes S5.3 and Fig AC in S1 Text).
Finally, we note that complex demographic scenarios in which admixture events postdate founder events are hard to interpret, as the founder event could have occurred in the target or one of the ancestral populations. In this scenario, the target population has a mosaic genome with chromosomal segments from multiple ancestral groups and the signatures of founder event may be present only in a subset of their genomic regions (if the founder event occurred in one of the ancestral populations). We explored this scenario by generating simulated data where the founder event(s) occurred in one or both ancestral populations of an admixed target population (Notes S2.4.2 in S1 Text). As expected, we found the intensity was lower than at the start of the bottleneck as admixture increases diversity. The inferred founder age could be biased depending on the number, the source or ancestral group experiencing the founder event, and the proportion of admixture. For scenarios with very low admixture, we were able to recover the founder age accurately. However, for higher proportions of admixture, the founder age was underestimated or similar to the time of admixture (Fig M in S1 Text). We note that in practice this bias should not impact the empirical results reported above as we observed minimal correlation between the inferred time of admixture (GLOBETROTTER or ALDER) and the founder ages (S3 Table). However, when there is evidence of recent admixture in the target population which postdates the founder event, it is advisable to perform local ancestry inference in the admixed population and then apply ASCEND to genomic regions that are confidently assigned to each ancestral population separately. This will lead to less ambiguity about the source of the founder event (target or one of the ancestral groups) and provide more reliable results.
In summary, we document founder events across space and time in two species and shortlist groups that have experienced significant founder events in their recent history. These results imply that many present-day human populations could have an increased risk of recessive diseases, as previously documented in Finns and AJs [3,71,72]. Future disease mapping efforts should prioritize founder populations as they offer immense potential for biological discovery and reducing disease burden through the discovery and testing of recessive disease-associated genes and pathways.
[END]
---
[1] Url:
https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1010243
Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.
via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/