(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .
Pan-European study of genotypes and phenotypes in the Arabidopsis relative Cardamine hirsuta reveals how adaptation, demography, and development shape diversity patterns [1]
['Lukas Baumgarten', 'Department Of Comparative Development', 'Genetics', 'Max Planck Institute For Plant Breeding Research', 'Cologne', 'Bjorn Pieper', 'Baoxing Song', 'Sébastien Mane', 'Janne Lempe', 'Jonathan Lamb']
Date: 2023-07
To understand the nature of the evolutionary processes that shaped patterns of variation in C. hirsuta, we conducted a series of demographic inference analyses aiming at estimating changes in population sizes, the ages of population divergences, and the level of gene flow between populations. We estimated piece-wise constant distributions of population sizes and cross-coalescence rates through time for IBE, BAL, and NCE using the software relate [ 47 ] and MSMC2 [ 48 – 50 ] (Figs 1E , 1F , S1G and S1H ). In both cases, results indicated that the oldest event in the history of our sample is the divergence between the ancestral lineages of IBE and the one ancestral to both BAL and NCE. The divergence between BAL and NCE, on the other hand, occurred more recently and was followed by a pronounced reduction in effective population size of NCE, which might reflect the colonization of higher latitudes (Fig 1E and 1F ). To obtain point estimates for those divergence times, we conducted model choice and parameter optimization for 4 alternative demographic scenarios using the maximum-likelihood optimization method implemented in fastsimcoal2 [ 51 , 52 ]. We assumed the population tree suggested by the relate and MSMC2 analyses to be correct and focused our model choice analysis only on the absence or presence of gene flow and population size bottlenecks. Our best model ( S1I Fig and S3 Table ) was characterized by gene flow between all populations and population size bottlenecks in IBE and NCE. Estimates for the 2 divergence times (indicated by triangles in Fig 1E and 1F ) were in line with results obtained using relate and confirmed an ancient divergence of ancestral IBE lineages (approximately 534,000 years) and a more recent split between NCE and BAL (approximately 16,000 years). Also, ancestral and present population size estimates largely agreed with the estimates from relate and MSMC2, and all methods identified a reduction in size for NCE following its divergence. Overall, our observations indicate the presence of a relict group in Iberia, thus highlighting striking parallelism in spatial population structure with A. thaliana despite the estimated divergence time of the 2 species being over 25 million years ago [ 53 ]. Furthermore, the recent colonization of higher latitudes by NCE suggests that potential associated adaptive events, which underpinned associated shifts in photoperiod and seasonality, might be detectable at the genomic level in this population [ 54 , 55 ].
We further compared ancestry group–specific patterns of diversity between C. hirsuta and A. thaliana by calculating nucleotide diversity, Tajima’s D, and F ST values in the genetic groups described for both species ( S2 Table ). We found that in C. hirsuta, IBE is the most diverse followed by BAL and NCE. The nucleotide diversity in IBE (0.14%) is similar to the one observed in A. thaliana strains belonging to the Iberian group (relicts and nonrelicts). The lower diversity in NCE is consistent with the effects of recent demographic processes accompanying the colonization of northern latitudes by Southern European lineages after the last glacial maximum [ 46 ] and is further confirmed by a slower linkage disequilibrium (LD) decay in NCE compared to other populations from C. hirsuta and A. thaliana ( S1F Fig ). Tajima’s D values in C. hirsuta are positive and more positive than in A. thaliana (with the exception of IBE) and potentially result from more pronounced population size bottlenecks ( S2 Table ), which may also be responsible for the high variance in Tajima’s D in NCE. F ST values indicate that the genetic differentiation between IBE and the other groups in C. hirsuta (F ST = 0.55) is 45% larger than the highest genetic differentiation observed between A. thaliana relict and nonrelict populations in the 1001 Genomes Project (F ST = 0.38; S2 Table ). Thus, the genetic differentiation of IBE versus non-IBE C. hirsuta is more substantial than relict versus nonrelict A. thaliana, while NCE presents a bottleneck signature not found in geographically equivalent A. thaliana populations.
In A. thaliana, a set of genetically differentiated strains (so-called relicts) have been identified based on the examination of distributions of pairwise genetic distances (PGDs) between individual strains [ 17 ]. We conducted a similar analysis to compare the genetic differentiation between populations in our dataset to the reported differentiation between relicts and nonrelicts in A. thaliana. Unlike A. thaliana, for which this distribution has a single major mode around 0.475% (Fig 3A in [ 17 ]), PGDs in C. hirsuta display 2 major modes around 0.4% and 0.8% ( Fig 1C ). The center of the second mode is above the highest values observed for A. thaliana [ 17 ], and it reflects pairwise comparisons between IBE and non-IBE strains (Figs 1D , S1C and S1D ). Notably, the Iberian C. hirsuta relict-like group shows broader distribution outside the Iberian Peninsula than its A. thaliana counterpart ( S1E Fig ). This observation suggests that C. hirsuta IBE relicts may have had higher potential to colonize diverse European environments than their A. thaliana counterparts.
( A ) ADMIXTURE analysis of C. hirsuta strains after filtering for close relatedness (n = 358) reveals 3 major ancestry groups. The number of clusters that best fitted the data was found to be 3 (see also in S1A Fig ). Each vertical bar represents a strain, where the colors indicate admixture proportions for the 3 ancestry groups. Strains were assigned to the ancestry group for which the proportion of ancestry was at least 0.5. The ancestry groups were named according to the main sampling location of their respective strains: BAL–Balkan; IBE–Iberia; NCE–Northern Central Europe. Strains with proportions less than 0.5 for all ancestry groups were categorized as ungrouped (top right). ( B ) The geographical distribution of the C. hirsuta ancestry groups in Western Europe. Each point represents the collection site of a strain and is colored according to the ancestry group it belonged to, with ungrouped strains shown in gray. The Macaronesian islands of the Azores and Madeira are shown at a smaller scale below the map. Map layers were made with Natural Earth and [ 142 ]. ( C ) The distribution of pairwise genetic distances (PGDs) indicates a deep split between groups of C. hirsuta strains. A histogram is shown of PGD between all possible pairs of strains in which the numbers of pairs in each bin are plotted against the PGD. The black outline shows the PGD of all strains in our sample. The presence of 2 major modes in the distribution, of which one at high genetic distance, indicated a group of strains in our sample that is highly differentiated from the others. Hierarchical clustering revealed a group of relict-like strains that was responsible for the second major mode in the distribution. PGDs including 1 or 2 relict-like strains are shown in blue, and PGDs not including those are shown in gray. ( D ) Identification of groups of C. hirsuta strains that are highly differentiated from each other based on multidimensional scaling and hierarchical clustering of the PGD. The first 2 PCs are plotted against each other where each point is a strain, colored according to the ADMIXTURE ancestry group it belonged to, with ungrouped strains shown in dark gray. Strains with ancestry in only a single ancestry group in the ADMIXTURE analysis are shown by darker shades versus lighter shades for admixed strains. Hierarchical clustering of the PGD matrix revealed that the separation of the strains along PC1 represented the 3 distinct groups of strains shown enclosed by dashed lines. The groups on the left and in the middle were responsible for the second major mode in the distribution of PGD (Figs 1C , S1B and S1C ). Those 2 groups are shown here in blue and gray. ( E , F ) Piecewise constant effective population sizes (N e ) as a function of time for the 3 ancestry groups using MSMC2 ( E ) and relate ( F ), and estimates of split times between them considering a mutation rate of 4 × 10 −9 mutations per base, per generation. The split times for BAL-NCE and BAL-IBE estimated with fastsimcoal2 ( S1I Fig ) are indicated by red and blue triangles on the x-axes, respectively. The top panel shows ancestral changes in N e within the groups plotted against time in years, when considering 1 generation per year. With MSMC2 ( E ), 20 random sets of 4 strains were analyzed, which are all plotted, while with relate ( F ), all strains were analyzed jointly, hence a single line. The bottom panels show the RCCRs in BAL vs. NCE (solid lines) and IBE vs. BAL (dashed lines). Light blue shaded areas in the plots show ancient periods of glaciation according to MISs 2–4, 6, 8, 10, 12, 14, 16, and 18 [ 45 ], respectively, from left to right. The period of the LGM [ 46 ] is likewise indicated by the darker blue shade embedded in MIS2–4. The data underlying the graphs shown in this figure can be found at
https://doi.org/10.5281/zenodo.7907435 . BAL, Balkan; IBE, Iberian; LGM, last glacial maximum; MIS, marine isotope stage; NCE, Northern Central European; PC, principal coordinate; PGD, pairwise genetic distance; RCCR, relative cross coalescence rate.
As a first step in these comparisons, we resequenced a set of 488 C. hirsuta strains and investigated patterns of genetic structure based on 5,336,586 biallelic high-quality SNPs. The geographic distribution of our panel is centered on Western Europe but also includes a small number of strains from the United States of America, New Zealand, Australia, and Japan ( S1 Table ). We used the software ADMIXTURE [ 44 ] to estimate the most likely number of ancestry groups and for each strain the proportion of ancestry from each group (Figs 1A , 1B and S1A ). The results of this analysis revealed the presence of 3 genetically differentiated groups: (1) the Iberian (IBE) ancestry group; 2) the Balkan (BAL) ancestry group; and (3) the Northern Central European (NCE) ancestry group. IBE included mainly strains that were collected from the Iberian Peninsula but also from France, Great Britain, the Netherlands, Ireland, USA, and New Zealand ( Fig 1B ). Strains with substantial ancestry from IBE were also found to be widespread in the Macaronesian archipelagos of Madeira and the Azores. Strains from BAL were predominantly sampled in Croatia and Austria, but admixed strains with substantial ancestry from BAL were widely distributed over the European continent, Madeira, Ethiopia, Japan, and the USA. NCE was mainly composed of strains sampled from Europe between latitudes 50° and 60° north. ADMIXTURE results also indicated a large number of strains with shared ancestry from BAL and NCE indicating gene flow, while IBE appeared more isolated in this respect. Those findings were corroborated by a principal component analysis (PCA) in which nonadmixed individuals from the 3 populations represented well-differentiated genetic groups, while admixed strains were mainly observed between BAL and NCE ( S1B Fig ).
The significantly associated region on chromosome 8 harbored a block of extended LD that contained FRIGIDA (FRI), a well-studied gene controlling flowering time through the activation of FLC expression [ 60 , 61 ]. FRI thus emerged as a candidate for determining natural variation in flowering time and leaflet number in C. hirsuta. We found 3 independent polymorphisms predicted to truncate the FRI protein. C. hirsuta Oxford (Ox) lines constitutively expressing the coding sequence (CDS) of any truncated FRI allele driven by the Ubiquitin10-promoter (UBQ10) did not show increased rosette leaf number (RLN), a proxy for flowering time, while lines expressing a nontruncated CDS (FRIfunc) increased RLN by 20%, confirming the functional significance of the truncations ( Fig 2C ). Of the 3 alleles, only one (FRIstop) occurred at high frequency and could therefore be responsible for the association on chromosome 8. This allele is specific to NCE only and was present in 45 out of 57 nonadmixed strains ( S2B Fig ). These observations coupled with QTL mapping in recombinant inbreeding line (RIL) populations derived from biparental crosses ([ 25 ] and Fig 3E ) indicate that the FRI/FLC module is a major determinant of natural variation for flowering time and leaflet number progression in C. hirsuta ( S2D Fig ).
( A ) GWAs for flowering time and leaflet number on leaf 7 using 352 C. hirsuta strains. The negative log base 10 transformed P values for association tests of individual SNPs are plotted against physical position on the 8 chromosomes. Horizontal dashed lines show thresholds of significance with correction for multiple testing according to Bonferroni (magenta) and fdr (cyan; for both α = 0.05). SNPs with transformed P values above the fdr threshold are shown in red, and others in gray. Two regions with strongly associated SNPs were detected on chromosomes 6 and 8 that contained the candidate genes FLC/TPPI and FRI. ( B ) Close-up view of the locus with the strongest associations showing GWAS for flowering time with the most significant SNP in TPPI used as covariate. Forward regression using a multilocus mixed model GWAS indicated that the highly significant association on chromosome 6 consisted of 2 independent associations. Yellow areas indicate the 2 candidate genes FLC (left) and TPPI (right) where the lighter shades indicate the promoter region (−3,000 bp) and the darker shades indicate the ORF. GWAS without covariates is shown in red, and with the SNP indicated by the blue encircled red point in TPPI as a covariate in blue. This result revealed significant associations for SNPs in the first intron of FLC that were independent of the associations for SNPs linked to TPPI. The associated SNPs in FLC shown in blue were the most significant genome-wide in this analysis. ( C ) Functional validation of 3 distinct truncated FRI alleles that exist within predominantly European samples of C. hirsuta. In contrast to a significant increase in rosette leaf number in plants transformed with a full-length FRI allele, the truncated FRI alleles showed no effect (Dunn test with Bonferroni adjusted P value, ***: P value < 0.001). One of the 3 alleles was found at high frequency (FRIstop) and exclusively in NCE strains (see also S2B Fig ). ( D ) Flowering time in DAG until anthesis for all genotype combinations at the 3 candidate genes identified by GWA. The genotypes at the representative SNPs for each gene are shown as either anc or der. The bars indicate the mean flowering time, and the points show the individual observations for each strain. Points are colored according to the ancestry group of strains ( Fig 1A ). ( E , F ) Correlation between North–South genetic differentiation (PC2 in S1B Fig ) and flowering time ( E ) as well as leaflet number on leaf 8 ( F ). The points are observations for individual strains colored according to their ancestry group ( Fig 1A ) such that strains with ancestry in only 1 group are shown in darker shades vs. lighter shades for admixed. The lines show linear models fitted to the data from the BAL and NCE populations (P<0.001, R2 = 0.54, r = -0.73, Fig 2E; P<0.001 R2 = 0.38, r = 0.62, Fig 2F). Large dots show non admixed samples in those two populations. ( G ) Evidence for a selective sweep at the FRI locus (see also S2 Fig ). A sliding window analysis of nucleotide diversity (π, top), Tajima’s D (middle), and CLR calculated by SweepFinder2 [ 59 ] (bottom) is shown for chromosome 8. The analyses were performed separately in strains with the FRIstop (blue) and the FRIfunc (black) alleles from the NCE group ( Fig 1A ). Note how the region, which includes the FRI locus (orange dashed line) displays reduced π, reduced Tajima’s D, and high CLR, consistent with a selective sweep, exclusively in strains with FRIstop. The horizontal dashed lines in the top and middle panels indicate the genome-wide averages for the respective groups in blue or gray, and in the lower panel the horizontal dashed line indicates the threshold (α = 0.05) derived from neutral simulations using our best demographic model. ( H ) The geographic distribution of full-length and truncated FRI alleles on the map and their projection on latitude in A. thaliana (*) and C. hirsuta (triangle) exhibited high similarity. The colors represent distinct truncated FRI alleles. The rectangles represent areas of high sampling density for both species. The pie charts show the proportion of functional and nonfunctional alleles in C. hirsuta (left) and full-length and truncated alleles in A. thaliana (right). The total number of strains inside the respective rectangles is shown inside the pie chart. The histograms on the right side show functional/full-length (black) and nonfunctional/truncated FRI alleles (different colors represent different truncated alleles) along the latitude. FRIstop is the major truncated FRI allele. Note that only one of all mainland European C. hirsuta strains harbors FRIstop2 and none FRIstop3. Map layers were made with Natural Earth and [ 142 ]. The data underlying the graphs shown in the figure can be found at
https://doi.org/10.5281/zenodo.7907435 . anc, ancestral; BAL, Balkan; CLR, composite likelihood ratio; DAG, days after germination; der, derived; fdr, false discovery rate; FLC, FLOWERING LOCUS C; FRI, FRIGIDA; GWA, genome-wide association; IBE, Iberian; NCE, Northern Central European; ORF, open reading frame; SNP, single nucleotide polymorphism; TPPI, TREHALOSE-6-PHOSPHATE-PHOSPHATASE I.
Having established the above framework for understanding demographic events that shaped European C. hirsuta genetic diversity, we next sought to link phenotype with genotype in corresponding germplasm. To this end, we measured flowering time and leaflet number in 352 strains from our diversity panel. We observed considerable phenotypic variation after accounting for population structure ( S2A Fig ). We used genome-wide association (GWA) analysis to identify genetic variants associated with flowering time and leaflet number. Two highly significant associations were detected for flowering time on chromosomes 6 and 8, respectively ( Fig 2A ), and those same loci were also found to contain SNPs strongly associated with leaflet number on the later leaf nodes ( Fig 2A ). A closer inspection of the associations on chromosome 6 revealed 2 candidate genes within a 20-kb region: FLC and the C. hirsuta orthologue of trehalose-6-phosphate phosphatase I (TPPI). FLC influences flowering time and leaf development in both A. thaliana and C. hirsuta [ 25 , 56 , 57 ]. TPPI catalyzes the dephosphorylation of trehalose-6-phosphate, the concentration of which influences flowering time and heterochronic shoot development in A. thaliana [ 58 ]. The 15 most significantly associated SNPs on chromosome 6 were found in close proximity to FLC and TPPI and 6 of them inside their open reading frames or promoter regions. Multilocus GWAS showed that SNPs in both genes had independent effects, indicating that both contribute to phenotypic diversity in our strain panel ( Fig 2B ).
In summary, our data indicate evolution of the same genetic mechanism of reduced FRI activity in both C. hirsuta and A. thaliana to confer fast cycling during colonization of Northern and Central Europe after the last glacial maximum. This conserved mechanism, even in a genetically complex trait like flowering time [ 70 , 71 ], highlights how evolution reuses specific genetic modules in a predictable way [ 5 ]. Within this framework of conservation, which indicates the necessity of fast cycling for adaptation after a glacial maximum, an important difference is that a different number of loss-of-function alleles appears to have supported evolution of fast cycling in the 2 species. This difference might be traced to differences in demography and genomic architectures, which led to a repeated selection of multiple alleles in A. thaliana versus what is essentially a single event in C. hirsuta (see Discussion ).
In the candidate genes FLC and TPPI from GWA analyses, the derived allele was associated with increased flowering time compared to the ancestral allele and vice versa for FRI ( Fig 2D ). Strains harboring all 3 alleles that reduced flowering time belonged to the NCE population and indeed displayed the lowest average and variance in flowering time ( Fig 2D ). Furthermore, the principal component (PC) accounting for the differentiation between BAL and NCE ( S1B Fig ) negatively correlated with flowering time and positively with leaflet number (Fig 2E and 2F ). Considering that FRI alleles with reduced function have been previously identified as targets of natural selection in A. thaliana [ 29 , 43 , 62 – 66 ], we hypothesized that the functional variation we identified at FRI may have allowed C. hirsuta to recently adapt to northern European climatic conditions. To test this hypothesis, we conducted a genome-wide scan for signatures of positive selection using the program SweepFinder2 [ 59 ], which uses a composite likelihood ratio statistic to identify areas whose polymorphism patterns deviate from neutral expectations. We found that the FRI locus contained the most significant composite likelihood ratio values after controlling for confounding demographic effects using our best demographic model for the NCE population (Figs 2G and S2E ). This region also displayed low nucleotide diversity (π) and Tajima’s D compared to the genomic background indicating reduced genetic variation and an excess of low-frequency alleles, the hallmark of a recent selective sweep [ 67 ] (Figs 2G and S2E–S2G ). Notably, the FRI locus is located on the distal extreme of the pericentromeric region of chromosome 8 [ 53 ] ( S9 Table ). Consequently, the selective sweep spans genomic regions with very different recombination rates per generation and nucleotide (r), ranging from 4.7 × 10 −10 to 6.6 × 10 −9 within and flanking the pericentromeric region. We hypothesized that the reduced r in the pericentromeric region could explain why the selective sweep extends further in the proximal pericentromeric area of the FRI locus, in strains with the FRIstop allele. We tested this hypothesis by simulations and found that the sweep profile is consistent with a hard sweep of a single strongly selected beneficial mutation that occurred after the divergence time between lineages of NCE and BAL ( S2H Fig ). We then investigated the geographic distribution of FRI loss-of-function alleles in A. thaliana and C. hirsuta. Given that both species have similar life cycles and face, at least in part, similar ecological challenges, we reasoned that if parallel evolution of FRI loss-of-function alleles contributed to local adaptation, this allele should exhibit similar geographic distributions in the 2 species. To test this idea, we exploited genome resequencing data from 1,115 A. thaliana and 426 C. hirsuta strains from Europe. We found that FRI loss-of-function alleles showed a similar geographical distribution in both species with high frequencies in Central Europe and Northern Britain, low frequency in Sweden, and even lower frequency in Southern Europe ( Fig 2H ). This observation, together with the evidence for a selective sweep, is consistent with ChFRIstop evolving adaptively in this range. Like in A. thaliana, environmental conditions such as sufficient precipitation and relatively short winters in these latitudes would allow for both summer annual and winter annual behavior [ 35 , 68 ]. Fast cycling lines harboring FRI loss-of-function alleles would be more likely to complete a full cycle before the onset of winter when germinating in late summer and, therefore, produce more offspring [ 69 ]. A striking difference between the 2 species was the number of distinct FRI loss-of-function alleles. We detected in total 3 distinct truncated FRI alleles in C. hirsuta, all of which showed a loss-of-function phenotype in transgenics experiments ( Fig 2C ). Contrastingly, we found 29 different truncated and likely loss-of-function FRI alleles in the entire set of A. thaliana strains ( S2I Fig and S4 Table ). The most abundant loss-of-function allele in A. thaliana [ 60 ] occurred in 40% of the European A. thaliana strains that harbored FRI loss-of-function alleles (112 out of 280 strains), while FRIstop was found in 98.2% of the European C. hirsuta strains with loss-of-function alleles (111 out of 113 strains).
Evidence for local adaptation in the Azores archipelago mediated by the transcription factor SPL9
The FRI/FLC module studied above affects flowering time concomitantly with the rate of vegetative development including node-dependent changes in leaflet number. This finding highlights that age-dependent progression of leaflet number provides an attractive trait for monitoring the progress of developmental time. To gain further insight into natural variation for heterochronic mechanisms in C. hirsuta and their possible role in local adaptation, we evaluated variation in leaflet number relative to flowering time. In the GWA panel, cumulative leaflet number of the first 8 rosette leaves and flowering time correlated negatively (P < 0.001, cor = −0.34, Pearson’s correlation). We reasoned that if heterochronic processes have a central role in shoot morphological variation in this species, they might also underlie variation between strains that do not differ in flowering time. This would be similar to the situation in snapdragon, where genetic control of age-dependent leaf shape variation is independent of flowering time [72], and it would also be consistent with findings that vegetative phase change and reproductive competence are, to some degree, genetically separable in A. thaliana [73,74]. We also postulated that if such heterochronic variation independent of flowering time exists, then its biogeographical distribution might help us understand its adaptive relevance. We noted that among the IBE strains in our GWA panel, flowering time and leaflet number did not correlate (S3A Fig). On average, the IBE strains produced the highest leaflet number compared to NCE and BAL. Interestingly, one IBE strain, Azores1 (Az1), originating from the Azorean island Faial, produced a conspicuously low number of leaflets (Fig 3A and 3B) despite being early flowering and thus, in contrast to the accelerated heteroblastic progression, typically found in early flowering strains [25]. We hypothesized that this strong shift in leaflet number with respect to other IBE strains might be a consequence of local adaptation in the Azores.
PPT PowerPoint slide
PNG larger image
TIFF original image Download: Fig 3. A QTL cluster on chromosome 4 contributes to low leaflet number in the Azorean C. hirsuta strain. (A) Leaflet number progression from the first to the eighth leaf indicates a strong deviation of the Az1 strain from other strains of the IBE group (see also S3A Fig). The leaflet number per leaf node of IBE strains is shown by blue points with that of Az1 shown by blue asterisks. The shaded area highlights the difference of Az1 compared to other IBE strains. (B) Representative silhouettes of the first 8 rosette leaves of 4 week-old C. hirsuta Ox and Az1 strains grown in long days, showing the lower leaflet number of the latter. (C) ADMIXTURE analysis with 421 C. hirsuta strains remaining out of 753 after filtering for close relatedness. The number of ancestry groups that best fit the data was found to be 4. Each vertical bar represents a strain where the colors indicate admixture proportions for the 4 ancestry groups. Strains were assigned to the ancestry group for which the proportion of ancestry was at least 0.5. The 3 clusters from Fig 1A were found again, and strains with maximum ancestry in the additional cluster were exclusively from the AZ. Strains with ancestry lower than 0.5 in all clusters are indicated as “ungrouped” on the right side of the figure. (D) Piecewise constant effective population sizes (N e ) of the 4 ancestry groups from Fig 3C using MSMC2, and estimates of split times between them considering a mutation rate of 4 × 10−9 mutations per base, per generation. The top panel shows ancestral changes in N e considering 1 generation per year. Colors indicate ancestry groups according to Fig 3C. Twenty random sets of 4 strains were analyzed, which are all plotted individually. The bottom panel shows the RCCRs in AZ vs. IBE (solid line), IBE vs. BAL (long dash line), and NCE vs. BAL (short dash line). Light blue shaded areas in the plots show ancient periods of glaciation according to MIS 2-4, 6, 8, 10, 12, 14, 16, and 18 [45], respectively, from left to right. The period of the LGM [46] is likewise indicated by the darker blue shade embedded in MIS 2–4. (E) Multiple trait QTL mapping of leaflet number from the first to the 10th rosette leaf and total RLN (a proxy for flowering time) in the Ox x Az1 RIL population. The negative log base 10 transformed P values of a composite interval mapping scan are plotted against position on the linkage groups of the chromosomes indicated in the top left corners of the upper panel. The horizontal dashed red line indicates the threshold of significance (α = 0.05). Significant allelic effects for each QTL on each trait are shown in the lower panel where red and blue colors indicate the direction, and the shade the magnitude of the effect according to the legend in the top left. (F) Multiple QTL models for leaflet number on different leaf nodes on chromosome 4. QTL detected using MQM mapping for the traits indicated on the y-axis are shown by black dots, and the 1.5 LOD intervals are indicated by shaded regions. The color of the 1.5 LOD intervals indicates the variance explained by the QTL according to the legend above the figure. Note that the direction of effect of both QTL agree with the parental differences in leaflet number (i.e., Az1 had lower leaflet number than Ox). (G) Leaflet number of HIFs segregating for different genomic regions of chromosome 4. Leaflet numbers of lines homozygous for Ox or Az1 alleles are shown in yellow and blue, respectively. Vertical bars indicate the standard errors of the means, and the points show the leaflet numbers of individual replicates. Significant differences in leaflet number for specific leaf nodes are shown as: *, P ≤ 0.05; **, P ≤ 0.01; ***, P < 0.001. Note that plants with Az1 alleles of HIFs LLN4_1A and LLN4_1B both show reduced leaflet number, but on earlier or later leaf nodes, respectively. By contrast, plants with Az1 alleles in HIF LLN4_1, which carries a larger introgression including LLN4_1A and LLN4_1B, show reduced leaflet number on earlier and later leaf nodes. (H) Graphical representation of the genotype of chromosome 4 in HIFs. Yellow and blue colors indicate homozygous Ox and Az1 alleles, respectively, while segregating regions are colored in red. Map positions of the 3 distinct QTL found in this region are depicted as black boxes. The data underlying the graphs shown in the figure can be found at
https://doi.org/10.5281/zenodo.7907435. AZ, Azores; Az1, Azores1; BAL, Balkan; HIF, heterogeneous inbred family; IBE, Iberia; LGM, last glacial maximum; MIS, marine isotope stage; NCE, Northern Central Europe; Ox, Oxford; QTL, quantitative trait locus; RCCR, relative cross coalescence rate; RIL, recombinant inbreeding line; RLN, rosette leaf number.
https://doi.org/10.1371/journal.pbio.3002191.g003
To test this hypothesis, we first explored the prevalence of the low-leaflet phenotype. We carried out denser sampling and sequenced 267 C. hirsuta strains from across the Azores archipelago and found that the low leaflet number occurred in approximately 24% of them (S3B Fig). ADMIXTURE analysis revealed a well-differentiated ancestry group specific to the Azores (AZ; Figs 3C and S3C) that included the Az1 strain. Demographic analyses using MSMC2 and relate estimated that the AZ and IBE ancestry groups diverged at least 30,000 years ago, thus indicating that C. hirsuta populations on the Azores might have been locally adapting for a long period of time (Figs 3D and S3D). To determine the genetic basis of this low-leaflet phenotype, we performed QTL mapping of leaflet number and RLN in an RIL population derived from a cross between Az1 and the NCE strain Ox (Fig 3E and S6 Table). We detected 7 QTL affecting both RLN and leaflet number (Fig 3E). Five out of the 7 loci had opposite effects on leaflet number and RLN similar to the heterochronic effect described for TPPI/FLC and FRI (Fig 2A; [25]). The remaining loci included a highly significant Leaflet number QTL on chromosome 4 (LLN4) that strongly affected leaflet number but not flowering time (Fig 3E). Genetic dissection of LLN4 with 4 heterogeneous inbred families (HIFs) identified a cluster of 3 closely linked QTL for leaflet number located between positions 17.39 Mb and 22.83 Mb, which are referred to as LLN4_1A, LLN4_1B, and LLN4_2 (Figs 3F–3H and S3E and S6 Table). In agreement with the parental behavior and the detected effects at LLN4, Az1 alleles at each of these loci reduced leaflet number. The effects of these 3 QTL were also validated in introgression lines (ILs; S3F Fig), showing that the most distal locus, LLN4_2, had the strongest effect on leaflet number. To further characterize LLN4_2, we first tested whether its effect on leaflet number was heterochronic and independent of flowering time. To this end, we carried out a photoperiod shift experiment using near-isogenic lines (NILs), which is based on the attribute of juvenile plants to not respond to the flowering-inducing stimulus of long photoperiod [75]. We found that a line homozygous for the Az1 allele delayed the juvenile-to-adult phase transition by 1.97 days compared to a line homozygous for the Ox allele (Fig 4A). However, both lines only showed a small but statistically insignificant difference in RLN in both short and long photoperiods. We concluded that the effect of LLN4_2 is heterochronic, affecting the timing of the juvenile-to-adult transition and, consequently, leaflet number, largely independent of the flowering transition.
PPT PowerPoint slide
PNG larger image
TIFF original image Download: Fig 4. A missense polymorphism in SPL9 underlies leaflet number QTL LLN4_2. (A) Photoperiod shift experiment showing that the Az1 alleles at the QTL LLN4_2 delay the juvenile-to-adult phase transition. Plants of a HIF homozygous for Ox (yellow) or Az1 (blue) alleles at the SPL9 locus were shifted from flowering inducing long photoperiod to a noninductive short photoperiod. The RLN of the plants is plotted against the time spent in long photoperiod. Points show the RLN of individual plants, while the lines show a logistic model fitted to the data. The inflection point of the model is indicated by vertical arrows on the x-axis. (B) Fine-mapping of the leaflet number QTL LLN4_2. The genotype information of the HIF LLN4_2 is shown in the top panel. The graphical genotypes of the homozygous progeny of 4 different recombinant lines segregating in the LLN4_2 genomic region are shown below including the positions (Mb) of the genetic markers in the top axis. The bar chart on the right shows the number of leaflets produced on leaves 1 through 8 for the respective genotypes on the left. The bars show the mean leaflet numbers, and the points the leaflet numbers of the individual replicates. Kruskal–Wallis tests were performed to test for leaflet number differences between the 2 homozygous progenies of the same heterozygous recombinant: *** P < 0.001, n.s. nonsignificant. On the right side, the genotype at the LLN4_2 locus (Ox or Az1) inferred from the phenotype of each line is depicted. The LLN4_2 fine-mapped region of 49 kb contains 14 genes shown in the lower part of the panel with wider rectangles indicating exons and narrow rectangles introns and UTRs. The region containing SPL9 is expanded at the bottom with the 2 missense SNPs differing between Ox and Az1 colored in red and other SNPs in blue (see also S4A Fig). (C) Transgenic complementation of the Chspl9 mutant with the genomic constructs of SPL9Ox (gSPL9Ox) and SPL9Az1 (gSPL9Az1). The estimated copy number of the transgene is indicated in parentheses. As a control, the Chspl9 mutant was transformed with an empty vector. Two copies of gSPL9Ox and gSPL9Az1 could complement the phenotype to the level of Ox wt and IL LLN4_2Az1, respectively. Dots correspond to individual T2 transgenic plants derived from 27 independent T1 plants, and their mean and standard error for cumulative leaflet number on the first 8 leaves is shown by the bars. The compact letter display shows significant differences between genotypes according to a Dunn test with a Benjamin–Holm post hoc correction of the P values for multiple pairwise comparisons. (D) Allele swaps for the 2 SPL9 missense SNPs differing between Ox and Az1. The line HIF_LLN4_2 homozygous for the Az1 allele at the SPL9 locus (Fig 4A and 4B) was transformed with the genomic constructs shown in Fig 4C, and with 2 additional chimeric genomic constructs carrying the Ox and Az1 alleles, or the Az1 and Ox alleles for the SNPs (SPL9mixAz1_Ox and SPL9mixOx_Az1). The HIF was transformed with an empty vector as a control. Dots correspond to individual independent T1 transgenic plants, and their mean cumulative leaflet number on the first 8 leaves is shown by the bars. The compact letter display shows significant differences between genotypes according to a Dunn test with a Benjamin–Holm post hoc correction of the P values for multiple pairwise comparisons. The data underlying the graphs shown in the figure can be found at
https://doi.org/10.5281/zenodo.7907435. Az1, Azores1; HIF, heterogeneous inbred family; Ox, Oxford; RLN, rosette leaf number; wt, wild type.
https://doi.org/10.1371/journal.pbio.3002191.g004
To identify the gene underlying LLN4_2, we first fine-mapped it to a 49-kb region containing 14 predicted genes. Among these, the C. hirsuta orthologue of the transcription factor encoding gene SQUAMOSA PROMOTER BINDING-LIKE 9 (SPL9; Fig 4B) emerged as the best candidate for underlying this QTL because of its role in the regulation of developmental timing in A. thaliana and potentially leaflet number in C. hirsuta [75–77]. Consistent with this idea, analysis of a C. hirsuta loss-of-function allele of SPL9 (Chspl9) generated by CRISPR-Cas9 genome editing showed reduced leaflet number, similar to the phenotype of the Az1 strain (S4A Fig). To directly compare the effects of both SPL9 alleles, we analyzed transgenic Chspl9 lines carrying genomic constructs of SPL9 from either Ox or Az1. The SPL9Az1 was less able than the SPL9Ox allele to increase leaflet number compared to the Chspl9 mutant background (Fig 4C), thus indicating that it is a weaker allele. To test if these differential effects might be caused by altered SPL9 expression of the 2 alleles, we conducted RNA-seq analyses in 2 pairs of genotypes and did not find SPL9 among the list of differentially expressed genes (S4B and S4C Fig). However, analysis of the predicted coding sequence revealed 2 missense SNPs (E242Q and D352N) that could potentially affect the function of SPL9 (Figs 4B and S4D). We evaluated the effect of each of the 2 SNPs in HIFs transformed with the parental genes as well as with 2 chimeric constructs differing in only one of the 2 missense SNPs, respectively. Phenotypic analyses of leaflet number showed that transgenic lines with the Az1 or Ox alleles for SNP E242Q behaved similar to those transformed with Az1 or Ox genomic constructs, respectively (Fig 4D). Therefore, we concluded that allelic variation at SPL9 underlies LLN4_2 and that the missense polymorphism E242Q contributes to the low leaflet number of the Az1 strain.
To test whether genetic diversity in SPL9 and its linked QTL which we collectively refer to as SPL9 QTL cluster might be maintained by adaptive processes, we first confirmed the phenotypic effect of the SPL9Az1 allele in field grown plants on the Azores (S5A Fig) and then searched for signatures of natural selection using pcadapt on the 753 Azorean and Eurasian strains. Similar to F ST -based methods, pcadapt identifies genomic regions with extreme genetic differentiation [78]. However, unlike F ST -based methods, it requires no prior population assignment. This feature can be advantageous for identifying regions that evolve nonneutrally in the presence of complex population structure and admixture, as is the case in C. hirsuta populations [78] (Figs 1 and S1). These analyses showed that the SPL9 QTL cluster overlapped with the most significant peak detected by pcadapt (Figs 5A and S5B) and that 83.8% of the 1,000 genome-wide most significant SNPs fell within a 6.4-Mb genomic interval including the SPL9 QTL cluster. Therefore, SNPs in this genomic region displayed a stronger pattern of genetic differentiation than the genomic background, consistent with the action of positive selection [78]. We then explored the genetic structure responsible for the pcadapt peak in the SPL9 QTL cluster by performing PCA using only the SNPs within the interval that were above the threshold (Fig 5B). Two groups of 103 and 50 strains from the Azores showed stronger genetic differentiation at the pcadapt peak compared to the genomic background (S5C Fig). Accordingly, these 2 groups showed F ST values above the 95th percentile of the genome-wide background, at the pcadapt peak (Fig 5A). Additionally, both groups displayed significantly lower nucleotide diversity and Tajima’s D in the SPL9 QTL cluster region than in the remaining genomic background (Fig 5C), providing additional evidence that the SPL9 QTL cluster genomic region has evolved under positive selection.
PPT PowerPoint slide
PNG larger image
TIFF original image Download: Fig 5. The SPL9 QTL cluster as a driver of local adaptation in the Azores. (A) Signatures of selection at the SPL9 QTL cluster. The upper panel shows a Manhattan plot with the pcadapt results for 753 C. hirsuta strains (see also S5B Fig). The negative log base 10 transformed P values for SNPs on chromosome 4 are plotted against their physical positions. The dashed horizontal line indicates the P value separating the 1,000 genome-wide most significant SNPs, among which 83.8% are located in the vicinity of the SPL9 QTL cluster. The SPL9 missense SNP E242Q that was found to be responsible for QTL LLN4_2 is highlighted by a red circle. Yellow boxes in the lower part of the panel depict the location of QTL LLN4_1A, LLN4_1B and LLLN4_2 (SPL9). The bottom panel shows a sliding window analysis along chromosome 4 of weighted F ST between the 2 groups of strains from the Azores, grp1 and grp2, which were found as highly differentiated in the SPL9 QTL cluster region (see also Fig 5B). The horizontal dashed line shows the genome-wide 95th percentile of weighted F ST . (B) PCA of the 838 outlier SNPs detected by pcadapt analysis that were located within the SPL9 QTL cluster region, in 753 worldwide strains. Two groups of strains were highly differentiated from each other (grp1 –blue, grp2 –red) and from the other strains (gray and black). Strains with recombinations in the SPL9 QTL cluster are shown in black. (C) Nucleotide diversity (π) and Tajima’s D in the region of the pcadapt peak surrounding the SPL9 QTL cluster and the GBG outside of the peak for grp1 (blue) and grp 2 (red) strains from the Azores. (D) Geographic distribution of C. hirsuta strains within the Azores archipelago. The pie charts show the proportions of strains from the different groups in our sample colored according to Fig 5B (blue—grp1; red—grp2; black—recombinant in the SPL9 QTL cluster; gray—others). Strains from grp1 and grp2 were exclusively sampled on the Azores and show a nonuniform distribution with highest frequencies in the east and the west, respectively. Map layers were made with Natural Earth and [142]. (E) Phenological field data of C. hirsuta plants from 4 islands along the west–east transect colored according to alleles for the functional missense SNP E242Q of SPL9 (blue—Az1; red—not Az1; yellow—heterozygous;). Plants were classified for developmental stage from 1 (very young seedling) to 6 (advanced seed shedding). On Flores, Faial, and Pico, SPL9Az1-harboring strains were found in more advanced stages of development than strains carrying an alternative allele (*** P < 0.001, Kruskal–Wallis). (F) Weather station data from the Azores indicates reduced precipitation in the east when compared to the west. Data from 1970–2020 were analyzed in sliding windows of 11 days with 5-day stride and in each window the fraction of days with rain (> 0 mm) out of the total number of observed days was calculated. The fraction of days with rain in each window is shown by the points, and the lines show smoothing splines fitted to the data to reveal the trends. Note how on Santa Maria, in the east, from April until October, the fraction of days with rain is as low or lower than the lowest annual value for Flores and Faial in the west. The data underlying the graphs shown in the figure can be found at
https://doi.org/10.5281/zenodo.7907435. GBG, genomic background; PCA, principal component analysis; QTL, quantitative trait locus; SPL9, SQUAMOSA PROMOTER BINDING PROTEN-LIKE 9.
https://doi.org/10.1371/journal.pbio.3002191.g005
To investigate possible drivers of selection at the SPL9 QTL cluster, we explored geographic and climatic associations of the 2 Azorean haplogroups in this genomic location (Fig 5B). Both groups showed a strong geographic structure along the West–East longitudinal axis of the Azorean islands (Fig 5D), which was stable across sampling years (S5D Fig). Haplotypes carrying the low-leaflet SPL9 allele dominated in the West (grp1), while the alternative allele of grp2 was prevalent in the East (Fig 5E). Notably, this west–east clinal distribution mirrors that of other lineages in the endemic Azorean flora (e.g., Leontodon, Ranunculus, and Tolpis), which is thought to reflect a climatic gradient between warm summers with punctuated precipitations in the West, and extended dry summers in the East, in a context of reduced annual variation in temperature and daylength in the entire archipelago compared to mainland Europe [78–82] (Figs 5F and S5E–S5G). Thus, population genomics and QTL analyses in C. hirsuta revealed potentially adaptive allelic variation in the key heterochronic transcription factor SPL9, which appears as a major driver of morphological differentiation across heterogeneous climatic conditions.
[END]
---
[1] Url:
https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3002191
Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.
via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/