(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .
Transposon dynamics in the emerging oilseed crop Thlaspi arvense [1]
['Adrián Contreras-Garrido', 'Department Of Molecular Biology', 'Max Planck Institute For Biology Tübingen', 'Tübingen', 'Dario Galanti', 'Plant Evolutionary Ecology', 'University Of Tübingen', 'Andrea Movilli', 'Claude Becker', 'Lmu Biocenter']
Date: 2024-02
Genome evolution is partly driven by the mobility of transposable elements (TEs) which often leads to deleterious effects, but their activity can also facilitate genetic novelty and catalyze local adaptation. We explored how the intraspecific diversity of TE polymorphisms might contribute to the broad geographic success and adaptive capacity of the emerging oil crop Thlaspi arvense (field pennycress). We classified the TE inventory based on a high-quality genome assembly, estimated the age of retrotransposon TE families and comprehensively assessed their mobilization potential. A survey of 280 accessions from 12 regions across the Northern hemisphere allowed us to quantify over 90,000 TE insertion polymorphisms (TIPs). Their distribution mirrored the genetic differentiation as measured by single nucleotide polymorphisms (SNPs). The number and types of mobile TE families vary substantially across populations, but there are also shared patterns common to all accessions. Ty3/Athila elements are the main drivers of TE diversity in T. arvense populations, while a single Ty1/Alesia lineage might be particularly important for transcriptome divergence. The number of retrotransposon TIPs is associated with variation at genes related to epigenetic regulation, including an apparent knockout mutation in BROMODOMAIN AND ATPase DOMAIN-CONTAINING PROTEIN 1 (BRAT1), while DNA transposons are associated with variation at the HSP19 heat shock protein gene. We propose that the high rate of mobilization activity can be harnessed for targeted gene expression diversification, which may ultimately present a toolbox for the potential use of transposition in breeding and domestication of T. arvense.
Transposable elements (TEs) are often considered genomic parasites, but they can also generate phenotypic novelty that helps organisms to adapt to new environments. To understand how TEs might contribute to phenotypic diversity and adaptive potential in the emerging oilseed crop Thlaspi arvense (field pennycress), we examined the dynamics of TE variation in a geographically diverse sample of this species. By surveying almost 300 wild accessions from North America and Eurasia we discovered over 90,000 polymorphic TE insertions. We identified not only genetic factors that vary between populations and that are associated with TE mobilization, but also TE families that are most likely to generate genetic diversity of interest to breeders.
Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: D.W. holds equity in Computomics, which advises breeders. D.W. advises KWS SE, a plant breeder and seed producer. All the other authors have declared that no competing interests exist.
Funding: The study was supported by Marie Sk?odowska Curie ETN EpiDiverse (EU Horizon 2020 Grant Agreement No. 764965; C.B., O.B., D.W.), the European Research Council (Grant Agreement No. 716823 “FEAR-SAP”; C.B.), the Novo Nordisk Foundation Novozymes Prize and the Max Planck Society (D.W.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Data Availability: Code used for analysis and figures can be found at:
https://github.com/acontrerasg/Tarvense_transposon_dynamics . Sequencing reads can be found at the European Nucleotide Archive (ENA) under accession number PRJEB62093. See S3 Table for details of the datasets. Datasets were uploaded to Zenodo under the DOI: 10.5281/zenodo.6372331 . The workflow was based on custom bash and python scripts available at
https://github.com/acontrerasg/Tarvense_transposon_dynamics . All the code for short variants calling, filtering and imputation can be found on GitHub (
https://github.com/Dario-Galanti/BinAC_varcalling ).
Copyright: © 2024 Contreras-Garrido et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Here, we use whole-genome resequencing data from 280 geographically diverse T. arvense accessions to characterize the inventory of mobile TEs (the ‘mobilome’), TE insertion patterns of class I and class II elements and their association with variation in the DNA methylation landscape. We highlight a small TE family with preference for insertion near genes, which may be particularly useful for identifying new genetic alleles for T. arvense domestication.
As a member of the Brassicaceae family, T. arvense is closely related to the oilseed crops Brassica rapa and Brassica napus, as well as the undomesticated model plant Arabidopsis thaliana [ 25 ]. A large proportion of the T. arvense genome consists of TEs [ 26 ], and TE co-option has been proposed as a mechanism particularly for short-term adaptation and as a source of genetic novelty [ 27 ]. As in many other species, differences in TE content is likely to be a major factor for epigenetic variation as well, especially through remodeling of DNA methylation [ 28 ].
Thlaspi arvense, field pennycress, yields large quantities of oil-rich seeds and is emerging as a new high-energy crop for biofuel production [ 19 – 21 ]. As plant-derived biofuels can be a renewable source of energy [ 22 ], the past decade has seen efforts to domesticate this species and understand its underlying genetics in the context of seed development and oil production. Thlaspi arvense is particularly attractive as a crop because it can be grown as winter cover during the fallow period, protecting the soil from erosion [ 19 ]. Natural accessions of T. arvense are either summer or winter annuals, with winter annuals being particularly useful as potential cover crop [ 23 ]. Native to Eurasia, T. arvense was introduced and naturalized mainly in North America [ 24 ].
While epigenetic silencing of TEs is important for the maintenance of genome integrity and species-specific gene expression, TE mobilization can also generate substantial phenotypic variation through changing the expression of adjacent genes, either due to local epigenetic remodeling or direct effects on transcriptional regulation [ 10 ]. Because TE activity is often responsive to environmental stress [ 11 – 13 ] and other environmental factors [ 14 – 17 ], it has been proposed that it could be used for speed-breeding through externally controlled transposition activation [ 18 ].
Transposable elements (TEs) are often neglected, mobile genetic elements that make up large fractions of most eukaryotic genomes [ 1 ]. In plants with large genomes, such as wheat, TEs can account for up to 85% of the entire genome [ 2 , 3 ]. Due to their mobility, TEs can significantly shape genome dynamics and thus both long- and short-term genome evolution across the eukaryotic tree of life. TEs are typically present in multiple copies per genome and they are broadly classified based on their replication mechanisms, as copy-and-paste (class I or retrotransposons) or cut-and-paste (class II or DNA transposons) elements. The two categories can be broken down into superfamilies based on the arrangement and function of their open reading frames [ 4 ]. Further distinctions can be made based on the phylogenetic relatedness of the TE encoded proteins [ 5 , 6 ]. To minimize the mutagenic effects of TE mobilization, host genomes tightly regulate TE load through an array of epigenetic repressive marks that suppress TE activity [ 7 – 9 ].
Results
Phylogenetically distinct transposon lineages shape the genome of T. arvense To be able to understand TE dynamics in Thlaspi arvense, we first reanalyzed its latest reference genome, MN106-Ref [26]. In total, 423,251 transposable elements were categorized into 1984 unique families and grouped into 14 superfamilies (S1 Table), together constituting 64% of the ~526 Mb MN106-Ref genome. Over half of the genome consists of LTR (Long Terminal Repeat)-TEs. Using the TE model of each LTR family previously generated by structural de novo prediction of TEs [26], we assigned 858 (~70%) of the 1,205 Ty1 and Ty3 LTR-TEs to known lineages based on the similarity of their reverse transcriptase domains [5] (Fig 1A). PPT PowerPoint slide
PNG larger image
TIFF original image Download: Fig 1. Genome-wide distribution and classification of TE families and superfamilies in the T. arvense reference genome MN106-Ref. (A) Phylogenetic tree of LTR retrotransposons based on the reverse transcriptase domain. (B) Genome-wide distribution of TE family and superfamily abundances. The tracks denote, from the outside to the inside, (1) protein-coding loci, (2) Athila, (3) Retand, (4) CRM, (5) Tekay, (6) Reina, (7) Ale, (8) Alesia, (9) Bianca, (10) Ivana, (11) all DNA TEs. (C) Evolutionary age estimates of intact copies of autonomous versus non-autonomous TE families. P-value is computed based on performing a Wilcoxon Rank Sum test. (D) Total number of intact TEs in different lineages. (E) Distribution of insertion time estimates for intact LTR elements across different LTR TE lineages (shown if number of intact TEs was greater than 10).
https://doi.org/10.1371/journal.pgen.1011141.g001 The most abundant LTR-TE lineage in T. arvense is Ty3 Athila (S2 Table) with ~180,000 copies, 10-fold more than the next two most common lineages, Ty3 Tekay (~57,000) and Ty3 CRM (~30,000). The most abundant Ty1 elements belonged to the Ale lineage, with 108 families, while the Alesia and Angela lineages were represented only by one family each (S2 Table). Next, we compared the genomic distribution of lineages within the same TE superfamily (Fig 1B). In the Ty1 superfamily, CRM showed a strong centromeric preference, whereas Athila was more common in the wider pericentromeric region. In the Ty1 superfamily, Ale elements were enriched in centromeric regions, whereas Alesia showed a preference for gene-rich regions.
Thlaspi arvense LTR retrotransposons present signatures of recent activity To assess the potential and natural variation of TEs transposition across accessions, we used the complete set of protein domains identified for a respective TE model to classify each family as either potentially autonomous or non-autonomous (METHODS). About 60% of all TE families (1,260 out of 2,038) encoded at least one TE-related protein domain, but only about a quarter had all protein domains necessary for transposition, and we classified these 537 families as autonomous. Autonomous TE families had on average more and longer copies than non-autonomous ones, although both contributed similarly to the total TE load in the genome (S1 Fig). Next, we focused on individual, intact LTR-TE copies, since they are often the source of ongoing mobilization activity (13)(18)(56). Overall, the 193 autonomous LTR-TE families had more members without apparent deletions than the 1,027 non-autonomous LTR-TE families (2,039 versus 339). Intact LTR-TEs from autonomous families tended to be evolutionarily younger and more abundant than their non-autonomous counterparts (Fig 1C). As for lineages, Athila was the lineage with the most intact members, followed by Tekay and CRM (Fig 1D), although estimates of insertion times revealed Ale and Alesia Ty1 lineages as actors of the most recent transposition bursts (Fig 1E).
Host control of TE mobility In A. thaliana, natural genetic variation affects TE mobility and genome-wide patterns of TE distribution, driven by functional changes in key epigenetic regulators [14,30–32]. The rich inventory of TE polymorphisms in T. arvense offered an opportunity to investigate the genetic basis of TE mobility in a species with a more complex TE landscape. We tested for genome-wide association (GWA) between genetic variants (SNPs and short indels) and TIP load of different TE classes, TE orders and TE superfamilies [4]. We found several GWA hits next to genes that are known to affect TE activity or are good candidates for being involved in TE regulation (Fig 4A–4D). The results differed strongly between class I and class II TEs: while class I TEs were associated with a wide range of genes encoding mostly components of the DNA methylation machinery (Fig 4A–4D), class II TEs were mostly associated with allelic variation at an ortholog of O. sativa HEAT SHOCK PROTEIN 19 (HSP19). Only class I TE superfamilies were enriched for significant associations close to DNA methylation machinery genes (Fig 4B), and this difference was consistent for most superfamilies that belonged to either class I or class II (S7 Fig). The most prominent hits for class I TIPs were near orthologs of A. thaliana BROMODOMAIN AND ATPase DOMAIN-CONTAINING PROTEIN 1 (BRAT1), which prevents transcriptional silencing and promotes DNA demethylation [7], and components of the RNA-directed DNA methylation machinery such as DOMAINS REARRANGED METHYLTRANSFERASE 1 (DRM1), ARGONAUTE PROTEIN 9 (AGO9) and DICER LIKE PROTEIN 4 (DCL4) [33] (Figs 4A–4D, S7 and S8). Another category of genes that emerged in our GWA are genes encoding DNA and RNA helicases such as RECQL1 and 2 (Figs 4 and S8). Some of our GWA peaks extend over several genes and might reflect associations with less well characterized genes, but others have the strongest associations in individual genes such as HSP19 and BRAT1 (S8 Fig). For HSP19, the top SNPs are located in introns and it is difficult to predict their effect. BRAT1 has two highly significant, fully linked SNPs in exons 1 and 4. The SNP in exon 4 (Chr1:63627484) introduces a stop codon that removes part of the ATPase domain and the entire chromatin binding bromodomain, and this mutation almost certainly completely eliminates BRAT1’s anti-silencing activity [7]. PPT PowerPoint slide
PNG larger image
TIFF original image Download: Fig 4. GWA analysis for TIP load of a class I and a class II TE superfamily. Results including all superfamilies are shown in S7 Fig. (A) Manhattan plots with candidate genes indicated next to neighboring variants. The red line corresponds to a genome-wide significance with full Bonferroni correction, the blue line to a more generous threshold of–log(p) = 5. (B) Enrichment and expected FDR of a priori candidate DNA methylation machinery genes, for stepwise significance thresholds [28,34]. (C) Shown are the allelic effects of the red-circled variants from the corresponding Manhattan plots on the left. (D) Shown are the candidate genes marked in A, their putative functions and distances to the top variant of the neighboring peaks. Blue font denotes DNA methylation machinery genes included in the enrichment analyses. (E) DNA methylation around class I and class II TIPs in carrier vs. non-carrier individuals.
https://doi.org/10.1371/journal.pgen.1011141.g004 Since accessions that diverged earlier from the reference had potentially more time to accumulate TIPs, we also estimated the age of all insertions [14] and repeated the GWA using only TIPs younger than 500,000 years. The results were similar to using all TIPs, suggesting that this potential reference bias is unlikely to drive any of the identified associations (S9 Fig). To further confirm the association between the DNA methylation pathway and class I TE polymorphisms, we used published bisulfite sequencing data to quantify methylation levels of the neighboring regions of TIPs [28]. In all three epigenetic contexts (CG, CHG, CHH; where H stands for all three nucleotides but G), we found a significant increase of methylation up to 1 kb around class I, but not around class II TE insertions (Fig 4E). Taken together, we interpret these results such that class I TE mobility is primarily controlled by the DNA methylation machinery, leading to RdDM spreading around novel insertions, thus creating substantial epigenetic variation beyond TE loci.
[END]
---
[1] Url:
https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1011141
Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.
via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/