(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .



Rapid evolutionary diversification of the flamenco locus across simulans clade Drosophila species [1]

['Sarah Signor', 'Biological Sciences', 'North Dakota State University', 'Fargo', 'North Dakota', 'United States Of America', 'Jeffrey Vedanayagam', 'Developmental Biology Program', 'Sloan-Kettering Institute', 'New York']

Date: 2023-09

Suppression of transposable elements (TEs) is paramount to maintain genomic integrity and organismal fitness. In D. melanogaster, the flamenco locus is a master suppressor of TEs, preventing the mobilization of certain endogenous retrovirus-like TEs from somatic ovarian support cells to the germline. It is transcribed by Pol II as a long (100s of kb), single-stranded, primary transcript, and metabolized into ~24–32 nt Piwi-interacting RNAs (piRNAs) that target active TEs via antisense complementarity. flamenco is thought to operate as a trap, owing to its high content of recent horizontally transferred TEs that are enriched in antisense orientation. Using newly-generated long read genome data, which is critical for accurate assembly of repetitive sequences, we find that flamenco has undergone radical transformations in sequence content and even copy number across simulans clade Drosophilid species. Drosophila simulans flamenco has duplicated and diverged, and neither copy exhibits synteny with D. melanogaster beyond the core promoter. Moreover, flamenco organization is highly variable across D. simulans individuals. Next, we find that D. simulans and D. mauritiana flamenco display signatures of a dual-stranded cluster, with ping-pong signals in the testis and/or embryo. This is accompanied by increased copy numbers of germline TEs, consistent with these regions operating as functional dual-stranded clusters. Overall, the physical and functional diversity of flamenco orthologs is testament to the extremely dynamic consequences of TE arms races on genome organization, not only amongst highly related species, but even amongst individuals.

Transposable element suppression is essential for genomic stability and fertility. To date, many insights have been gained by studying the major suppression loci in D. melanogaster, flamenco and 42AB. While 42AB is an exemplar germline locus, flamenco is the master regulator of TEs in the somatic cells of the ovary. Here, we take a closer look at flamenco in simulans-clade species, to see if what we have learned about flamenco in D. melanogaster holds true. Certain aspects of flamenco are conserved, including enrichment for LTR class TEs arising from horizontal transfer, but other features are diverged. flamenco has duplicated in D. simulans and may also be serving as a germline suppression cluster. This is also true in D. mauritiana, while D. sechellia flamenco retains D. melanogaster-like features. There is also incredible diversity at flamenco within D. simulans populations, suggesting important fitness effects at this locus. Overall, our data provide unique insights into the evolutionary dynamics of TE suppression and turnover of piRNA cluster properties.

Data Availability: All data has been made available in the following repositories: The genomes referenced in this study have been deposited at NCBI Genome under the accession number PRJNA907284. The small RNA data is available at NCBI SRA under the accession number PRJNA913883. These repositories will be made public upon acceptance of the manuscript. The RepeatMasker annotations are available at https://github.com/SignorLab/Flamenco_manuscript .

Cross-species comparisons further support that functions of flamenco have diversified. Data from D. sechellia and D. melanogaster conform with the current understanding of flamenco as a uni-strand cluster. However, we find evidence that D. simulans and D. mauritiana flamenco can act as a dual strand cluster in testis (D. mauritiana) and embryos (D. mauritiana and D. simulans), yielding piRNAs from both strands with a ping-pong signal. Overall, we infer that the rapid evolution of flamenco alleles across individuals and species reflects highly adaptive functions and dynamic biogenesis capacities.

In this study, we compare the flamenco locus across long-read assemblies of the three simulans-clade sister species, including 10 strains of D. simulans, and one strain each of D. mauritiana and D. sechellia. Analysis of piRNAs from ovaries of five genotypes of D. simulans found that flamenco is duplicated in D. simulans. There is no sequence synteny across copies, even though their core promoter regions and the adjacent dip1 gene duplications are conserved. flamenco has also been colonized by abundant (>40) copies of R1, a TE that was thought to insert only at ribosomal genes, and to evolve at the same rate as nuclear genes [ 27 ]. Furthermore, between different genotypes, up to 63% of TE insertions are not shared within any given copy of flamenco. Despite this, several full length TEs are shared between all genotypes in a similar sequence context. This incredible diversity at the flamenco locus, even within a single species, suggests there may be considerable variation in its ability to suppress transposable elements across individuals.

flamenco has been a favored model for understanding the piRNA pathway since the discovery of piRNA mediated silencing of transposable elements [ 6 ]. flamenco spans >350 kb of repetitive sequences located in β-heterochromatin of the X chromosome [ 20 ]. Of note, flamenco was initially identified, prior to the formal recognition of piRNAs, via transposon insertions that de-repress mdg4 (also known as gypsy), ZAM, and Idefix elements [ 20 – 24 ]. These mutant alleles disrupt the flamenco promoter, and consequently abrogate transcription and piRNA production across the length of this locus. By contrast, the deletion of multiple model germline piRNA clusters, which eliminate the biogenesis of a bulk of cognate piRNAs, surprisingly did not de-repress their cognate TEs [ 9 ]. Thus, flamenco evolution is potentially more consequential for TE dynamics. Analysis of flamenco in various strains of D. melanogaster supports that this locus traps horizontally derived TEs to achieve silencing of newly invaded TEs [ 17 ]. The flamenco locus exhibits synteny across the D. melanogaster sub-group [ 25 ]; however, the sequence composition of flamenco outside D. melanogaster has not been well-characterized [ 3 , 26 ].

The location and activity of germline piRNA clusters are stochastic and evolutionarily dynamic, as there are many copies of TE families in different locations that may produce piRNAs [ 9 , 19 ]. By contrast, somatic piRNA clusters are not redundant and a single insertion of a TE into a somatic piRNA cluster should be sufficient to largely repress that TE from further transposition [ 1 , 17 ]. Thus, flamenco should contain only one copy per TE, which is largely true in the flamenco locus of D. melanogaster [ 17 ]. Notably, flamenco is also the only piRNA cluster known to produce a phenotypic effect when mutated, since deletions of multiple germline clusters did not activate corresponding TE classes [ 9 ].

Mature piRNAs are processed from non-coding piRNA cluster transcripts, which derive from genomic regions that are densely populated with TE sequences [ 7 – 9 ]. However, the mechanisms of piRNA biogenesis differ between gonadal cell types. In the germline, piRNA clusters are transcribed from both DNA strands through non-canonical Pol II activity [ 6 , 10 – 12 ], which is initiated by chromatin marks rather than specific core promoter motifs. Moreover, co-transcriptional processes such as splicing and polyadenylation are suppressed within dual strand piRNA clusters [ 12 , 13 ]. On the other hand, in ovarian somatic support cells, piRNA clusters are transcribed from a typical promoter as a single stranded transcript, which can be alternatively spliced as with protein-coding mRNAs [ 14 – 17 ]. These rules derive in large part from the study of model piRNA clusters (i.e. the germline 42AB and somatic flamenco piRNA clusters). For both types, their capacity to repress invading TEs is thought to result from random integration of new transposons into the cluster [ 18 ]. As such, piRNA clusters are adaptive loci that play central roles in the conflict between hosts and TEs.

Drosophila gonads exemplify two important fronts in the conflict between transposable elements (TEs) and the host–the germline (which directly generates gametes), and somatic support cells (from which TEs can invade the germline) [ 1 , 2 ]. The strategies by which TEs are suppressed in these settings are distinct [ 3 ], but share their utilization of Piwi-interacting RNAs (piRNAs). These are ~24–32 nt RNAs that are bound by the Piwi subclass of Argonaute effector proteins, and guide them and associated cofactors to targets for transcriptional and/or post-transcriptional silencing [ 4 – 7 ].

Results

flamenco loci across simulans-clade Drosophilid species We identified D. simulans flamenco from several lines of evidence: piRNA cluster calls from proTRAC, its location adjacent to divergently transcribed dip1, the existence of conserved core flamenco promoter sequences, and enrichment of Ty3/mdg4 elements (Figs 1 and 2 and S1 and S2 Tables). The flamenco locus is at least 376 kb in D. simulans. This is similar to D. melanogaster, where flamenco is typically up to 350 kb, though this appears to vary by genotype [28]. In D. sechellia flamenco is at least 363 kb, however in D. mauritiana the locus has expanded to at least 840 kb (S2 Table). This is a large expansion, and it is possible that the entire region does not act as a region controlling somatic TEs. However, evidence that is does include uniquely mapping piRNAs that are found throughout the region and Ty3/mdg4 enrichment consistent with a flamenco-like locus (S1 Fig). There are no protein coding genes within the 840 kb putative flamenco region. The genes that are downstream of flamenco in D. melanogaster have moved in D. mauritiana (CG40813- CG41562 at 21.5 MB in D. melanogaster), and flamenco is now flanked by the group of genes beginning with CG14621 (22.4 MB in D. melanogaster). Thus in D. melanogaster the borders of flamenco are flanked by dip1 upstream and CG40813 downstream, while in D. mauritiana they are dip1 upstream and CG14621 downstream (but note that flamenco does not extend all the way to these genes). Between all species the flamenco promoter and surrounding region, including a dip1 gene, are alignable and conserved (Fig 2D). PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 1. A) The duplication of flamenco in the D. simulans. Both copies are flanked by copies of the dip1 gene and copies of the putative flamenco promoter. The top portion of the alignment shows ~ 2 kb around the promoter. SNPs are shown if they differentiate copies of flamenco within a single genotype of D. simulans. Dots do not indicate a single nucleotide, but rather a sequence region where no SNPs differentiate the two copies of flamenco within a single genotype. The lower portion illustrates the promoter region with all SNPs illustrated in D. melanogaster, D. sechellia, D. mauritiana, and D. simulans. B) A schematic of the restriction digest used to verify the duplicate of flamenco. The targeted region is a 1 kb fragment adjacent to the promotor of flamenco. Within this region the original flamenco copy does not contain a YACGTR site and is not cut by the restriction enzyme BsaAI. The duplicate of flamenco is cut into two pieces (750 bp and 250 bp). C) A gel showing the fragments of the original and duplicated copy of flamenco before and after digestion with BsaAI. Both copies of flamenco are amplified by the primers, in column two of the gel (Supplemental File 2). In column three of the gel, the original copy of flamenco is uncut (band 1), while the duplicate of flamenco forms two bands at 750 bp (band 2) and 250 bp (band 3). https://doi.org/10.1371/journal.pgen.1010914.g001 PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 2. A) Unique piRNA from the ovary and Ty3/mdg4 enrichment around flamenco and its duplicate in D. simulans and D. melanogaster. piRNA mapping to the entire contig that contains flamenco is shown for both species. The top of the panel shows piRNA mapping to flamenco and is split by antisense (blue) and sense (red) piRNA. The bottom panel shows the frequency of Ty3/mdg4 transposon annotations across the contig containing flamenco, counted in 100 kb windows. There is a clear enrichment of mdg4 in the area of flamenco and, in D. simulans, its duplicate compared to the rest of the contig. B) The distribution of read size for small RNA mapping to flamenco. The peak is at approximately 26 bp, within the expected range for piRNA. C) The percent of TEs in flamenco in each species which are in the antisense orientation (first bar) and the percent of TEs in the antisense orientation that are also LTR class elements (second bar). D) A phylogenetic tree of the dip1 and flamenco enhancer region for D. melanogaster and the simulans clade. This region is conserved and alignable between all species. The tree was generated with Mr. Bayes 3.2.7a [74]. Branch lengths are indicated by the scale bar at the bottom, in units of expected changes per site. https://doi.org/10.1371/journal.pgen.1010914.g002

Structure of the flamenco locus D. melanogaster flamenco bears a characteristic structure, in which the majority of TEs are Ty3/mdg4 elements in the antisense orientation (79% antisense orientation, 85% of which are Ty3/mdg4 elements) (Fig 2C and S3 Table). In D. simulans, flamenco has been colonized by large expansions of R1 transposable element repeats such that on average the percent of antisense TEs is only 50% and the percent of the locus comprised of LTR elements is 55%. However, 76% of antisense insertions are LTR insertions, thus the underlying flamenco structure is apparent when the R1 insertions are disregarded (Fig 2C). In D. mauritiana flamenco is 71% antisense, and of those antisense elements it is 85% LTRs. Likewise in D. sechellia 78% of elements are antisense, and of those 81% are LTRs. flamenco retains the overall structure of a canonical D. melanogaster-like flamenco locus in all of these species. That is, Ty3/mdg4 enrichment, the flamenco promoter region, and an enrichment of antisense LTR elements (Fig 2A–2D).

flamenco is duplicated in D. simulans In D. simulans, we unexpectedly observed that flamenco is duplicated on the X chromosome; the duplication was confirmed with PCR and a restriction digest (Figs 1 and S2 and S2 File). While this might in principle represent a second allele of flamenco that is very diverged and found in one copy of each genome, the high quality of assemblies of this region makes this unlikely (S1 File). Furthermore, it is found in every assembled D. simulans genome and thus is unlikely to be a high frequency balanced polymorphism. These duplications are associated with a conserved copy of the putative flamenco enhancer as well as copies of the dip1 gene located proximal to flamenco in D. melanogaster (Figs 1 and 3A). While it is unclear which copy is orthologous to D. melanogaster flamenco, all D. simulans lines bear one copy that aligns across genotypes. We refer to this copy as D. simulans flamenco, and the other copies as duplicates. Otherwise, outside of the promoter and dip1 region, the two copies of flamenco do not align with one another and lack synteny amongst their resident TEs. Possible evolutionary scenarios are that the flamenco duplication occurred early in the simulans lineage, that the clusters evolved very rapidly, or that the duplication encompassed only the promoter region and was subsequently colonized by TEs (Figs 1A and 3A). The duplicate retains the structure of flamenco, with an average of 67% of TEs in the antisense orientation, and 91% of the TEs in the antisense orientation are LTRs. The duplicate of flamenco is less impacted by R1, with some genotypes having as few as 8 R1 insertions (Fig 3C). PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 3. A) A representation of flamenco and its duplicate from genotype LNP-01-062. R1 repeat regions are shown in blue. Full length transposable elements are labeled. There is no synteny conservation between flamenco and its duplicate. Figure is not to scale. B) Divergence between copies of flamenco. This is a phylogenetic tree of dip1 and the flamenco promoter region from each genome. In between dip1 and the promoter are a series of G5/INE1 repeats that are found in every genome. Overall this region is fairly conserved, with the duplicate copies all grouping together with short branch lengths (shown in pink). The original copy of flamenco is more diverse with some outliers (shown in light blue) but there is good branch support for all the deep branches of the tree. C) The proportion of insertions that are shared by one through seven genotypes (genotypes with complete flamenco assemblies). D) Divergence of flamenco within D. simulans. Labeled TEs correspond to elements which are present in a full length copy in at least one genome. If they are shared between genomes they are labeled in red, if they are unique they are black. If they are full length in one genome and degraded in other genomes they are represented by stacked dashes. If they are present in the majority of genomes but missing in one, it is represented as a missing that TE, which is agnostic to whether it is a deletion or the element was never present. https://doi.org/10.1371/journal.pgen.1010914.g003 The flamenco duplicate is absent in the D. simulans reference assembly, w501 (GCA_000754195.3), but present in wxD1, suggesting it was polymorphic, the duplication had not yet occurred, or the most likely scenario that it was not assembled. A second flamenco promoter is present on a 750 bp scaffold in w501, but that is not enough to know if it is a flamenco duplicate or an assembly artifact.

flamenco piRNA is expressed in the testis and the maternal fraction Canonically, flamenco piRNA is expressed in the somatic follicular cells of the ovary and not in the germline, and also does not produce a ping-pong signal [23]. It was not thought to be present in the maternal fraction of piRNAs or other tissues. However, that appears to be variable in different species (Fig 4). We examined single mapping reads in the flamenco region from testes and embryos (maternal fraction) in D. simulans, D. mauritiana, D. sechellia, and D. melanogaster. As a control we also included D. melanogaster ovarian somatic cells, where Aub and Ago3 are not expressed and therefore there should be no ping-pong signals. In D. simulans and D. mauritiana flamenco is expressed bidirectionally in the maternal fraction and the testis, including ping-pong signals on both strands (Figs 4A, 4C and S1). In D. sechellia, there is no expression of flamenco in either of these tissues. Discarding multimappers in the maternal fraction 63% (D. mauritiana)– 36% (D. simulans) of the ping-pong signatures on the X with a z-score of at least 0.9 are located within flamenco (Fig 4C). In the testis the picture is more complicated–in D. mauritiana 50% of ping-pong signals on the X with a z-score of at least 0.9 are located within flamenco (S1 Fig). While mapping of piRNA to both strands was observed in D. simulans testis, there is very little apparent ping-pong activity (5 positions in flamenco z > 0.9; 15 potential ping pong signals on the X). In D. melanogaster, there is uni-strand expression in the maternal fraction, but it is limited to the region close to the promoter. In D. melanogaster no ping-pong signals have a z-score above 0.8 in the maternal fraction or the ovarian somatic cells. There are ping-pong stacks in flamenco in the testis of D. melanogaster (2% of the total on the contig); however, they are limited to a single region and are not abundant enough to be strong evidence of ping-pong activity. PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 4. A. Expression of single mapping piRNAs in the maternal fraction and testis (gray) of D. melanogaster and the simulans clade. Sense mapping reads are shown in blue, antisense in red. Libraries are RPM normalized and the axis are the same for each library type i.e. embryo. D. sechellia has no expression of flamenco in the maternal fraction or the testis. D. melanogaster has low expression in the maternal fraction and very little ping-pong activity. D. simulans and D. mauritiana show dual stranded expression in the testis and maternal fraction. B. The total number of uniquely mapping reads for each of the libraries illustrated in A. This is included to demonstrate that a low number of mapping reads does not explain the patterns seen in D. sechellia versus D. mauritiana. C. The height of 10 nt ping-pong stacks at flamenco in D. melanogaster maternal fraction, testis and ovarian somatic cells is shown on the left. Below each schematic of the height of the stacks is the position of z-scores over 0.8, indicating the likelihood that this is a real ping-pong signal as opposed to an artifact. Scores were produced by pingpongpro [76]. Signals move from red to blue as they approach 1. In the testis, a few ping-pong signals reach this threshold but not enough to indicate ping-pong activity convincingly. On the right are the ping-pong stacks and z-scores for the maternal fraction and testis in D. simulans. Only in the maternal fraction are the density of z-scores over 0.8 convincing enough to indicate an active ping-pong cycle in the flamenco region. However, the presence of stacks is enriched in testis, thus this may warrant further investigation. D. mauritiana also has convincing ping-pong signals in this region (S1 Fig). D. A schematic of the evolution of flamenco and its mode expression in the simulans and melanogaster clade. https://doi.org/10.1371/journal.pgen.1010914.g004 In the duplicate of flamenco in the maternal fraction 15% of the ping-pong signals with a z-score above 0.9 on the X are within the flamenco duplicate. The flamenco duplicate does not have a strong signal of the ping-pong pathway in the testis. In addition, flamenco in these species has been colonized by full length TEs thought to be active in the germline such as blood, burdock, mdg-3, Transpac, and Bel [29,30]. The differences in ping-pong signals between species and the presence of germline TEs in D. simulans and D. mauritiana suggests that the role of flamenco in these tissues has evolved between species.

R1 LINE elements at the flamenco locus R1 elements are well-known to insert into rDNA genes, are transmitted vertically, and evolve similarly to the genome background rate [27]. They have also been found outside of rDNA genes, but only as fragments. R1 elements are abundant within flamenco loci in the simulans clade. Outside of flamenco, R1 elements in D. simulans are distributed according to expectation, with full length elements occurring only within rDNA (S3 File). Within flamenco, most copies of R1 occur as tandem duplicates, creating large islands of fragmented R1 copies (Fig 3A). They are on average 3.7% diverged from the reference R1 from D. simulans. Across individual D. simulans genomes, ~99 kb of flamenco loci consists of R1 elements, i.e. 26% of their average total length. SZ45, LNP-15-062, NS40, MD251, and MD242 contain 4–7 full length copies of R1 in the sense orientation, even though all but SZ45 bear fragmented R1 copies on the antisense strand. (The SZ45 flamenco assembly is incomplete, as the scaffold ends before the end of Ty3/mdg4 enrichment). As the antisense R1 copies are expected to suppress R1 transposition, flamenco may not suppress these elements effectively. Alternatively, it is possible that D. simulans flamenco is still mostly active in the soma, while R1 is active in the germline, and thus escapes host control by flamenco. In D. mauritiana, flamenco harbors abundant fragments or copies of R1 (19 on the reverse strand and 20 on the forward strand), and one large island of R1 elements. In total, D. mauritiana contains 84 kb of R1 sequence within flamenco. In D. mauritiana there are 8 full length copies of R1 at the flamenco locus, 7 in antisense, which are not obviously due to a segmental or local duplication. Finally, we find that D. sechellia flamenco lacks full length copies of R1, and it contains only 18 KB of R1 sequence (16 fragments on the reverse strand). Yet, all the copies are on the sense strand, which would not produce fragments that can suppress R1 TEs. Essentially the antisense copies of R1 in D. mauritiana should be suppressing the TE, but we see multiple full length antisense insertions, and D. sechellia has no antisense copies, but we see no evidence for recent R1 insertions. From this it would appear that whatever is controlling the transposition of R1 lies outside of flamenco. The presence of long sense-strand R1 elements within flamenco is a departure from expectation [17,27]. There is no evidence of an rDNA gene within the flamenco locus or the insertion site of R1 within the 28S rDNA gene that would explain the insertion of R1 elements there, nor is there precedence for the large expansion of R1 fragments within the locus. Furthermore, the suppression of R1 transposition does not appear to be controlled by flamenco.

piRNA production from R1 On average R1 elements within the flamenco locus of D. simulans produce more piRNA than any other TE within flamenco. R1 reads mapping to the forward strand constitute an average of 51% of the total piRNAs within the flamenco locus from the maternal fraction, ovary, and testis using weighted mapping. The maternal fraction constitutes the piRNA deposited by the mother into the embryo. Weighted mapping refers to mapping where read counts are divided by the number of places they map, i.e. a read that maps to 50 locations is counted as 1/50. The only exception is the ovarian sample from SZ232 which is a large outlier at only 5%. However R1 reads mapping to the reverse strand account for an average of 84% of the piRNA being produced from the reverse strand in every genotype and tissue–maternal fraction, testis, or ovary. If unique mapping is considered instead of weighted these percentages are reduced by approximately 20%, which is to be expected given that R1 is present in many repeated copies. Production of piRNA from the reverse strand seems to be correlated with elements inserted in the sense orientation, of which the vast majority are R1 elements in D. simulans (S3 Fig). The production of large quantities of piRNA cognate to the R1 element seemingly has no function–if R1 only inserts at rDNA genes and are vertically transmitted there is little reason to be producing the majority of piRNA in response to this element. In D. sechellia there are very few piRNA produced from flamenco in the maternal fraction or testis (which is expected for a cluster that is only active in ovarian somatic tissue), and there are no full length copies of R1. Likewise overall weighted piRNA production from R1 elements on either strand is 2.8–5.9% of the total mapping piRNA. In contrast in D. mauritiana there are full length R1 elements and abundant piRNA production in the maternal fraction and testis. In D. mauritiana an average of 28% of piRNAs mapping to the forward strand of flamenco are arising from R1, and 33% from the reverse strand. In D. mauritiana R1 elements make up a smaller proportion of the total elements in the sense orientation (24%), versus D. simulans (55%).

Conservation of flamenco The dip1 gene and promoter region adjacent to each copy of flamenco are very conserved both within and between copies of flamenco (Fig 3A). The phylogenetic tree of the area suggests that we are correct in labeling the two copies as the original flamenco locus and the duplicate (Fig 3A). The original flamenco locus is more diverged amongst genotypes of D. simulans while the duplicate clusters closely together with short branch lengths (Fig 3A). The promotor region is also conserved and alignable between D. melanogaster, D. sechellia, D. mauritiana, and D. simulans (Fig 2D). However, the same is not true of the flamenco locus itself. Approximately 3 kb from the promoter flamenco diverges amongst genotypes and species and is no longer alignable by traditional sequence-based algorithms, as the TEs are essentially presence/absence polymorphisms that span multiple kb. There is no conservation of flamenco between D. melanogaster, D. simulans, D. sechellia, and D. mauritiana (Fig 5). However, within the simulans clade many of the same TEs occupy the locus, suggesting that they are the current genomic invaders in each of these species (Fig 5). PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 5. A. Copy number of a subset of transposable elements at flamenco. Solo LTRs are indicated by in a lighter shade at the top of the bar. The black line on each bar graph indicates a copy number of one. Values for D. simulans are the average for all genotypes with a complete flamenco assembly. Note that in D. melanogaster (green) most TEs have a low copy number. The expansion of R1 elements in the simulans clade is clearly indicated on the right hand panel with a dotted box. Many elements within flamenco are multicopy in the simulans clade. While some of this is likely due to local duplications it is clearly a different pattern than D. melanogaster. Enrichment of LTR elements on the antisense strand is clear for all species. B. Alignment of flamenco in D. melanogaster, D. simulans, D. sechellia, and D. mauritiana. There is no conserved synteny between species but there are clearly shared TEs, particularly within the simulans clade. The expansion of D. mauritiana compared to the other species is apparent. https://doi.org/10.1371/journal.pgen.1010914.g005 In D. simulans the majority of full length TEs are private insertions– 54% in flamenco and 64% in the duplicate. Copies that are full length in one genotype but fragmented in others are counted as shared, not private. However, the TE must be full length in at least one genotype to be included in this grouping. Almost half of these private insertions in the duplicate are due to a single genotype with a unique section of sequence, in this case MD251. In flamenco, private insertions are the single largest category of transposable element insertions, followed by fixed insertions. Thus even within a single population there is considerable diversity at the flamenco locus, which could potentially lead to differences in the ability to suppress TEs in the somatic cells of the ovary. For example, full length copies of 297 are present in four genotypes either in flamenco or the duplicate, which would suggest that these genotypes are able to suppress this transposable element while the other genotypes are not. Germline suppression is redundant, thus absence of a TE in flamenco would not necessarily mean it is not suppressed in the germline. In contrast mdg4-3 is present in more than one full length copy in flamenco and its duplicate in every genotype but one where it is present in a single copy. There are a number of these conserved full length TEs that are present in all or nearly all genotypes, including Chimpo, mdg4-2, Tirant, and mdg4-4. In addition, INE1 elements adjacent to the promoter are conserved. It is notable that any full length TEs are shared across all genotypes, given that wxD1 was likely collected 30–50 years prior to the others, and the collections span continents (Jerry Coyne pers. comm.). Two facts are relevant to this observation: (1) TEs were shown not to correlate with geography [31] and (2) D. simulans is more diverse within populations than between different populations [32–34]. Other explanations are also plausible. Selection could be maintaining these full length TEs because TE deletions allow for TE reactivation that reduces fitness, wxD1 could have had introgression from other lab strains, or a combination of these explanations.

Suppression of TEs by the flamenco locus and the trap model of TE control In D. melanogaster, it was proposed that while germline clusters may have many insertions of a single TE, the somatic ’master regulator’ flamenco will have a single insertion of each transposon, after which they are silenced and no longer able to transpose [17]. While the ‘single copy’ rule remains a hypothesis, it is largely supported in D. melanogaster where the two observed multicopy elements likely arose from segmental duplications. However, this is from an older and partially misassembled flamenco (18). In the past, this ‘single copy’ rule has appeared to apply only to full length insertions, with older degraded copies not effectively suppressing TEs [17]. To evaluate this model we will determine each of the following for full length TEs: (1) How many TEs have antisense oriented multicopy elements within flamenco? (2) How many de novo insertions of TEs in the flamenco duplicate of D. simulans are also present in the original flamenco copy? (3) How many TEs have full length and fragmented insertions, suggesting the older fragments did not suppress the newer insertion? First we will evaluate the presence of antisense oriented multicopy elements within flamenco in each species. Due to the difficulty in classifying degraded elements accurately, for example between multiple Ty3/mdg4 elements, we will focus here on full length TEs, suggesting recent transposition. In D. melanogaster there are 17 full length TEs (sense and antisense), one of which is present in multiple antisense copies. In D. sechellia there are 22 full length TEs within the flamenco locus, two of which are multicopy in antisense. D. mauritiana contains 41 full length TEs within the flamenco locus. Five of these are present in multiple antisense full length copies–mdg4-5, R1, Stalker-4, jockey-3, and Cr1a. In D. simulans there are 26 full length TEs present in any of the seven complete flamenco assemblies. Six of these are present in multiple antisense copies within a single genome–INE1, Chimpo, mdg4-4, 412, Tirant, and BEL-unknown. The two Tirant copies are likely a segmental duplication as they flank an R1 repeat region. In the duplicate of flamenco in D. simulans there are 30 full length TEs, none of which are multicopy in antisense. However, there are TEs that are multicopy in antisense with respect to the original copy of flamenco—mdg4-3, BEL-unknown, Nomad-1, Chimpo, mdg4-53A, R1, and INE1. The fact that these elements are full length in both copies suggests independent insertions in each cluster rather than inheritance from duplication. Thus D. simulans and D. mauritiana overall do not meet the expectation that flamenco will contain a single insertion of any given TE. Full length elements are generally younger insertions than fragmented insertions. Although we cannot know the order of insertions and deletions for sure, if a full length element is inserted in flamenco and there are fragments in the antisense orientation elsewhere in flamenco this suggests that flamenco did not successfully suppress the transposition of this element. In D. melanogaster six elements have fragments in antisense that are less than 10% diverged from a full length TE (excluding TEs present in multiple antisense copies). In D. sechellia and D. mauritiana this is nine and five elements respectively. In D. simulans ten TEs fit this criteria in flamenco including mdg4-2, mdg4-3, mdg4-5, 412, INE1, R1 and Zam. In the duplicate of flamenco in D. simulans there are nine TEs that fit this criteria, including mdg4-2, mdg4-3, mdg4-5, 297, Stalker-4, and R1. In the simulans clade either fragments of TEs are not sufficient to suppress transposable elements or some elements are able to transpose despite the hosts efforts to suppress them.

[END]
---
[1] Url: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1010914

Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/