(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .
Genomic survey maps differences in the molecular complement of vesicle formation machinery between Giardia intestinalis assemblages [1]
['Shweta V. Pipaliya', 'Division Of Infectious Diseases', 'Department Of Medicine', 'University Of Alberta', 'Edmonton', 'Joel B. Dacks', 'Institute Of Parasitology', 'Biology Centre', 'Czech Academy Of Sciences', 'České Budějovice']
Date: 2024-01
Giardia intestinalis is a globally important microbial pathogen with considerable public health, agricultural, and economic burden. Genome sequencing and comparative analyses have elucidated G. intestinalis to be a taxonomically diverse species consisting of at least eight different sub-types (assemblages A-H) that can infect a great variety of animal hosts, including humans. The best studied of these are assemblages A and B which have a broad host range and have zoonotic transmissibility towards humans where clinical Giardiasis can range from asymptomatic to diarrheal disease. Epidemiological surveys as well as previous molecular investigations have pointed towards critical genomic level differences within numerous molecular pathways and families of parasite virulence factors within assemblage A and B isolates. In this study, we explored the necessary machinery for the formation of vesicles and cargo transport in 89 Canadian isolates of assemblage A and B G. intestinalis. Considerable variability within the molecular complement of the endolysosomal ESCRT protein machinery, adaptor coat protein complexes, and ARF regulatory system have previously been reported. Here, we confirm inter-assemblage, but find no intra-assemblage variation within the trafficking systems examined. This variation includes losses of subunits belonging to the ESCRTIII as well as novel lineage specific duplications in components of the COPII machinery, ARF1, and ARFGEF families (BIG and CYTH). Since differences in disease manifestation between assemblages A and B have been controversially reported, our findings may well have clinical implications and even taxonomic, as the membrane trafficking system underpin parasite survival, pathogenesis, and propagation.
Giardia intestinalis, a globally significant microbial pathogen, displays taxonomic diversity, with at least eight sub-types (assemblages A-H) capable of infecting various hosts, including humans. This study examined 89 Canadian genomes of assemblage A and B G. intestinalis to explore vesicle formation and cargo transport machinery using comparative genomics. We found considerable variability in the molecular components of endolysosomal ESCRT protein machinery, adaptor coat protein complexes, and the ARF regulatory system. Notably, differences existed between assemblages A and B but not within the same assemblage in the examined trafficking systems, including the loss of ESCRTIII subunits and lineage-specific duplications in COPII machinery, ARF1, and ARFGEF families. These findings could have clinical and taxonomic implications, as the membrane trafficking system is critical for parasite survival, pathogenesis, and propagation.
Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: Shweta V. Pipaliya is employed at F. Hoffmann La-Roche (Zurich, Switzerland). JBD and MAC have no competing interests.
Funding: SVP received support and a stipend from a CIHR Doctoral Scholarship (CIHR CGM 175863) and an Alberta Innovates Graduate Student Scholarship. Research in the Dacks lab is funded by NSERC Discovery Grants (RES0043758, RES0046091), awarded to JBD. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Data Availability: All assembled genome have been made available through Figshare (
https://doi.org/10.6084/m9.figshare.12668813 ). For each resulting assembled genome, GFF and unmapped features files have also been made available through Figshare (
https://doi.org/10.6084/m9.figshare.13567655.v2 ).
Copyright: © 2023 Pipaliya et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Large-scale genomic data from assemblage A and B isolates are therefore essential to evaluate these trends. Recently, the British Columbia Centre for Disease Control Public Health Laboratory (BCCDC PHL) sequenced 89 G. intestinalis isolates belonging to either assemblage A or B [ 24 ]. Archived isolates were collected from fecal samples, surface waters, and beavers during periods of sporadic and regional Giardiasis outbreaks in across the province of British Columbia (Canada) between the years 1989 and 1995 as well as some from Hamilton, Ontario (Canada) [ 25 ]. A combination of PCR and genome sequencing classified these outbreak-associated Giardia isolates within assemblage AI, AII, and B [ 24 , 25 ]. In this study we leveraged the short-read sequencing data generated by the BCCDC PHL by first assembling the Giardia genomes, followed by a population-scale survey of the molecular complement with three important families of vesicle formation machinery, the ESCRT complexes, vesicle coats and adaptor proteins, and the ARF regulatory system proteins in these large collection of Giardia isolates [ 24 ].
Differences in the other encoded protein machinery, particularly those belonging to the membrane trafficking system pathways, are also evident. Previous comparative genomics and phylogenetic investigations performed by us and others showed the complement and evolution of the ESCRT proteins, vesicle coats, and the ARF regulatory system proteins. Within each of these systems, consistent patterns of inter-assemblage variabilities were identified that would have substantial implications on the molecular intricacies underpinning the processes occurring at the parasite endomembrane organelles. Given the small number of isolates from each of the two human-infecting assemblages included in these studies, a deeper investigation at a population level is necessary to confirm the initial findings of intra-assemblage differences and to assess what variability exists, if any within assemblages, i.e. at the intra-assemblage level.
The increased availability of genome sequencing data from numerous assemblage A and B isolates previously enabled large-scale and multi-family comparative genomic investigation and has brought forward growing evidence that suggests considerable inter-assemblage variability at the genetic level. Much of this has been found within multiple Giardia-specific virulence gene families, such as the cysteine-rich variant-specific surface proteins (VSPs) expressed by the trophozoites during a gut infection for parasite antigenic variation in order to modulate and evade the host immune cells and other high-cysteine proteins (HC) [ 22 ]. Recent genome-wide surveys have highlighted that isolates belonging to assemblage B encode upwards of 340 different VSPs compared to their assemblage A counterparts, which have smaller VSP repertoires (i.e., 136 in WB)[ 9 , 23 ]. Most notably, the assemblages show high proportions of assemblage-specific VSP and HCs [ 9 ].
From a clinical standpoint, differences in human Giardiasis outcomes caused due to assemblages A and B are also variable and often conflicting. These reports have been documented through several cross-sectional clinical studies conducted across numerous countries (e.g., United Kingdom, Scotland, Sweden, United States, Saudi Arabia, Egypt) to understand if specific strains are attributable to a higher probability of developing symptomatic or chronic disease outcomes [ 17 – 21 ]. While the precise impact of molecular-level attributes and variances within Giardia assemblage biology on the manifestation of symptoms and disease remains uncertain, gaining insight into the underlying parasite biology and genomic content of these assemblages is undeniably valuable for a more comprehensive understanding of this discourse.
Fundamental biological and genetic differences have been enumerated between the two human-infecting strains. Although structurally otherwise identical, various cytogenetic differences are also present, such as the number of chromosomes per nuclei and their sizes [ 14 , 15 ]. Although generally lower compared to other polyploid eukaryotes, genetic allelic heterozygosity (ASH) comparisons between assemblages A and B have shown variances in the levels between the two. Assemblage A isolate WB is markedly lower in overall ASH (<0.01%) compared to assemblage B isolate, GS, which has a much higher genomic allelic divergence (0.5%) [ 16 ].
Advances in genome sequencing technologies have resulted in an abundance of new genomic data from numerous human and animal-infecting Giardia isolates for comparative analyses. Of the human-infecting isolates, assemblage A, isolate WB was the first available genome and lent tremendous insights into molecular-level losses and sequence divergence in this parasite compared to other eukaryotes [ 4 ]. Since then, other assemblage A isolates, such as AS175 and AS98, as well as assemblage B isolates, GS, GS_B, and BAH15c1, have also been sequenced and made available [ 5 – 8 ]. Recent advances in long-read sequencing technologies have also allowed cost-efficient improvements of these previous assemblies [ 8 – 10 ]. Other animal-infecting isolates from dog assemblages, C and D, and cattle assemblage E have also been sequenced and assembled, which have shed light on the molecular differences between the various Giardia assemblages [ 11 , 12 ]. Investigating these additional animal isolates is necessary due to their implications on potential for anthropozoonotic transmission between humans and domesticated animals [ 13 ].
Although morphologically identical, multi-locus sequence genotyping has sub-divided G. intestinalis into eight distinct assemblages (i.e., genotypes), A through H, with broad animal host tropism, including humans [ 2 ]. Of these, A and B have the greatest host range and are the predominant assemblages that infect humans to cause human giardiasis. Despite similarities in their ability to cause human disease, phylogenetic analyses place assemblage A closely related to the cattle-infecting assemblage E and cat assemblage F. Meanwhile, assemblage B is closer in relation to assemblage G, which infects murine hosts such as mice and rats [ 3 ].
Giardia intestinalis is a protist pathogen responsible for the globally occurring water and food-borne diarrheal disease, Giardiasis. The World Health Organization estimates roughly 280 million cases, with greater incidence of disease occurring in resource poor regions with limited access to clean drinking water and sanitation infrastructure [ 1 ]. Ingestion of Giardia cysts contaminating food and water leads to establishment of gut infection. Chronic and recurring infections in children have developmental consequences and can extend beyond the acute intestinal disease.
Using these identified ORFs, we extracted the nucleotide sequences and performed their translation using Bedtools getfasta ( S4 File ). Both nucleotide as well as translated amino acid coding regions from each trafficking protein analyzed are summarized in S6 Table .
Genome output files were used for presence/absence analyses of the vesicle formation machinery in isolates of Giardia assemblage A and B, previously curated through comprehensive comparative genomics and phylogenetics in preceding investigations [ 37 – 40 ]. Specifically, WB and GS accessions corresponding to subunits of the giardial ESCRT machinery, adaptins, retromer, COPII, COPI, clathrin, and the ARF regulatory system proteins, were searched into the newly generated GFF files. To consolidate these preliminary hits, assign orthology to any unmapped genes, and cross-identify and validate assemblage-specific orthologs/paralogs, TBLASTN (v. 2.2.29+) searches, with an e-value threshold set to 0.01, were also performed [ 41 ]. Orthologs of the identified vesicle formation machinery from G. intestinalis ADH, WB, GS, and GS_B were used as queries for homology searching into the contigs and scaffolds of all newly assembled genomes. Results of this survey are summarized as tile-plots, but specific scaffold locations from the TBLASTN hits have been made available through S5 Table .
Of the currently available assemblage A and B genomes, those belonging to G. intestinalis assemblage A, isolate WB (UU_20 assembly, GCA_011634545.1) and assemblage B, isolate GS (GL50801 assembly, GCA_011634595.1) have been richly annotated and publicly available [ 5 , 8 ]. Therefore, UU_20 and GL50801 assemblies were used as references for gene prediction in the A and B assemblage genomes, respectively. Input and reference genomes were first indexed and aligned using Minimap2 v. 2.17 using the following parameters: -a -eqx–end-bonus 5 -N 50 -p 0.5 [ 36 ]. Genes were successfully aligned if the coverage along the coding sequence (CDS) met a threshold of 50% or higher in sequence similarity. They were used to generate output GFF files detailing coordinates, accessions of the mapped ORFs, and accompanying gene features. Unmapped gene accessions were parsed out in a separate file. Program usage parameters were specified as per Liftoff instructions to generate corresponding GFF files and the per-sample Liftoff mapped and unmapped summaries have been made available through S4 Table .
The nucleotide contigs from all isolates were used to predict gene features and annotations to screen for protein orthologs. To do so, Liftoff v.1.5.1 was used for annotation mapping from closely related reference genomes [ 35 ].
In addition to genome contiguity, to assess gene set completeness and report on the number of evolutionarily conserved near-universal single-copy orthologs, Benchmarking Universal Single-Copy Orthologs (BUSCO) v. 4.1.1 was used as a tool for a translated Hidden Markov Model-based searching against OrthoDB v.10 [ 31 , 32 ]. Run parameters for BUSCO consisted of the following: lineage dataset selection was eukaryote_odb10, analysis mode set to ’genome,’ and TBLASTN e-value cut-off was kept as the default 1x10 -3 . All other BUSCO parameters, including those for the in-built AUGUSTUS v. 3.2.3 gene prediction software, were kept to default. All BUSCO output scores were plotted and visualized using the ggplot2 library available through the R Tidyverse package [ 33 ]. Data visualization specifications were provided through an R script, which is available as S3 File [ 34 ].
The resulting nucleotide assemblies were subject to post-assembly quality assessments by computing genome contiguity and completeness statistics. Contiguity was assessed using the N50 metric with the Quality Assessment Tool for Genome Assemblies (QUAST) v. 5.0.2, which also determined other contig statistics such as total length in base pairs and the final percent GC for each assembly [ 30 ]. This detailed report is made available through S3 Table .
Kraken2 analyses flagged three datasets to be highly contaminated with bacterial sequences and three with mixed assemblage taxonomic assignment. These six samples were not analyzed further. The remaining 83 paired-end filtered datasets were kept for de novo genome assembly using the ploidy-aware assembler MaSuRCA v.3.3.0, using 200,000,000 hashes from Jellyfish v.3.2.4 to generate contigs and scaffolds [10-13Mb] in FASTA format. MaSuRCA configuration script has been made available as S2 File [ 29 ].
Prior to assembly, the overall quality of the paired-end inserts was assessed using FastQC [ 26 ]. All datasets were inspected for non-Giardia contaminating sequences. To do so, paired-end reads belonging to each biosample were subject to taxonomic classification using the metagenomic classification software, Kraken2 (v. 2.0.8-beta), and against a custom-built database for k-mer matching against query sequences [ 27 ]. The database was built by retrieving taxonomic information and full-length genomes corresponding to archaeal, bacterial, viral, human, fungal, protozoa, and vector contaminant (UniVec_Core) datasets from the NCBI Reference Sequence Database [RefSeq]. In addition, genomes belonging to Giardia muris (GCA_006247105.1 UU_GM_1.1), G. intestinalis EP15 (GCA_000182665.1), G. intestinalis BAH15c1 (GCA_001543975.1), G. intestinalis GS (GCA_011634595.1), G. intestinalis GS_B (GCA_000498735.1), G. intestinalis WB (GCA_011634545.1), G. intestinalis AS175 (GCA_001493575.1), G. intestinalis ADH (GCA_000498715.1), G. intestinalis assemblage C pooled cysts (GCA_902221545.1), and G. intestinalis assemblage D pooled cysts (GCA_902221535.1) were also added to the library using the kraken2-build option. Sequences that remained unclassified or were assigned to taxa other than the Giardia genus (NCBI taxid: 5741) were considered contaminants. Datasets were flagged if 20% or higher number of the total reads were not characterized as Giardia. These bio-samples were removed from analyses and not subject to downstream assembly process or comparative genomics. For the remainder datasets with low contamination levels, decontamination was performed using the Python script, extract_kraken_reads.py, available through KrakenTools (
https://github.com/jenniferlu717/KrakenTools ) [ 27 ]. Post-decontamination Kraken2 output was visualized using KronaTools [ 28 ]. Kraken-build shell script with all parameters has been made available as S1 File . Per-sample read taxonomic assignment is summarized in S2 Table .
Raw sequencing reads are available at National Centre for Biotechnology Information (NCBI) Short Read Archive under the BioProject ID PRJNA280606. We downloaded the sequencing data from the European Nucleotide Archive (The European Molecular Biology Laboratory, Heidelberg, Germany). Metadata detailing sequencing run statistics and biosample accessions for the 89 isolates is provided through the S1 Table as well as in the previous study [ 24 ].
De novo genome assembly analyses with the BCCDC PHL isolates of G. intestinalis assemblage A and B were performed using the previously sequenced and archived paired-end inserts that were generated using the Illumina MiSeq. Cysts belonging to each isolate had been previously obtained from surface water, beaver, and human fecal samples and archived by the BCCDC PHL [ 24 , 25 ]. Detailed geographical sources for isolate retrieval and assemblage classification were provided through Tables 1 and S1 in the previous study by Tsui et al [ 24 ].
Results
Genome contiguity and completeness analyses yield uniform results across all isolates Previously, Tsui and colleagues performed genome sequencing and assembly with 89 G. intestinalis assemblage A and B isolates archived at the BCCDC PHL to evaluate specific disease transmission dynamics underpinning the 1980s Giardiasis outbreak in British Columbia, with some of the isolates originating from New Zealand and Ontario to be used as reference [24]. Although the sequenced short paired-end reads from this analysis are publicly available, the assembled genomes are not. Therefore, it was necessary to perform de novo genome assembly prior to comparative genomic analyses with the vesicle formation machinery, as per the summarized workflow (Fig 1). To do so, read taxonomic classification was performed first to assess contamination and remove sequences that did not correspond to Giardia (Fig 1). Of the 89 initially retrieved datasets, six were removed from final genome analyses, either due to considerable levels of bacterial contamination or if the samples were characterized as ‘mixed’ assemblages. In the latter case, according to Tsui et al. [24], a mixed assemblage assignment was denoted if the sequencing reads mapped equally to both Giardia assemblage A and B. This does not imply existence of a genetically hybrid strain but is instead a by-product of gDNA contamination or its isolation from mixed cultures containing trophozoites belonging to both strains. Therefore, these data were treated as metagenomic. Because this investigation aimed to elucidate nuanced molecular-level differences between the two assemblages, samples with ‘mixed’ assemblage classification were also removed from analyses. This resulted in a final total of 83 datasets being used for downstream analyses, 42 of which were classified as assemblage A and 41 as assemblage B (S5 and S6 Tables). PPT PowerPoint slide
PNG larger image
TIFF original image Download: Fig 1. Workflow for de novo assembly and reference-based gene predictions. Steps included read-quality assessment and filtering using Kraken2 and FastQC, followed by de novo genome assembly using MaSuRCA. Assembled contigs and scaffolds were used to determine contiguity and completeness using metrics such as N50, the total number of contigs, and BUSCO scores. The resulting contigs were used for reference-based genome annotation using Liftoff for orthology mapping of genes and presence/absence analyses of the vesicle formation proteins. False negatives and existing orthology assignments were cross-validated using TBLASTN.
https://doi.org/10.1371/journal.pntd.0011837.g001 N50 scores and contig sizes for all assemblies were determined using QUAST. The results of this analysis calculated an average N50 of 70,543 bp and a median of 74,269 bp across all assemblies (S1 and S3 Tables). Genomes were also assembled in an average of 382 contigs that were ≥ 1kb in size (S1 and S3 Tables). In addition to genome contiguity, completeness was also assessed by determining the number of evolutionarily conserved single-copy orthologs of eukaryotic proteins by generating BUSCO scores where overall the percent BUSCO of complete plus fragmented genes was approximately 25 to 30% across all assemblies (Figs 1 and S1). While these are much lower than what is typically expected of a eukaryotic genome from animal, plant, or fungal lineages (i.e., 85–95%), low BUSCO scores in Giardia are a drawback of limited inclusion of diverse protist lineages within the eukaryote_odb10 database. In general, metamonad lineages are highly divergent in their sequences and ancestrally lack many of the proteins present in model eukaryotes, and therefore all suffer from poor BUSCO scores [42–44]. Instead of using these as absolute measures, BUSCO scores were evaluated against those published for other Giardia assemblies. The short-read reference genome of assemblage A, isolate WB (GCA_011634545.1) was previously determined to have a BUSCO score of C:23.9%(S:0%, D:23.9%), F:5.9%, M:70.2%, n = 255, while the short-read reference genome from assemblage B, isolate GS has a BUSCO completeness score of 23.1% (
https://www.uniprot.org/proteomes/UP000001548;
https://www.uniprot.org/proteomes/UP000002488) [4,6]. Therefore, our re-assembled Canadian Giardia genomes are comparable, if not slightly more complete, in their BUSCO scores to those of the previously published short-read reference genomes belonging to assemblage A isolate WB and assemblage B isolate GS.
Gene predictions and functional annotations suggests close similarity to reference genomes Past analyses have identified some differences in membrane-trafficking machinery complement between assemblages A and B [39–41]. However, these trends were based on a limited number of samples (three to five) per assemblage, and so a much larger sampling is needed to confirm the results. Moreover, the question of whether there are intra-assemblage differences within A or B vesicle coat machinery (Adaptins, COPI, and Retromer), COPII subcomplex, ESCRT protein complexes, and the ARF regulatory protein machinery has not been assessed at a population-level and as a result, survey of these proteins was performed with these 83 Canadian genomes to comprehend the trends previously observed in the repertoire of the endo-lysosomal vesicle formation machinery. To do so, we first used Liftoff to map gene features and annotations from closely related species of organisms by providing the genomes from assemblage A, isolate WB (UU_20 assembly; GCA_011634545.1) and assemblage B, isolate GS (GL50801 assembly; GCA_011634595.1) as references for mapping. Many of the annotations have been curated based on experimental evidence (i.e., molecular protein characterizations, transcriptomics, and proteomics) and are regularly updated by the Giardia research community. To supplement and cross-validate Liftoff annotations, TBLASTN analyses were also performed for genes of interest to eliminate false-negative absence artifacts due to improper or missed gene mapping. As per recent estimations, the newly updated genome of G. intestinalis WB (UU_20 assembly) is predicted to encode 4,963 protein-coding genes and 85 non-coding RNA (ncRNAs) genes, yielding a total of 5,048 genes [8]. From this total, Liftoff mapped an average of 4,688 genes from the BC assemblage A genomes, with approximately 72 remaining unmapped (S4 Table). Similarly, albeit slightly lower, the total number of protein-coding and ncRNAs in the assemblage B reference GS genome (GL50801 assembly) is estimated at 4,470 and 92, respectively, resulting in a total of 4,562 predicted genes. An average of 4,481 genes were mapped across the BC assemblage B genomes, with 62 remaining unmapped (S4 Table). Unmapped genes may have resulted from sequence divergence or fragmented orthologs that did not share enough similarity with the target for a Minimap2 cut-off of 50%. The other possibility is that a subset of these unmapped genes are protein repertoires exclusive to the reference WB and GS genomes. Nonetheless, these findings suggest that for both assemblages A and B, approximately 98.2% of the genes and annotations from WB and GS genomes were successfully mapped, and therefore, encoded in the genomes of the newly assembled isolates.
ESCRT repertoire varies between assemblage A and B isolates but is consistent within assemblages Previously, comparative genomic and phylogenetic analyses with the late-endosomal ESCRT subcomplexes were performed by us to trace their evolution across fornicates, including several human-infecting isolates of Giardia [39]. Here patterns of universal streamlining within ESCRTs were observed across the entire Giardia genus, such as the complete loss of the ESCRTI subcomplex [39]. However, aside from absences, Giardia-specific duplication events also occurred to yield several paralogs in the following subunits: ESCRTII-Vps36 (i.e., Vps36A, B, and C), ESCRTIII-Vps24 (i.e., Vps24A and Vps24B), ESCRTIIIA-Vps4 (i.e., Vps4A, B, and C), ESCRTIIIA-Vps46 (i.e., Vps46A and B), and ESCRTIIIA-Vps31 (i.e., Vps31A, B, and C) [39]. Comparisons between various pan-global isolates revealed distinct differences within the repertoire of the individual subunits and the number of paralogs that comprise the giardial ESCRTII, ESCRTIII, and ESCRTIIIA. To better elucidate these inter-assemblage differences within ESCRTs and to assess if these trends exist at a population level, the repertoire of the giardial ESCRT machinery in the newly assembled BC Giardia genomes was reconstructed. This was done by surveying the Liftoff annotations combined with TBLASTN searches into the nucleotide contigs and scaffolds. Overall, our findings confirm and extend past results of ESCRT retention and loss within and between assemblages. All 42 assemblage A isolates are conserved in their Giardia repertoire of the ESCRT complexes without any variabilities at the individual isolate level (Fig 4A). Contrary to this, previously noted streamlining exists in several ESCRT machinery components in assemblage B isolates (Fig 4B). The most prominent of these was the loss of Vps20L. The survey of 41 new assemblage B isolates confirms that this crucial ESCRTIII component is universally absent across all sampled Giardia assemblage B genomes (Fig 4B). TBLASTN searches using Vps20L orthologs identified in pan-global assemblage A isolates (i.e., DH, WB, and AS175) were used as queries to rule out false-negative artifacts. Despite this approach, no protein hits resembling Vps20L orthology were identified. Similar trends were observed within one of the three Vps4 paralogs (i.e., Vps4C), which also remained unidentified in the pan-global isolates of assemblage B. Using both Liftoff and TBLASTN, only Vps4A and Vps4B were retrieved across all 41 isolates (Fig 4B). Therefore, the absence of Vps20L and Vps4 points towards a global loss of these components in this assemblage. PPT PowerPoint slide
PNG larger image
TIFF original image Download: Fig 4. Tile-plot depictions of the ESCRT repertoire in the BCCDC PHL isolates. [A] depicts the giardial ESCRT repertoire distribution in the newly assembled BCCDC PHL isolates compared to the two pan-global reference assemblage A isolates, WB [AI] and DH [AII]. No absences were identified within any assemblage A isolates. [B] shows the distribution of the giardial ESCRT repertoire in the newly assembled BCCDC PHL assemblage B isolates compared to the two pan-global reference assemblage B isolates, GS and GS_B. Assemblage-specific losses were identified within the ESCRTIII-Vps20L and ESCRTIII-Vps4C, and are consistent with the findings in the pan-global isolates, indicated in grey.
https://doi.org/10.1371/journal.pntd.0011837.g004 Altogether, these findings validate previous observations of molecular differences within the endo-lysosomal ESCRT subunits between the two assemblages and allow for an extension of those conclusions at a greater population level.
Vesicle coat complexes follow a pattern of inter-assemblage molecular differences As with the ESCRTs, the evolution and the molecular complement of the heterotetrameric complexes (adaptins and COPI) has been addressed [38]. To consolidate those findings as well as to identify any other possible isolate-level differences, the Giardia complement of heterotetrameric adaptor complexes (HTACs) from 83 BCCDC assemblage A and B genomes were determined. The giardial repertoire of adaptins, COPI, retromer, and clathrin heavy chain were conserved from an inter-assemblage standpoint (Fig 5). However, isolate-level variabilities were present within the adaptins. PPT PowerPoint slide
PNG larger image
TIFF original image Download: Fig 5. Tile-plot depictions of COPII and retromer repertoire in the BCCDC PHL isolates. [A] depicts the distribution of the giardial COPII and retromer components in the newly assembled genomes of the BCCDC PHL isolates, compared to the two pan-global reference assemblage A isolates, WB [AI] and DH [AII]. No absences were identified within any of the assemblage A genomes. [B] depicts the distribution of the giardial COPII and retromer repertoire in the newly assembled BCCDC PHL assemblage B genomes and compared with the two pan-global reference assemblage B isolates, GS and GS_B. Assemblage B-specific losses are evident within one of the paralogs of COPII-Sec24FII [Sec24C].
https://doi.org/10.1371/journal.pntd.0011837.g005 Unexpectedly, two instances of gene absences within components of the AP-1 subcomplex in genomes of assemblage B were noted. First was the lack of AP-1μ in the SRR1957167 assembly and the second absence was within the AP-1σ subunit in the VANC/96/UBC/129 assembly (Fig 6B). TBLASTN searching using AP-1μ and AP-1σ orthologs from other assemblage B genomes could not identify μ1 or σ1 subunits in these genomes other than those belonging to the AP-2 sub-complex. The absence of AP-1μ and AP-1σ, although surprising, is highly unlikely and should be ascertained as a technical assembly or sequencing-related issue. AP-1μ is absent from the SRR1957167 assembly belonging to assemblage B, which is an outlier as it has a comparatively smaller genome size (10.9 Mbp) and lower N50 (44, 795 bp) with its counterparts. Both attributes are likely a result of low starting sequencing depth compared to the other isolates and hence a cause for gene absence artifacts. In the case of AP-1σ, which is missing from the VANC/96/UBC/129 assembly, may have been due to low coverage k-mers in this region and therefore discarded during the assembly process. These are more likely scenarios than instances of actual loss, as the remaining 39 isolates from assemblage B encode both adaptin subunits. PPT PowerPoint slide
PNG larger image
TIFF original image Download: Fig 6. Tile-plot depictions of heterotetrameric adaptor complexes and clathrin repertoire in the BCCDC PHL isolates. [A] depicts the distribution of the previously identified HTACs and clathrin components in the newly assembled BCCDC PHL isolates and compared with the two pan-global reference assemblage A isolates, WB [AI] and DH [AII]. No absences were identified in any assemblage A genomes. [B] depicts the distribution of adaptin, COPI, and clathrin components in the newly assembled BCCDC PHL assemblage B genomes in comparison to the two pan-global reference assemblage B isolates, GS and GS_B. Although no large assemblage-wide losses are evident, instances of individual isolate-specific absences within AP-1 subunits [i.e., AP-1μ and AP-1σ] were present.
https://doi.org/10.1371/journal.pntd.0011837.g006 The molecular complement of COPII, retromer, and clathrin have not been examined at the same depth and as recently as the above systems and therefore were examined here. For nearly all components we found the identical repertoire across Retromer, Clathrin, and most of COPII. Unexpectedly, we find multiple paralogs of the COPII component Sec24, which we have termed Sec24A, B, and C across all 41 assemblage A genomes (Fig 5). However, COPII-Sec24C was not found in any of the assemblage B genomes, despite searching by TBLASTN analyses using assemblage A Sec24C sequences (Fig 5B).
[END]
---
[1] Url:
https://journals.plos.org/plosntds/article?id=10.1371/journal.pntd.0011837
Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.
via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/