(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .
The NCBI Comparative Genome Viewer (CGV) is an interactive visualization tool for the analysis of whole-genome eukaryotic alignments [1]
['Sanjida H. Rangwala', 'National Center For Biotechnology Information', 'National Library Of Medicine', 'National Institutes Of Health', 'Nih', 'Bethesda', 'Maryland', 'United States Of America', 'Dmitry V. Rudnev', 'Victor V. Ananiev']
Date: 2024-05
Overview of CGV
We developed a web application, the CGV (
https://ncbi.nlm.nih.gov/genome/cgv/), to aid in comparing genome structures between 2 eukaryotic assemblies. CGV facilitates analyses of genome variation and evolution between different strains or species, as well as evaluation of assembly quality between older and newer assemblies from the same species.
Alignments are generated at NCBI using BLAST [22] or LASTZ-based algorithms [21] or imported from the UCSC Genomics Institute (
https://hgdownload.soe.ucsc.edu/downloads.html) and other research groups (e.g., T2T/HPRC,
https://humanpangenome.org/). Shorter alignments are merged where possible; however, because of repeats and gaps, even very similar genomic regions may be broken down into multiple alignment segments. More closely related genomes will provide more contiguous alignments, while more distant species may align only to short highly conserved regions. In addition, while we are often able to provide alignments for polyploid genomes, it is more difficult to distinguish orthologs (identity by descent) from homeologues (identity by duplication) for species pairs with more recent whole-genome duplications. CGV is therefore more suited to analyzing alignments between more distinct genomes, e.g., allopolyploids or older autopolyploids (S1 Appendix). Refer to Materials and methods for more details on how we generate whole-genome assembly alignments and load them into the viewer.
The CGV home page provides a menu where users can select from available species and assembly combinations (Fig 1A). We add new whole-genome alignments as high-profile assemblies become available and in response to requests from the scientific community. As of February 2024, we provided a selection of about 800 alignments from over 350 eukaryotic species (Fig 1B). Whole-genome sequence alignments between more distantly related species may be sparse or low-quality with limited analytical utility; therefore, most of the alignments we offer are between assemblies of the same species or more closely related species within the same class or order.
PPT PowerPoint slide
PNG larger image
TIFF original image Download: Fig 1. Overview of CGV. (A) CGV selection menu. (B) Taxonomic distribution of species represented by alignments in CGV. Numbers in the pie charts are current as of February 5, 2024 (C) CGV ideogram view of whole-genome assembly alignment. Buttons in the lower right provide download access for complete whole-genome alignment data or an SVG image of the current alignment view. The view can be recreated at
https://www.ncbi.nlm.nih.gov/genome/cgv/browse/GCF_015227675.2/GCF_000001635.27/27835/10116 (D) CGV zoomed to a chromosome-by-chromosome view, with an information panel shown. This panel can be viewed by clicking to select an alignment segment. (i) Flip orientation; (ii) zoom in/out; (iii) pan left/right; (iv) view assembly in GDV; (v) information panel. The view can be recreated at
https://www.ncbi.nlm.nih.gov/genome/cgv/browse/GCF_015227675.2/GCF_000001635.27/27835/10116#NC_051343.1:44657546-50471623/NC_000075.7:41365809-52285446/size=10000. (E) CGV search interface and sample search results. (F) Adjust Your View configure options for the ideogram view. Webpage source: National Library of Medicine. CGV, Comparative Genome Viewer; GDV, Genome Data Viewer.
https://doi.org/10.1371/journal.pbio.3002405.g001
CGV’s main view (the “ideogram view”) displays pairwise alignments as colored connectors linking the chromosomes in the 2 assemblies (Fig 1C). The view is filtered by default to show only reciprocal best hits between assemblies in order to facilitate the analysis of orthologous genomic regions. Researchers can choose to show the non-best placed alignments to reveal additional closely related sequence duplications or ancestral homologues (Fig 1F and S1 Appendix). Users of CGV can also filter alignments in view by size (e.g., to only show large alignment blocks) or by orientation (e.g., to only show regions that have undergone a potential inversion). The complete whole-genome alignment data in GFF3 and human-readable formats like XLSX can be downloaded from the viewer for a researcher’s own use.
Users can click to select a chromosome from each assembly to zoom to the alignments for the selected chromosome. They can navigate further within this chromosome comparison using the zoom in/out and pan buttons or by pinch-zoom or drag to pan. Users can zoom directly to a particular region of a chromosome by dragging their cursor over the coordinate ruler or the ideogram for either assembly. Double-clicking on a selected alignment segment will synchronously zoom both the top and bottom assembly on the aligned coordinates so that they are stacked on top of one another (Fig 1D).
Where available, RefSeq or assembly-submitter provided gene annotation is displayed on the chromosomes (Fig 1D). Similarities in gene order denote regions of synteny, while discrepancies can point to evolutionarily or biologically significant differences. Differences may also result from assembly errors, particularly if evaluating different assemblies from the same species or strain. Researchers can use the search feature in CGV to find their gene of interest by name or keyword, and subsequently navigate to the location of the gene in the viewer (Fig 1E). If the gene region is aligned, the viewer will simultaneously navigate to the aligned location, which may contain the gene’s known or putative ortholog on the second assembly. The “flip” button allows the user to reverse one chromosome to see inverted alignments displayed in the same relative orientation, which may aid in the detection of discrepancies in gene annotation in regions that are locally syntenic between the 2 assemblies. Once a user has completed their analysis of a region of interest, they can export the image as an SVG to adapt for use in publications and presentations.
Users can click on an alignment segment to show an information panel (Fig 1D). This panel reports the chromosome scaffold accession and sequence coordinates of the alignment on each assembly, as well as the percent identity, number of gaps and mismatches, and alignment length. While the ideogram view in CGV does not display specific nucleotide bases, users can open another panel from the right-click menu that shows the alignment sequence. They can also download the alignment FASTA file of a particular alignment segment for downstream analysis, such as BLAST search or primer design. Researchers can also navigate from CGV to NCBI’s genome browser, the GDV [2]. GDV can display the assembly-alignment data viewed in CGV as a linear track alongside additional data mapped onto a genome assembly, such as detailed transcript and CDS annotation, repeats, GC content, variation data, or user-provided annotations. Zooming to a location within GDV can reveal differences in nucleotide sequence or gene exon or CDS annotation between the 2 assemblies.
In addition to the main ideogram-based view, the Comparative Genome Viewer also provides a 2D dotplot view of the pairwise genome alignment (Fig 2A). The dotplot shows aligned sequence locations in one assembly on the X-axis plotted against aligned locations on the second assembly on the Y-axis. Alignments in the reverse orientation are plotted with an opposite slope and in a different color (purple) than alignments in the same orientation (green), making it easier to identify inversions and inverted translocations. The CGV dotplot shows both reciprocal best-placed and non-best placed alignments. As a result, compared to the ideogram view, this plot may more easily expose differences in copy number between 2 assemblies, such as segmental duplications or differences in genome or chromosome ploidy. Users can select and zoom to a view showing the comparison between a pair of chromosomes in the whole-genome plot (i.e., a “cell” in the plot) (Fig 2B). Once a researcher has discovered a chromosome pair of interest in the dotplot, they can navigate back to the ideogram view to conduct even more detailed analysis, including examining gene annotation and investigating short alignment segments that were beyond the resolution of the dotplot.
[END]
---
[1] Url:
https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3002405
Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.
via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/