(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .
Trackplot: A flexible toolkit for combinatorial analysis of genomic data [1]
['Yiming Zhang', 'Department Of Neurosurgery', 'State Key Laboratory Of Biotherapy', 'Cancer Center', 'West China Hospital', 'Sichuan University', 'Chengdu', 'Sichuan', 'Institute Of Thoracic Oncology', 'Department Of Thoracic Surgery']
Date: 2023-11
Here, we introduce Trackplot, a Python package for generating publication-quality visualization by a programmable and interactive web-based approach. Compared to the existing versions of programs generating sashimi plots, Trackplot offers a versatile platform for visually interpreting genomic data from a wide variety of sources, including gene annotation with functional domain mapping, isoform expression, isoform structures identified by scRNA-seq and long-read sequencing, as well as chromatin accessibility and architecture without any preprocessing, and also offers a broad degree of flexibility for formats of output files that satisfy the requirements of major journals. The Trackplot package is an open-source software which is freely available on Bioconda (
https://anaconda.org/bioconda/trackplot ), Docker (
https://hub.docker.com/r/ygidtu/trackplot ), PyPI (
https://pypi.org/project/trackplot/ ) and GitHub (
https://github.com/ygidtu/trackplot ), and a built-in web server for local deployment is also provided.
Simultaneously visualizing how isoform expression, protein-DNA/RNA interactions, accessibility, and architecture of chromatin differs across conditions and cell types could inform our understanding on regulatory mechanisms and functional consequences of alternative splicing. However, the existing versions of tools generating sashimi plots remain inflexible, complicated, and user-unfriendly for integrating data sources from multiple bioinformatic formats or various genomics assays. Thus, a more scalable visualization tool is necessary to broaden the scope of sashimi plots. To overcome these limitations, we present Trackplot, a comprehensive tool that delivers high-quality plots via a programmable and interactive web-based platform. Trackplot seamlessly integrates diverse data sources and utilizes a multi-threaded process, enabling users to explore genomic signal in large-scale sequencing datasets.
Funding: This work is supported by the National Natural Science Foundation of China (82303975 to R.Z. and 82273117 to Y.W), the National Key Research and Development Program of China, Stem Cell and Translational Research (2022YFA1105200 to Y.W.), the China Postdoctoral Science Foundation (2022TQ0226 to R.Z.), and Post-Doctor Research Project, West China Hospital, Sichuan University (2023HXBH100 to R.Z.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
To generate plots, Trackplot initially requires the precise genomic coordinates of interest and a meta file containing information such as file path, data category, display label, color, and strandness for each data track. If there is a need for additional annotations, such as cell meta for demultiplexing or highlight regions for polyadenylation sites, these can be provided as parameters. Additionally, when the domain model is activated, Trackplot incorporates an automated process to request the API (
https://rest.uniprot.org/uniprotkb/search?&query="ID ") of UniProt [ 9 ] using the transcript ID in order to retrieve its corresponding protein ID, which may have multiple values. Subsequently, each protein ID associated with the given transcript ID is utilized to access the ENSEMBL [ 10 ] API (
https://www.ebi.ac.uk/proteins/api/features/"uniprot_ID "). The tool verifies whether the length of the coding sequence (CDS) is three times that of the protein length, and if so, it collects and visualizes the domain information with user-defined filters. Subsequently, all the configuration information will be stored in a Plot object for further processing. Once the configuration process is complete, the tool utilizes packages such as pysam, pyBigWig, or hicmatrix to parse the input track files. It extracts and stores comprehensive information, including abundance, splicing junctions, gene annotation, and protein domains of the target region, in a Pandas dataframe. This dataframe can then be utilized for further analysis and processing. Finally, Trackplot utilizes the matplotlib package to generate plots, which provides flexibility in adjusting the size and resolution (dots per inch, DPI) of the figure. It also supports various output formats, including png, pdf, and tiff, ensuring compatibility with the requirements of major scientific journals. In addition to supporting some features already present in existing software, such as sample aggregation, Reads per kilobase of transcript per Million reads mapped (RPKM) / reads per million (RPM) calculation, and intron shrinkage, our tool outperforms the existing sashimi tool in terms of speed and efficiency ( S1 Fig ). In summary, Trackplot provides a highly accessible, reproducible, and flexible tool for generating genomic data plots.
Trackplot is a platform that leverages Python and JavaScript to visualize genomic data from diverse sources and generate plots suitable for publication. It offers easy accessibility and ensures high reproducibility. Users can freely download Trackplot from GitHub and install it from source code, PyPI, Pipenv, Bioconda, AppImage, or a Docker image. It provides multiple approaches for generating plots, including an application programming interface (API) for scripts and Jupyter Notebooks, a command-line interface (CLI), and a user-friendly web interface. Trackplot supports most standard data formats in bioinformatics, such as BAM, BED, bigWig, bigBed, GTF, BedGraph, HiCExplorer’s native h5 format, and the depth file generated by samtools [ 8 ] ( Fig 1 ).
Uncovering differential isoform expression is crucial for enhancing proteome diversity and transcript functionality [ 1 ]. Various library protocols and sequencing methods, such as single-cell RNA sequencing (scRNA-seq) [ 2 ] and long-read sequencing [ 3 ], have been developed and widely used to explore the heterogeneity of isoform expression in single cells. Despite the availability of advanced tools for analyzing and visualizing genomics data, several challenges persist. Existing tools like sashimi [ 4 ], ggsashimi [ 5 ], and SplicePlot [ 6 ] are limited in efficiency and flexibility when handling the ever-growing volume and size of data. Moreover, these tools often only provide a command-line interface, which can be daunting for inexperienced programmers. Additionally, conventional interactive genome browsers like Integrative Genomics Viewer (IGV) [ 7 ] lack flexibility in output format. To address these limitations, we introduce Trackplot, a comprehensive tool that generates high-quality plots in a programmable and interactive web-based format. Trackplot offers integrated visualization of diverse data sources, including gene annotation with functional domain mapping, isoform expression, isoform structures identified by scRNA-seq and long-read sequencing, as well as chromatin accessibility and architecture.
Results
Trackplot functions similarly to previous Sashimi plot packages, taking all splicing reads including novel junctions from BAM files and gene model annotations from GTF or BED files as input to visualize the differential usage of exons or transcripts. An example of a plot generated by Trackplot for eight bulk RNA-seq samples from the TNP GBM model [11] is shown in S1A Fig, which suggests gradual exclusion of the middle exon during tumorigenesis. The tool identified that the long isoform, which encodes a protein with key functional domains, is gradually spliced out, and the short isoform without functional domains becomes the major isoform (S1A Fig). Moreover, trackplot could take input in various bioinformatics formats, making it flexible in integrating data from multiple sources. Through the integration of RNA binding signal data (bigWig) and coverage data (BAM), Trackplot effectively illustrates the enrichment of PTBP1 at exon 2 of PTBP3. This observation suggests that PTBP1 is likely to directly regulate the alternative splicing of PTBP3’s exon 2, consistent with previous findings [12] (S2B Fig).
The advent of long-read sequencing platforms, such as Pacific Biosciences and Oxford Nanopore Technologies, has revolutionized transcriptome analysis by providing full transcript structures without the need for assembly. However, existing sashimi plot tools are primarily designed for short-read sequencing data and visualize sequencing reads by aggregating the depth of each coordinate, thereby losing the exon connections from individual reads. This limitation is effectively addressed by Trackplot, which offers a read-by-read style visualization with exon-sort options. This unique feature enables Trackplot to distinctly present the exon-intron structures of each isoform, providing a more comprehensive view of the transcriptome (S3A Fig). Moreover, Trackplot has the capability to extract and visualize additional information from the BAM file tags, such as the length of poly(A) tails or the modification status of each nucleobase (S3B Fig). By incorporating these features, Trackplot offers enhanced insights into the complexity and diversity of transcriptomes.
Several methods have recently been proposed to identify and estimate alternative polyadenylation (APA) events at the single-cell level, including SCAPE [13]. Existing tools in the field lack the capability to accurately demultiplex gene expression into distinct cell populations, often requiring users to manually split and deduplicate BAM files prior to analysis. However, Trackplot offers an automated solution to this challenge by implementing a demultiplexing and deduplication process based on a user-provided meta file containing cell barcodes and their corresponding cell types. This feature enables Trackplot to generate a clearer and more accurate representation of differential expression APA (alternative polyadenylation) events among 3’ enriched single-cell RNA sequencing (scRNA-seq) data, as illustrated in S4A Fig. Furthermore, Trackplot extends its functionality to support the analysis of single-cell data that simultaneously profiles the transcriptome and chromatin accessibility. In an example analysis, Trackplot presents a differential chromatin accessibility pattern of U2AF1L4 between CD4 naïve T cells and CD16 monocytes. This observation correlates with distinct usage patterns of alternative polyadenylation sites (pA1 and pA2) in these two cell populations, as depicted in S4B Fig. These findings highlight the utility of Trackplot in exploring the relationship between transcriptional enhancers and 3’ end processing. In summary, Trackplot provides a comprehensive platform for researchers to investigate isoform diversity within cell populations and explore potential enhancer elements involved in the regulation of gene and isoform expression. Its automated demultiplexing capability and integration of transcriptomic and chromatin accessibility data make it a valuable tool for unraveling the complex regulatory mechanisms underlying gene expression.
[END]
---
[1] Url:
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011477
Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.
via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/