(C) PlosOne
This story was originally published on plosone.org. The content has not been altered[1]
Licensed under Creative Commons Attribution (CC BY) license .
url:
https://journals.plos.org/plosone/s/licenses-and-copyright
--------------------
Inferring latent temporal progression and regulatory networks from cross-sectional transcriptomic data of cancer samples
['Xiaoqiang Sun', 'Key Laboratory Of Tropical Disease Control', 'Chinese Ministry Of Education', 'Zhongshan School Of Medicine', 'Sun Yat-Sen University', 'Guangzhou', 'School Of Mathematics', 'Ji Zhang', 'State Key Laboratory Of Oncology In South China', 'Collaborative Innovation Center For Cancer Medicine']
Date: None
Abstract Unraveling molecular regulatory networks underlying disease progression is critically important for understanding disease mechanisms and identifying drug targets. The existing methods for inferring gene regulatory networks (GRNs) rely mainly on time-course gene expression data. However, most available omics data from cross-sectional studies of cancer patients often lack sufficient temporal information, leading to a key challenge for GRN inference. Through quantifying the latent progression using random walks-based manifold distance, we propose a latent-temporal progression-based Bayesian method, PROB, for inferring GRNs from the cross-sectional transcriptomic data of tumor samples. The robustness of PROB to the measurement variabilities in the data is mathematically proved and numerically verified. Performance evaluation on real data indicates that PROB outperforms other methods in both pseudotime inference and GRN inference. Applications to bladder cancer and breast cancer demonstrate that our method is effective to identify key regulators of cancer progression or drug targets. The identified ACSS1 is experimentally validated to promote epithelial-to-mesenchymal transition of bladder cancer cells, and the predicted FOXM1-targets interactions are verified and are predictive of relapse in breast cancer. Our study suggests new effective ways to clinical transcriptomic data modeling for characterizing cancer progression and facilitates the translation of regulatory network-based approaches into precision medicine.
Author summary Reconstructing gene regulatory network (GRN) is an essential question in systems biology. The lack of temporal information in sample-based transcriptomic data leads to a major challenge for inferring GRN and its translation to precision medicine. To address the above challenge, we propose to decode the latent temporal information underlying cancer progression via ordering patient samples based on transcriptomic similarity, and design a latent-temporal progression-based Bayesian method to infer GRNs from sample-based transcriptomic data of cancer patients. The advantages of our method include its capability to infer causal GRNs (with directed and signed edges) and its robustness to the measurement variability in the data. Performance evaluation using both simulated data and real data demonstrate that our method outperforms other existing methods in both pseudotime inference and GRN inference. Our method is then applied to reconstruct EMT regulatory networks in bladder cancer and to identify key regulators underlying progression of breast cancer. Importantly, the predicted key regulators/interactions are experimentally validated. Our study suggests that inferring dynamic progression trajectory from static expression data of tumor samples helps to uncover regulatory mechanisms underlying cancer progression and to discovery key regulators which may be used as candidate drug targets.
Citation: Sun X, Zhang J, Nie Q (2021) Inferring latent temporal progression and regulatory networks from cross-sectional transcriptomic data of cancer samples. PLoS Comput Biol 17(3): e1008379.
https://doi.org/10.1371/journal.pcbi.1008379 Editor: Sushmita Roy, University of Wisconsin, Madison, UNITED STATES Received: October 3, 2020; Accepted: February 15, 2021; Published: March 5, 2021 Copyright: © 2021 Sun et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: The gene expression dataset of the bladder cancer was downloaded from the NCBI GEO database (GSE128192). The gene expression datasets as well as clinical information of the breast cancer patients used for network prediction were downloaded from the NCBI GEO database (GSE7390). The microarray and ChIP-seq data used for network validation were downloaded from the NCBI GEO database (GSE40766, GSE40762, GSE62425, GSE2222, GSE58626 and GSE27830). The clinical gene expression data used for survival analysis were downloaded from the NCBI GEO database (GSE2990, GSE12093, GSE5327, GSE1456, GSE2034, GSE3494, GSE6532 and GSE9195). The gene expression RNAseq and phenotype information associated with the TCGA COAD dataset were downloaded from the UCSC Xena website (
https://xena.ucsc.edu/) via the following links:
https://tcga-xena-hub.s3.us-east-1.amazonaws.com/latest/TCGA.COAD.sampleMap/HiSeqV2.gz and
https://tcga-xena-hub.s3.us-east-1.amazonaws.com/latest/TCGA.COAD.sampleMap/COAD_clinicalMatrix; The gene expression RNAseq and phenotype information associated with the TCGA SKCM dataset were also downloaded from the UCSC Xena website (
https://xena.ucsc.edu/) via the following links:
https://tcga-xena-hub.s3.us-east-1.amazonaws.com/latest/TCGA.SKCM.sampleMap/HiSeqV2.gz and
https://tcga-xena-hub.s3.us-east-1.amazonaws.com/latest/TCGA.SKCM.sampleMap/SKCM_clinicalMatrix. A recent re-quantification of the LPS scRNA-seq dataset (GSE48968) was downloaded from the conquer database (
http://imlspenticton.uzh.ch:3838/conquer/). The code for PROB is available at
https://github.com/SunXQlab/PROB. The numerical data underlying graphs in the manuscript is available at S1_Data.xlsx in the Supporting Information. Funding: XS was supported by grants from the National Natural Science Foundation of China (11871070, 11931019), the Guangdong Basic and Applied Basic Research Foundation (2020B151502120), the Fundamental Research Funds for the Central Universities (20ykzd20). QN was partially supported by a National Science Foundation grant DMS1736272, a Simons Foundation grant (594598), and a National Institute of Health grant U54CA217378. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist.
Introduction Inferring gene regulatory networks (GRNs) from molecular profiling of large-scale patient samples is of significance to identifying master regulators in disease at systems level [1]. Detecting the causal relationships between genes from biomedical big data, such as clinical omics data, has recently emerged as an appealing yet unresolved task, particularly for clinical purposes (e.g., diagnosis, prognosis and treatment) in the era of precision medicine [2]. Many methods have been developed for inferring GRNs from gene expression data [3]. The GRN inference methods can be grouped into at least four categories: Boolean network methods [4], ordinary differential equation (ODE) model-based methods [5], Bayesian network methods [6] and tree-based ensemble learning methods [7]. These methods mainly rely on two types of gene expression data, i.e., gene perturbation experiments [8,9] or time-course gene expression data [10]. Temporal changes in expressions, resulting from the interactions between genes, could potentially imply causal regulations. Meanwhile, a wealth of time-course transcriptomic data has been generated from the laboratory experiments. So temporal type of expression data is one of the most common assumptions based on which many GRN inference methods were designed [11]. However, the transcriptomic data of tumor samples often lack explicit temporal information [12]. In fact, large samples of time-course data are rarely available in clinical situations, at least for now, since longitudinal surveys are often challenging to conduct. In contrast, cross-sectional studies (i.e., a snapshot of a particular group of people at a given point in time) based on high-throughput molecular omics data are more prevalent due to their relative feasibility. As such, for cross-sectional transcriptomic data at population-scale, most of the current methods, such as Pearson correlation coefficient (PCC)-based methods [13], mutual information [14], regression methods [15] and machine learning methods [16], can only infer co-expressions or associations between genes. Moreover, although some correlation network-based methods have been used to identify disease-associated genes [17], it’s hard to tell the causal drivers or regulatory roadmap underlying phenotypic abnormality in the absence of regulatory network information [18]. Therefore, the lack of temporal information in clinical transcriptomic data leads to a key challenge for inferring directed GRN and its translation to systems medicine. Decoding temporal information that traces the underlying biological process from the cross-sectional data is intriguing and enlightening to address the above challenge. The sample similarity-based approach has shown great promise in recovering evolutionary dynamics in evolution and genetics studies [19], for instance, phylogenetic trees based on microarray data [20] and genetic linkage maps based on genetic markers [21]. To this end, we propose that the latent temporal order of cancer progression status (i.e., latent-temporal progression) could be estimated from the cross-sectional data based on transcriptomic similarity between patient samples. Leveraging the latent-temporal ordering, we could represent the GRN as a nonlinear dynamical system. What’s more, however, considering the technical variability or measurement error in the RNA-sequencing or microarray data (e.g., variations in sample preparation, sequencing depth and measurement noise and bias) [22,23], it’s indispensably important to guarantee the robustness of the GRN inference. In this study, we present PROB, a latent-temporal progression-based Bayesian method of GRN inference designed for population-scale transcriptomic data. To estimate the temporal order of cancer progression from the cross-sectional transcriptomic data, we develop a staging information-guided random walk approach to efficiently measure manifold distance between patients in a large cohort. In this way, the cross-sectional data could be reordered to be analogous to time-course data. This transformation enables us to formulate the GRN inference as an inverse problem of progression-dependent dynamic model of gene interactions, which is solved using a Bayesian method. The robustness of the estimates of regulatory coefficients is justified through mathematical analysis and simulations. Furthermore, applications to real data not only demonstrate better performance of PROB than other existing methods but also show good capacity of PROB in identifying key regulators of cancer progression or potential drug targets. The identified ACSS1 in bladder cancer and predicted FOXM1-targets interactions in breast cancer are both validated. In addition, we also discuss potential clinical applications of our method.
Discussion PROB provides a novel tool for inferring cancer progression and GRNs from cross-sectional data. Our approach is based on a dynamical systems representation of gene interactions during cancer progression. The inverse problem with respect to GRN reconstruction was solved by combining latent progression estimation and Bayesian inference for high-dimensional dynamic systems. PROB can be used to generate experimentally testable hypotheses on the molecular regulatory mechanisms of gene regulation during cancer progression and to identify network-based gene biomarkers for predicting cancer prognosis and treatment response. Besides cross-sectional bulk transcriptomic data, our method can be naturally applied to time-course scRNA-seq data (Fig 3). Although scRNA-seq data can be used to infer GRNs during cell differentiation or development, it is currently not feasible to use scRNA-seq to investigate long term cancer progression due to patient heterogeneity, difficulty in acquisition of massive samples and expensive cost. In view of this, clinical transcriptomic data of cancer patients provide an alternative way to infer GRNs underlying cancer progression. The novelty and superiority of PROB can be first attributed to the successful ordering of tumor samples by using both gene expression data and staging information. Our proposed stage-weighted Gaussian kernel allows construction of diffusion-like random walks to quantify the temporal progression distance (TPD) between two patients (Eq (8)). The diffusion map, as a manifold-based nonlinear dimension reduction method, has been recently applied to scRNA-seq data analysis [26,57–59]. One major difficulty in applying diffusion maps for inferring pseudo trajectories lies in identifying the rooting point when using scRNA-seq data itself, and it often needs additional biological knowledge. An advantage of clinical transcriptomic data is that staging or grading information is usually available for samples as well, allowing development of an algorithm that automatically identifies the rooting point (Eq (9)). We demonstrated that incorporating staging information into the temporal progression inference significantly improved its accuracy (S1 Fig) and that our method significantly outperformed existing pseudotime inference methods (Figs 3B and S6). Considering technical variabilities in the sample-based transcriptomic data, it is important to have good robustness of the interaction coefficients in the GRN model with respect to the perturbation of the temporal progression. In addition to proving such property mathematically, through simulations we found that PROB inference of both the progression trajectory and the gene network structure are rather robust to noise in the data (Figs 2, S4 and S5). In addition, PROB is computationally efficient for GRN inference, which could be completed within 1 minute on the three real datasets analyzed in this study (S4 Table). For clinical applications, our method can be used to identify key genes for early detection of cancer progression and design of therapeutic targets. By recovering the temporal dynamics of gene expression in terms of the disease progression, PROB provides insights into exploiting kinetic features of functionally important genes that may be used as predictive biomarkers or drug targets. In the case study of bladder cancer progression, we have demonstrated that ACSS1 and PTNT12 played important roles in EMT during bladder cancer progression from UC to SARC and their expressions dynamically changed over the progression (Figs 4 and 5). Therefore, we hypothesized that the temporal dynamics of EMT regulatory genes (e.g., ACSS1 or PTPN12) could be exploited to predict cancer progression. To this end, a logistic regression model was developed to predict EMT states or histological subtypes (UC vs. SARC) of bladder cancer based on the expression levels of ACSS1 and PTPN12, which showed good predictive accuracy (S11 Fig). As such, the early changes in expressions of ACSS1 and PTPN12 during the progression of UC to SARC may be relevant for the early detection of SARC. In another case study of breast cancer, FOXM1, a drugable target, was identified as a key regulator underlying breast cancer progression (Fig 6) and, importantly, the predicted FOXM1-target regulations were validated (Fig 7). Furthermore, here, we propose a GRN kinetic signature (S8 Text) based on FOXM1-targeted gene interactions to prognosticate relapse in breast cancer. Kaplan-Meier (K-M) survival curves were plotted for the high-risk group (green) and low-risk group (red) of patients with respect to relapse-free survival (RFS) (S12A–S12C Fig). The log-rank test p values for all three datasets were less than 1e-4. Moreover, we tested the statistical significance of the FOXM1-targets interactions in predicting relapse in breast cancer using a bootstrapping approach (S8 Text). We compared the prognostic power (Wald test p value) of the FOXM1-predicted targets with that of 10000 sets of 8 randomly selected genes. The permutation test p values for all three datasets were less than 0.05 (S12D–S12F Fig), verifying the non-randomness of the predicted targeted genes of FOXM1. These results demonstrated that the predicted FOXM1-target interactions could be used as a biomarker for prognosticating relapse in breast cancer. The latent-temporal progression–based casual network reconstruction method proposed in this study will likely innovate other network-based methodologies, such as those in system genetics [60,61], network pharmacology [62,63], and network medicine [64,65]. Our method has several limitations that could be improved in future studies. For example, in the current method, only gene expression profiles and staging information from patient samples have been used for latent-temporal progression modeling. Other covariates, for example, age, genetic mutation, and molecular subtypes, might also be useful for progression inference [66]. Statistical models that integrate multiple aspects of clinical information will provide better inference of disease progression. In summary, we have developed a novel latent-temporal progression-based Bayesian Lasso method, PROB, to infer directed and signed gene networks from prevalent cross-sectional transcriptomic data. PROB provides a dynamic and systems perspective for characterizing and understanding cancer progression based on patients’ data. Our study also sheds light on facilitating the regulatory network-based approach to identifying key genes or therapeutic targets for the prognosis or treatment of cancers.
Acknowledgments We would like to acknowledge Profs. Tianshou Zhou, Jinzhi Lei, Yong Wang for valuable discussion. We would also like to acknowledge Drs. Zifeng Wang and Dongliang Leng for processing the Chip-seq data.
[1] Url:
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008379
(C) GlobalVoices
Licensed under Creative Commons Attribution 3.0 Unported (CC BY 4.0)
URL:
https://creativecommons.org/licenses/by/4.0/
via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/