(C) PLOS One

(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .

One model fits all: Combining inference and simulation of gene regulatory networks [1]

['Elias Ventre', 'Laboratoire De Biologie Et Modélisation De La Cellule', 'École Normale Supérieure De Lyon', 'Cnrs', 'Umr', 'Inserm', 'Université Claude Bernard Lyon', 'Lyon', 'Inria Center Grenoble Rhône-Alpes', 'Équipe Dracula']

Date: 2023-04

The rise of single-cell data highlights the need for a nondeterministic view of gene expression, while offering new opportunities regarding gene regulatory network inference. We recently introduced two strategies that specifically exploit time-course data, where single-cell profiling is performed after a stimulus: HARISSA, a mechanistic network model with a highly efficient simulation procedure, and CARDAMOM, a scalable inference method seen as model calibration. Here, we combine the two approaches and show that the same model driven by transcriptional bursting can be used simultaneously as an inference tool, to reconstruct biologically relevant networks, and as a simulation tool, to generate realistic transcriptional profiles emerging from gene interactions. We verify that CARDAMOM quantitatively reconstructs causal links when the data is simulated from HARISSA, and demonstrate its performance on experimental data collected on in vitro differentiating mouse embryonic stem cells. Overall, this integrated strategy largely overcomes the limitations of disconnected inference and simulation.

Gene regulatory network (GRN) inference is an old problem, to which single-cell data has recently offered new challenges and breakthrough potential. Many GRN inference methods based on single-cell transcriptomic data have been developed over the last few years, while GRN simulation tools have also been proposed for generating synthetic datasets with realistic features. However, except for benchmarking purposes, these two fields remain largely disconnected. In this work, building on a combination of two methods we recently described, we show that a particular GRN model can be used simultaneously as an inference tool, to reconstruct a biologically relevant network from time-course single-cell gene expression data, and as a simulation tool, to generate realistic transcriptional profiles in a non-trivial way through gene interactions. This integrated strategy demonstrates the benefits of using the same executable model for both simulation and inference.

In a second step, we use CARDAMOM to calibrate the model with a real time-stamped scRNA-seq dataset of differentiating mouse embryonic stem (ES) cells [ 22 ]. We demonstrate the ability of the model to reproduce the global features of real time-course transcriptomic profiles. We also show that most of the inferred interactions are indeed supported by biological evidence such as ChIP-seq experiments, although this evidence was not used during the inference process. Altogether, these results establish the ability of an executable network model not only to simulate realistic single-cell datasets, but also to provide an effective reverse-engineering algorithm capable of reproducing the main gene expression patterns of an experimental dataset as emergent properties of the underlying GRN.

After introducing the setup of our benchmark made from in silico datasets generated with the mechanistic model, we first evaluate the performances of HARISSA and CARDAMOM together with four state-of-the-art GRN inference algorithms: GENIE3 [ 18 ], PIDC [ 19 ], SINCERITIES [ 20 ] and SCRIBE [ 21 ]. We study the limits of the different categories of inference methods in the case of transcriptional bursting, and verify that the two model-based methods perform better than the others on these datasets. CARDAMOM appears as the best performing algorithm during this benchmark step, which only considers network structures. Importantly, the output of this algorithm is not only a matrix of interaction scores, but also a set of quantitative parameters that can be plugged into the GRN model for simulations.

In this work, we sought to investigate the benefits of using this model as an integrated tool for both GRN inference and data simulation. We therefore assessed its ability to allow for efficient network reconstruction from time-course scRNA-seq data, while accurately reproducing the dataset main features from the functioning of the inferred network. Note that to the best of our knowledge, this is not performed by existing GRN-based simulation tools, which are generally based on more phenomenological than mechanistic models, with at least some important aspects of gene expression patterns, such as transitions between cell types [ 16 ] or gene expression variability [ 16 , 17 ], being hard-coded instead of arising from biological mechanisms.

We recently developed several methods for inferring GRNs from single-cell data based on a particular mechanistic network model, defined as a ‘multi-agent’ generalization of the well-known two-state stochastic model of gene expression [ 8 ] where genes are now being described by interacting two-state models [ 6 ]. These methods are well suited for single-cell RNA-seq (scRNA-seq) time-course data, each dataset being considered as a partial observation of the model at a certain time. Crucially, they do not require the observation of cell trajectories, whose inference is a problem in itself [ 12 , 13 ], but only that the cells sampled at each timepoint are driven by the same dynamical process, i.e., resulting from the same GRN. Our first proposal was called WASABI [ 14 ], which uses a divide-and-conquer approach where the problem of GRN inference is solved one gene at a time. Although able to propose relevant GRNs, this approach suffered from two drawbacks: it required days of computation for a GRN with 50 genes, and proposed a potentially long list of candidate networks. We therefore developed two other methods: HARISSA [ 6 ], a GRN simulation algorithm based on the mechanistic model together with a proof-of-concept inference method derived from likelihood maximization, and CARDAMOM [ 15 ], a simplified and scalable alternative for the GRN inference part that crucially exploits the notions of landscape and metastability.

Moreover, the results of a method based on a mechanistic model can only be considered relevant if the model is able to correctly reproduce single-cell datasets. For instance, it is now widely accepted that the transcriptional bursting phenomenon is associated to specific patterns of gene expression products [ 8 , 9 ], making continuous single-cell data close to Gamma distributions [ 10 ] and discrete data close to negative binomial distributions [ 11 ], the latter being themselves mathematically equivalent to Poisson distributions with Gamma-distributed random parameters. Thus, executable network models should at least be able to generate these patterns in their marginal distributions. In any case, the use of a mechanistic model-based method requires prior strong evidence that the underlying model is relevant for simulating realistic single-cell transcriptomic datasets.

In this article, we make a distinction between mechanistic GRN models (e.g., built on the biological understanding of differentiation mechanisms) for which cell behavior appears as an emergent property of gene interactions, and phenomenological models, for which the expected outcome is directly prescribed by some dedicated parameters. In this case, although such parameters can still have a biological meaning, the cellular behavior is not biologically emerging but rather ‘hard-coded’ by the model. As developed afterwards, many GRN models fall in between: some aspects of gene expression patterns are then hard-coded instead of emerging from interactions between genes, and gene expression stochasticity is often assumed to be driven by Gaussian white noise only, requiring ad hoc additional noise to match the data.

Reconstructing most-likely GRNs from transcriptomic datasets has therefore become a major goal in systems biology [ 3 ] but is also notoriously difficult, especially in the case of single-cell transcriptomic data. Indeed, the bursty synthesis of mRNAs, now clearly evidenced [ 4 , 5 ], gives rise to highly variable and non-Gaussian expression data [ 1 , 6 ], and current GRN inference methods employ a wide range of statistical and modeling tools [ 7 ]. Methods based on a specific dynamical model, called here GRN models, have the great advantage of providing biological interpretability, since each inferred interaction between genes can be understood in terms of model behavior. Moreover, such approach generally provides interactions with their direction and intensity, which is not the case for most purely statistical methods.

The “how” question can now be approached using single-cell-based technologies, offering an unprecedented resolution and a much finer view than population-based measures [ 1 , 2 ]. The “why” question relates to the functioning of an underlying gene regulatory network (GRN) which describes interactions between genes through their expression products. GRNs are thus a central notion for understanding and predicting cellular behavior, but their construction from literature is a very laborious task, sometimes even impossible due to the lack of knowledge.

Cell decision making as a response to exogenous or endogenous stimuli (e.g., differentiation, proliferation, cell death or biological activity modulation) is often supported by time-dependent modulation of gene expression upon stimulation. Understanding how and why gene expression changes as a function of time in response to specific stimuli is therefore critical to understand the underlying biological processes.

Results

HARISSA simulates single-cell datasets from a mechanistic GRN model We first wanted to benchmark the ability of the different inference algorithms to reconstruct correct network structures from in silico generated datasets, i.e., when the ground truth is known. For this, we used the simulation module of HARISSA [6], which generates trajectories of a mechanistic model describing gene expression dynamics (both mRNA and the corresponding proteins) within a single cell, these dynamics being influenced by an underlying GRN and driven by transcriptional bursting (see Simulation of the inferred network reproduces the original dataset and S1 Fig). As shown in previous work, this model is indeed able to generate scRNA-seq datasets with realistic marginal distributions [23]. We simulated nine datasets corresponding to different network structures (Fig 1): a network of 4 genes with a branching structure and inhibition feedback loop (FN4); a network of 5 genes with a cycling structure (CN5); a network of 8 genes with multiple branching structure and feedback loops (FN8); a network of 8 genes with branching trajectories (BN8); networks with a tree structure of 5, 10, 20, 50, and 100 genes (Trees). These networks represent the main types of network structures that have been used for benchmarking GRN inference algorithms [17]. Overall, the objective was to reproduce time-course experiments in which single-cell profiling is performed after a given stimulus, typically a change of medium [22, 24, 25]. This stimulus was therefore taken into account in all the simulations, in the form of a virtual gene defined as being inactive before the beginning of the experiment and fully activated afterwards. PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 1. Single-cell data simulation using HARISSA. (A) Networks used for subsequent tests, including feedback loop networks (FN), a cycling network (CN) and a branching network (BN). Genes and stimulus are represented by numbered nodes and an empty node, respectively, while green arrows indicate activation and red blunt arrows indicate inhibition. (B) Corresponding trajectories, defined as time-dependent mRNA levels (in copies per cell). For each network, the first plot shows one example of single-cell trajectory M while the second plot shows the population average 〈M〉 from 1000 cells. The transcriptional bursting model underlying HARISSA implies that every single-cell trajectory differs strongly from the more usual population average. (C) Two-dimensional UMAP representations of corresponding single-cell snapshots, defined as mRNA levels sampled at 10 timepoints in different cells from 0h to 96h, with 100 cells per timepoint. Such snapshots are called time-stamped data in the text and are fundamentally different from single-cell trajectories, which are currently not available experimentally. https://doi.org/10.1371/journal.pcbi.1010962.g001 For each network structure (Fig 1A), the transcriptional bursting model implies that typical single-cell trajectories do not follow a diffusion-like process (at least in the space of mRNA levels), and differ strongly from the more usual and intuitive population-average trajectory (Fig 1B). The practical datasets were obtained by sampling independent cells at a specific sequence of timepoints, therefore not keeping the real cell trajectories but rather considering different cells at each timepoint, forming time-stamped snapshots (Fig 1C). Interestingly, both feedback networks (FN4 and especially FN8) produce a recognizable “differentiation trajectory” across the UMAP space with a clear temporal order of cells. Due to the stochastic nature of cell trajectories generated by the mechanistic model, branching trajectories in snapshots only appear in specific cases, generally when a toggle-switch is dominating the GRN structure and then generates distinct branches in the UMAP representation (see BN8 in Fig 1). As mentioned previously, HARISSA consists of two modules for performing respectively simulation and inference. Whereas the original inference module of HARISSA was limited to a few genes [6], it recently integrated an effective CARDAMOM-inspired simplification [23] that allows to infer networks with a much larger number of genes. We therefore also benchmarked this method along with the others.

CARDAMOM quantitatively reconstructs causal GRN links We then inferred GRN structures from the in silico generated datasets using the six algorithms presented in the Simulation of the inferred network reproduces the original dataset section (HARISSA, CARDAMOM, GENIE3, PIDC, SINCERITIES, and SCRIBE). Note that neither GENIE3 nor PIDC are able to use the temporal information (except for the stimulus state information, which they are also provided with), giving them a disadvantage compared to the other algorithms. They were nevertheless used in the benchmark as they are considered to be among the best algorithms for single-cell data, and given that very few algorithms are specifically adapted to time-stamped datasets. Indeed, most methods are limited to static data, and those that are not (such as SCRIBE) require temporally-ordered cell trajectories instead of independent snapshots, thus requiring a pre-processing step that can itself be subject to errors. Moreover, it was not known how they would fare in a time-course setting with transcriptional bursting, which was an interesting question per se. We also emphasize that among these algorithms, only CARDAMOM and HARISSA have the significant advantage of providing biological interpretability, thanks to the mechanistic model on which they are based: here the network parameters are not mere interaction scores, but quantitative parameters that can be plugged into the model for simulations. Besides, the main objective is really to reverse engineer such a model: from this perspective, despite the obvious advantage of CARDAMOM and HARISSA being built on the same mathematical framework as the one used to generate the data, even similar performances compared to the other algorithms would be satisfying. Inference was performed on ten independent datasets for each network, and the results were merged into the area under the precision-recall curve (AUPR) which measures the quality of the inferred GRN structure. We also compared the inferred GRNs with a naive method consisting in assigning to each edge of the network the value given by the Pearson correlation coefficient between the corresponding genes (abbreviated as PEARSON): this comparison with Pearson coefficients makes it possible to verify, when the algorithms show good performances, that these are not only due to highly correlated data which are thus not difficult to analyze. The results are presented in Fig 2A and 2B for the first five algorithms. We present the results for SCRIBE separately in Fig 2C because this algorithm requires temporally or pseudo-temporally ordered trajectories, and the results then depend on the pre-processing that is applied on the time-stamped data. PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 2. Benchmark of inference methods for five different network structures. For each network, inference is performed on ten independently simulated datasets, each dataset containing the same 10 timepoints with 100 cells per timepoint (1000 cells sampled per dataset). The performance on each dataset is then measured as the area under the precision-recall curve (AUPR), based on the unsigned inferred weights of edges. Finally, the performance of each method is summarized as a box plot of the corresponding AUPR values, or the average AUPR value for the tree-structure activation networks (Trees). For each plot, the dashed gray line indicates the average performance of the random estimator (assigning to each edge a weight 0 or 1 with 0.5 probability). For the Trees networks, each dataset corresponds to a random tree structure of fixed size (5, 10, 20, 50, and 100 genes) sampled from the uniform distribution over trees of this size. (A) Performance of all methods when considering only undirected interactions. (B) Performance of the methods able to infer directed interactions. (C) Performance of the SCRIBE inference method for the same networks, in three conditions: when one has access to real single-cell trajectories (in brown), when pseudo-trajectories are reconstructed from time-stamped data using a coupling method similar to Waddington-OT (in pink), and when a single pseudo-trajectory is reconstructed using the pseudotime algorithm SLINGSHOT (in light green). For the last two conditions, the datasets used are therefore the same as those used for the other methods. https://doi.org/10.1371/journal.pcbi.1010962.g002 CARDAMOM and HARISSA appeared to outperform the other algorithms for most of these datasets. In particular, in terms of directed interactions, these two methods always clearly performed better than the others. The undirected networks for which GENIE3 and PIDC have similar performances (CN5 and BN8) correspond to cases where the Pearson correlation method is also accurate. Also, if GENIE3 and PIDC represent an improvement over the Pearson correlation method, they seem to perform poorly when the correlation between genes is not sufficient to infer a reliable GRN. More precisely, we observe that GENIE3 and PIDC are accurate for tree-like networks (Trees), even with bifurcating trajectories (BN8) and cycling (CN5), which was not the case in [17]. On the contrary, SINCERITIES performs very poorly for these type of networks, but seems however competitive for networks with feedback loops (FN4 and FN8) where GENIE3 and PIDC have lower performances. These networks are more difficult to reconstruct. Indeed, as visible in Fig 1, the population-average trajectories of some genes are completely similar. Some genes also have the same marginal distribution of mRNA levels: for example in the network FN4, gene 2 and gene 3 have the same input (gene 1), so their marginal distributions evolve similarly at each timepoint. Then SINCERITIES, which bases the inference procedure on the approximate distribution for each gene, fails to make this subtle distinction, illustrating the improvement that is typically expected from HARISSA and CARDAMOM. On all the networks, GENIE3 fails to infer reliably the direction of the interaction, i.e., to distinguish the interaction i → j from the interaction j → i. On the contrary, because of their mechanistic assumptions, CARDAMOM, HARISSA and SINCERITIES have always quite similar results for directed and undirected inference. Finally, we observed that CARDAMOM outperforms HARISSA on most of the networks. Regarding SCRIBE, which is a trajectory-based method, we tested its performances in three scenarios (Fig 2C): When we have access to real trajectories (Real traj.): each cell at each timepoint is being associated to a real ancestor at the previous timepoint and a real descendant at the following one. Such knowledge can of course only be accessed with in silico generated datasets or in vitro for a very limited number of genes by using live-cell imaging of short-lived transcriptional reporters [26]; When the dataset is the same as for the other methods (i.e., no access to real trajectories), and each cell at each timepoint is associated to a pseudo-ancestor at the previous timepoint and a pseudo-descendant at the following one, using the Waddington-OT method described in [27] (Coupling); When the dataset is the same as for the other methods (i.e., no access to real trajectories), and the algorithm SLINGSHOT [28] is used for reconstructing a pseudo-temporally ordered trajectory (Pseudotime). We observed that SCRIBE performs well in scenario 1, but poorly in scenarios 2 and 3, at least on the tested networks (Fig 2C). These poor performances are due to the loss of temporal coupling between measurements of genes that interact. They suggest that neither optimal coupling nor pseudotime reconstruction are sufficiently efficient for GRN inference in case of transcriptional bursting. Concerning the optimal coupling method, we notice that this might be due to the “movement by diffusion” assumption on which the Waddington-OT method is built, which does not take into account the constraints on the trajectories imposed by the GRN. When computing the average runtime of each algorithm on the tree-like networks, we observed that except for SCRIBE, all algorithms are suitable for inferring GRN with a realistic number of genes (see S1 Table). Thus, due to this computational limit and its poor performances when using time-stamped data, we did not consider SCRIBE for further analysis. We then investigated the limit performances with respect to the number of cells and/or timepoints. We observed that the performances of the first five algorithms decrease for the tree-like networks when the number of genes increases (Fig 2). This can be due to three main factors: A sequence of timepoints too coarse in relation to the dynamics would directly lead to a lack of inference accuracy; A sequence of timepoints which is too restricted may not allow to see interactions involving some genes that are regulated late in the process. For example, in Fig 2, we observe that the inference on the Trees networks is very poor for more than ten genes: it comes from the fact that some genes are never activated before 96h; The number of cells at each timepoint can simply not be enough to infer a reliable GRN. We therefore investigated the effects of these three factors on the accuracy of the algorithms by studying their performances in terms of AUPR for ten datasets generated from ten randomly-generated tree-like network of ten genes, when varying the number of cells at each timepoint (Fig 3A), the length of the interval for a fixed time gap between each timepoint (Fig 3B), and the density of the sequence of timepoints for an interval with fixed length (Fig 3C). As anticipated, all these conditions have an impact on the quality of the inference: augmenting their values tends to produce a better quality of inference. We also observed that the number of sampled cells seems less critical than the other factors, confirming that few cells at a sequence of timepoints which is dense and long-enough is preferable to many cells on a sequence of timepoints which is too coarse and/or too short. This should be kept in mind when designing single-cell transcriptomics experiments aiming at GRN inference. PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 3. Dependence of inference methods on data collection parameters. For simplicity, only the case of undirected interactions is considered here and the datasets are restricted to 10-gene tree-structure networks (see Fig 2 for the general benchmark). Inference is performed for each method and condition on ten independently simulated datasets and summarized by box plots of AUPR values as in Fig 2. For each plot, the dashed gray line indicates the average performance of the random estimator (assigning to each edge a weight 0 or 1 with 0.5 probability). (A) Performance as a function of the number of cells per timepoint, while keeping the same timepoints. (B) Performance as a function of the length of the measurement period, while keeping the same gap between timepoints and the same total number of cells. (C) Performance as a function of the gap between timepoints, while keeping the same final timepoint and the same total number of cells. https://doi.org/10.1371/journal.pcbi.1010962.g003 Hence, both CARDAMOM and HARISSA, with a benefit for using CARDAMOM, allowed to efficiently reconstruct network structures by reverse engineering the generative model on which they are based. We then needed to test its ability to reproduce an experimental dataset from the literature after network inference.

[END]
---
[1] Url: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010962

Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/