(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .
A Physics-Informed Neural Network approach for compartmental epidemiological models [1]
['Caterina Millevoi', 'Department Of Civil', 'Environmental', 'Architectural Engineering', 'University Of Padova', 'Via Marzolo', 'Padova', 'Damiano Pasetto', 'Department Of Environmental Sciences', 'Informatics']
Date: 2024-10
Compartmental models provide simple and efficient tools to analyze the relevant transmission processes during an outbreak, to produce short-term forecasts or transmission scenarios, and to assess the impact of vaccination campaigns. However, their calibration is not straightforward, since many factors contribute to the rapid change of the transmission dynamics. For example, there might be changes in the individual awareness, the imposition of non-pharmacological interventions and the emergence of new variants. As a consequence, model parameters such as the transmission rate are doomed to vary in time, making their assessment more challenging. Here, we propose to use Physics-Informed Neural Networks (PINNs) to track the temporal changes in the model parameters and the state variables. PINNs recently gained attention in many engineering applications thanks to their ability to consider both the information from data (typically uncertain) and the governing equations of the system. The ability of PINNs to identify unknown model parameters makes them particularly suitable to solve ill-posed inverse problems, such as those arising in the application of epidemiological models. Here, we develop a reduced-split approach for the implementation of PINNs to estimate the temporal changes in the state variables and transmission rate of an epidemic based on the SIR model equation and infectious data. The main idea is to split the training first on the epidemiological data, and then on the residual of the system equations. The proposed method is applied to five synthetic test cases and two real scenarios reproducing the first months of the Italian COVID-19 pandemic. Our results show that the split implementation of PINNs outperforms the joint approach in terms of accuracy (up to one order of magnitude) and computational times (speed up of 20%). Finally, we illustrate that the proposed PINN-method can also be adopted to produced short-term forecasts of the dynamics of an epidemic.
The proposed PINN implementations are tested in different scenarios using both synthetic and real-world data referred to the COVID-19 pandemic outbreak in Italy. The promising results can pave the way for a wider use of PINNs in epidemiological applications.
In this paper, we explore the use of a recently developed technique called Physics-Informed Neural Network, which tries to combine the two approaches and to simultaneously fit the data, infer the dynamics of the unknown parameters, and solve the model equations.
During the recent COVID-19 pandemic, we all became familiar with the reproduction number, a crucial quantity to determine if the number of infections is going to increase or decrease. Understanding the past changes of this quantity is fundamental to produce realistic forecasts of the epidemic and to plan possible containment strategies. There are several methods to infer the values of the reproduction number and, thus, the number of new infections. Statistical methods are based on the analysis of the collected epidemiological data. Instead, modeling approaches (such as the popular SIR model) attempt constructing a set of mathematical equations whose solution aims at approximating the dynamics underlying the data.
Data Availability: The epidemiological data for the Italian COVID-19 epidemic are available at the following link:
https://www.epicentro.iss.it/en/coronavirus/sars-cov-2-integrated-surveillance-data . The source code in Python is available at the following repository:
https://github.com/cmillevoi/EpiPINN .
Copyright: © 2024 Millevoi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
The paper is organized as follows. Section 2 presents the mathematical formulation of the proposed methods. It starts with the equations of the SIR model (Section 2.1), then it describes the joint and split implementations of PINNs (Section 2.2), and finishes with the modified schemes for the reduced approach (Section 2.3) and the extension to the hospitalized data (Section 2.4). The numerical results of the application of the proposed PINNs to seven test cases are illustrated in Section 3. The method is tested and validated on synthetic cases (Section 3.1) and then applied in a real-world scenario (Section 3.2) for both parameter estimation and forecast. Finally, Section 4 presents the discussion of the results and sums up the main conclusions.
Our analysis compares the joint, split, and reduced approaches in a sequence of synthetic test cases where we progressively challenge the structure of the transmission rate from constant, to a sinusoidal-like dependence on time, to a real scenario, and increase the noise on the synthetic reported data. The proposed test cases assume model parameters that are inspired by the first months of the COVID-19 outbreak in Italy. As an example of application, the PINN strategies are adapted in order to fit the real epidemiological data reported in Italy. Due to the large uncertainties that characterize the real data on the reported infections, in this last setting we propose to include in the loss function also the data on the daily hospitalizations, which are a more reliable representation of the number of individuals with severe symptoms. Finally, we consider this scenario to explore the accuracy of the short-term forecasts produced by PINNs.
The second proposed modification reduces the number of NNs considered in the PINN approximation and, consequently, simplifies the structure of the loss function. This simplification is possible because, in simple SIR-based models, the transmission parameter and the infected compartment control the system dynamic. In fact, these functions allow to directly evaluate the other state variables, which are then redundant in the formulation of the loss function.
The first modification splits the PINN implementation in two steps. The motivation for this approach is that in the common PINN implementation for SIR-like models, the NNs representing the model state variables and, if present, the time-dependent parameters, are calibrated together through the minimization of the loss function on the data and the model residual. This inverse problem is particularly complex and many epochs might be required to achieve convergence. Starting from the idea that the available epidemiological data, which are typically the daily or weekly reported infections, is directly associated to a model state variable, the split PINN approach is based on the following two steps: as first, construct the NN of the state variable associated to the data, e.g., the infected compartment, by minimizing the loss function based on the data; as second, calibrate the other NNs for the remaining state variables and parameters based on the NN computed in the first step and the minimization of the residuals of the governing equations. We will refer to the traditional PINN approach as joint approach, in contrast to the described split approach. A graphical sketch of the two approaches is shown in Fig 1 .
In particular, we propose two modifications of the PINNs algorithm that grant faster convergence and more stable results, thus providing a step forward in the use of PINNs in real epidemiological models.
Building on top of these examples, our work aims to deeper explore the properties of PINNs as an inverse solver for the estimation of time-dependent transmission rates or reproduction numbers in SIR models. Our analysis aims on further showing some benefits of using PINNs that are not directly available with more traditional approaches such as: the simultaneous estimation of multiple parameters that change in time, the inference using jointly different types of data, the possibility of providing a future projection for the evaluated parameters, the possibility of training the model even if there are gaps or large errors or uncertainties on the quality of the data.
The application of PINNs to epidemiological models became particularly relevant during the COVID-19 pandemic. Many studies used PINNs as an inverse-problem solver, to calibrate the parameters of epidemiological compartmental models. However, the model parameters has frequently been considered constant in time, e.g, [ 21 , 22 ], or with particular periodic dependencies on time [ 23 ]. Schiassi et al. [ 24 ] showed the computational efficacy of using PINNs to estimate constant parameters of different basic compartmental models under increasing levels of noise in the data. Long et al. [ 25 ] considered a more realistic scenario, and used PINNs to accurately identify the time-varying transmission parameter in a SIRD model of COVID-19 when assimilating the reported infected cases in three USA states. Feng et al. [ 26 ] proposed a similar approach to predict the number of active cases and removed cases in the US. Olumoyin et al. [ 27 ] used PINNs to track the changes in transmission rate and the number of asymptomatic individuals for COVID-19. Ning et al. [ 28 ] and He et al. [ 29 ] presented applications of PINNs to COVID-19 outbreaks in Italy and China, respectively. Bertaglia et al. [ 30 ] constrained PINNs to satisfy an asymptotic-preservation property to avoid poor results caused by the multiscale nature of the residual terms in the loss function.
Here, we propose to adopt a deterministic approach based on Physics-Informed Neural Networks (PINNs). The idea behind PINNs is to exploit the universal approximation property of Neural Networks (NNs) [ 18 , 19 ] to estimate the solution of differential equation [ 20 ]. In practice, this is done by describing the state variables and, in case, the time-dependent parameters using NNs. The parameters of the NNs are trained by seeking the minimum of a loss function based on both the misfit on the available data, and the residual of the differential equations governing the problem at hand, i.e., the SIR model equations in our case. Thus, the PINN functions fit the data and, at the same time, provide good approximations of the solutions of the differential equations. The use of the epidemiological model equations is fundamental in PINNs and constitutes the main innovation with respect to simpler NNs or Deep Neural Networks (DNNs), which are completely data-driven.
Tracking the temporal variations in the model parameters is an essential but complex problem to follow and predict the spreading of a disease. Many studies tackle this problem using Bayesian inference, i.e., searching for the posterior distribution of the unknown parameters based on the available reported cases and the prior distribution. Among these approaches, we recall the iterative particle filter [ 16 ], sequential data-assimilation schemes [ 17 ], or the use of subsequent Markov chain Monte Carlo (MCMC) [ 4 , 7 ]. Being based on random sampling, these approaches might result in low quality results and large computational times, due to the slow Monte Carlo convergence.
Data-driven methods provide effective estimates of based on the renewal equation [ 10 – 12 ], i.e., a convolution on the reported cases having as kernel the serial interval (the time interval between the symptom onset of an individual and its secondary infections). These data-driven estimates do not explicitly provide a relationship between the changes in and its possible causes, such as the implemented non-pharmaceutical interventions or the vaccination campaigns. Compartmental models give a deeper understanding of the ongoing spreading of the disease and, at the same time, allow the computation of using the spectral radius of the next generation matrix [ 13 – 15 ]. However, they require the assessment and calibration of time-dependent parameters.
Epidemiological models are nowadays fundamental to assist and guide policy makers in the fight against the spreading of diseases. This has been evident during the recent COVID-19 pandemic, when epidemiologists and scientists all over the world devoted their research to develop ad-hoc transmission models. Focusing, for example, on Italy, where the European outbreak started in February 2020, epidemiological models have been adopted to analyze different aspects of the epidemic: to determine the urgency to impose regional restrictions [ 1 ]; to analyze the impact of the national lockdown [ 2 , 3 ]; to explore the results of transmission scenarios after the release of the restrictions [ 4 ]; to study the impact of the different variants and the vaccination campaign [ 5 – 7 ]; and to compute optimal strategies for the vaccine deployment in order to minimize the number of cases or deaths [ 8 , 9 ]. Most of these studies describe the SARS-CoV-2 transmission using different variations of compartmental models. The basic SIR model is at the core of those more-complex epidemiological models. It subdivides the population of interest into compartments indicating the infectious status of each individual (i.e. susceptible, infected, or recovered individuals). The dynamic describes the mean contacts between susceptible and infected individuals, and thus, the average rate at which susceptible individuals transit to the infected compartment. The main model parameter is the rate of transmission of the infection, β. This is strictly related to the well known basic reproduction number, , representing the average number of secondary infections generated by one infected individual in a totally susceptible population. The value of this quantity changes during an outbreak due to the temporal variations in human behavior (caused, for example, by changes in individual awareness or social distancing policies) and in the infectiousness of the virus. The effective reproduction number, , aims at describing the ongoing transmission in a changing system.
2 Methods
2.1 The basic SIR model The well-known SIR model is largely adopted for the theoretical analysis of epidemics, and lies at the core of several more complex epidemiological models for real applications. At a given time t [T], the individuals in a population of dimension N [–] are subdivided into compartments on the basis of their epidemiological status, in this case the susceptible (S), the infected (I), and the recovered (R) individuals. The number of individuals in the three compartments changes in time under the assumption that, in a well mixed population, any susceptible individual can enter in contact with any infected individual, thus possibly becoming infected itself. From a mathematical point of view, the strong form of the ordinary differential problem governing these dynamics can be stated as follows. Let be the time domain of interest, with t 0 and t f [T] the initial and final times of the simulation, respectively. Given the continuous functions and , find , , and such that: (1) and satisfying the initial conditions: (2) In Eqs (1) and (2) β [T−1] is the transmission rate controlling the average rate of the infection, δ [T−1] is the mean rate of removal of the infected individuals that become recovered. Another relevant quantity used to set up the model is D = δ−1, i.e., the mean reproduction period [T] representing the average time spent by an individual in compartment I. Initial conditions for the spreading of a new disease assume that the population at the initial time is completely susceptible besides a small number I 0 of infected individuals (typically 1, but not necessarily). The basic reproduction number [-] associated to this model reads and provides an estimate of the number of secondary infections generated by one infectious individual in a susceptible population, i.e. at the beginning of the epidemic. The threshold indicates the occurrence of an outbreak, while indicates that the number of infected individuals is rapidly decreasing. Note that, in a real population, the number of individuals in each compartment is a discrete variable, whose dynamic can be described by stochastic approaches, e.g., the Gillespie method or discrete Markov chains. Hence, the continuous deterministic model in Eqs (1) and (2) is a valuable representation of the mean process in large populations. Standard numerical ODE solvers, such as Runge-Kutta-based methods, can provide an accurate solution to the differential problem (1) and (2). For and constant parameters, the solution depicts an initial exponential-like increase in the number of infections up to a peak, and then a fast decrease due to the depletion of susceptible individuals. However, it is clear that this dynamic does not correspond to what happens during an outbreak. The main challenge when using a model based on (1) to describe a real epidemic is that the transmission rate β and the mean reproduction period δ−1 can change in time because of many factors: social behaviors (individual awareness, increase or decrease of gatherings, mobility, social distancing), non-pharmaceutical interventions (use of devices that reduce transmission—such as masks, introduction of lock-downs), changes in the pathogen infectiousness due to new variants, reduction of the susceptibility of the population due to vaccination campaigns. In this evolving scenario, the effective reproduction number [-] is the critical quantity that controls the spreading of the disease. is the equivalent of in time, i.e., , taking into account that the number of susceptible individuals decreases and the main parameters controlling the spreading of the disease generally change. An essential element for a reliable simulation is therefore the assessment of , hence β(t) and δ(t) along with the compartment S(t), from the available epidemiological data. In the following we will assume that δ is constant in time, assumption done in many epidemiological applications (see e.g., [2, 4, 5]).
2.2 PINN solution to the SIR model Here we develop and analyze a PINN-based approach to simultaneously solve the problem (1) and (2) and estimate the temporal values of the reproduction number by using a time series of infectious individuals as basic epidemiological information. A standard NN aims to reconstruct an unknown function u from the knowledge of some training data points. The NN approximating a generic u, denoted throughout this work by , is the recursive composition of the function: (3) where , , and ϕ(l) are weights, biases, and activation functions of the l-th layer, respectively. The last layer is the output layer, the others are the hidden layers. We denote with n l the number of neurons in layer l. Activation functions are user-specified functions with limited range, which are generally non linear in order to provide a source of non linearity to the NN and maintain low weight values. The Matlab-inspired notation ϕ.(x) means that the function ϕ is applied to each component of the vector x. Let L be the number of hidden layers. If is the solution of an ordinary differential equation in the domain , the input of the first layer reads , so n 0 = 1, and the output of the last layer is a scalar, so n L+1 = 1. Then, the NN for u formally reads: (4) The NN depends on the set of weights and biases, which are trained through an optimization algorithm so as to minimize an appropriate loss function defined as the mean squared error of over the set of training points. In the case of PINNs, the information from the governing equations of the physical system is introduced in a weak way in the loss function by adding the residual of the differential equations evaluated at some collocation points [31, 32]. For the SIR model (1) and (2), we assume that the training data points for the fitting are the reported infections. Let be the number of reported infected individuals at times , j = 1, …, N D . This might be subject to reporting errors, thus, in general . The residual of the governing equations is computed over N C collocation points. We aim at finding a NN representation for the susceptible, infected, and recovered individuals ( , , and , respectively) along with the transmission rate ( ). Since the state variables S, I, R span an extremely wide range of values (from zero to the population size N > 106), the functional search is optimized by a proper scaling: (5) where C [-] is an appropriate constant and t s is the dimensionless scaled temporal variable, t s = (t − t 0 )/(t f − t 0 ). The system of ODEs (1) for the scaled variables becomes: (6) where , C 1 = (t f − t 0 )C/N and C 2 = (t f − t 0 )δC. The initial conditions (2) are correspondingly scaled as well as the infectious data at times . The SIR model (6) does not consider death and birth processes and assumes a negligible mortality rate of the disease. Thus, the total population N is constant in time and equal to N = S + I + R. Under these hypotheses, the PINN model needs only two NNs representing the behavior of the population: one for the state variable of the susceptible individuals , and one for the infected individuals . The number of recovered individuals is computed as . A third NN is included for the estimation of the transmission rate . In this way, the number of parameters to be tuned during the training is consistently reduced. It is important to underline that the state variables represent the number of individuals in a compartment, thus they all have positive outputs. Training the model without imposing this condition could lead to nonphysical negative NN outputs. The non-negative constraint can be imposed in the NN in two alternative ways: inserting a penalty term for the negative values of the NNs (weak constraint) or building the NN architecture so as to allow for positive values only (hard constraint). The latter prescription can be met by setting the output activation function, i.e., the one related the last layer, ϕ(L+1), equal for example to the square function. An experimental comparison between the two approaches shows that the latter is generally more effective and provides more robust results. The numerical outcomes that follow are therefore obtained by using the hard constraint prescription for the non-negativity of the solution. The same constraint is adopted to entail a positive value for β s . The selection of the loss function is one of the most sensitive steps in the PINN approach, given the multi-objective nature of the method. Using the Mean Squared Error (MSE) as loss measure, the objective is to minimize the mismatch on the N D data: (7) the squared norm of the residual of Eq (6) evaluated on N C collocation points : (8) and the misfit on the initial conditions: (9) where ω * are proper weights needed to balance the relative importance of the entries arising from each contribution to the global MSE value. Fig 2 shows a diagram of the PINN implementation for the solution of the scaled SIR model (6). PPT PowerPoint slide
PNG larger image
TIFF original image Download: Fig 2. Diagram of the PINN model for the SIR equations with unknown β(t). The parameters of the NNs for β, S, I are obtained by minimizing the loss functions on the infectious data, and on the residual and initial conditions of the model equations.
https://doi.org/10.1371/journal.pcbi.1012387.g002 We explore two possible approaches for the construction of the PINN model, indicated as joint or split. The joint approach aims to simultaneously calibrate , , and by minimizing the joint loss function corresponding to the sum of , , and : (10) By distinction, the split approach subdivides the overall problem. First, is independently calibrated on the data error (7) only. In this case, a standard NN is used with weight ω D = 1, thus obtaining a differentiable regression function for the data. The only-data regression is followed by a fully-physics-informed regression, where the parameters defining and are trained by minimizing: (11) It is important to underline that in standard data-driven NNs a regularization term is frequently added to the loss function to avoid overfitting on the data. The term related to the residual in the loss function (Eq 9 in our case) acts as a regularization in PINNs, therefore no additional regularization has been added (see [20] for more details).
2.3 Reduced SIR model The system of ODEs in (1) can be further reduced by directly considering the definition of the effective reproduction number . By easy developments, the model (1) becomes: (12) where the unknown functions are I(t) and S(t), and the state variable R(t) is simply obtained from the consistency relationship R(t) = N − S(t) − I(t). The initial conditions (2) still hold. The new system (12) can be solved sequentially by integrating the upper equation first and then computing S(t) from the second equation. This approach reduces the number of functions that are approximated by NNs to two, i.e., I and , and eliminates any redundant term in the loss function minimized in the PINN approach. The same scaling as in Eq (5) is used for the state variable I, so that the upper equation in (12) reads: (13) The NNs approximating the variables of interest, i.e., and , can be obtained by minimizing the mismatch on data (Eq (7)) and the squared norm of the residual of Eq (13) on N C collocation points: (14) Notice that in this case the contributions in and have a consistent size, hence there is no need for introducing the weight parameters ω * to balance the loss function terms. For this reason, we simply set ω D = 1 in the expression (7). The joint and split approaches can be formulated for this PINN-based model as well. The joint approach consists in training simultaneously the NNs and by minimizing the total loss function: (15) By distinction, the split approach implies training on the data only by the minimization of in Eq (7). Then, the time-dependent parameter is obtained by minimizing: (16) Notice that in the reduced PINN model no initial condition is set, but we let the model deduce it from the data. From a theoretical viewpoint, initial conditions are not necessary because is obtained from the data, while the governing differential Eq (13) is used to calibrate . This outcome is relevant because it replicates what typically happens in a real-case scenario, where there is no actual knowledge about the instant of beginning of the outbreak. In fact, the case 0 in most outbreaks is unknown and the conventional start of the epidemic has a number of infected individuals that is usually largely underestimated. The use of the reduced modeling approach makes it possible to remove the term related to the initial condition from the loss function.
2.4 SIR model with the hospitalization compartment The reported infections can be often affected by large uncertainties. Especially at the beginning of an epidemic outbreak, the disease cannot be easily recognized, either because of the difficulty of correctly identifying the symptoms, or the absence of well-established detection and surveillance procedures, or the impossibility of reaching and testing all the people infected by the disease. Moreover, these data can be strongly affected by territorial peculiarities and the logistic of testing facilities. Hence, founding an epidemiological model on these pieces of information can undermine its reliability. A much less uncertain epidemiological datum is the daily number of individuals that require to be hospitalized. This fraction of the overall number of infected individuals is representative of the entire I compartment by assuming that hospitalization is needed over a certain common threshold level of symptoms in the population. We introduce a new variable, H, defined as: (17) where σ represents the fraction of infected individuals moving to the hospitalized compartment. Note that also parameter σ might change in time, for example because of the insurgence of more aggressive variants or the improvement of home treatment. A more convenient formulation uses the cumulative number Σ H of hospitalized individuals: (18) The new formulation of the updated SIR model can be therefore stated as follows. Given , , and , find , , and such that: (19) with R(t) = N − I(t) − S(t), the initial conditions (2) and Σ H (t 0 ) = 0. The available information from the actual epidemiological data is the daily variation Δ H of the cumulative number of hospitalized individuals: (20) whose values represents the training dataset for the PINN approximation of system (19). As previously done, the functional search of the approximating NNs is carried out on the properly scaled quantities I(t) = CI s (t s ) (see Eq (5)) and: (21) with C H the scaling factor. The upper equation in system (19) with the scaled quantities reads: (22) with , while the second scaled equation is the same as in (13). Hence, the NNs needed to solve the SIR model with hospitalization data are , , , and . The training data points for the fitting are both the scaled reported infections and the hospitalizations at the scaled times , j = 1, …, N D . The NNs can be obtained by minimizing the mismatch (7) on the infection data and on the hospitalization data: (23) and the squared norm of the residuals of Eqs (13) and (22) on N C collocation points: (24) The joint approach consists in the simultaneous estimate of , , , and by finding the minimum to the functional: (25) As for the PINN solution to the reduced SIR model, it is not necessary to include the mismatch on the initial conditions into the global loss function (25) because they are met through the available training data. Moreover, also the use of non-unitary weights ω * for the different contributions to is not required since all terms are likely to have a similar magnitude. In the split approach, is directly trained with the hospitalization data only by minimizing in Eq (23). Then, is computed from (22) as: (26) and and are trained by minimizing: (27) In real-world scenarios, the new daily infections is a more common piece of information than the total number of infected individuals. In order to include these data in the PINN model, we introduce the cumulative number Σ I of infected individuals: (28) The variation of Σ I in time coincides with negative variation of the class of susceptible individuals S(t), so we can simply update the SIR model with hospitalization data (19) by replacing the last equation with: (29) Since the available information is the daily variation Δ I of the cumulative number of infected individuals: (30) we use these values as training data set. As usual, scaled values are considered such as Δ I = CΔ I,s and we assume that the set of scaled values is available at the training scaled times , j = 1, …, N D , instead of . The mismatch of , i.e., the NN approximating Δ I,s , with the data is measured by: (31) while the squared norm of the residual reads: (32) Hence, with the joint approach we aim at minimizing the functional: (33) By distinction, with the split approach we first train by the available data (see Eq (23)). Then, we use Eq (26) for and: (34) for , and minimize the functional: (35) This choice for the split approach is based on the fact that hospitalization data are usually more reliable than infected individuals, hence they are more appropriate for an only-data regression training.
2.5 Simulation setup The PINN-based approaches are here implemented by making use of the SciANN software library [33], a Keras and TensorFlow wrapper specifically developed for physics-informed deep learning. We analyze the performance of the PINN-based approaches to estimate the state variables and identify the governing parameters of an epidemiological model mimicking the setup of the first 90 days of a COVID-like disease outbreak in Italy. The total population is set to N = 56 × 106 and the mean infectious period to D = 5 days, which is an estimate used for COVID-19 [4]. The initial value of infectious individuals I 0 is set to 1. The accuracy of the trained NNs is evaluated by the 2-norm of the error with respect to the 2-norm of the reference solution: (36) where y can be either one of the state variables, or a time-dependent parameter. The relative error (36) is numerically computed by using 90 points equally spaced in the domain. We consider a number of scenarios, summarized in Table 1, differing for the reference SIR model and state variables of interest, the selection of the estimated governing parameters, and the available training data. The first five scenarios are used to validate the numerical model, while the last two consist of a real application to the Italian COVID-19 epidemic. PPT PowerPoint slide
PNG larger image
TIFF original image Download: Table 1. Scenarios adopted to analyze the proposed PINN-based approaches.
https://doi.org/10.1371/journal.pcbi.1012387.t001 For the estimation of the transmission rate β(t) in the basic SIR model (1) and (2), we consider three different scenarios: Case 1: constant β. We use this scenario to compare the efficiency of the joint and split approaches (10) and (11), respectively.
Case 2: synthetic time-dependent β(t), where the reference values are provided as an analytical function.
Case 3: the reference β(t) is obtained from the estimates of In each case, the training data are the number of infectious individuals per day. These are synthetically generated by numerically integrating the system (1) using the selected reference function for β(t). In particular, we used N D = 90 training data points (one value per day). To take into account possible reporting errors, the data for each time are obtained by sampling from a Poisson distribution having as mean . This kind of Poisson error is frequently assumed on data arising from a counting process. The infectious data and the epidemiological model are scaled by a factor C = 105 (see Eq (5)). The reduced SIR model (12) is then used to explore a more realistic scenario with strongly perturbed data on the infected individuals and accurate data on the number of hospitalizations. The joint and split approaches (Eqs (15) and (16)) are used to estimate the governing parameter in the following inverse problems: Case 4: synthetic time-dependent β(t) (as in Case 2), subject to a larger error noise on the infectious data.
Case 5: an adaptation of Case 4, considering also the hospitalization data and a time-dependent hospitalization fraction σ (to be estimated). In Case 5, the synthetic data of the daily hospitalizations, , are obtained by sampling from a Poisson distribution having as mean value the reference solution. The scaled values are obtained by setting C H = 103. Finally, we applied the PINN approaches to the infected and hospitalized data reported in Italy during the first months of the COVID-19 pandemic: Case 6: infers a time-dependent
Case 7: simultaneously infers In these scenarios we consider the epidemiological data provided by the Italian surveillance system [34] from February 21st, 2020 to May 20th, 2020. The period coincides with the advent of the disease and its initial spread. The vaccination campaign was not started yet and possible reinfections are negligible. The Italian dataset contains the number of new daily hospitalizations and reported infections, and , respectively, and supplies an estimate of the COVID-19 reproduction number based on [10]. The scaled values are obtained by setting C and C H equal to the maximum experimented values for Δ I and Δ H , respectively, in the 90 days taken into consideration. New infections are multiplied by a reporting ratio α r = 6, following the estimate from Italian Institute of Statistic based on the sierological data [35].
[END]
---
[1] Url:
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012387
Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.
via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/