(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .



How fast are viruses spreading in the wild? [1]

['Simon Dellicour', 'Spatial Epidemiology Lab', 'Spell', 'Université Libre De Bruxelles', 'Brussels', 'Department Of Microbiology', 'Immunology', 'Transplantation', 'Rega Institute', 'Ku Leuven']

Date: 2024-12

Genomic data collected from viral outbreaks can be exploited to reconstruct the dispersal history of viral lineages in a two-dimensional space using continuous phylogeographic inference. These spatially explicit reconstructions can subsequently be used to estimate dispersal metrics that can be informative of the dispersal dynamics and the capacity to spread among hosts. Heterogeneous sampling efforts of genomic sequences can however impact the accuracy of phylogeographic dispersal metrics. While the impact of spatial sampling bias on the outcomes of continuous phylogeographic inference has previously been explored, the impact of sampling intensity (i.e., sampling size) when aiming to characterise dispersal patterns through continuous phylogeographic reconstructions has not yet been thoroughly evaluated. In our study, we use simulations to evaluate the robustness of 3 dispersal metrics — a lineage dispersal velocity, a diffusion coefficient, and an isolation-by-distance (IBD) signal metric — to the sampling intensity. Our results reveal that both the diffusion coefficient and IBD signal metrics appear to be the most robust to the number of samples considered for the phylogeographic reconstruction. We then use these 2 dispersal metrics to compare the dispersal pattern and capacity of various viruses spreading in animal populations. Our comparative analysis reveals a broad range of IBD patterns and diffusion coefficients mostly reflecting the dispersal capacity of the main infected host species but also, in some cases, the likely signature of rapid and/or long-distance dispersal events driven by human-mediated movements through animal trade. Overall, our study provides key recommendations for the use of lineage dispersal metrics to consider in future studies and illustrates their application to compare the spread of viruses in various settings.

Funding: SD and PL acknowledge support from the European Union Horizon 2020 project MOOD (grant agreement n°874850). SD also acknowledges support from the Fonds National de la Recherche Scientifique (F.R.S.-FNRS, Belgium; grant n°F.4515.22), from the Research Foundation — Flanders (Fonds voor Wetenschappelijk Onderzoek — Vlaanderen, FWO, Belgium; grant n°G098321N), and from the European Union Horizon 2020 project LEAPS (grant agreement n°101094685). PR’s internship at the University of Montpellier was founded by the I-SITE MUSE through the Key Initiative “Data and Life Sciences”. MAS and PL acknowledge support from the European Union's Horizon 2020 research and innovation programme (grant agreement no. 725422-ReservoirDOCS), from the Wellcome Trust through project 206298/Z/17/Z, and from the National Institutes of Health grants R01 AI153044, R01 AI162611 and U19 AI135995. PL also acknowledges support from the Research Foundation — Flanders (Fonds voor Wetenschappelijk Onderzoek — Vlaanderen, FWO, Belgium; grants n°G0D5117N and G051322N). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Data Availability: R scripts related to the analyses based on simulated and real datasets are all available, along with the associated input/output files, at https://github.com/sdellicour/dispersal_capacities . Continuous phylogeographic simulations and dispersal statistics were respectively conducted and computed using the R package “seraphim” available at https://github.com/sdellicour/seraphim (see also the updated “seraphim” tutorial on the estimation of dispersal statistics available at: https://github.com/sdellicour/seraphim/blob/master/tutorials/ ).

At relatively large scales, dispersal dynamics of viruses primarily spreading within human populations are generally better captured by discrete diffusion processes involving the rapid exchange of viral lineages among remote locations — a typical pattern for remote locations mainly connected by the air traffic network (see e.g., [ 16 , 17 ]). This is for instance the case for seasonal human influenza virus H3N2, for which dispersal at the global scale has been demonstrated to be better predicted by international air travel than geographic distances [ 18 ]. While continuous phylogeographic analyses can also be performed to reconstruct the dispersal history of viruses spreading in human populations (see e.g., [ 19 – 21 ]), our study focuses on viruses circulating in animal populations to target dispersal processes that are relatively strongly dependent on geographic distance. Such dispersal patterns can in essence be modelled through a continuous diffusion process, such as the relaxed random walk (RRW) diffusion model implemented in the software package BEAST 1.10 [ 22 ], which is often used for continuous phylogeographic inference.

In the context of these heterogeneous sampling intensities, the goal of the present study is to assess the usefulness of dispersal metrics to evaluate and compare the dispersal dynamic of viral lineages. Specifically, we first aim to assess the robustness of 3 metrics — a lineage dispersal velocity, a diffusion coefficient, and an isolation-by-distance (IBD) signal metric — to the sampling intensity, i.e., the number of genomic samples considered in the phylogeographic analysis. In addition, we aim to exploit the metrics identified as robust to the sampling intensity (i.e., sampling size) to compare the dispersal capacity across a variety of viruses of public and one health importance (e.g., Lassa, rabies, avian influenza, or West Nile viruses).

In particular, continuous phylogeographic inference allows estimation of a spatially explicit reconstruction of the dispersal history of the lineages leading to the sampled genomic sequences. The reconstructions in two-dimensional space can subsequently be exploited to gain valuable information about the dispersal dynamics of circulating lineages [ 15 ], including their capacity to disperse in space and time. However, for a given outbreak, phylogeographic reconstructions are generally conducted on genomic sequences from a limited proportion of cases, with sampling intensities that can vary considerably among studies.

Unravelling the spatiotemporal dynamics of pathogenic spread constitutes a long-standing challenge in epidemiology. When designing and implementing intervention strategies to mitigate outbreaks or endemic circulation of pathogens, evaluating the speed at which they spread and circulate within host populations can be crucial. While geo-localised infectious cases can be used to model and quantify the wavefront progression of an outbreak during its expansion phase (see e.g., [ 1 , 2 ]), their analysis usually provides little information about the routes taken by the underlying transmission chains. Genomic sequencing of fast-evolving pathogens, however, offers the possibility to infer evolutionary relationships among sampled cases. Estimated through a phylogenetic tree, such inferred evolutionary links can for instance provide key insights into the dispersal history and dynamics of a transmission chain [ 3 ]. This can be achieved through phylogeographic inference, either performed in discrete [ 4 – 6 ] or continuous space [ 7 , 8 ], which basically consists in mapping time-scaled phylogenetic trees of fast-evolving organisms such as RNA viruses (see e.g., [ 9 , 10 ]) or, sometimes, DNA viruses (see e.g., [ 11 , 12 ]) or bacteria (see e.g., [ 13 , 14 ]).

Results and discussion

We here assess 3 distinct dispersal metrics: the weighted lineage dispersal velocity (WLDV) [23], the weighted diffusion coefficient (WDC) [24], and an IBD signal metric that we here propose to estimate as the Pearson correlation coefficient between the patristic and log-transformed great-circle geographic distances computed for each pair of tip nodes (which is similar to the metric estimated by Pekar and colleagues [25]). The great-circle distance is the geographic distance between points on the surface of the Earth and the patristic distance is the sum of the branch lengths that link 2 tip nodes in a phylogenetic tree. While the WLDV aims at measuring the distance covered per unit of time across phylogenetic history, the WDC measures the “diffusivity” [24], i.e., the velocity at which inferred lineages invade a two-dimensional space: where d i and t i are, respectively, the great-circle distance and the time elapsed on the i-th branch of the phylogeny. The concept of IBD was first introduced by Sewall Wright [26] to describe the accumulation of local genetic differences under geographically restricted dispersal [27,28]. IBD analyses generally consist in determining the strength and significance of the relationship between a genetic and geographic distance between individuals [29]. The IBD signal metric considered here aims to measure to what extent phylogenetic branches are spatially structured or the tendency of phylogenetically closely related tip nodes to be sampled from geographically close locations. Note that this third metric is therefore not based on a continuous phylogeographic reconstruction but directly on a time-scaled phylogeny and sampling locations.

To evaluate the robustness of these 3 dispersal metrics to the sampling intensity (i.e., sampling size), we have conducted phylogeographic simulations either based on a Brownian random walk (BRW) or a RRW diffusion process. For this purpose, we have first implemented 2 distinct forward-in-time simulators, both jointly simulating time-scaled phylogenies and the dispersal history of their branches on an underlying geo-referenced grid. At each time step of the BRW simulation, both the longitudinal and latitudinal displacements of evolving lineages are randomly drawn from a Gaussian distribution, while in the RRW simulations such longitudinal and latitudinal displacements are randomly drawn from a Cauchy distribution before randomly rotating the resulting displacement around its origin (see the Materials and methods section for further detail). An example of a geo-referenced phylogeny simulated through a Brownian diffusion process is displayed in Fig 1 (see Figure A in S1 Text for the example of a RRW simulation). To investigate the potential impact of the phylogenetic tree inference on the different dispersal statistic estimates, we have also conducted alternative phylogeographic simulations based on tree topologies simulated under a coalescent model considering a population with a constant size (Figure B in S1 Text). The aim of all these simulations is to obtain continuous phylogeographic reconstructions that can subsequently be subsampled to artificially generate data sets that would have been obtained through various levels of sampling intensity.

PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 1. Example of a continuous phylogeographic simulation based on a BRW diffusion process. Both graphs display the phylogenetic tree sampled during a unique simulation, with its time-scaled visualisation in the left panel and its mapped visualisation in the right panel. Tree nodes are coloured according to time, with internal and tip nodes coloured according to their time of occurrence and collection time, respectively. The data underlying this figure can be found in https://doi.org/10.5281/zenodo.13984927. BRW, Brownian random walk. https://doi.org/10.1371/journal.pbio.3002914.g001

For each diffusion model considered, we have simulated 50 continuous phylogeographic processes of more than 500 tips and then subsampled the resulting locations to only keep 500, 450, 400, 350, 300, 250, 200, 150, 100, and 50 tip locations (see the Materials and methods section for further detail). We subsequently computed the 3 dispersal metrics on all the resulting data sets to explore the impact of the sampling size on their estimates (Fig 2). Although associated with slightly more variability when estimated on the smaller data sets, both the diffusion coefficient and IBD signal metrics appear to be robust to the sampling intensity (Fig 2). This is however not the case for the lineage dispersal velocity metric for which estimates drastically decrease with the number of tips (Fig 2). While Fig 2 reports the results obtained with the continuous phylogeographic simulations based on a Brownian diffusion process, we reach the same conclusions based on the results obtained by the simulations following an RRW diffusion process (Figure C in S1 Text) or when tree topologies are simulated under a coalescent instead of a birth-death model (Figure D in S1 Text).

PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 2. Robustness of lineage dispersal metrics to the sampling intensity. We here report 3 dispersal statistics estimated on 50 geo-referenced phylogenetic trees simulated under a Brownian diffusion process: the WLDV (km/year), the WDC (km2/year), and the IBD signal has been estimated by the Pearson correlation coefficient (r P ) between the patristic and log-transformed great-circle geographic distances computed for each pair of tip nodes. Each specific tree is represented by a specific grey curve obtained when re-estimating the dispersal metric on subsampled versions of the tree, i.e., subsampled trees obtained when only randomly keeping 500, 450, 400, 350, 300, 250, 200, 150, 100, and 50 tip nodes; and the red curve indicate the median value across all simulated trees. The data underlying this figure can be found in https://doi.org/10.5281/zenodo.13984927. IBD, isolation-by-distance; WDC, weighted diffusion coefficient; WLDV, weighted lineage dispersal velocity. https://doi.org/10.1371/journal.pbio.3002914.g002

In addition to the 3 main metrics investigated above, we have also used our simulations to assess the robustness of alternative metrics, such as different ways to define the correlation coefficient measuring the IBD signal (Figure E in S1 Text) or unweighted versions of the lineage dispersal velocity and diffusion coefficient (Figure F in S1 Text). Overall, we reach the same conclusion: while the lineage dispersal velocity estimates are impacted by the sampling intensity, this does not seem to be the case for the diffusion coefficient estimates and all investigated IBD signal metrics.

From a mechanistic perspective, the lack of robustness of the lineage dispersal velocity metric is expected from a Brownian motion process, which has paths with infinite variation on any interval (see e.g., Corollary 2.17 in [30]). This implies that, the more densely sampled the trajectory is, the larger the WLDV becomes, making the estimation of an average speed sampling inconsistent. In contrast, the Brownian motion has finite quadratic variation (see e.g., Proposition 2.16 in [30]), so that the WDC, which uses quadratic distances instead of linear ones, should not be sensitive to the number of samples. To illustrate these theoretical expectations, we consider the case of a strict one-dimensional Brownian diffusion process of position X along a time interval equal to 1, with N being the total number of regular time intervals dt = 1/N, d i the distance travelled during time interval i, X i the position at the end of interval i, and X pa(i) the position at the end of the previous time interval. Under a strict Brownian process, we have that X i −X pa(i) is a Gaussian random variable, so that the distance covered on one interval is distributed as a half normal variable: which allows to determine the expected value for the WLDV as follows:

This indicates that lineage dispersal velocity depends, on average, on the number of time intervals N and thus the duration of these time intervals: the shorter these intervals, the larger the estimate. Because the phylogenetic branch lengths (durations) will on average decrease with the number of tip nodes, this implies, by extension, that the lineage dispersal velocity depends on the number of tip nodes included in the tree, as also illustrated by our simulations (Fig 2). Further, the quadratic distance is in this special case a scaled χ2 random variable with one degree of freedom, so that:

Therefore, WDC is proportional to the squared net lineage displacement per unit of time (σ2) and does not depend on N, so that it can be used to compare the dispersal capacities of different pathogens. Note that this relationship between the WDC and σ2 is only valid in this toy model of a one-dimensional Brownian diffusion where the process is measured at all time steps dt. In a phylogenetic context, the geographic position of internal nodes of the tree is unknown, and the WDC is typically computed with inferred ancestral positions, so that the relationship between the WDC and σ2 is not that simple in general and depends on the ancestral reconstruction method being used. The WDC metric should hence not be used as a replacement for classical estimators of the diffusion coefficient of a Brownian model, either based on the (restricted) maximum likelihood [31] or, in the context of this study, Bayesian inference [7].

In addition to the simulations conducted with a birth-death or coalescent approach, we have also conducted continuous phylogeographic simulations by simulating RRW diffusion processes along the branches of an empirical phylogenetic tree, i.e. the maximum clade credibility (MCC) tree previously obtained from the phylogeographic analysis of the West Nile virus in North America [32] (Figure G in S1 Text; see also the Materials and methods section for further detail). The specific aim of these additional simulations is to assess the robustness of the dispersal metrics to the sampling intensity, as well as to a scenario of sampling bias, but this time also considering the uncertainty associated with phylogenetic inference. As illustrated in Figure H in S1 Text, our results indicate that when explicitly considering the uncertainty associated with the inference of the time-scaled tree topologies, the impact of various scenarios of sampling intensity on the dispersal metrics is less evident. Indeed, the posterior distribution of the dispersal metric estimates appears to be often quite large, sometimes preventing a clear distinction of any impact of sampling intensity on these estimates. Furthermore, in order to take into account the uncertainty related to phylogenetic inference, this additional simulation framework required performing a continuous phylogeographic inference based on each subset of geo-referenced tips sampled from the original simulation for generating various scenarios of sampling intensity (see the Materials and methods section for further detail). Because of this additional inference step, it seems that the random selection of sequences has often more impact on the dispersal metric estimates than the number of samples that were subsampled, especially when the sampling size decreases (Figure H in S1 Text). The later result thus highlights the stochastic impacts of the sampling itself, independently of the impact of the sampling size. Finally, we also note that use of an RRW instead of a BRW diffusion model to conduct these continuous phylogeographic simulations could also contribute to the relatively large variability observed among simulations; a phenomenon that was already observed when comparing the results obtained when considering either a BRW or an RRW model to conduct the simulations under a birth-death process (Fig 1 and Figure C in S1 Text).

As introduced above, we have also used these simulations of RRW diffusion processes along empirical tree branches to estimate the dispersal metrics from simulations based on a scenario of sampling bias where only a restricted area of the spread has been sampled (see the Materials and methods section for further detail). Such a sampling bias scenario has previously been shown to strongly affect continuous phylogeographic reconstructions [33]. Again, our results based on this simulation scenario and explicitly considering phylogenetic uncertainty do not indicate a clear impact of sampling bias on the estimates (Figure H in S1 Text). So with this simulation framework, we do not manage to distinguish a clear difference between the dispersal metrics estimated for the scenarios of various sampling intensity and sampling bias. The impact of such spatial sampling bias on the outcomes of continuous phylogeographic inference has been more thoroughly investigated in previous studies [8,33]. For example, Kalkaukas and colleagues have explored the impact of 3 sampling bias scenarios on the estimation of the squared net lineage displacement per unit of time (σ2, referred to as the “diffusion parameter” in their work [33]), which is directly related to a metric like WDC investigated here (see above). Their analyses reveal that all considered scenarios of sampling bias — a central sampling bias, a diagonal sampling bias, and a one-side sampling bias scenario similar to the one explored in this study — lead to an underestimation of the diffusion parameter. Further confirmed by additional investigations of sampling bias scenarios by Guindon and De Maio [8], these results thus indicate that sampling bias can lead to an underestimation of the dispersal capacity as quantified by a metric like WDC.

In the second part of our study, we use the 2 dispersal metrics identified above as the most robust to the sampling intensity — the WDC and IBD signal — to quantify and compare the dispersal capacity of viruses spreading in animal populations. We here target viruses of public and one health importance and for which comprehensive genomic data sets have at some point been exploited to conduct continuous phylogeographic reconstructions that we have managed to retrieve. The selected data sets encompass viruses associated with a broad range of host species (Table A in S1 Text): moles (Nova virus), rodents (Puumala, Lassa, and Tula viruses), deer (Powassan virus), poultry and waterfowl (avian influenza virus (AIV)), domestic pigs (Getah virus and porcine deltacoronavirus), cattle (lumpy skin disease virus), (migratory) bird species (West Nile virus), as well as raccoons, skunks, bats, and dogs (rabies virus). By definition, the transmission cycles of the 3 considered arboviruses involve transmission events through biting by infected arthropods such as ticks of the Ixodes and Dermacentor genera for the Powassan virus, as well as mosquito species of the Culex genus for the West Nile virus and also Aedes genus in the case of the Getah virus.

Our results highlight substantial differences among the data sets, both in terms of diffusion coefficient and IBD signal (Fig 3). By far, the 2 virus data sets associated with the lowest diffusion coefficient are the Nova virus and Puumala virus genomic sequences collected in Belgian mole (Talpa europaea) and bank vole (Myodes glareolus) populations, respectively [49,50]. The very limited diffusion coefficient estimated for these 2 hantaviruses is coherent with the limited dispersal capacity of their host species. The Lassa virus data set [48] is next in the ranking with a WDC estimated at approximately 40 km2/year (Fig 3 and Table A in S1 Text). Lassa virus is an arenavirus that also circulates in a rodent species: the Natal multimammate mouse (Mastomys natalensis). Interestingly, these 3 first data sets all reveal a relatively high IBD signal reflecting an important phylogeographic structure (Fig 3 and Table A in S1 Text).

PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 3. Comparison of dispersal metrics estimated for different genomic data sets of viruses spreading in animal populations. Specifically, we here report posterior estimates obtained for 2 metrics estimated from trees sampled from the posterior distribution of a Bayesian continuous phylogeographic inference: the WDC and the IBD signal estimated by the Pearson correlation coefficient (r P ) between the patristic and log-transformed great-circle geographic distances computed for each pair of virus samples. We report the posterior distribution of both metrics estimated through continuous phylogeographic inference for the following data sets: West Nile virus in North America [32], Lumpy skin disease virus [11], Porcine deltacoronavirus in China [34], Getah virus in China [35], AIV in the Mekong region [36] and H3N1 in Belgium [37], rabies virus (dogs) in Iran [38], rabies virus (bats) in Peru [39], rabies virus (dogs) in northern Africa [40,41], rabies virus (bats) in Argentina [42], rabies virus (skunks) in the USA [43], rabies virus (raccoons) in the USA [44], Tula virus in central Europe [45], rabies virus (bats) in eastern Brazil [46], Powassan virus in the USA [47], Lassa virus in Africa [48], Puumala virus in Belgium [49], and Nova virus in Belgium [50]. (*) Estimates based on the analysis of the wild-type strains (see [11] for further detail); (**) estimates based on the combined analysis of lineages L1 and L3. See also Table A in S1 Text for the related 95% HPD intervals and number of samples associated with each data set. The data underlying this figure can be found in https://doi.org/10.5281/zenodo.13984927. AIV, avian influenza virus; HPD, highest posterior density; IBD, isolation-by-distance; WDC, weighted diffusion coefficient. https://doi.org/10.1371/journal.pbio.3002914.g003

A large fraction of genomic data sets result in diffusion coefficient estimates between 100 and 2,000 km2/year (Table A in S1 Text). They range from the Powassan virus data set [47] from the United States (WDC = ~130 km2/year) to the avian influenza virus (AIV) H3N1 data set [37] corresponding to a specific Belgian outbreak (~1,800 km2/year). In between these 2 diffusion coefficient estimates, we find the diffusion coefficients estimated from a series rabies virus data sets (Fig 3 and Table A in S1 Text): ~250 km2/year for the bat rabies virus data set from eastern Brazil [46], almost ~600 km2/year for the raccoon [44] and skunk [43] rabies virus data sets from the USA, ~700 km2/year for the bat rabies virus data set in Argentina [42], ~1,100 km2/year for the dog rabies data sets, respectively, from the Yunnan province (China) and northern Africa [41], ~1,400 km2/year for the bat rabies dataset from Peru [39], and ~1,600 km2/year for the dog/wild carnivore rabies data set from Iran [38]. Interestingly, although the diffusion coefficient estimated for the bat rabies data set from Peru is comparatively high, rabies viruses circulating in bats do not seem to be associated with a notably higher diffusion coefficient compared to other non-flying host species. Furthermore, it has previously been hypothesised that the relatively higher diffusion coefficient estimated for the dog rabies virus data sets compared to other rabies virus datasets could be the result of occasional rapid (long-distance) dispersal events driven by human-mediated movements [41,51]. Of note, we observe an important heterogeneity among the IBD signals estimated for the different rabies virus data sets, the IBD signal metric being centred on zero for the dog rabies data set from Yunnan or close to zero for the raccoon one from the USA, and higher than ~0.65 for the dog rabies data set from northern Africa and the bat rabies data set from Peru.

We subsequently identify 3 data sets with diffusion coefficient estimates ranging from ~20,000 to ~26,000 km2/year including an AIV H5N1 data set from the Mekong region [36] and 2 genomic data sets of viruses infecting domestic pig populations in China [34,35]: the Getah virus, a re-emerging arbovirus China, and the porcine deltacoronavirus. Similarly to the hypothesis formulated for the dog rabies data sets such as the one from northern Africa, the relatively high diffusion coefficient for those 2 pig virus data sets may potentially be explained by long-distance dispersal through animal trade. It is also interesting to note the very limited or absent IBD signal estimated for the deltacoronavirus and Getah virus data sets, respectively, which is consistent with the notable spatial inter-mixing of lineage dispersal events that would be induced by animal trade across the Chinese territory [34,35].

Finally, with posterior median estimates >50,000 km2/year, the North American West Nile virus [32] and lumpy skin disease virus [11] data sets yield the highest diffusion coefficient estimates. Upon first detection in 1999 in New York City, the West Nile virus spread through the North American continent, carried by numerous (migratory) birds and reaching the West Coast of the United States within a period of only 3 years [52]. Lumpy skin disease virus was first detected in 1929 in northern Rhodesia (now Zambia) and initially circulated within the African continent. During the last decades, it has however spread eastward and to southeastern Europe [53]. Transmitted by a variety of arthropod vectors (ticks, mosquitoes, and biting flies), the main hosts of this virus are cattle and buffalos while the role of other wild animals in its transmission and spread remains uncertain [53]. The continuous phylogeographic reconstruction of the lumpy skin disease virus dissemination considered here is a global one involving long-distance lineage dispersal events among continents. Also in this case, the high diffusion coefficient estimated for this data set is potentially related to the transport of infected cattle to naive regions [11]. It is also worth noting that the IBD signal estimated for the West Nile virus and lumpy skin disease virus data sets are drastically different, being centred on zero for the former and approximately 0.8 for the latter. This result really reflects the phylogeographic patterns reconstructed for both outbreaks, the West Nile virus one displaying an important amount of longitudinal, and to some extent latitudinal, back and forth lineage dispersal events during the endemic phase of the North American outbreak, and the lumpy skin disease virus spread mainly consisting in long-distance dispersal events establishing more local transmission chains.

Furthermore, the relatively large diffusion coefficient estimated for the West Nile virus data set appears to be partially due to the expansion phase of this outbreak: when computing a distinct diffusion coefficient for the expansion and endemic phases — here approximated by considering phylogenetic branches occurring before or after 2004 [32], respectively — we obtain an estimate that is between 2 and 3 times higher for the expansion phase (Table A in S1 Text), illustrating the impact of the distinction of those epidemic phases on the metric. In the case of the West Nile virus spread across the North American continent, the invasion was in part driven by long-distance dispersal events, which can transform the spread of a pathogen from a wave-like to a fast “metastatic” growth pattern [54]. Overall, the comparison of the estimates based on the West Nile virus data set highlights that a metric like the diffusion coefficient can notably be impacted by the epidemic phase of the considered outbreak, an expansion phase being more likely to be associated with higher estimates than an endemic phase.

Finally, while we have previously established that the 2 dispersal metrics estimated on those empirical data sets appear to be robust to the sampling intensity, they remain impacted by sampling bias like any outcome of a phylogeographic inference. As reported in Figure I in S1 Text, we see that the sampling maps of the different empirical data sets display an important variability in terms of sampling patterns, with a varying degree of sampling site clustering. The associated sampling bias is however difficult to evaluate in practice given that the actual spatial distribution of individuals/infectious cases is frequently unknown, especially when studying pathogens spreading in wild animal populations. We can nevertheless clearly speculate that infected areas were likely under or even unsampled for a good number of those data sets. As established in previous works [8,33], we cannot exclude that such sampling bias could lead to a significant underestimation of the dispersal capacity as measured by the WDC metric.

[END]
---
[1] Url: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3002914

Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/