(C) PLOS One

(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .

Generalism drives abundance: A computational causal discovery approach [1]

['Chuliang Song', 'Department Of Biology', 'Quebec Centre For Biodiversity Science', 'Mcgill University', 'Montreal', 'Department Of Ecology', 'Evolutionary Biology', 'University Of Toronto', 'Toronto', 'Benno I. Simmons']

Date: 2022-10

Abstract A ubiquitous pattern in ecological systems is that more abundant species tend to be more generalist; that is, they interact with more species or can occur in wider range of habitats. However, there is no consensus on whether generalism drives abundance (a selection process) or abundance drives generalism (a drift process). As it is difficult to conduct direct experiments to solve this chicken-and-egg dilemma, previous studies have used a causal discovery method based on formal logic and have found that abundance drives generalism. Here, we refine this method by correcting its bias regarding skewed distributions, and employ two other independent causal discovery methods based on nonparametric regression and on information theory, respectively. Contrary to previous work, all three independent methods strongly indicate that generalism drives abundance when applied to datasets on plant-hummingbird communities and reef fishes. Furthermore, we find that selection processes are more important than drift processes in structuring multispecies systems when the environment is variable. Our results showcase the power of the computational causal discovery approach to aid ecological research.

Author summary Ever since Aristotle, the chicken-or-egg causality dilemma has baffled researchers. Such causality dilemmas are abundant in ecological research, where causal directions are often assumed but not tested. An archetypal example is whether being a generalist causes a species to be more abundant, or whether being more abundant causes a species to be generalists. Without doubt, the gold standard to establish causal directions is controlled experiments. However, controlled experiments that can disentangle the direction of causality in this case are challenging because it involves controlling biotic or abiotic niche breadth. These challenges create an opportunity for computational tools to detect the most likely causal direction. Here, by adapting a set of recently developed computational methods, we provide strong evidence that generalism drives abundance, overturning the previously established direction. We hope our work raises awareness of the potential for computational discovery methods to address long-standing questions in ecology, especially increasingly large datasets become available.

Citation: Song C, Simmons BI, Fortin M-J, Gonzalez A (2022) Generalism drives abundance: A computational causal discovery approach. PLoS Comput Biol 18(9): e1010302. https://doi.org/10.1371/journal.pcbi.1010302 Editor: Mercedes Pascual, University of Chicago, UNITED STATES Received: February 1, 2022; Accepted: June 14, 2022; Published: September 29, 2022 Copyright: © 2022 Song et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: All the datasets analyzed in this study are publicly available. The dataset of plant-hummingbird communities is available from Dryad Digital Repository dx.doi.org/10.5061/dryad.c270ft8. The dataset of coral reef fishes is available from the Reef Life Survey website dx.doi.org/10.1038/s41559-020-01342-7. The source code to produce the results is available on GitHub at https://github.com/clsong/ReproduceChickenEgg. Funding: B.I.S. was supported by a Royal Commission for the Exhibition of 1851 Research Fellowship. M.-J.F. acknowledges the funding of the CRC in Spatial Ecology. A.G. is supported by the Liber Ero Chair in Biodiversity Conservation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist.

Introduction Identifying the causes of species abundance is a central question in ecology with direct implications for conservation management [1–3]. A ubiquitous ecological pattern in ecological communities is the skewed distribution of abundance with a few abundant species accompanied by many less abundant and rare species [4]. What causes this species abundance distribution is one of the most studied questions in ecological research. Theoretical studies have provided a diverse array of explanations for the emergence of uneven species abundance distributions, such as neutral theory [5, 6], niche partitioning [7–10], emergent neutrality [11, 12], and even statistical artifacts [13–15]. However, while these well-established theoretical explanations operate under different mechanisms, they generate similar and often empirically indistinguishable patterns (e.g., log-normal distributions). Thus, by considering only species abundance distributions, it is difficult to discern the main ecological drivers of this ecological pattern [16–18]. Here, we examine the role of species generalism as a predictor of abundance. Generalism is the (biotic or abiotic) niche breadth of a species [19, 20]; this is an archetypal feature of a species that strongly correlates with its abundance [21]. Specifically, we focus on biotic niche breadth and we will refer to the number of interacting partners in an interaction network as a measure of breadth. Despite the strong correlation, identifying the causal direction between abundance and generalism is not a trivial problem. Indeed, we have a chicken-and-egg dilemma: both causal directions make intuitive sense, and it is difficult to discern a priori which direction is correct. Specifically, whether being a generalist causes a species to be more abundant, or whether being more abundant causes a species to be more generalized [3, 21–25]. These two causal directions present fundamentally different views on multispecies dynamics in a local community (Fig 1). In one causal direction—via selection processes such as resource limitation (niche-based)—generalist species are competitively advantaged by having access to a wider range of resources, which causes higher abundance. In the other causal direction—via drift processes (neutrality-based)—abundant species are more likely to occupy more biotic niche space simply by coming into contact with more interaction partners than rare species, resulting in greater generalism. To further add to the complexity, the causal direction between abundance and generalism may not be unidirectional because it is unlikely that only selection or drift processes are occurring [26]. However, we do not know whether and when selection or drift processes predominantly structure multispecies dynamics in a local community [3, 26]. Thus, understanding the relative causal direction between abundance and generalism would increase our understanding of the roles of selection and drift in the structuring of ecological communities. PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 1. A chicken-and-egg dilemma of generalism and abundance. Empirical evidence shows that abundant species are also generalists. However, the causal direction is debated. If the community is mainly structured by selection processes, then species are more abundant because generalists have a competitive advantage. In contrast, if the community is mainly structured by drift processes, then species are more generalized because abundant populations have a higher chance of encountering more partners. The clip-arts of flowers and hummingbirds are made with DALL·E. https://doi.org/10.1371/journal.pcbi.1010302.g001 We employ a computational approach—without assuming a mechanistic model of how species abundances are generated—to directly identify the causal explanations for species abundance and other ecological patterns, such as species generalism. The computational problem of identifying the causal direction is known as causal discovery or structural identification in the field of statistics [27]. Note that causal discovery is different from causal inference, where causal discovery aims to find the causal direction while causal inference aims to find the causal strength given the preassigned causal directions. While causal inference has become a popular tool in quantitative ecology [28–31], causal discovery remains rarely used in ecology. This is partly because causal discovery is a notoriously difficult problem and has only taken off in the past decade [32–34]. Without any assumption, it is mathematically impossible to correctly distinguish causes and effects [35, 36]. To address this fundamental constraint, researchers in the field of causal discovery have recently developed a set of computational methods that can operate under minimal assumptions of the causal forms (Chapter 4 of [27]). In particular, two methods have a firm theoretical foundation and are widely applicable. One method is the nonlinear additive noise model based on nonparametric regression [27, 37], and the other method is the geometric-information inference based on information theory [38, 39]. Both methods take advantage of “asymmetry” resulting from the one-way causal direction: the nonlinear additive noise model focuses on the asymmetry of noise, while the geometric-information inference focuses on the asymmetry of information. In parallel, in the field of community ecology, a new method with a different theoretical foundation has been proposed to identify the causal direction between generalism and abundance [24]. A key assumption of Fort et al.’s method is the need to classify continuous data into binary categories (e.g. classifying species abundance data into either abundant or rare). However, the method can be sensitive to how the data are binarized, especially given the log-normal nature of species abundance distributions. We take a computational causal discovery approach to the chicken-and-egg dilemma between generalism and abundance, and we apply three methods with independent theoretical foundations: (i) our refinement on Fort et al.’s method based on formal logic [24], (ii) the nonlinear additive noise model based on nonparametric regression [27, 37], and (iii) the geometric-information inference based on information theory [38, 39]. Our computational approach not only allows us to detect the causal directions, but also to use the relative strength of the causal directions as a proxy of the relative roles of either selection or drift processes. We evaluate the sensitivity of these three methods to plant–hummingbird data across the Americas [25, 40] and reef fishes data from the Reef Life Survey program [41, 42]. All three methods consistently found strong evidence that generalism drives abundance in these plant-hummingbird communities and reef fish datasets. In addition, we found strong evidence that selection processes act more strongly than drift processes when local temperatures are more variable.

Discussion We have studied whether abundance drives generalism or the other way around via a computational causal discovery approach. We have used three independent methods of causal discovery: a refined method of Fort et al.’s based on formal logic [24], the nonlinear additive noise model, and the geometric-information inference. We have found strong evidence that generalism causes species abundance in both datasets of plant-hummingbird communities and reef fishes. We have also found that the causal evidence for generalism as a driver of observed patterns is stronger when the community is exposed to greater environmental variation. Our results shed light on the big question of selection versus drift processes in structuring multispecies dynamics [26]. Since Hubbell’s groundbreaking work [55], this question has taken a central place in community ecology. In two-species communities, a fruitful research line has increased our understanding of this problem by rigorously linking experiments and theory [70–72]. Yet, we lack a full understanding of this question in multispecies communities, because it is challenging to carry out experiments that control species interactions in large communities [3]. Our computational causal discovery approach provides an alternative, practical path to tackle this problem in multispecies communities. This approach takes advantage of the fact that causal directions are different between abundance and generalism when the community is structured by selection or drift processes. Thus, based on our causal discovery methods, we found strong evidence that selection processes have a critical role in maintaining species persistence in mutualistic systems. Importantly, while previous works have addressed this question [24, 25], we reverse the previously established conclusion that abundance is the cause by fixing a methodological issue. Causal discovery is not a novel topic, but a computational approach to causal discovery has only taken off in the last decade [27]. Without doubt, the gold standard in casual discovery is controlled experiments [73, 74]. However, controlled experiments can be difficult or even impossible to conduct in many contexts. These contexts create an opportunity for computational tools to detect the correct causal directions. In ecology, the most adopted tool of causal discovery is convergent cross mapping [75]. While this method has been widely applied to many ecological questions [76–80], this method only works for time series data. The field of causal discovery has recently developed a line of methods that work with cross-sectional data, such as the additive noise model and the method of information-geometric inference. These methods have already been applied to many disciplines, including genetics [81], earth system science [82], and kinetic systems [83]. Yet, to the best of our knowledge, these methods have not been used in ecology. We have demonstrated that these methods are useful in ecological contexts. Importantly, these methods are flexible and can be easily adapted to different datasets. As a proof of its flexibility, we have applied the same methods to analyze the dataset of plant-hummingbird communities and the dataset of reef fishes. We acknowledge that the causal direction between abundance and generalism is not unidirectional and feedback may occur [3]. As the causal direction represents either selection or drift processes, it is unlikely that only one process is in play. As with many debates in ecology, the reality might lie somewhere along the spectrum of the dichotomy. As the strengths of the causal directions indicate the relative roles of the selection and drift process, our results suggest that selection processes are stronger when the network structure of the community has a higher level of nestedness. Our results are consistent with the literature on this topic. At the species level, empirical evidence in reef fishes [42, 84] and corals [85] shows generalism is favored under variable environments. At the community level, empirical evidence on plant-pollinator communities shows that networks are more nested when located in more variable environments [47, 86, 87]. A limitation of our work is that we did not consider potential confounding factors. Our approach can be heuristically justified by the central limit theorem and our removal of the skewness in marginal distributions (i.e., the ordered quantile normalization [59] in the refined method of Fort et al. [24] or post-linear regression [66] in the additive noise model). To explain the emergence of skewed distributions [13], a simple but general argument is based on the central limit theorem: as the mean of the summation of many independent or weakly dependent processes result in a normal distribution, the mean of the product of these processes results in a log-normal distribution. In this sense, removing the skewness of the distribution transforms the statistical nature of the distribution from the product of multiple processes into the summation of multiple processes (on an appropriate scale). Thus, by removing the skewness of the marginal distribution via the log transformation or the more general semiparametric transformation [59], we can identify the causal direction with more confidence. Of course, this explanation is far from being rigorous, as we may still suffer from the bias of unobserved confounding factors [88], the omitted-variable bias [89], and that the central limit theorem requires asymptotic aggregations (although there is empirical evidence that small ecological systems can exhibit asymptotic behaviors [90]). A more satisfactory solution is to adopt rigorous methods capable of detecting nonlinear causal directions with the presence of hidden confounding factors. For example, using instrumental variables [88] or the DeCAMFounder method [91]. These methods generally require a sufficiently large amount of data to ensure statistical convergence, which is unfortunately beyond the reach of the datasets we used. Future research with larger datasets should explore these methods to control for hidden confounding factors. From a broader perspective, we showcase the power of the computational causal discovery approach in ecological research. In the realm where experiments and theory are difficult to apply to estimate the direction of causality, these computational methods show great promise. Meanwhile, the computational identification of causal association can help refine theoretical assumptions and experimental designs for multispecies communities. The association between abundance and generalism we studied here is by no means an exceptional pattern in community ecology. For example, species abundances are also strongly associated with geographic distributions [92]. The flexibility of these computational methods may be similarly applied to study these patterns in the context where experiments have already been conducted [93]. We hope our work can raise the awareness of this causal discovery approach in the era where much ecological data is becoming available [94, 95].

Supporting information S1 Data. Detailed methods and additional validations, and supplementary figures and tables. https://doi.org/10.1371/journal.pcbi.1010302.s001 (PDF)

Acknowledgments We thank Haoran Cai, Lucas P. Medeiros and Serguei Saavedra for insightful discussions.

[END]
---
[1] Url: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010302

Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/