(C) PLOS One [1]. This unaltered content originally appeared in journals.plosone.org.
Licensed under Creative Commons Attribution (CC BY) license.
url:
https://journals.plos.org/plosone/s/licenses-and-copyright
------------
A cloud-based toolbox for the versatile environmental annotation of biodiversity data
['Richard Li', 'Department Of Ecology', 'Evolutionary Biology', 'Yale University', 'New Haven', 'Connecticut', 'United States Of America', 'Center For Biodiversity', 'Global Change', 'Ajay Ranipeta']
Date: 2021-12
Spatiotemporal biodiversity data are accumulating rapidly, offering an ever larger and more diverse foundation for research. Citizen science observations are driving much of this increase [1,2], complemented in recent years by GPS tracking data and camera trapping data [3,4]. Simultaneously, there have been substantial increases in the quantity, spectral, and spatiotemporal resolution and availability of ecologically relevant environmental data, thanks to increasing Earth-orbiting satellite-based sensors and powerful models of global environmental conditions [5–7].
The fusion of these 2 data types, that is, the characterization of environmental conditions at, or surrounding, spatiotemporal points or polygons representing biodiversity data, is commonly called “environmental annotation” of biodiversity records (e.g., [8]). Annotations may range from simple space-time intersections between biodiversity and environmental data, to more complex zonal calculations incorporating values from neighboring space-time locations, analogous to feature extraction in purely spatial applications. Environmental annotation of biodiversity records over larger scales has already supported a broad range of applications and insights. These include studies of species distribution [9,10], animal movement [8,11,12], phenology [13,14], habitat use [15,16], landscape ecology [17], disease ecology [18], climate change [19], and others. The continued growth in the volume and variety of biodiversity and environmental data is poised to further increase the scope of research applications.
Matching the respective scales of observations, uncertainty, and process
Despite the large volume and breadth of research relying in some form on the combination of biodiversity and environmental data, best approaches for proper integration have yet to be well specified. Notably, both spatiotemporal biodiversity and environmental data vary widely in their associated spatial and temporal grains and uncertainties [20]. Biodiversity records may refer to split-second observations of single organisms or larger areas sampled over months. They may be measured with extremely high accuracy, as for GPS-based tracking or camera trap records [3,21], or be less precise in their time stamps or locations as is often the case for citizen science or older museum specimen data. Environmental data layers similarly range from precisely captured 30-m pixels overflights to monthly 1-km characterizations. Beyond these discrepancies in the spatiotemporal attributes of data, there are well-recognized temporal and spatial scale dependencies in ecological processes [22,23]. Connecting environmental and biodiversity observation data in a way that does not address the respective spatial or temporal scales (i.e., grains) of evidence or process can severely compromise ecological inference and prediction. Currently, many studies relying on environment—biodiversity data annotation do not actively consider these scale issues [12], or do so only on an ad hoc basis [11], thus incurring observation and process mismatches and missed opportunities for more effective and informed integration.
We suggest that a central reason for the relatively limited consideration of these data and process scale differences, despite rapidly growing data, is the lack of performant, readily usable tools. This is the motivation for the development of the versatile biodiversity–environment annotation toolbox that we present here: Spatiotemporal Observation Annotation Tool (STOAT). To further support STOAT’s relevance and specific uses, we explore 3 major concepts in detail (Fig 1): (i) spatiotemporal observational grain (data resolution); (ii) spatiotemporal uncertainty; and (iii) spatiotemporal process scale.
PPT PowerPoint slide
PNG larger image
TIFF original image Download: Fig 1. Variation in spatiotemporal characteristics of (A, B) biodiversity data and (C, D) ecological processes underlying biodiversity data. The blue cylinders reflect the spatiotemporal grain of the data itself, with the red and green cylinders reflecting data uncertainty and driver extent, respectively.
https://doi.org/10.1371/journal.pbio.3001460.g001
Spatiotemporal observational grain. The grain of observations and environmental data can vary widely depending on their sources and subsequent processing. A species observation could range from a single point in time and space where an individual was recorded by a GPS collar, to a polygon with an area of thousands of square kilometers representing a national park the species was known to inhabit over the past few decades. Likewise, an environmental characterization of a location could range from an estimate of vegetation status on a specific day for an area of 25m2 to a multidecadal mean averaged over hundreds of km2. The observations are essentially the same (species location or vegetation status), but the grain and associated interpretation of those observations are, or should be, very different. The linking of environmental with biodiversity data adds a further set of challenges. The grains of environmental data are predetermined and may or may not align with those of biodiversity data [6]. Likewise, the grains of biodiversity data may differ from each other in analyses that aggregate data from multiple sources. The intersection of spatially fine-grain biodiversity data with coarser gridded datasets is also vulnerable to the modifiable areal unit problem [24]. When considering grain heterogeneity, entire classes of biodiversity data may be unfit for simple intersections with categories of environmental products: temporally fine-grain camera trap observations with temporally coarse climatological products and spatially coarse biological survey data with fine-grain remote sensing products. The 2-way spatiotemporal grain variability between biodiversity observations and environmental data increases the likelihood of mischaracterization following a simple intersection (i.e., identifying the grid cell of environmental data that is closest to a biodiversity observation). The impact of mismatches in data grain size can be lessened using techniques such as data coarsening or statistically using multiscale models [6,25], but all of these methods presume existing knowledge of biodiversity and environmental data grain as well as the grain of the relevant ecological processes, in addition to access to the appropriate data. Furthermore, coarsening either type of data risks removing fine-grain variability that is critically important for understanding the underlying ecological processes. Data coarsening loses utility if the spatiotemporal scale of an ecological process is finer than the finest grain biodiversity or environmental data available. Demand for data of higher resolution for the study of fine-scale ecological problems (e.g., animal behavior [12] or mechanistic niche models [26]) demonstrates the need for advancement in our biodiversity and Earth observation technologies. Addressing grain mismatch is made more difficult by the general lack of attention to scale dependency [12,27]; spatial and temporal scale metadata of biodiversity data (especially in large aggregated databases) are often unavailable, lost, or discarded [28,29].
Spatiotemporal uncertainty. Biodiversity observations also vary in their spatiotemporal uncertainty owing to the variety of data collection methods. These can vary from a global positioning system location with spatiotemporal accuracies of seconds and meters to text-based descriptions of location common in museum collections, which may have uncertainties measured in kilometers and days or even years. The interpretation of uncertainty is distinct from that of grain. Consider 2 hypothetical biodiversity observations: the first with a coarse spatial grain but low uncertainty (e.g., a polygon representing the home range of a bird estimated from many GPS observations) and the second with fine grain but high uncertainty (e.g., a record of a museum specimen with imprecise geocoding). For the first GPS-derived range, taking the mean and variance of the environmental conditions across the range may be an appropriate way to characterize the individual’s environmental niche. The environmental characterization of the museum observation could also utilize the mean and variance of the area of spatial uncertainty, but would be more appropriately interpreted as a random variable drawn from the possible values across the area of uncertainty. The organism recorded in the second case is not assumed to inhabit all locations within its area of uncertainty, and thus the same mean and variance can be interpreted differently.
[END]
[1] Url:
https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001460
(C) Plos One. "Accelerating the publication of peer-reviewed science."
Licensed under Creative Commons Attribution (CC BY 4.0)
URL:
https://creativecommons.org/licenses/by/4.0/
via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/