TITLE: Making abundance matrices

TITLE: Making abundance matrices
DATE: 2020-10-31
AUTHOR: John L. Godlee
====================================================================

There are lots of R packages to generate species by site abundance
matrices from a long-format dataframe of records. For example,
labdsv::matrify() takes a matrix like this:

Site Species Abundance
------ ---------------- -----------
A Quercus robur 10
B Quercus robur 2
B Betula pendula 30
... ... ...

This method relies on already having the data summarised, but what
if each row was a record, as would be the case if you had raw tree
diameter measurements, rather than merely a count of abundance:

Site Species DBH
------ ---------------- ------
A Quercus robur 15.6
A Quercus robur 5.4
A Betula pendula 11.0
... ... ...

It wouldn't be hard to turn this into a summary table with some
dplyr:

count(dat, Site, Species)

Additionally, what if individuals vary according sampling effort,
for example if species less than 10 cm DBH were only measured in a
20x10 m box within a large 20x50 m plot:

Site Species DBH FPC
------ ---------------- ------ -----
A Quercus robur 15.6 1
A Quercus robur 5.4 0.2
A Betula pendula 11.0 1
... ... ... ...

Or if the measure of abundance isn't individual presence, but the
canopy cover of the individual:

Site Species DBH Cover
------ ---------------- ------ -------
A Quercus robur 15.6 2.53
A Quercus robur 5.4 1.01
A Betula pendula 11.0 2.40
... ... ... ...

Then it becomes much harder to create one of these matrices.

Wouldn't it be nice to have a base R function to create species by
site abundance matrices, which can deal with sampling effort,
alternative methods of abundance, and unsummarised data.

#' Generate a species by site abundance matrix
#'
#' @param x dataframe of individual records
#' @param site_id column name string of site IDs
#' @param species_id column name string of species names
#' @param fpc optional column name string of sampling weights
of each record,
#' between 0 and 1
#' @param abundance optional column name string with an
alternative abundance
#' measure such as biomass, canopy cover, body length
#'
#' @return dataframe of species abundances (columns) per site
(rows)
#'
#' @examples
#' x <- data.frame(site_id = rep(c("A", "B", "C"), each = 3),
#' species_id = sample(c("a", "b", "c", "d"), 9, replace =
TRUE),
#' fpc = rep(c(0.5, 0.6, 1), each = 3),
#' abundance = seq(1:9))
#' abMat(x, "site_id", "species_id")
#' abMat(x, "site_id", "species_id", "fpc")
#' abMat(x, "site_id", "species_id", "fpc", "abundance")
#'
#' @export
#'
abMat <- function(x, site_id, species_id, fpc = NULL, abundance
= NULL) {
# If no fpc or abundance, make 1
if (is.null(fpc)) {
x$fpc <- 1
} else {
x$fpc <- x[[fpc]]
}
if (is.null(abundance)) {
x$abundance <- 1
} else {
x$abundance <- x[[abundance]]
}

# Get all species and sites
species <- unique(x[[species_id]])
sites <- unique(x[[site_id]])

# Create empty species by site matrix
comm <- matrix(0, nrow = length(sites), ncol =
length(species))

# Fill matrix
for (i in seq(length(sites))) {
for(j in seq(length(species))) {
abu <- x[x[[site_id]] == sites[i] & x[[species_id]] ==
species[j],
c(site_id, species_id, "fpc", "abundance")]
comm[i,j] <- sum(1 * abu$abundance / abu$fpc, na.rm =
TRUE)
}
}

# Make tidy with names
comm <- data.frame(comm)
names(comm) <- species
row.names(comm) <- sites

return(comm)
}