TITLE: Making abundance matrices
DATE: 2020-10-31
AUTHOR: John L. Godlee
====================================================================


There are lots of R packages to generate species by site abundance
matrices from a long-format dataframe of records. For example,
labdsv::matrify() takes a matrix like this:

  Site      Species       Abundance
 ------ ---------------- -----------
   A     Quercus robur       10
   B     Quercus robur        2
   B     Betula pendula      30
  ...         ...            ...

This method relies on already having the data summarised, but what
if each row was a record, as would be the case if you had raw tree
diameter measurements, rather than merely a count of abundance:

  Site      Species       DBH
 ------ ---------------- ------
   A     Quercus robur    15.6
   A     Quercus robur    5.4
   A     Betula pendula   11.0
  ...         ...         ...

It wouldn't be hard to turn this into a summary table with some
dplyr:

   count(dat, Site, Species)

Additionally, what if individuals vary according sampling effort,
for example if species less than 10 cm DBH were only measured in a
20x10 m box within a large 20x50 m plot:

  Site      Species       DBH    FPC
 ------ ---------------- ------ -----
   A     Quercus robur    15.6    1
   A     Quercus robur    5.4    0.2
   A     Betula pendula   11.0    1
  ...         ...         ...    ...

Or if the measure of abundance isn't individual presence, but the
canopy cover of the individual:

  Site      Species       DBH    Cover
 ------ ---------------- ------ -------
   A     Quercus robur    15.6   2.53
   A     Quercus robur    5.4    1.01
   A     Betula pendula   11.0   2.40
  ...         ...         ...     ...

Then it becomes much harder to create one of these matrices.

Wouldn't it be nice to have a base R function to create species by
site abundance matrices, which can deal with sampling effort,
alternative methods of abundance, and unsummarised data.

   #' Generate a species by site abundance matrix
   #'
   #' @param x dataframe of individual records
   #' @param site_id column name string of site IDs
   #' @param species_id column name string of species names
   #' @param fpc optional column name string of sampling weights
of each record,
   #'     between 0 and 1
   #' @param abundance optional column name string with an
alternative abundance
   #'     measure such as biomass, canopy cover, body length
   #'
   #' @return dataframe of species abundances (columns) per site
(rows)
   #'
   #' @examples
   #' x <- data.frame(site_id = rep(c("A", "B", "C"), each = 3),
   #'   species_id = sample(c("a", "b", "c", "d"), 9, replace =
TRUE),
   #'   fpc = rep(c(0.5, 0.6, 1), each = 3),
   #'   abundance = seq(1:9))
   #' abMat(x, "site_id", "species_id")
   #' abMat(x, "site_id", "species_id", "fpc")
   #' abMat(x, "site_id", "species_id", "fpc", "abundance")
   #'
   #' @export
   #'
   abMat <- function(x, site_id, species_id, fpc = NULL, abundance
= NULL) {
     # If no fpc or abundance, make 1
     if (is.null(fpc)) {
       x$fpc <- 1
     } else {
       x$fpc <- x[[fpc]]
     }
     if (is.null(abundance)) {
       x$abundance <- 1
     } else {
       x$abundance <- x[[abundance]]
     }

     # Get all species and sites
     species <- unique(x[[species_id]])
     sites <- unique(x[[site_id]])

     # Create empty species by site matrix
     comm <- matrix(0, nrow = length(sites), ncol =
length(species))

     # Fill matrix
     for (i in seq(length(sites))) {
       for(j in seq(length(species))) {
         abu <- x[x[[site_id]] == sites[i] & x[[species_id]] ==
species[j],
           c(site_id, species_id, "fpc", "abundance")]
         comm[i,j] <- sum(1 * abu$abundance / abu$fpc, na.rm =
TRUE)
       }
     }

     # Make tidy with names
     comm <- data.frame(comm)
     names(comm) <- species
     row.names(comm) <- sites

     return(comm)
   }