TITLE: Modelling stem diameter class distribution with Weibull

TITLE: Modelling stem diameter class distribution with Weibull
distributions
DATE: 2021-03-30
AUTHOR: John L. Godlee
====================================================================

The two-parameter Weibull probability distribution is commonly used
to model the distribution of stem diameters in forests. The scale
and shape parameters can be used to compare different forest stands
to see how their distributions vary. I created a function in R
which calculates these parameters per plot, basically a wrapper
around MASS::fitdistr(). It's worth noting that FPC refers to
adjustments by sampling effort that may occur in plots with a
nested sampling design:

#' Fit a two-parameter Weibull distribution of diameter size
classes
#'
#' @param x dataframe of stem measurements
#' @param plot_id column name string of plot IDs
#' @param diam column name string of stem diameters
#' @param fpc column name string of stem FPC values
#' @param return_fit logical, optionally return a list of model
objects instead
#' of a dataframe of parameters
#'
#' @return dataframe with the shape and scale parameters, and
the standard
#' errors of each parameter for each plot-level Weibull
distribution
#'
#' @details The two-parameter Weibull distribution is a
probability density
#' function of the form: \eqn{f(D) = c/d(D/d)^{c-1}
exp(–(D/d)^{c})},
#' where $c$ is the shape parameter, $d$ is the scale
parameter, and $D$
#' is the stem diameter. To deal with variation in sampling
effort, this
#' function duplicates rows based on their FPC, rounding
the number of rows
#' up or down. E.g. a stem with FPC=0.08 would be
duplicated 13 times,
#' because $1/0.08=12.5$. The Weibull distribution is
fitted here using
#' Maximum Likelihood. $c$ is closely related to the mean
stem diameter
#' in most cases. $d<1$ indicates a concave exponential
which becomes more
#' extreme as $d -> 0$, while $d>1$ indicates a hump-shaped
#' relationship which becomes increasingly left-skewed as
#' $d -> inf$.
#'
#' Note that the Weibull distribution is not capable of
representing
#' bimodal or more complex diameter distributions, which
can occur in
#' disturbed woodlands.
#'
#' Any NA diameters will be excluded from the calculation
#'
#' @examples
#'
#' @importFrom MASS fitdistr
#'
#' @export
#'
diamWeibullGen <- function(x, plot_id = "plot_id", diam =
"diam", fpc = "fpc",
return_fit = FALSE) {
x_split <- split(x, x[[plot_id]])

out <- lapply(x_split, function(y) {
y_dup <- y[rep(row.names(y), round(1 / y[[fpc]])),]

y_fil <- y_dup[!is.na(y_dup[[diam]]),]

fit <- tryCatch({
suppressWarnings(MASS::fitdistr(y_fil[[diam]], "weibull"))
},
error = function(cond) {
return(NA)
})

if (return_fit) {
ret <- fit
} else {
if (!is.list(fit)) {
ret <- data.frame(
weibull_shape = NA,
weibull_scale = NA,
weibull_shape_se = NA,
weibull_scale_se = NA,
plot_id = unique(y[[plot_id]])
)
} else {
ret <- data.frame(
weibull_shape = fit[[1]][1],
weibull_scale = fit[[1]][2],
weibull_shape_se = fit[[2]][1],
weibull_scale_se = fit[[2]][2],
plot_id = unique(y[[plot_id]])
)
}
}
ret
})

if (!return_fit) {
out <- do.call(rbind, out)
names(out)[names(out) == "plot_id"] <- plot_id
}

return(out)
}

I created another function which fits a Weibull distribution and
extrapolates it to estimate the number of stems within arbitrary
size classes:

#' Estimate lower size-class abundances in plots with higher
minimum diameter thresholds
#'
#' @param x dataframe of stem measurements
#' @param plot_data dataframe of plot data
#' @param binwidth
#' @param min_diam_thresh column name string of minimum
diameter threshold in \code{plot_data}
#' @param diam column name string of stem diameter in \code{x}
#' @param min_limit lower diameter limit down to which to
estimate stem abundance
#' @param size_classes vector of size classes
#'
#' @return dataframe of stem abundances per size class
#'
#' @examples
#'
#'
#' @export
#'
diamWeibullEst <- function(x, plot_data, bins,
stem_plot_id = "plot_id", plot_plot_id = stem_plot_id,
min_diam_thresh = "min_diam_thresh", diam = "diam", fpc =
"fpc") {

# Get plot IDs in stems and plots
plots_all <- unique(c(x[[stem_plot_id]],
plot_data[[plot_plot_id]]))

# Which plot IDs are shared?
good_plots <- plots_all[plots_all %in% x[[stem_plot_id]] &
plots_all %in% plot_data[[plot_plot_id]]]

# Plots not in stems
plots_not_in_stems <- plots_all[!plots_all %in%
x[[stem_plot_id]]]

# Plots not in plots
plots_not_in_plots <- plots_all[!plots_all %in%
plot_data[[plot_plot_id]]]

# Plots with no min diam thresh
plot_no_thresh <-
plot_data[is.na(plot_data[[min_diam_thresh]]), plot_plot_id]

# Warning for plots with no min diam thresh
seosawr:::warnFormat(plot_no_thresh,
"Some plots have NA diameter thresholds:", "warning", 15)

# Warning for any plots which have been dropped
seosawr:::warnFormat(plots_not_in_plots,
"Some plots not present in plot data, will be NA:",
"warning", 15)

seosawr:::warnFormat(plots_not_in_stems,
"Some plots have no stems, will be 0:", "warning", 15)

# For each plot
out <- do.call(rbind, lapply(plots_all, function(y) {

# Get stems and plots
x_fil <- x[x[[stem_plot_id]] == y & !is.na(x[[diam]]),]

plots_fil <- plot_data[plot_data[[plot_plot_id]] == y,]

# Find min diam thresh
if (nrow(plots_fil) > 0) {
min_diam <- min(plots_fil[[min_diam_thresh]], na.rm =
TRUE)

# Filter to plot diameter threshold
x_fil <- x_fil[x_fil[[diam]] >= min_diam &
!is.na(x_fil[[diam]]),]

# Calculate weibull
if (nrow(x_fil) > 0) {
weib <- diamWeibullGen(x_fil, diam = diam, plot_id =
stem_plot_id,
fpc = "fpc", return_fit = FALSE)

# Get replicated rows by FPC
x_dup <- x_fil[rep(row.names(x_fil), round(1 /
x_fil[[fpc]])),]

# Extract Weibull parameters
weib_shape <- weib$weibull_shape
weib_scale <- weib$weibull_scale

# Find number of stems used to generate the
distribution
nstems_obs <- length(x_dup[[diam]])

# Find predicted proportion of stems this represents
prop_obs <- unname(exp(-(min_diam /
weib_scale)^weib_shape))

# Estimate total number of stems
nstems_total <- nstems_obs / prop_obs

# For each bin, calculate estimated proportion of
stems
n_est <- unlist(lapply(bins, function(z) {
# Estimate proportion of stems in category of
interest
prop_cat <- unname(exp(-(z[[1]] /
weib_scale)^weib_shape) -
exp(-(z[[2]] / weib_scale)^weib_shape))

prop_cat * nstems_total
}))

} else {
n_est <- 0
}
} else {
n_est <- NA_real_
}

# Create pretty bin chars
diam_class <- unlist(lapply(bins, function(z) {
paste(z[[1]], z[[2]], sep = "-")
}))

# Create dataframe of output
ret <- data.frame(y, diam_class, n_est)
names(ret)[1] <- plot_plot_id

# Return
ret
}))

# Return
return(out)
}

The function models stem numbers reasonably consistently among
plots, but tends to under-estimate the lower diameter size classes:

![Estimated diameter size classes vs.
observed](https://johngodlee.xyz/img_full/weibull/p_plot.png)

In the plot each set of points connected by a line is a plot, and
each point is a diameter size class, with the proportion of stems
observed within that size class on the x axis, and the proportion
of stems estimated by diamWeibullEst() on the y axis. As you can
see, large diameter size classes are slightly over-estimated in
their proportional abundance, and the smallest size class (5-10 cm)
is consistently under-estimated. This is probably because the
Weibull distribution will dip down towards zero unless the shape
parameter (k) is less than 1.