TITLE: Writing R package documentation

TITLE: Writing R package documentation
DATE: 2020-05-30
AUTHOR: John L. Godlee
====================================================================

There's already a tonne of stuff on how to write R packages, see
here, here, here and here. Part of the reason for the breadth of
articles is that there are many different workflows for how to
write them. Here I'm only going to share my thoughts on writing
package documentation, because that's the area where I didn't find
one complete resource that answered all of my questions and
provided a workflow I liked, when I was writing my first serious
package.

[here]:
https://tinyheero.github.io/jekyll/update/2015/07/26/making-your-fir
st-R-package.html
[1]: https://r-pkgs.org/
[2]:
https://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratc
h/
[3]: https://kbroman.org/pkg_primer/

To briefly explain the basic structure of my package, I took advice
from Hadley and kept functions in my package inside thematic files,
like biomass.R and taxonomy.R, with each of these files holding
multiple functions. It's somewhere between keeping all functions in
one file and keeping each function in its own file. I think both of
these extremes ignore the natural sorting which can come from
keeping a tidy directory structure. I found it more intuitive to
find a particular function based on its theme when I used these
thematic files.

[advice from Hadley]: https://r-pkgs.org/r.html

I used roxygen2 to store the documentation for each package
function alongside the code for that function in my R/*.R files.
For example, my convenience function for concatenating genus and
species names to one string (picked as an example purely because
it's short):

[roxygen2]:
https://cran.r-project.org/web/packages/roxygen2/vignettes/roxygen2.
html

#' Combine genus and species character vectors to a species name
#'
#' @param x vector of genus names
#' @param y corresponding vector of species names
#'
#' @return vector of genus and species
#'
#' @export
#'
combineSpecies <- function(x, y) {
vec <- speciesFormat(data.frame(genus = x, species = y))
vec <- paste(vec[[1]], vec[[2]])
return(vec)
}

This function has the @export tag, meaning that when my package is
loaded with library() by a user, this function can be accessed
without prefixing with the package name. Functions with @export are
automatically written into the package manual when I compile it
with devtools::document(). I have a tonne of functions in this
package that are not useful to the average user however, mostly
functions which check the contents of a particular column in the
standardised datasets used by this package. These functions are
purposely not written into the package manual with the @noRd tag,
which stops a .Rd file being written for that function and
therefore keeps it out of the manual. These functions also have the
@keywords internal, which means that the function can only be
accessed by the user with package:::function(), but can still be
accessed by other functions in the package with function(). This
means that the user can still use the function if they need to, but
are discouraged from doing so, normally because that function is
better implemented in a higher-level wrapper function which
provides checks or preprocessing. As an example, my function
genus() checked whether genus names are formatted sensibly, but is
only meant to be called from within colValCheck(), which wraps a
bunch of column checking functions in a neater interface:

#' Check validity of stem genus column
#'
#' @param x vector of stem genera
#'
#' @return vector of class "character"
#' @keywords internal
#' @noRd
#'
genus <- function(x, ...) {
x <- fillNA(x)
x <- coerce_catch(x, as.character, ...)
na_catch(x, warn = TRUE, ...)
if (any(!grepl("^[[:alpha:]]+$", x[!is.na(x)]))) {
stop("Non-letter characters found in genus")
}
else if (any(!grepl("^[A-Z]", x[!is.na(x)]))) {
stop("Genera must start with a capital letter [A-Z]")
}
else if (any(grepl("[A-Z]", substring(x[!is.na(x)], 2)))) {
stop("Genera must not have multiple capital letters")
}
structure(x, class = "character")
}

It's nice to have a package level description at the start of a
package manual before launching into the technicalities of the
function definitions. To do this, I added a roxygen entry like the
one below (cut for brevity), which has the object NULL and uses the
key tags: @docType package and @name packagename-package.

#' silvR: Clean and analyse SEOSAW style data
#'
#' The \code{silvr} package facilitates three important
activities:
#' \itemize{
#' \item{Checking and cleaning new data for the SEOSAW
dataset}
#' \item{Manipulating the SEOSAW dataset to provide
informative summary data}
#' \item{Analysing the SEOSAW dataset}
#' }
#'
#' @details The functions in the \code{silvr} package form a
workflow for
#' checking data prior to ingestion into the SEOSAW
database. The package
#' deals with 4 principle data objects:
#'
#' ...
#'
#' The package contains various functions for quickly
creating useful
#' summary data objects such as abundance matrices and
maps, ...
#'
#' @author The \code{silvr} package is a collaborative effort,
bringing code
#' together from various SEOSAW members ...
#'
#' @section Key top-level functions:
#' For ingesting new data into the SEOSAW database, it is
recommended to run
#' these top level functions in this order to catch errors.
#'
#' \itemize{
#' \item{\code{plotTableGen()} - Checks for value and column
errors and
#' return a clean SEOSAW style plot metadata dataframe.}
#' \item{\code{stemTableGen()} - Checks for value and column
errors and
#' return a clean SEOSAW style stem data dataframe.}
#' \item{...}
#' }
#'
#' @docType package
#' @name silvr-package
NULL

This longer description takes advantage of @section and @details
for structuring blocks of text in the 'roxygen2 block.

The manual frontmatter comes mostly from the DESCRIPTION file. Most
important are the package dependencies, which are also specified by
minimum version number. Annoyingly, these package versions don't
get populated directly from the package dependencies in the
roxygen2 function blocks. Instead they have to be written manually
into the DESCRIPTION.

Roxygen2 autopopulates NAMESPACE from the @import and @importFrom
tags in the function blocks. I tend to use @importFrom vegan
diversity rather than @import vegan where I can, to avoid potential
conflicts in function names if I start loading lots of packages,
but I don't think there is any hard rule on this.

To write a vignette, I used RMarkdown rather than Sweave. It seems
to be the modern approach to vignette writing and is much more
straightforward when including figures and code chunks in the
document. To set this up I created a directory in the package root
call vignettes/ and created a packagename.Rmd file in there. Then
in the YAML frontmatter I included this:

output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Cleaning and analysing SEOSAW data}
%\VignetteEngine{knitr::rmarkdown}
\usepackage[utf8]{inputenc}

Then in my DESCRIPTION I added:

Suggests:
knitr (>= 1.28),
rmarkdown (>= 2.1)
VignetteBuilder: knitr

Which ensures the tools for building the vignette are present. I
can then build the vignette with: devtools::build_vignettes().

Finally, a short R script I have sitting above my package directory
contains this code to build the package:

setwd("silvr")
devtools::document() # Generate .Rd files
devtools::build_manual() # Generate .pdf manual
devtools::build_vignettes() # Generate .html vignette
setwd("..")
devtools::install("silvr") # Install the package
library(silvr) # Load the package