TITLE: Using R to locate spatial data points inside map polygons
DATE: 2017-09-27
AUTHOR: John L. Godlee
====================================================================


I was looking into a paper called "Determinants of woody cover in
African savannas" by Sankaran et al. (2005). The paper looks at the
large scale environmental factors that affect percentage woodland
cover in African savanna landscapes. One figure in particular that
got me interested was this one:

 ![Mean annual Precipitation vs. woody
cover](https://johngodlee.xyz/img_full/sankaran/map_wood.png)

I haven't fully worked out the implications of this figure yet, but
what stands out to me the most is that many plots in high rainfall
areas with low woody cover are classed as 'arid fertile savanna' by
White's veg. classification. Secondly moist infertile savanna seems
to straddle the saturation point of the MAP limited woody cover.

It shows that savannas with Mean Annual Precipitation values less
than ~650 mm have their upper woody cover potential limited by
precipitation, but above that threshold an increase in MAP doesn't
increase the maximum potential woody cover. It also shows that lots
of sites have woody cover below their MAP limited maximum, pointing
to lots of other environmental factors, like fire, herbivory, soil
characteristics.

Looking into this graph more, I wanted to see whether there was any
biogeographic patterns that could be drawn from the data. Were all
the sites with particularly low actual woody cover from a
particular woodland cover biome, for example.

To do this I compared the data from Sankaran et al. (2005), which
is publicly available as supplementary information, to White's
seminal vegetation classification map of 1983, which I accessed as
as shapefiles from here.

I did this analysis in R, so all the code below is for R.

You can find the code and data that I used by cloning this github
repo.

First, I loaded the packages and the data:

   # Set working directory to the location of the source file ----
   setwd(dirname(rstudioapi::getActiveDocumentContext()$path))

   # Packages ----
   library(ggplot2)
   library(dplyr)
   library(rgeos)
   library(rgdal)

   # Import data ----

   ## sankaran data
   cover <- read.csv("data/sankaran_2005_data.csv")
   str(cover)

   ## white 1983 veg data
   white_veg <- readOGR(dsn="data/whitesveg",
       layer="Whites vegetation")

   ## Country outline
   countries <- readOGR(dsn="data/africa",
       layer="Africa")

First I can have a go at plotting White's map:

   # Plot White's veg map data ----
   ## Fortify country outline for ggplot
   countries@data
   countries_fort <- fortify(countries, region = "COUNTRY")

   ## Exploring whiteveg
   white_veg@data
   white_veg@polygons[[1]]
   white_veg@proj4string
   white_veg@bbox
   white_veg@plotOrder

   ## Fortify white shape file for ggplot2
   white_veg_fort <- fortify(white_veg, region = "DESCRIPTIO")
   names(white_veg_fort)
   length(unique(white_veg_fort$id))

   ## Create colour palette for ggplot2
   palette_veg_type_19 <-
c("#FF4A46","#008941","#006FA6","#A30059","#FFDBE5",
                "#7A4900","#0000A6","#63FFAC","#B79762","#004D43",
                "#8FB0FF","#997D87","#5A0007","#809693","#FEFFE6",
                "#1B4400","#4FC601","#3B5DFF","#4A3B53")

   ## ggplot
   ggplot() +
       geom_polygon(aes(x = long, y = lat, group = group, fill =
id),
               data = white_veg_fort) +
       geom_polygon(aes(x = long, y = lat, group = group, fill =
NA),
               colour = "black",
               data = countries_fort) +
       theme_classic()  +
       scale_fill_manual(values = palette_veg_type_19) +
       labs(fill = "Biome") +
       xlab("Longitude") +
       ylab("Latitude") +
       coord_map()

Then I had to convert the data from Sankaran et al. into a
SpatialPoints object so I can use it in future analyses.

   # Create a data frame with only the latitude and longitude data
   cover_coords <- cover %>%
       select(lon_dec_deg, lat_dec_deg)  # Important for later on
to have lon then lat as columns

   # Convert to SpatialPoints object
   cover_spoints <-
SpatialPoints(cover_coords,proj4string=CRS(proj4string(white_veg)))

The bit of this project that took me some time to work out was how
to compute whether a data point from Sankaran et al. fell into a
polygon of a certain type in White's map. I ended up using over()
from the sp package. Then I can add that data back into the
original Sankaran dataset

   # Add vegetation class column by referencing White map
   cover_veg_class <- cover %>%
       mutate(veg_class = over(cover_spoints,
white_veg)$DESCRIPTIO)

Finally, I can use cover_veg_class to create a ggplot of MAP vs
woody cover, with the points coloured according to which of White's
vegetation classes the point is in:

   # Plot with vegetation classification from White et al. 1983
----
   veg_class_plot <- ggplot(cover_veg_class, aes(x = map_mm,
               y = woody_cover_per,
               colour = veg_class)) +
       geom_point(size = 4) +
       guides(colour=guide_legend(title="Vegetation Class")) +
       xlab("MAP (mm)") +
       ylab("Woody Cover (%)")

 ![MAP vs. woody cover with coloured points by vegetation
type](https://johngodlee.xyz/img_full/sankaran/map_wood_col.png)