Working With Geospatial Data

The `sf` package

We will attempt to take advantage of the bleeding edge tidyverse friendly R package for geospatial data, called sf. Even though our textbook is only a few years old, this R package had not been released in its final form at the time of publication. However, compared to the previous state of the art, it simplifies the interface for interacting with spatial layers substantially and makes it play nicely with other tidyverse packages, most especially dplyr and ggplot. What a time to be alive!

Before you go further, make sure you are able to load the sf package, with library(). If you get an error, it may be because you have installed packages to your user account that have different versions than the system-wide ones. If this comes up let me know and I will show everyone how to get around it.

Opening a dataset

The sf package comes with some example datasets. One of these is about cases of Sudden Infant Death Syndrome in North Carolina. The dataset is described at this link (key excerpt quoted below)

“This data set was presented first in Symons et al. (1983), analysed with reference to the spatial nature of the data in Cressie and Read (1985), expanded in Cressie and Chan (1989), and used in detail in Cressie (1991). It is for the 100 counties of North Carolina, and includes counts of numbers of live births (also non-white live births) and numbers of sudden infant deaths, for the July 1, 1974 to June 30, 1978 and July 1, 1979 to June 30, 1984 periods. In Cressie and Read (1985), a listing of county neighbours based on shared boundaries (contiguity) is given, and in Cressie and Chan (1989), and in Cressie (1991, pp. 386–389), a different listing based on the criterion of distance between county seats, with a cutoff at 30 miles. The county seat location coordinates are given in miles in a local (unknown) coordinate reference system. The data are also used to exemplify a range of functions in the S-PLUS spatial statistics module user’s manual (Kaluzny et al., 1996)”

We can read in the data as follows:

Code:

library(sf)
nc <- system.file("shape/nc.shp", package = "sf") %>%
  st_read()

## Reading layer `nc' from data source 
##   `/usr/local/lib/R/site-library/sf/shape/nc.shp' using driver `ESRI Shapefile'
## Simple feature collection with 100 features and 14 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## CRS:           4267

The main advantage of sf over older methods is that we can interact with the nc object as a regular data.frame.

For example, we can view the AREA, NAME and geometry variables with select() as we normally would.

Code:

library(tidyverse)
nc %>% select(AREA, NAME, geometry)

## Simple feature collection with 100 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## CRS:           4267
## First 10 features:
##     AREA        NAME                       geometry
## 1  0.114        Ashe MULTIPOLYGON (((-81.47276 3...
## 2  0.061   Alleghany MULTIPOLYGON (((-81.23989 3...
## 3  0.143       Surry MULTIPOLYGON (((-80.45634 3...
## 4  0.070   Currituck MULTIPOLYGON (((-76.00897 3...
## 5  0.153 Northampton MULTIPOLYGON (((-77.21767 3...
## 6  0.097    Hertford MULTIPOLYGON (((-76.74506 3...
## 7  0.062      Camden MULTIPOLYGON (((-76.00897 3...
## 8  0.091       Gates MULTIPOLYGON (((-76.56251 3...
## 9  0.118      Warren MULTIPOLYGON (((-78.30876 3...
## 10 0.124      Stokes MULTIPOLYGON (((-80.02567 3...

There is also a plot method for this type of object that makes it very easy to see the data in map form.

The following code will produce a map of the counties in North Carolina, colored by their size in land area.

Code:

nc %>% 
  select(AREA, geometry) %>% 
  plot()

If we use the latest version of ggplot2, we can also use the geom_sf() function to create a similar plot.

library(ggplot2)
nc %>% 
  ggplot(aes(fill = AREA)) +
  geom_sf()

Using the NC dataset and your wrangling and visualization skillz, create a plot showing the geographic distribution of the number of births in each of two years: 1974 (in the BIR74 variable) and 1979 (in the BIR79 variable), where year is a faceting variable. (Hint: You will need a pivot for the faceting part)

SOLUTION

Create a similar plot, this time showing the number of SIDS cases in these two years, normalized by the number of births.

SOLUTION

Data with multiple layers

A more complex dataset comes from the macleish package, which has spatial data coupled with weather data recorded from two monitor stations in Massachussetts.

To get the data, first load the macleish package with library().

Code:

library(macleish)

The data consists of multiple layers, each in the form of an sf data frame, and collected into an R list called macleish_layers. See the names of the layers with names(macleish_layers).

Code:

names(macleish_layers)

##  [1] "landmarks"         "forests"           "streams"          
##  [4] "challenge_courses" "buildings"         "wetlands"         
##  [7] "boundary"          "research"          "soil"             
## [10] "trails"            "contours_30ft"     "contours_3m"

To access individual layers in this list, you can use the pluck()function.

For example, the buildings layer:

Code:

library(magrittr)
macleish_layers %>% 
  pluck("buildings")

## Simple feature collection with 27 features and 1 field
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -72.68251 ymin: 42.44104 xmax: -72.67923 ymax: 42.44919
## CRS:           EPSG:4326
## First 10 features:
##    name                       geometry
## 1     0 POLYGON ((-72.68056 42.4484...
## 2     0 POLYGON ((-72.68051 42.4483...
## 3     0 POLYGON ((-72.68123 42.4464...
## 4     0 POLYGON ((-72.68116 42.4463...
## 5     0 POLYGON ((-72.68134 42.4461...
## 6     0 POLYGON ((-72.68135 42.4451...
## 7     0 POLYGON ((-72.68137 42.4450...
## 8     0 POLYGON ((-72.68102 42.4445...
## 9     0 POLYGON ((-72.68098 42.4444...
## 10    0 POLYGON ((-72.68143 42.4443...

We can get a quick plot of the buildings in this layer by simply piping the layer into plot().

macleish_layers %>% 
  pluck("buildings") %>% 
  plot()

There is a tutorial, or “vignette” in R parlance, that gives you some context, examples, and analysis of this data, available by typing vignette("macleish").

Plot the “streams” layer of the macleish data.

SOLUTION

Interactive plots with `leaflet`

When working with multiple layers, it can be nicer to create an interactive plot so the user can get context about the data by hovering.

We can use yet another plotting library well suited for spatial data, called leaflet. Here is an example, showing a particular building in the context of buildings and forests.

library(leaflet)

First, we create a dataset for the particular building of interest

bechtel <- tibble(lat = 42.449167, lon = -72.679389)

Next we can create an initial plot, which we will add to later:

base_plot <- leaflet() %>%
  addTiles() %>%
  addMarkers(
    lng   = ~lon, 
    lat   = ~lat, 
    data  = bechtel,
    popup = "Bechtel Environmental Classroom")

base_plot %>%
  addPolygons(
    data   = macleish_layers %>% pluck("buildings"), 
    weight = 1, 
    popup  = ~name) %>%
  addPolygons(
    data        = macleish_layers %>% pluck("forests"), 
    weight      = 1, 
    fillOpacity = 0.2, 
    popup       = ~ Sheet1__Na)

Using the above example as a guide, make an interactive visualization of the MacLeish data, including layers for buildings, streams, and trails.

SOLUTION

Upload your plot for Exercise 4 to #lab17 on Slack

STAT 209: Lab 17

Working With Geospatial Data

The sf package

Opening a dataset

SOLUTION

SOLUTION

Data with multiple layers

SOLUTION

Interactive plots with leaflet

SOLUTION

The `sf` package

Interactive plots with `leaflet`