sf
packageWe will attempt to take advantage of the bleeding edge tidyverse friendly R package for geospatial data, called sf
. Even though our textbook is only a few years old, this R package had not been released in its final form at the time of publication. However, compared to the previous state of the art, it simplifies the interface for interacting with spatial layers substantially and makes it play nicely with other tidyverse packages, most especially dplyr
and ggplot
. What a time to be alive!
Before you go further, make sure you are able to load the sf
package, with library()
. If you get an error, it may be because you have installed packages to your user account that have different versions than the system-wide ones. If this comes up let me know and I will show everyone how to get around it.
The sf
package comes with some example datasets. One of these is about cases of Sudden Infant Death Syndrome in North Carolina. The dataset is described at this link (key excerpt quoted below)
“This data set was presented first in Symons et al. (1983), analysed with reference to the spatial nature of the data in Cressie and Read (1985), expanded in Cressie and Chan (1989), and used in detail in Cressie (1991). It is for the 100 counties of North Carolina, and includes counts of numbers of live births (also non-white live births) and numbers of sudden infant deaths, for the July 1, 1974 to June 30, 1978 and July 1, 1979 to June 30, 1984 periods. In Cressie and Read (1985), a listing of county neighbours based on shared boundaries (contiguity) is given, and in Cressie and Chan (1989), and in Cressie (1991, pp. 386–389), a different listing based on the criterion of distance between county seats, with a cutoff at 30 miles. The county seat location coordinates are given in miles in a local (unknown) coordinate reference system. The data are also used to exemplify a range of functions in the S-PLUS spatial statistics module user’s manual (Kaluzny et al., 1996)”
We can read in the data as follows:
Code:
## Reading layer `nc' from data source
## `/usr/local/lib/R/site-library/sf/shape/nc.shp' using driver `ESRI Shapefile'
## Simple feature collection with 100 features and 14 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## CRS: 4267
The main advantage of sf
over older methods is that we can interact with the nc
object as a regular data.frame
.
For example, we can view the AREA
, NAME
and geometry
variables with select()
as we normally would.
Code:
## Simple feature collection with 100 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## CRS: 4267
## First 10 features:
## AREA NAME geometry
## 1 0.114 Ashe MULTIPOLYGON (((-81.47276 3...
## 2 0.061 Alleghany MULTIPOLYGON (((-81.23989 3...
## 3 0.143 Surry MULTIPOLYGON (((-80.45634 3...
## 4 0.070 Currituck MULTIPOLYGON (((-76.00897 3...
## 5 0.153 Northampton MULTIPOLYGON (((-77.21767 3...
## 6 0.097 Hertford MULTIPOLYGON (((-76.74506 3...
## 7 0.062 Camden MULTIPOLYGON (((-76.00897 3...
## 8 0.091 Gates MULTIPOLYGON (((-76.56251 3...
## 9 0.118 Warren MULTIPOLYGON (((-78.30876 3...
## 10 0.124 Stokes MULTIPOLYGON (((-80.02567 3...
There is also a plot method for this type of object that makes it very easy to see the data in map form.
The following code will produce a map of the counties in North Carolina, colored by their size in land area.
Code:
If we use the latest version of ggplot2
, we can also use the geom_sf()
function to create a similar plot.
BIR74
variable) and 1979 (in the BIR79
variable), where year is a faceting variable. (Hint: You will need a pivot
for the faceting part)A more complex dataset comes from the macleish
package, which has spatial data coupled with weather data recorded from two monitor stations in Massachussetts.
To get the data, first load the macleish
package with library()
.
Code:
The data consists of multiple layers, each in the form of an sf
data frame, and collected into an R list called macleish_layers
. See the names of the layers with names(macleish_layers)
.
Code:
## [1] "landmarks" "forests" "streams"
## [4] "challenge_courses" "buildings" "wetlands"
## [7] "boundary" "research" "soil"
## [10] "trails" "contours_30ft" "contours_3m"
To access individual layers in this list, you can use the pluck()
function.
For example, the buildings layer:
Code:
## Simple feature collection with 27 features and 1 field
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -72.68251 ymin: 42.44104 xmax: -72.67923 ymax: 42.44919
## CRS: EPSG:4326
## First 10 features:
## name geometry
## 1 0 POLYGON ((-72.68056 42.4484...
## 2 0 POLYGON ((-72.68051 42.4483...
## 3 0 POLYGON ((-72.68123 42.4464...
## 4 0 POLYGON ((-72.68116 42.4463...
## 5 0 POLYGON ((-72.68134 42.4461...
## 6 0 POLYGON ((-72.68135 42.4451...
## 7 0 POLYGON ((-72.68137 42.4450...
## 8 0 POLYGON ((-72.68102 42.4445...
## 9 0 POLYGON ((-72.68098 42.4444...
## 10 0 POLYGON ((-72.68143 42.4443...
We can get a quick plot of the buildings in this layer by simply piping the layer into plot()
.
There is a tutorial, or “vignette” in R parlance, that gives you some context, examples, and analysis of this data, available by typing vignette("macleish")
.
macleish
data.leaflet
When working with multiple layers, it can be nicer to create an interactive plot so the user can get context about the data by hovering.
We can use yet another plotting library well suited for spatial data, called leaflet
. Here is an example, showing a particular building in the context of buildings and forests.
First, we create a dataset for the particular building of interest
Next we can create an initial plot, which we will add to later:
base_plot <- leaflet() %>%
addTiles() %>%
addMarkers(
lng = ~lon,
lat = ~lat,
data = bechtel,
popup = "Bechtel Environmental Classroom")
base_plot %>%
addPolygons(
data = macleish_layers %>% pluck("buildings"),
weight = 1,
popup = ~name) %>%
addPolygons(
data = macleish_layers %>% pluck("forests"),
weight = 1,
fillOpacity = 0.2,
popup = ~ Sheet1__Na)
#lab17
on Slack