sf
packageWe will attempt to take advantage of the bleeding edge tidyverse friendly R package for geospatial data, called sf
. Even though our textbook is less than a year old, this R package had not been released in its final form at the time of publication. However, compared to the previous state of the art, it simplifies the interface for interacting with spatial layers substantially and makes it play nicely with other tidyverse packages, most especially dplyr
and ggplot
. What a time to be alive!
Installing sf
is a bit of a hassle, so I suggest working on the rstudiopro
server for this lab, even if you normally work on your own local installation.
Before you go further, make sure you are able to load the sf
package, with library()
. If you get an error, it may be because you have installed packages to your user account that have different versions than the system-wide ones. If this comes up let me know and I will show everyone how to get around it.
The sf
package comes with some example datasets. One of these is about cases of Sudden Infant Death Syndrome in North Carolina. The dataset is described at this link (key excerpt quoted below)
“This data set was presented first in Symons et al. (1983), analysed with reference to the spatial nature of the data in Cressie and Read (1985), expanded in Cressie and Chan (1989), and used in detail in Cressie (1991). It is for the 100 counties of North Carolina, and includes counts of numbers of live births (also non-white live births) and numbers of sudden infant deaths, for the July 1, 1974 to June 30, 1978 and July 1, 1979 to June 30, 1984 periods. In Cressie and Read (1985), a listing of county neighbours based on shared boundaries (contiguity) is given, and in Cressie and Chan (1989), and in Cressie (1991, pp. 386–389), a different listing based on the criterion of distance between county seats, with a cutoff at 30 miles. The county seat location coordinates are given in miles in a local (unknown) coordinate reference system. The data are also used to exemplify a range of functions in the S-PLUS spatial statistics module user’s manual (Kaluzny et al., 1996)”
Read in the data as follows:
Code:
library(sf)
## Linking to GEOS 3.6.2, GDAL 2.2.4, proj.4 5.0.1
nc <- system.file("shape/nc.shp", package = "sf") %>%
st_read()
## Reading layer `nc' from data source `/Library/Frameworks/R.framework/Versions/3.3/Resources/library/sf/shape/nc.shp' using driver `ESRI Shapefile'
## Simple feature collection with 100 features and 14 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## epsg (SRID): 4267
## proj4string: +proj=longlat +datum=NAD27 +no_defs
The main advantage of sf
over older packages is that we can interact with the nc
object as a regular data.frame
.
For example, we can view the AREA
, NAME
and geometry
variables with select()
as we normally would.
Code:
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
nc %>% select(AREA, NAME, geometry)
## Simple feature collection with 100 features and 2 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## epsg (SRID): 4267
## proj4string: +proj=longlat +datum=NAD27 +no_defs
## First 10 features:
## AREA NAME geometry
## 1 0.114 Ashe MULTIPOLYGON (((-81.47276 3...
## 2 0.061 Alleghany MULTIPOLYGON (((-81.23989 3...
## 3 0.143 Surry MULTIPOLYGON (((-80.45634 3...
## 4 0.070 Currituck MULTIPOLYGON (((-76.00897 3...
## 5 0.153 Northampton MULTIPOLYGON (((-77.21767 3...
## 6 0.097 Hertford MULTIPOLYGON (((-76.74506 3...
## 7 0.062 Camden MULTIPOLYGON (((-76.00897 3...
## 8 0.091 Gates MULTIPOLYGON (((-76.56251 3...
## 9 0.118 Warren MULTIPOLYGON (((-78.30876 3...
## 10 0.124 Stokes MULTIPOLYGON (((-80.02567 3...
There is also a plot method for this type of object that maket it very easy to see the data in map form.
The following code will produce a map of the counties in North Carolina, colored by their size in land area. Code:
nc %>% select(AREA, geometry) %>% plot()
If we use the latest version of ggplot2
, we can also use the geom_sf()
function to create a similar plot.
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following object is masked from 'package:dplyr':
##
## vars
names(nc)
## [1] "AREA" "PERIMETER" "CNTY_" "CNTY_ID" "NAME"
## [6] "FIPS" "FIPSNO" "CRESS_ID" "BIR74" "SID74"
## [11] "NWBIR74" "BIR79" "SID79" "NWBIR79" "geometry"
head(nc)
## Simple feature collection with 6 features and 14 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -81.74107 ymin: 36.07282 xmax: -75.77316 ymax: 36.58965
## epsg (SRID): 4267
## proj4string: +proj=longlat +datum=NAD27 +no_defs
## AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74
## 1 0.114 1.442 1825 1825 Ashe 37009 37009 5 1091
## 2 0.061 1.231 1827 1827 Alleghany 37005 37005 3 487
## 3 0.143 1.630 1828 1828 Surry 37171 37171 86 3188
## 4 0.070 2.968 1831 1831 Currituck 37053 37053 27 508
## 5 0.153 2.206 1832 1832 Northampton 37131 37131 66 1421
## 6 0.097 1.670 1833 1833 Hertford 37091 37091 46 1452
## SID74 NWBIR74 BIR79 SID79 NWBIR79 geometry
## 1 1 10 1364 0 19 MULTIPOLYGON (((-81.47276 3...
## 2 0 10 542 3 12 MULTIPOLYGON (((-81.23989 3...
## 3 5 208 3616 6 260 MULTIPOLYGON (((-80.45634 3...
## 4 1 123 830 2 145 MULTIPOLYGON (((-76.00897 3...
## 5 9 1066 1606 3 1197 MULTIPOLYGON (((-77.21767 3...
## 6 7 954 1838 5 1237 MULTIPOLYGON (((-76.74506 3...
nc %>%
ggplot(aes(fill = AREA)) +
geom_sf()
Using the NC dataset and your wrangling and visualization skillz, create a plot showing the geographic distribution of the number of births in each of two years: 1974 (in the BIR74
variable) and 1979 (in the BIR790
variable), where year is a faceting variable. (Hint: You will need a gather()
)
Create a similar plot, this time showing the number of SIDS cases in these two years, normalized by the number of births.
A more complex dataset comes from the macleish
package, which has spatial data coupled with weather data recorded from two monitor stations in Massachussetts.
To get the data first load the macleish
package with library()
.
The data consists of multiple layers, each in the form of an sf
data frame, and collected into an R list called macleish_layers
. See the names of the layers with names(macleish_layers)
.
Code:
library(macleish)
## Loading required package: etl
names(macleish_layers)
## [1] "landmarks" "forests" "streams"
## [4] "challenge_courses" "buildings" "wetlands"
## [7] "boundary" "research" "soil"
## [10] "trails" "contours_30ft" "contours_3m"
To access individual layers in this list, we need to use R’s double square bracket syntax, which you may not have used before. In a nutshell, when working with a proper list object, single square brackets (macleish_layers[3]
or macleish_layers[3:4]
) return a sublist, whereas double brackets actually return the entries in the list. We can refer to the entries by name using quotes.
For example, the buildings layer: Code:
macleish_layers[["buildings"]]
## Simple feature collection with 27 features and 1 field
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: -72.68251 ymin: 42.44104 xmax: -72.67923 ymax: 42.44919
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## First 10 features:
## name geometry
## 1 0 POLYGON ((-72.68056 42.4484...
## 2 0 POLYGON ((-72.68051 42.4483...
## 3 0 POLYGON ((-72.68123 42.4464...
## 4 0 POLYGON ((-72.68116 42.4463...
## 5 0 POLYGON ((-72.68134 42.4461...
## 6 0 POLYGON ((-72.68135 42.4451...
## 7 0 POLYGON ((-72.68137 42.4450...
## 8 0 POLYGON ((-72.68102 42.4445...
## 9 0 POLYGON ((-72.68098 42.4444...
## 10 0 POLYGON ((-72.68143 42.4443...
We can get a quick plot of the buildings in this layer by simply piping the layer into plot()
.
macleish_layers[["buildings"]] %>% plot()
There is a tutorial, or “vignette” in R parlance, that gives you some context, examples, and analysis of this data, available by typing vignette("macleish")
.
macleish
data.leaflet
When working with multiple layers, it can be nicer to create an interactive plot so the user can get context about the data by hovering.
We can use yet another plotting library well suited for spatial data, called leaflet
. Here is an example, showing a particular building in the context of buildings and forests.
Code:
library(leaflet)
## First create a dataset for the particular building
bechtel <- data.frame(lat = 42.449167, lon = -72.679389)
## Now create a plot, which we will add to
base_plot <- leaflet() %>%
addTiles() %>%
addMarkers(lng = ~lon, lat = ~lat, data = bechtel,
popup = "Bechtel Environmental Classroom")
base_plot %>%
addPolygons(data = macleish_layers[["buildings"]],
weight = 1, popup = ~name) %>%
addPolygons(data = macleish_layers[["forests"]],
weight = 1, fillOpacity = 0.2, popup = ~ Sheet1__Na)
Upload your plot for Exercise 4 to #lab14 on Slack by next Wednesday’s class (we will work on this lab for part of next Monday if needed).