ggplot2
Gain familiarity with the way that color palettes and aesthetic themes are specified in ggplot2
, so that you can use both in your own visualizations.
The examples will use the storms
dataset which is included in the dplyr
package (both dplyr
and ggplot2
are loaded by tidyverse
, so it suffices to do library(tidyverse)
). The data comes from the NOAA Atlantic hurricane database, and includes positions and attributes of 198 tropical storms measured at six hour intervals during each storm’s lifetime.
storms
data, and examine the documentation using the ?datasetname
syntax to see the definitions of the variables.data(storms)
?storms
# The last line generally wouldn't go in a Markdown document
# since it opens an external window
wind_boxplot
.You might notice that in the boxplot you get, the storm types are ordered alphabetically. In this case the types have a natural ordering based on their intensity: tropical depressions are the least severe, and hurricanes the most. Our plot will be more natural if we respect this natural ordering. We can do this as follows:
Code
We can add some color to the plot by adding a redundant mapping: in addition to mapping status
to the x
dimension, we’ll also map it to the fill
feature (the interior color of the boxplots).
Code
The default color palette looks pretty nice, but just for fun let’s see how we could change it.
One option is to directly specify the colors we want R to use, using hexadecimal codes for the colors’ RGB values. The colors in the following example come from one of my personal favorite palettes (you might recognize it from my website, for example), the Solarized palette by Ethan Schoonover. In each code, the first two characters represent the red channel (on a 0-255 scale, encoded in base 16), the next two represent the green channel, and the last two represent the blue channel.
Code
Manually specifying colors allows very precise control, but it’s tedious, time consuming, and brittle. It is generally better and easier to use predefined color palettes instead.
A fairly large and diverse collection of color palettes was developed by Cynthia Brewer (link) and is available in the RColorBrewer
package.
After loading the package, we can view the included palettes with display.brewer.all()
Code
We can apply whichever palette we choose to the fill
feature with the scale_fill_brewer()
function:
Code
Makes for a great Easter decoration. Or collection of golf shirts?
Let’s visualize the paths of some of these storms, using color to depict wind speed, so we can see how the intensity changed over the path of each storm.
To make the plot more manageable, we’ll create a smaller dataset, storms1995
with only those storms that occurred in 1995.
Code
x
dimension, latitude to the y
dimension, and wind speed to the color
dimension.trajectory_plot <-
storms1995 %>%
ggplot(aes(x = long, y = lat, color = wind)) +
geom_point() +
facet_wrap(~name)
trajectory_plot
ggthemes
In addition to controlling the features involved in the aesthetic mapping, we can control the overall look of a plot, through things like font, style of the axes, style and color of the background, etc.
There are several predefined themes made available in the ggthemes
package, including several that mimic the style of popular publications, like the Wall Street Journal, The Economist, FiveThirtyEight, and others. Nate Silver of FiveThirtyEight used natural disasters as a running example of predictive modeling in his book The Signal and the Noise, so let’s apply the FiveThirtyEight theme to our hurricane box plot from above. Note that the specification of theme is in many cases separate from the specification of color palette, but many of the themes come with one or more accompanying palettes.
Code
data()
at the console, with no arguments, and examine the documentation for a given dataset with the ?datasetname
syntax. Use ggplot()
to create a plot of something interesting from the dataset you choose, applying color and theme to style the plot. In a sentence or two, describe what your plot shows. Post your graphic and your short description to Slack – this time, let’s post them to the #lab3
channel so everyone can see each other’s plots (since they will presumably all be different)!This lab was adapted by Colin Dawson for STAT 209: Data Computing and Visualization, at Oberlin College, from a similar lab by Jordan Crouser and Ben Baumer used for SDS 192: Introduction to Data Science, at Smith College.