Using Color and Themes in ggplot2

Goal

Gain familiarity with the way that color palettes and aesthetic themes are specified in ggplot2, so that you can use both in your own visualizations.

Data

The examples will use the storms dataset which is included in the dplyr package (both dplyr and ggplot2 are loaded by tidyverse, so it suffices to do library(tidyverse)). The data comes from the NOAA Atlantic hurricane database, and includes positions and attributes of 198 tropical storms measured at six hour intervals during each storm’s lifetime.

  1. Load the storms data, and examine the documentation using the ?datasetname syntax to see the definitions of the variables.

SOLUTION

  1. Make a box plot depicting the distribution of wind speed grouped by the type of storm (tropical depression, tropical storm, or hurricane) and save it to a variable called wind_boxplot.

Side note: reordering a categorical variable

You might notice that in the boxplot you get, the storm types are ordered alphabetically. In this case the types have a natural ordering based on their intensity: tropical depressions are the least severe, and hurricanes the most. Our plot will be more natural if we respect this natural ordering. We can do this as follows:

Code

Adding color

We can add some color to the plot by adding a redundant mapping: in addition to mapping status to the x dimension, we’ll also map it to the fill feature (the interior color of the boxplots).

Code

Specifying colors manually

The default color palette looks pretty nice, but just for fun let’s see how we could change it.

One option is to directly specify the colors we want R to use, using hexadecimal codes for the colors’ RGB values. The colors in the following example come from one of my personal favorite palettes (you might recognize it from my website, for example), the Solarized palette by Ethan Schoonover. In each code, the first two characters represent the red channel (on a 0-255 scale, encoded in base 16), the next two represent the green channel, and the last two represent the blue channel.

Code

Using a predefined palette

Manually specifying colors allows very precise control, but it’s tedious, time consuming, and brittle. It is generally better and easier to use predefined color palettes instead.

A fairly large and diverse collection of color palettes was developed by Cynthia Brewer (link) and is available in the RColorBrewer package.

After loading the package, we can view the included palettes with display.brewer.all()

Code

We can apply whichever palette we choose to the fill feature with the scale_fill_brewer() function:

Code

Makes for a great Easter decoration. Or collection of golf shirts?

  1. Test out a few other palettes to find one that looks good.

CODE AND PLOTS


Let’s visualize the paths of some of these storms, using color to depict wind speed, so we can see how the intensity changed over the path of each storm.

To make the plot more manageable, we’ll create a smaller dataset, storms1995 with only those storms that occurred in 1995.

Code

  1. With the restricted data, use facets to separate individual storms by their name, and map longitude to the x dimension, latitude to the y dimension, and wind speed to the color dimension.

SOLUTION

  1. Recreate the plot above using a different color palette of your choice. Since we are mapping a quantitative variable to color, we should use either a sequential or diverging palette. Which one do you think makes more sense here?

SOLUTION

Applying global aesthetic themes using ggthemes

In addition to controlling the features involved in the aesthetic mapping, we can control the overall look of a plot, through things like font, style of the axes, style and color of the background, etc.

There are several predefined themes made available in the ggthemes package, including several that mimic the style of popular publications, like the Wall Street Journal, The Economist, FiveThirtyEight, and others. Nate Silver of FiveThirtyEight used natural disasters as a running example of predictive modeling in his book The Signal and the Noise, so let’s apply the FiveThirtyEight theme to our hurricane box plot from above. Note that the specification of theme is in many cases separate from the specification of color palette, but many of the themes come with one or more accompanying palettes.

Code

  1. Find an interesting dataset from one of the R packages that’s already installed. You can list them by typing data() at the console, with no arguments, and examine the documentation for a given dataset with the ?datasetname syntax. Use ggplot() to create a plot of something interesting from the dataset you choose, applying color and theme to style the plot. In a sentence or two, describe what your plot shows. Post your graphic and your short description to Slack – this time, let’s post them to the #lab3 channel so everyone can see each other’s plots (since they will presumably all be different)!

CODE AND INTERPRETATION

This lab was adapted by Colin Dawson for STAT 209: Data Computing and Visualization, at Oberlin College, from a similar lab by Jordan Crouser and Ben Baumer used for SDS 192: Introduction to Data Science, at Smith College.