2/9/2018

RMarkdown

This presentation was made in RStudio using RMarkdown!

  • Can set output: ioslides_presentation in the header

Course Business

To-Dos

  • Anybody not on Slack/DataCamp yet?
  • Anybody not filled out the background survey?
  • Anybody not signed up to meet with me?
  • Anybody not have a working RStudio server account?

Lab 1 Recap

Structure of R commands

  • R commands are like sentences:

    • Functions are like verbs
    • Some arguments are like nouns (with particular roles)
    • (Other arguments are like prepositional phrases)

Example

draw(picture, recipient = "me", material = "paint")
Component Role
draw function
picture argument value
recipient argument name
"me" argument value
material argument name
"paint" argument value

The "Pipe" operator (%>%)

  • takes the thing on the left, and (by default) puts it (or its output, if it is a function call) into the first argument slot for the function on the right

  • We'll do a lot more with this later

## Two equivalent expressions
arbuthnot %>% mutate(total = boys + girls)
mutate(arbuthnot, total = boys + girls)

The mutate() function

  • Takes a data frame as the main (first) argument
  • Takes a "formula" defining a new variable as the second argument
  • Output is a new data frame with a new column added
  • If we want to keep the new data frame, need to assign it to a named "container"
## Creates the column, but it immediately disappears
mutate(arbuthnot, total = boys + girls)
## Creates the column, and creates a brand new data frame
## that has both new and old variables (now we have two
## data frames, one with and one without the new variable)
new.arbuthnot <- mutate(arbuthnot, total = boys + girls)
## Creates the column, and overwrites the original data
## with a data frame that has new and old columns
arbuthnot <- mutate(arbuthnot, total = boys + girls)

Elements of data graphics

Elements of data graphics

  • Visual cues
    • position, size, color, etc.
  • Coordinate system
    • how are data points organized?
  • Scale
    • relationship between variable and distance in space
  • Context
    • what in the world is the data about?
  • Faceting
    • What are the sub-parts (facets) of the graph?

Source: Nathan Yau, Data Points

Source: Nathan Yau, Data Points

Source: Nathan Yau, Data Points

Source: Nathan Yau, Data Points

Source: Nathan Yau, Data Points

Source: New York Times

A Perceptual Hierarchy

Cleveland and McGill (1985): people better at judging:

  • position than size
  • length than angle
  • 1D differences than 2D differences
  • 2D differences than 3D differences
  • size than color

Some dos and don'ts

  • Prefer length scale to color scale
  • Never use pie charts
  • Always Bring Context
  • Above all else, show the data (Edward Tufte)

The Grammar of Graphics

The Grammar of Graphics

  • The Grammar of Graphics is a book by Leland Wilkinson (1999, 2005) that set out to define the "parts of speech" and "grammar rules" of data visualization
  • An "ontology of graphs"
  • Implemented in R in ggplot2 (Hadley Wickham, 2010)
    • ggplot (or ggplot1) was sort of a beta version; not really used today

Graphical elements in ggplot2

Graphical element ggplot2 object(s)
The data data= argument
Geometric objects geom_*() functions
Mappings of variables to cues aes() function
Scales scale_*() functions
Faceting facet_wrap(), facet_grid()

The basic operation is combination of these elements via the + operator. A fully formed combination returns a plot as an R object (can be assigned to things, operated on later, etc.)

The minimal template

Plots must at a minimum have data (data=), a mapping (aes()), and at least one geometry element (geom_*())

library(tidyverse)
ggplot(data = mtcars, aes(x = disp, y = mpg)) +
  geom_point()

Something more complex

ggplot(data = mtcars, aes(x = disp, y = mpg, color = factor(cyl))) +
  geom_point() + 
  geom_line() + 
  facet_wrap(~am) +
  scale_color_brewer(palette = "Set1")