The structure and spirit of this project is adapted from the SDS 192 “Intro to Data Science” course at Smith College, by Ben Baumer and Jordan Crouser.


The goal of this project is to create informative, accurate, and aesthetically pleasing data graphics.

Required skills

Proficiency with the ggplot2 package

Relevant background and resources


Your group will work together to write a blog post that contains one or more data graphics that tell the reader something interesting about the domain that the data comes from. The following are some examples of the kind of structure I have in mind (though most of these are longer than your post will be).

Conciseness is of value here: aim for a post in the 200-400 word range. A suggested structure is as follows (though you do not have to adhere to this exactly):

Turning in your project

Collaboration on your project should take place via GitHub commits. Your .Rmd source should be part of a GitHub repo from its inception, and changes should be recorded via commits from the account of the person who made the edit. Everyone in the group must make at least one commit.

Your final submission will consist of the .Rmd source, compiled .html, and any other files needed for the .Rmd to compile successfully. For example, if you are reading in the data from a .csv file stored in your RStudio server account, commit this file. If you are reading the dataset directly from an R package or from a URL, this is not necessary.

Whatever state those files are in at the deadline is what I will grade.


You can use any data source you want. For this first project, you are not expected to do any data wrangling, so you should spend minimal (if any) time manipulating the dataset. You might want to do some filter()ing to select subsets, and/or mutate()ing to create new variables, but that’s the extent of the wrangling you should do (and only do that if it’s appropriate for what you want to show!)

Some possible sources for data are:

To see a list of the datasets provided by a given R package, you can type the following at the console (fill in the package name).

Grading Rubric

The project will consist of 20 points, 17 assigned to the group as a whole and 3 to each individual. The grade is based on the following criteria:

Group grade: Basic (10 pts)

  • +1: the .Rmd compiles successfully
  • +1: a description of the dataset is provided
  • +1: a graphic is included
  • +2: the graphic is generated by the code embedded in the .Rmd (not included from an external file)
  • +1: the visual (aesthetic) mapping is described in the text
  • +1: the graphic includes relevant context (title, axis labels, etc.)
  • +1: the blog post is not too long and not too short (roughly 200-400 words)
  • +2: the graphic is interpreted clearly and concisely, including a “take-home” message in no more than two sentences

Group grade: Finishing touches (7 pts)

  • +1: code, unnecessary messages, and raw R output (other than the plots) are suppressed from the .html output
  • +1: the choices made are effective and allow information to be conveyed clearly and efficiently
  • +2: the visualization choices are described in a “Methods” paragraph
  • +0-3: subjective assessment of the overall quality and polish of your post

Individual grade (3 pts)

  • +1: your individual contributions are clearly documented with commits in GitHub
  • +0-2: you carried your weight in contributing to the project
data(package = "packagename")

Some of the above packages are not installed, so you will first need to install them with install.packages("packagename")