• The final writeup is due by Sat. 5/19 at 7pm, which is the designated “final exam” time for this class.

### Goals

The specs of this project have been revised slightly from what is in the syllabus. You may elect to do a project along the lines of what is stated there, but I want to open it up a bit.

There are three options::

• Option 1: Do a new analysis using a dataset that interests you, along the same lines as project 3 (so, using a large dataset, accessed via SQL)
• Option 2: Do a new analysis involving one or more of the ideas or techniques that haven’t been part of a project yet (e.g., interactive graphics, clustering, dimensionality reduction, geographic data, text data). If you incorporate one or more of these elements, you can use any size dataset, accessed with SQL or not.
• Option 3: “Improve and extend” one of your first three projects. The “improve” part means revising elements of the original project that you weren’t entirely satisfied with the first time. The “extend” part means taking advantage of tools that you learned after you did the original project to look at the data from a different perspective, complement the analysis with additional data, apply some of the machine learning algorithms you will have learned by that point, etc. The degree of extension expected is inversely related to the scope of the original project: a revision of project 1 should be more different from the original than a revision of project 3.

For this project, you can choose your own group of up to 3 people, or work individually if you prefer. If you elect option 3, since revising an existing project is less work than creating a new one from scratch, you must work individually (but it is OK if multiple group members want to revise the same project in their own different directions).

### The final writeup

The writeup should be structured similarly to previous projects, in particular project 3 (which had a bit more generous length limit).

### Data Visualization Requirements

As always, prioritize quality over quantity. A single really well thought out graph that took a fair amount of work is much better than several hasty ones. As always, the graphs must be created within your .Rmd using ggplot2, and you should endeavor to do whatever wrangling is necessary beforehand so that the visualization pipeline is as well structured and concise as possible.

Even if you are doing some of your wrangling in dplyr, keep your wrangling pipeline(s) separate from your visualization pipeline(s), for readability’s sake.

### Data

You can use any of the data sources from previous projects, except that you may only use data from an R package if you are revising project 1, and if you are doing this I will expect that you are either complementing it with another data set, or you are applying some technique from the last “quarter” of the semester.

### The GitHub workflow

• The GitHub clasroom link to create your project repo is here
• See the Project 2 description for an outline of the recommended GitHub workflow.

### Turning in your project

You should record the history of edits to your project via GitHub commits; even if you are working by yourself (if you are working alone, you may only need to push at the end, though you should probably still push occasionally to “back up” your work on another server to safeguard against ill-timed RStudio server unavailability).

As always your final submission will consist of the .Rmd source, compiled .html, and any other files needed for the .Rmd to compile successfully, and whatever state those files are in at the deadline is what I will grade.

The syllabus states that each of the four projects is to conribute 15% of the final grade. However, I am a fan of “second chances”, so I will let this final project replace the lowest of the first three project grades, and average it on equal footing with the remaining two. If that results in a higher grade than averaging all four projects, I will use that to calculate final grades.

#### Rough rubric

Since the specs for this project are more open, the grading rubric is not as concrete this time, but here is a rough rubric:

#### Technical essentials (12 pts)

• +1: the .Rmd compiles successfully with no error messages
• +1: code, unnecessary messages, and raw R output (other than the plots) are suppressed from the .html output
• +1: The visualization pipeline is kept as “clean” as possible by wrangling your data into a form that is conducive to simple visualization commands
• +1: The GitHub workflow is followed, and informative commit messages (that describe what changes were made in that commit) are included.
• +2: The technical content of the project lines up with one of the options given
• +0-6: Score based on overall ambitiousness and technical quality. I will rate the ambitiousness (A) of the project on a 0-5 scale, and the overall technical correctness of the project, on a 0-6 scale. The score for this component will be calculated as (6/20) * min(20, A*T). In other words, to get full credit, the product of your ambitiousness and correctness should be at least 20 (for example, your project could be 4/5 on the ambitiousness scale and 5/6 on the correctness scale, or 5/5 on the ambitiousness scale and 4/6 on the correctness scale). You can get almost full credit by doing a project that is of average ambitiousness (3/5) but which is technically flawless (6/6). I will be impressed if your project is both extremely ambitious and flawless, but your grade won’t exceed 6; that is, technical flourish can’t substitute for a good writeup.

#### Writeup essentials (13 pts)

• +1: The final post is the right length (800-1000ish words)
• +2: At least one visualization is included, as generated by embedded code
• +2: An engaging introduction and discussion are included
• +2: Each graphic is interpreted clearly and concisely in the text, including a “take-home” message in no more than two sentences
• +1: Data-wrangling methodology is included in the appendix, and is clear
• +1: Data-visualization methodology is included in the appendix, and is clear
• +0-4: Overall polish of the writeup