STAT 113: Lab 1

Scroll down to the section labeled ## Using RMarkdown

Using RMarkdown

RMarkdown is a file format that allows us to interleave text (with some simple formatting control), code, the output of code, and plots produced by code, in a single file.

This gives us less to keep track of, and it also ensures that our data, our code and the output or plots we’re reporting are consistent with each other.

Since the whole document is plain text, it’s small, fully portable, and we don’t need to use any paid software to open it.

However, even though what you’re looking at now is plain text, we can produce a more elegant looking version of the document by “compiling”, or what’s called “Knitting”, the plain text into one of several formats: HTML, PDF, or Word.

Try this now: Look for a button above that says “Knit”. In the dropdown menu, select HTML. You should get a popup window (your popup blocker might complain initially; just allow it to display the popup).

As you work on a project or assignment, you should periodically Knit your file, to verify that your formatting is correct (do it often enough that if you get an error you don’t have to hunt through a large swath of text and code to locate the source)

The document “metadata”

At the very top of this file, above the three dashes, is a section that defines some “metadata” for ths document. This includes things like the title, author, and formatting instructions for the document as a whole.

Before moving on, fill in the “Author” field in the metadata section with your names.

Section headers

The ## in this context indicates a heading. One # is a document title level heading. More #s means a subheading.

Subheadings

The section “Subheadings” is a subsection of “Section headers”

Text formatting

Adding emphasis

Text can be decorated with bold or italics (equivalently, italics).

Lists

We can make bulleted lists like this:

My first item
My second item

Be sure to put a space after the * when you are creating bullets and a space after # when creating section headers.

Code chunks

Creating and running code chunks

We can embed “chunks” of R code inside our document by containing it in a “fence”, which looks like this:

## Code and comments go in here

Add a blank line after the word ‘here’ in the chunk above, and in that line, type 2 + 2.

To see the output the chunk, click on the “Play” button (the green triangle in the upper right corner of the chunk).

You can insert a new R code chunk either with the Insert > R menu in RStudio, or with a keyboard shortcut (See the Code dropdown menu for the shortcut for “Insert Chunk”). Or you can just type the “fence” directly (though this is easy to mess up at first)

Chunks Producing Graphics

If the code of an R chunk produces a plot, this plot can be displayed in the resulting file.

The chunk below creates a scatterplot using a dataset called Births78. Don’t worry about what the code does for now; we’ll get to that soon.

For now, just run the chunk using the “Run chunk” button (looks like a ‘play’ button)

gf_point(births ~ date, data = Births78)

Running previous chunks

Sometimes the code in one chunk requires code in previous chunks to have run before it will work. You can run all previous chunks with the second-to-rightmost button in the upper right.

Aside: Chunk settings

There are also some chunk settings you can modify by clicking the “gear”. This includes

whether or not we want the code in that chunk to be run when we produce the output file
whether we want the code itself to appear in the output
whether we want the results of the code to appear in the output
how large or small we want plots produced by the code chunk to be
as well as others

R output

Other forms of R output are also displayed as they are produced. Again, don’t worry about the specific command or the meaning of the output for now; we’re just trying to get a feel for how chunks work.

favstats(~ births, data = Births78)

##   min   Q1 median   Q3   max     mean       sd   n missing
##  7135 8554   9218 9705 10711 9132.162 817.8821 365       0

“Knitting” the Markdown document

Once again, to “render” the complete document, we “Knit” the file, either to HTML, PDF, or Word.
Just select the desired output file type and click on Knit HTML, Knit PDF, or Knit Word (there are also keyboard shortcuts to Knit to the default format if you prefer that)

Try this again, to get into the habit: Knit this document into an HTML file, and look over it to see how each section is rendered.

Objects and Variables

R, like other programming languages, stores information (such as data) in objects, which are given labels so that we can refer to them as we are working.

Some kinds of objects are

text strings, which are used for labels, or the values of categorical variables (like “blue”), among other things.
numbers, on which arithmetic can be performed
logical values, TRUE and FALSE (note the all caps).

There are also objects called vectors, which are like lists whose entries can be text strings, numbers, or logical values.

When we want to use an object a lot (such as a numeric value, like a mean, from some statistical computation), it is helpful to give it a name so we can refer to it by what it represents, instead of by its values.

Assignment

We can give a name to an object using an expression of the form

name <- value

This process is called assignment, because we are “assigning” the value to a container with the name we’ve chosen. The named thing is called a variable (which means something a bit different than a variable in the statistical sense, although a variable in code can refer to a statistical variable).

For example:

myName <- "Colin Dawson"
myAge <- 38

You can read the <- symbol as “gets”, as in “(The name) my.name gets (the value)”Colin Dawson". Notice that there is just one hyphen in the arrow. A common error is to add an extra hyphen to the arrow, which R will misinterpret as a minus sign.

It is also legal to use underscores and digits in variable names, but none of these can be used at the beginning of a name.

Assigning the Result of a Command

We can also store the result of a command in a named variable. A simple example is the following:

myResult <- sqrt(25)

Now if I type the name of the new variable at the console, R will print out its contents:

myResult

## [1] 5

The 1 in brackets is there to indicate that the next value shown is the first entry in the variable called myResult (Note that if you try to access the variable MyResult, you will get an error, because you defined it with a lower case “m”). In this case the variable has only one entry, but sometimes we will hold lists of data or other values in a variable.

We can also use variables as the values of arguments, such as in:

a_squared <- 3^2
b_squared <- 4^2
a_squared_plus_b_squared <- a_squared + b_squared

Notice that if we run the chunk that defines these variables, we will see them appear in the Environment tab in the upper right pane. This shows us everything we’ve defined.

Make new code chunk. In it, define variables with your name, your birth year, and the current year. Run the chunk and verify that the variables you created appear in the environment tab.
Now, in a new chunk, create a variable that calculates your age by doing arithmetic with the variable for your birth year and the one for the current year.

The Global Environment vs the Knitting Environment

The Environment tab will also contain any variables that we defined in other documents, at the console, or in chunks that we’ve since deleted. This can cause problems, because variables can wind up referring to things that don’t exist in the current document, or to things that should have a different value in the current document.

Fortunately, when we Knit our document, the rendering program ignores the interactive environment and creates its own encapsulated environment that only contains variables we’ve defined in the current document (and similarly, only allows us to use datasets and functionality from packages that have been loaded in our document).

This means that we can only use variables in our document that have been defined prior to the point when we refer to them. If we try to use a variable above the chunk where it’s defined, it may work when we’re running chunks interactively (provided we’ve previously run the chunk where it’s defined), but it won’t work when we try to Knit. This is another reason why Knitting every so often is a good idea, since it helps us catch errors in our document that we might otherwise miss.

Try defining a variable called theAnswer at the console (rather than in a code chunk), and assign it the value 42. Then, create a code chunk that refers to theAnswer in an expression that computes twice the answer. What should happen is that the chunk will run fine when you just try to run it by itself, but if you try to Knit you’ll get an error.
Fix the error by adding a code chunk in an appropriate spot that defines theAnswer within the document.

Turning in Files on the Server

Once you’re done, go to the Files tab in the lower right, and find the folder inside stat113 called turnin. Within that there’s a folder called hw1. Do Save As... with your .Rmd file, and put it in the stat113/turnin/hw1 folder as lab1.Rmd. Now, go back to your project folder in the Files tab, and find the Knitted file called lab1.html. In the Files tab, click the “gear” icon that says “More”, and choose “Copy To…”. Save a copy of lab1.html in stat113/turnin/hw1 as well.

Documenting file creation

It’s useful to record some information about how your file was created at the very end of the file. I will typically include the following ‘footer’ in the templates I provide you.

File creation date: 2020-09-02
R version 3.6.0 (2019-04-26)
R version (short form): 3.6.0
mosaic package version: 1.5.0
tidyverse package version: 1.2.1
Additional session information

## R version 3.6.0 (2019-04-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] mosaic_1.5.0      Matrix_1.2-17     mosaicData_0.17.0
##  [4] ggformula_0.9.1   ggstance_0.3.1    lattice_0.20-38  
##  [7] forcats_0.4.0     stringr_1.4.0     dplyr_0.8.3      
## [10] purrr_0.3.2       readr_1.3.1       tidyr_1.0.0      
## [13] tibble_2.1.3      ggplot2_3.2.1     tidyverse_1.2.1  
## 
## loaded via a namespace (and not attached):
##  [1] ggrepel_0.8.1    Rcpp_1.0.2       lubridate_1.7.4  assertthat_0.2.1
##  [5] zeallot_0.1.0    digest_0.6.21    mime_0.7         R6_2.4.0        
##  [9] cellranger_1.1.0 backports_1.1.4  evaluate_0.14    httr_1.4.0      
## [13] pillar_1.4.2     rlang_0.4.0      lazyeval_0.2.2   readxl_1.3.1    
## [17] rstudioapi_0.10  rmarkdown_1.15   labeling_0.3     splines_3.6.0   
## [21] htmlwidgets_1.3  munsell_0.5.0    shiny_1.3.2      broom_0.5.2     
## [25] compiler_3.6.0   httpuv_1.5.1     modelr_0.1.4     xfun_0.9        
## [29] pkgconfig_2.0.3  htmltools_0.3.6  tidyselect_0.2.5 gridExtra_2.3   
## [33] mosaicCore_0.6.0 crayon_1.3.4     withr_2.1.2      later_0.8.0     
## [37] MASS_7.3-51.4    grid_3.6.0       xtable_1.8-4     nlme_3.1-140    
## [41] jsonlite_1.6     gtable_0.3.0     lifecycle_0.1.0  magrittr_1.5    
## [45] scales_1.0.0     cli_1.1.0        stringi_1.4.3    promises_1.0.1  
## [49] leaflet_2.0.2    xml2_1.2.0       ggdendro_0.1-20  generics_0.0.2  
## [53] vctrs_0.2.0      tools_3.6.0      glue_1.3.1       hms_0.4.2       
## [57] crosstalk_1.0.0  yaml_2.2.0       colorspace_1.4-1 rvest_0.3.4     
## [61] knitr_1.25       haven_2.1.0