Scroll down to the section labeled ## Using RMarkdown
RMarkdown is a file format that allows us to interleave text (with some simple formatting control), code, the output of code, and plots produced by code, in a single file.
This gives us less to keep track of, and it also ensures that our data, our code and the output or plots we’re reporting are consistent with each other.
Since the whole document is plain text, it’s small, fully portable, and we don’t need to use any paid software to open it.
However, even though what you’re looking at now is plain text, we can produce a more elegant looking version of the document by “compiling”, or what’s called “Knitting”, the plain text into one of several formats: HTML, PDF, or Word.
As you work on a project or assignment, you should periodically Knit your file, to verify that your formatting is correct (do it often enough that if you get an error you don’t have to hunt through a large swath of text and code to locate the source)
At the very top of this file, above the three dashes, is a section that defines some “metadata” for ths document. This includes things like the title, author, and formatting instructions for the document as a whole.
The ## in this context indicates a heading. One # is a document title level heading. More #s means a subheading.
The section “Subheadings” is a subsection of “Section headers”
Text can be decorated with bold or italics (equivalently, italics).
We can make bulleted lists like this:
Be sure to put a space after the * when you are creating bullets and a space after # when creating section headers.
We can embed “chunks” of R code inside our document by containing it in a “fence”, which looks like this:
To see the output the chunk, click on the “Play” button (the green triangle in the upper right corner of the chunk).
You can insert a new R code chunk either with the Insert > R menu in RStudio, or with a keyboard shortcut (See the Code dropdown menu for the shortcut for “Insert Chunk”). Or you can just type the “fence” directly (though this is easy to mess up at first)
If the code of an R chunk produces a plot, this plot can be displayed in the resulting file.
The chunk below creates a scatterplot using a dataset called Births78
. Don’t worry about what the code does for now; we’ll get to that soon.
Sometimes the code in one chunk requires code in previous chunks to have run before it will work. You can run all previous chunks with the second-to-rightmost button in the upper right.
There are also some chunk settings you can modify by clicking the “gear”. This includes
Other forms of R output are also displayed as they are produced. Again, don’t worry about the specific command or the meaning of the output for now; we’re just trying to get a feel for how chunks work.
## min Q1 median Q3 max mean sd n missing
## 7135 8554 9218 9705 10711 9132.162 817.8821 365 0
Once again, to “render” the complete document, we “Knit” the file, either to HTML, PDF, or Word.
Just select the desired output file type and click on Knit HTML
, Knit PDF
, or Knit Word
(there are also keyboard shortcuts to Knit to the default format if you prefer that)
R, like other programming languages, stores information (such as data) in objects, which are given labels so that we can refer to them as we are working.
Some kinds of objects are
TRUE
and FALSE
(note the all caps).There are also objects called vectors, which are like lists whose entries can be text strings, numbers, or logical values.
When we want to use an object a lot (such as a numeric value, like a mean, from some statistical computation), it is helpful to give it a name so we can refer to it by what it represents, instead of by its values.
We can give a name to an object using an expression of the form
This process is called assignment, because we are “assigning” the value to a container with the name we’ve chosen. The named thing is called a variable (which means something a bit different than a variable in the statistical sense, although a variable in code can refer to a statistical variable).
For example:
You can read the <-
symbol as “gets”, as in “(The name) my.name
gets (the value)”Colin Dawson". Notice that there is just one hyphen in the arrow. A common error is to add an extra hyphen to the arrow, which R will misinterpret as a minus sign.
It is also legal to use underscores and digits in variable names, but none of these can be used at the beginning of a name.
We can also store the result of a command in a named variable. A simple example is the following:
Now if I type the name of the new variable at the console, R will print out its contents:
## [1] 5
The 1 in brackets is there to indicate that the next value shown is the first entry in the variable called myResult
(Note that if you try to access the variable MyResult
, you will get an error, because you defined it with a lower case “m”). In this case the variable has only one entry, but sometimes we will hold lists of data or other values in a variable.
We can also use variables as the values of arguments, such as in:
Notice that if we run the chunk that defines these variables, we will see them appear in the Environment tab in the upper right pane. This shows us everything we’ve defined.
Make new code chunk. In it, define variables with your name, your birth year, and the current year. Run the chunk and verify that the variables you created appear in the environment tab.
Now, in a new chunk, create a variable that calculates your age by doing arithmetic with the variable for your birth year and the one for the current year.
The Environment tab will also contain any variables that we defined in other documents, at the console, or in chunks that we’ve since deleted. This can cause problems, because variables can wind up referring to things that don’t exist in the current document, or to things that should have a different value in the current document.
Fortunately, when we Knit our document, the rendering program ignores the interactive environment and creates its own encapsulated environment that only contains variables we’ve defined in the current document (and similarly, only allows us to use datasets and functionality from packages that have been loaded in our document).
This means that we can only use variables in our document that have been defined prior to the point when we refer to them. If we try to use a variable above the chunk where it’s defined, it may work when we’re running chunks interactively (provided we’ve previously run the chunk where it’s defined), but it won’t work when we try to Knit. This is another reason why Knitting every so often is a good idea, since it helps us catch errors in our document that we might otherwise miss.
Try defining a variable called theAnswer
at the console (rather than in a code chunk), and assign it the value 42. Then, create a code chunk that refers to theAnswer
in an expression that computes twice the answer. What should happen is that the chunk will run fine when you just try to run it by itself, but if you try to Knit you’ll get an error.
Fix the error by adding a code chunk in an appropriate spot that defines theAnswer
within the document.
Files
tab in the lower right, and find the folder inside stat113
called turnin
. Within that there’s a folder called hw1
. Do Save As...
with your .Rmd
file, and put it in the stat113/turnin/hw1
folder as lab1.Rmd
. Now, go back to your project folder in the Files tab, and find the Knitted file called lab1.html
. In the Files tab, click the “gear” icon that says “More”, and choose “Copy To…”. Save a copy of lab1.html
in stat113/turnin/hw1
as well.It’s useful to record some information about how your file was created at the very end of the file. I will typically include the following ‘footer’ in the templates I provide you.
mosaic
package version: 1.5.0tidyverse
package version: 1.2.1## R version 3.6.0 (2019-04-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] mosaic_1.5.0 Matrix_1.2-17 mosaicData_0.17.0
## [4] ggformula_0.9.1 ggstance_0.3.1 lattice_0.20-38
## [7] forcats_0.4.0 stringr_1.4.0 dplyr_0.8.3
## [10] purrr_0.3.2 readr_1.3.1 tidyr_1.0.0
## [13] tibble_2.1.3 ggplot2_3.2.1 tidyverse_1.2.1
##
## loaded via a namespace (and not attached):
## [1] ggrepel_0.8.1 Rcpp_1.0.2 lubridate_1.7.4 assertthat_0.2.1
## [5] zeallot_0.1.0 digest_0.6.21 mime_0.7 R6_2.4.0
## [9] cellranger_1.1.0 backports_1.1.4 evaluate_0.14 httr_1.4.0
## [13] pillar_1.4.2 rlang_0.4.0 lazyeval_0.2.2 readxl_1.3.1
## [17] rstudioapi_0.10 rmarkdown_1.15 labeling_0.3 splines_3.6.0
## [21] htmlwidgets_1.3 munsell_0.5.0 shiny_1.3.2 broom_0.5.2
## [25] compiler_3.6.0 httpuv_1.5.1 modelr_0.1.4 xfun_0.9
## [29] pkgconfig_2.0.3 htmltools_0.3.6 tidyselect_0.2.5 gridExtra_2.3
## [33] mosaicCore_0.6.0 crayon_1.3.4 withr_2.1.2 later_0.8.0
## [37] MASS_7.3-51.4 grid_3.6.0 xtable_1.8-4 nlme_3.1-140
## [41] jsonlite_1.6 gtable_0.3.0 lifecycle_0.1.0 magrittr_1.5
## [45] scales_1.0.0 cli_1.1.0 stringi_1.4.3 promises_1.0.1
## [49] leaflet_2.0.2 xml2_1.2.0 ggdendro_0.1-20 generics_0.0.2
## [53] vctrs_0.2.0 tools_3.6.0 glue_1.3.1 hms_0.4.2
## [57] crosstalk_1.0.0 yaml_2.2.0 colorspace_1.4-1 rvest_0.3.4
## [61] knitr_1.25 haven_2.1.0