STAT 209: Lab 1b

Scroll down to below this first code chunk.

Objects and Variables

R, like other programming languages, stores information (such as data) in objects, which are given labels so that we can refer to them as we are working.

Some kinds of objects are

text strings, which are used for labels, or the values of categorical variables (like “blue”), among other things.
numbers, on which arithmetic can be performed
logical values, TRUE and FALSE (note the all caps).

There are also objects called vectors, which are like lists whose entries can be text strings, numbers, or logical values.

When we want to use an object a lot (such as a numeric value, like a mean, from some statistical computation), it is helpful to give it a name so we can refer to it by what it represents, instead of by its values.

Assignment

We can give a name to an object using an expression of the form

name <- value

This process is called assignment, because we are “assigning” the value to a container with the name we’ve chosen. The named thing is called a variable (which means something a bit different than a variable in the statistical sense, although a variable in code can refer to a statistical variable).

For example:

myName <- "Colin Dawson"
myAge <- 38

You can read the <- symbol as “gets”, as in “(The name) my.name gets (the value)”Colin Dawson". Notice that there is just one hyphen in the arrow. A common error is to add an extra hyphen to the arrow, which R will misinterpret as a minus sign.

It is also legal to use underscores and digits in variable names, but none of these can be used at the beginning of a name.

variable3 <- 57
## underscores are ok 
variable_3 <- 57
## spaces are not
## my Name <- "Colin Dawson" # gives an error
## can't start a variable name with a digit
## 3rdvariable <- 57 # not a valid variable name

Assigning the Result of a Command

We can also store the result of a command in a named variable. A simple example is the following:

myResult <- sqrt(25)

Now if I type the name of the new variable at the console, or refer to it by itself in a chunk, R will print out its contents:

myResult

## [1] 5

The 1 in brackets is there to indicate that the next value shown is the first entry in the variable called myResult (Note that if you try to access the variable MyResult, you will get an error, because you defined it with a lower case “m”). In this case the variable has only one entry, but sometimes we will hold lists of data or other values in a variable.

We can also use variables as the values of arguments, such as in:

a_squared <- 12^2
b_squared <- 16^2

a_squared_plus_b_squared <- a_squared + b_squared

Notice that if we run the chunk that defines these variables, we will see them appear in the Environment tab in the upper right pane. This shows us everything we’ve defined.

Make new code chunk. In it, define variables with your name, your birth year, and the current year. Run the chunk and verify that the variables you created appear in the environment tab.

myName <- "Colin Dawson"
birthYear <- 1982
currentYear <- 2020

Now, in a new chunk, create a variable that calculates your age as of December 31st this year by doing arithmetic with the variable for your birth year and the one for the current year.

myAge <- currentYear - birthYear

The Global Environment vs the Knitting Environment

The Environment tab will also contain any variables that we defined in other documents, at the console, or in chunks that we’ve since deleted. This can cause problems, because variables can wind up referring to things that don’t exist in the current document, or to things that should have a different value in the current document.

Fortunately, when we Knit our document, the rendering program ignores the interactive environment and creates its own encapsulated environment that only contains variables we’ve defined in the current document (and similarly, only allows us to use datasets and functionality from packages that have been loaded in our document).

This means that we can only use variables in our document that have been defined prior to the point when we refer to them. If we try to use a variable above the chunk where it’s defined, it may work when we’re running chunks interactively (provided we’ve previously run the chunk where it’s defined), but it won’t work when we try to Knit. This is another reason why Knitting every so often is a good idea, since it helps us catch errors in our document that we might otherwise miss.

Try defining a variable called theAnswer at the console (rather than in a code chunk), and assign it the value 42. Then, create a code chunk that refers to theAnswer in an expression that computes twice the answer. What should happen is that the chunk will run fine when you just try to run it by itself, but if you try to Knit you’ll get an error.

# twiceTheAnswer <- 2 * theAnswer

# twiceTheAnswer

Fix the error by adding a code chunk in an appropriate spot that defines theAnswer within the document.

Functions, arguments and commands

Most of what we do in R consists of applying functions to data objects, specifying some options for the function, which are called arguments. Together, the application of a function, together with its arguments, is called a command.

Many of the commands in R look a lot like functions from math class; that is, invoking R commands means supplying a function with some number of “inputs” (the arguments), which yields some kind of output, much as the sin() function in math takes a number as input and returns another number corresponding to its trigonometric sine.

A useful analogy is that commands are like sentences, where the function is the verb, and the arguments (one of which usually specifies the data object) are the nouns.

There is often a “main” argument that comes first. This is like the direct object of the command.

For example, in the English command, “Draw a picture for me with some paint”, the verb “draw” acts like the function (what is the listener supposed to do?); the noun “picture” is the direct object (draw what?), and “me” and “paint” are extra (in this case, optional) details, that we might call the “recipient” and the “instrument”.

In the grammar of R, I could write this sentence like:

## Note: this is not real R code
draw("picture", recipient = "me", material = "paint")

We are applying the function draw() to the object "picture", and adding some additional detail about the recipient and material. Here the function is called draw, and we have a main argument with the value "picture", and additional arguments recipient and material with the values "me", and "paint", respectively.

Technically speaking, "picture" is the value of an argument too; we might have written

### Note: this is not real R code
draw(object = "picture", recipient = "me", material = "paint")

However, in practice, there is often a required first “main” argument whose name is left out of the command.

In R, arguments always go inside parentheses, and are separated by commas when there is more than one. For arguments whose names are explicitly given, the name goes to the left of the =, and the value goes to the right.

The command

log(100, base = 10)

finds the logarithm of the number 100, using base 10 log.

We are applying the function log() function to the value 100 and modifying the behavior of log() through the optional argument base that in this case specifies what kind of logarithm we want.

log(100, base = 2)

## [1] 6.643856

log(100)

## [1] 4.60517

Assigning the results of a function

As we have seen, when we apply a function to some arguments, it produces a result. If we simply call the function, most of the time the result is just printed out. But often times we want to refer to or use that result later. In this case we can assign the result of the function call to a “container”; that is, to a named variable.

For example, if I have a variable called Income, I might want to compute and store the log of that income variable:

Income <- 42000
logIncome <- log(Income, base = 10)

Now logIncome is a variable whose value is the log (base 10) of the Income value.

Once you’re done, go to the Files tab in the lower right, and find the folder inside stat209 called turnin. Within that there’s a folder called lab1. Do Save As... with your .Rmd file, and put it in the stat209/turnin/lab1 folder as lab1b.Rmd. Now, go back to your project folder in the Files tab, and find the Knitted file called lab1b.html. In the Files tab, click the “gear” icon that says “More”, and choose “Copy To…”. Save a copy of lab1b.html in stat209/turnin/lab1 as well.

Documenting file creation

It’s useful to record some information about how your file was created at the very end of the file. I will typically include the following ‘footer’ in the templates I provide you.

File creation date: 2021-05-27
R version 3.6.0 (2019-04-26)
R version (short form): 3.6.0
mosaic package version: 1.5.0
tidyverse package version: 1.3.1
Additional session information

## R version 3.6.0 (2019-04-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] forcats_0.5.1   stringr_1.4.0   dplyr_1.0.5     purrr_0.3.4    
## [5] readr_1.4.0     tidyr_1.1.3     tibble_3.1.1    ggplot2_3.3.3  
## [9] tidyverse_1.3.1
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.1.1 xfun_0.19        haven_2.4.1      colorspace_1.4-1
##  [5] vctrs_0.3.8      generics_0.0.2   htmltools_0.3.6  yaml_2.2.0      
##  [9] utf8_1.1.4       rlang_0.4.11     pillar_1.6.0     glue_1.4.2      
## [13] withr_2.4.2      DBI_1.0.0        dbplyr_2.1.1     modelr_0.1.8    
## [17] readxl_1.3.1     lifecycle_1.0.0  munsell_0.5.0    gtable_0.3.0    
## [21] cellranger_1.1.0 rvest_1.0.0      evaluate_0.14    knitr_1.25      
## [25] curl_3.3         fansi_0.4.0      broom_0.7.6      Rcpp_1.0.2      
## [29] scales_1.0.0     backports_1.1.4  jsonlite_1.7.2   fs_1.3.1        
## [33] hms_1.0.0        digest_0.6.21    stringi_1.4.3    grid_3.6.0      
## [37] cli_2.5.0        tools_3.6.0      magrittr_2.0.1   crayon_1.4.1    
## [41] pkgconfig_2.0.3  ellipsis_0.3.0   xml2_1.3.2       reprex_2.0.0    
## [45] lubridate_1.7.10 assertthat_0.2.1 rmarkdown_2.5    httr_1.4.2      
## [49] rstudioapi_0.13  R6_2.4.0         compiler_3.6.0