Objects and Variables

R, like other programming languages, stores information (such as data) in objects, which are given labels so that we can refer to them as we are working.

Some kinds of objects are

  • text strings, which are used for labels, or the values of categorical variables (like “blue”), among other things.
  • numbers, on which arithmetic can be performed
  • logical values, TRUE and FALSE (note the all caps).

There are also objects called vectors, which are like lists whose entries can be text strings, numbers, or logical values.

When we want to use an object a lot (such as a numeric value, like a mean, from some statistical computation), it is helpful to give it a name so we can refer to it by what it represents, instead of by its values.

Assignment

We can give a name to an object using an expression of the form:

name <- value

The above is not real code, but is just intended to illustrate the format of commands that attach values to objects.

This process is called assignment, because we are “assigning” the value to a container with the name we’ve chosen. The named thing is called a variable (which means something a bit different than a variable in the statistical sense, although a variable in code can refer to a statistical variable).

For example:

myName <- "Colin Dawson"
myHeight <- 74

You can read the <- symbol as “gets”, as in “(The name) myName gets (the value)”Colin Dawson". Notice that there is just one hyphen in the arrow. A common error is to add an extra hyphen to the arrow, which R will misinterpret as a minus sign.

It is also legal to use underscores and digits in variable names, but none of these can be used at the beginning of a name.

variable3 <- 57

Underscores are ok:

variable_3 <- 57

Spaces (in object names) are not: this gives an error.

my Name <- "Colin" # gives an error

We can’t start an object name with a digit, though digits are fine elsewhere in object names

3rdvariable <- 57 # not a valid variable name

Assigning the Result of a Command

We can also store the result of a command in a named variable. A simple example is the following:

myResult <- sqrt(25)

Now if I type the name of the new variable at the console, or refer to it by itself in a chunk, R will print out its contents:

myResult
## [1] 5

The 1 in brackets is there to indicate that the next value shown is the first entry in the variable called myResult (Note that if you try to access the variable MyResult, you will get an error, because you defined it with a lower case “m”). In this case the variable has only one entry, but sometimes we will hold lists of data or other values in a variable.

We can also refer to variables on the right hand side of an assignment as part of the value of another variable, as in:

a_squared <- 12^2
b_squared <- 16^2
a_squared_plus_b_squared <- a_squared + b_squared

Notice that if we run the chunk that defines these variables, we will see them appear in the Environment tab in the upper right pane. This shows us everything we’ve defined.

  1. Make new code chunk. In it, define an object called myName with your name, an object birthYear containing your birth year, and an object currentYear containing the current year. Run the chunk and verify that the variables you created appear in the environment tab.

YOUR CODE BELOW

  1. Now, in a new chunk, create an object called ageThisYear that calculates your age as of December 31st this year using the birthYear and currentYear objects.

YOUR CODE BELOW

The Global Environment vs the Knitting Environment

The Environment tab will also contain any objects that we defined in other documents, at the console, or in chunks that we’ve since deleted. This can cause problems, because variables can wind up referring to things that don’t exist in the current document, or to things that should have a different value in the current document.

Fortunately, when we Knit our document, the rendering program ignores the interactive environment and creates its own encapsulated environment that only contains variables we’ve defined in the current document (and similarly, only allows us to use datasets and functionality from packages that have been loaded in our document).

This means that we can only use variables in our document that have been defined prior to the point when we refer to them. If we try to use a variable above the chunk where it’s defined, it may work when we’re running chunks interactively (provided we’ve previously run the chunk where it’s defined), but it won’t work when we try to Knit. This is another reason why Knitting every so often is a good idea, since it helps us catch errors in our document that we might otherwise miss.

  1. Try defining a variable called theAnswer at the console (rather than in a code chunk), and assign it the value 42. Then, create a code chunk that refers to theAnswer in an expression that computes twice the answer. What should happen is that the chunk will run fine when you just try to run it by itself, but if you try to Knit you’ll get an error.

YOUR CODE BELOW


  1. Fix the error by adding a code chunk in an appropriate spot that defines theAnswer within the document.

Functions, arguments and commands

Most of what we do in R consists of applying functions to data objects, specifying some options for the function, which are called arguments. Together, the application of a function, together with its arguments, is called a command.

Many of the commands in R look a lot like functions from math class; that is, invoking R commands means supplying a function with some number of “inputs” (the arguments), which yields some kind of output, much as the sin() function in math takes a number as input and returns another number corresponding to its trigonometric sine.

A useful analogy is that commands are like sentences, where the function is the verb, and the arguments (one of which usually specifies the data object) are the nouns.

There is often a “main” argument that comes first. This is like the “direct object” of the command.

For example, in the English command, “Draw a picture for me with some paint”, the verb “draw” acts like the function (what is the listener supposed to do?); the noun “picture” is the direct object (draw what?), and “me” and “paint” are extra (in this case, optional) details, that we might call the “recipient” and the “instrument”.

In the grammar of R, I could write this sentence like:

## Note: this is not real R code
draw("picture", recipient = "me", material = "paint")

We are applying the function draw() to the object "picture", and adding some additional detail about the recipient and material. Here the function is called draw, and we have a main argument with the value "picture", and additional arguments recipient and material with the values "me", and "paint", respectively.

Technically speaking, "picture" is the value of an argument too; we might have written

### Note: this is not real R code
draw(object = "picture", recipient = "me", material = "paint")

However, in practice, there is often a required first “main” argument whose name is left out of the command.

In R, arguments always go inside parentheses, and are separated by commas when there is more than one. For arguments whose names are explicitly given, the name goes to the left of the =, and the value goes to the right.

The command

log(100, base = 10)

finds the logarithm of the number 100, using base 10 log.

We are applying the function log() function to the value 100 and modifying the behavior of log() through the optional argument base that in this case specifies what kind of logarithm we want.

log(100, base = 2)
## [1] 6.643856
log(100)
## [1] 4.60517

Assigning the results of a function

As we have seen, when we apply a function to some arguments, it produces a result. If we simply call the function, most of the time the result is just printed out. But often times we want to refer to or use that result later. In this case we can assign the result of the function call to a “container”; that is, to a named variable.

For example, if I have a variable called Income, I might want to compute and store the log of that income variable:

Income <- 42000
logIncome <- log(Income, base = 10)

Now logIncome is a variable whose value is the log (base 10) of the Income value.

  1. Look at the last command in the chunk above. For each of the following pieces, identify whether it is an object name, a function name, an argument name, or an argument value.

YOUR ANSWERS
Component Role
logIncome
log
Income
base
10

Turning in Your Work

  1. When finished, Knit your Markdown document to an .html file (the default format). Then, save both this .Rmd file and the Knitted .html file to the ~/stat113/turnin/hw2/ folder.

Environment and Session Information

  • File creation date: 2021-10-14
  • R version 4.1.1 (2021-08-10)
  • R version (short form): 4.1.1
  • mosaic package version: 1.8.3
  • tidyverse package version: 1.3.1
  • Additional session information
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] mosaic_1.8.3      ggridges_0.5.3    mosaicData_0.20.2 ggformula_0.10.1 
##  [5] ggstance_0.3.5    Matrix_1.3-4      lattice_0.20-44   forcats_0.5.1    
##  [9] stringr_1.4.0     dplyr_1.0.7       purrr_0.3.4       readr_2.0.1      
## [13] tidyr_1.1.3       tibble_3.1.4      ggplot2_3.3.5     tidyverse_1.3.1  
## 
## loaded via a namespace (and not attached):
##  [1] ggdendro_0.1.22   httr_1.4.2        sass_0.4.0        jsonlite_1.7.2   
##  [5] splines_4.1.1     modelr_0.1.8      bslib_0.3.0       assertthat_0.2.1 
##  [9] cellranger_1.1.0  ggrepel_0.9.1     yaml_2.2.1        pillar_1.6.2     
## [13] backports_1.2.1   glue_1.4.2        digest_0.6.27     polyclip_1.10-0  
## [17] rvest_1.0.1       colorspace_2.0-2  htmltools_0.5.2   plyr_1.8.6       
## [21] pkgconfig_2.0.3   broom_0.7.9       labelled_2.8.0    haven_2.4.3      
## [25] scales_1.1.1      tweenr_1.0.2      tzdb_0.1.2        ggforce_0.3.3    
## [29] generics_0.1.0    farver_2.1.0      ellipsis_0.3.2    withr_2.4.2      
## [33] cli_3.0.1         magrittr_2.0.1    crayon_1.4.1      readxl_1.3.1     
## [37] evaluate_0.14     fs_1.5.0          fansi_0.5.0       MASS_7.3-54      
## [41] xml2_1.3.2        tools_4.1.1       hms_1.1.0         lifecycle_1.0.0  
## [45] munsell_0.5.0     reprex_2.0.1      compiler_4.1.1    jquerylib_0.1.4  
## [49] rlang_0.4.11      grid_4.1.1        rstudioapi_0.13   htmlwidgets_1.5.4
## [53] crosstalk_1.1.1   mosaicCore_0.9.0  rmarkdown_2.10    gtable_0.3.0     
## [57] DBI_1.1.1         R6_2.5.1          gridExtra_2.3     lubridate_1.7.10 
## [61] knitr_1.34        fastmap_1.1.0     utf8_1.2.2        stringi_1.7.4    
## [65] Rcpp_1.0.7        vctrs_0.3.8       leaflet_2.0.4.1   dbplyr_2.1.1     
## [69] tidyselect_1.1.1  xfun_0.25