R, like other programming languages, stores information (such as data) in objects, which are given labels so that we can refer to them as we are working.
Some kinds of objects are
TRUE
and FALSE
(note the all caps).There are also objects called vectors, which are like lists whose entries can be text strings, numbers, or logical values.
When we want to use an object a lot (such as a numeric value, like a mean, from some statistical computation), it is helpful to give it a name so we can refer to it by what it represents, instead of by its values.
We can give a name to an object using an expression of the form:
<- value name
The above is not real code, but is just intended to illustrate the format of commands that attach values to objects.
This process is called assignment, because we are “assigning” the value to a container with the name we’ve chosen. The named thing is called a variable (which means something a bit different than a variable in the statistical sense, although a variable in code can refer to a statistical variable).
For example:
<- "Colin Dawson"
myName <- 74 myHeight
You can read the <-
symbol as “gets”, as in “(The name) myName
gets (the value)”Colin Dawson". Notice that there is just one hyphen in the arrow. A common error is to add an extra hyphen to the arrow, which R will misinterpret as a minus sign.
It is also legal to use underscores and digits in variable names, but none of these can be used at the beginning of a name.
<- 57 variable3
Underscores are ok:
<- 57 variable_3
Spaces (in object names) are not: this gives an error.
<- "Colin" # gives an error my Name
We can’t start an object name with a digit, though digits are fine elsewhere in object names
<- 57 # not a valid variable name 3rdvariable
We can also store the result of a command in a named variable. A simple example is the following:
<- sqrt(25) myResult
Now if I type the name of the new variable at the console, or refer to it by itself in a chunk, R will print out its contents:
myResult
## [1] 5
The 1 in brackets is there to indicate that the next value shown is the first entry in the variable called myResult
(Note that if you try to access the variable MyResult
, you will get an error, because you defined it with a lower case “m”). In this case the variable has only one entry, but sometimes we will hold lists of data or other values in a variable.
We can also refer to variables on the right hand side of an assignment as part of the value of another variable, as in:
<- 12^2
a_squared <- 16^2 b_squared
<- a_squared + b_squared a_squared_plus_b_squared
Notice that if we run the chunk that defines these variables, we will see them appear in the Environment tab in the upper right pane. This shows us everything we’ve defined.
myName
with your name, an object birthYear
containing your birth year, and an object currentYear
containing the current year. Run the chunk and verify that the variables you created appear in the environment tab.ageThisYear
that calculates your age as of December 31st this year using the birthYear
and currentYear
objects.The Environment tab will also contain any objects that we defined in other documents, at the console, or in chunks that we’ve since deleted. This can cause problems, because variables can wind up referring to things that don’t exist in the current document, or to things that should have a different value in the current document.
Fortunately, when we Knit our document, the rendering program ignores the interactive environment and creates its own encapsulated environment that only contains variables we’ve defined in the current document (and similarly, only allows us to use datasets and functionality from packages that have been loaded in our document).
This means that we can only use variables in our document that have been defined prior to the point when we refer to them. If we try to use a variable above the chunk where it’s defined, it may work when we’re running chunks interactively (provided we’ve previously run the chunk where it’s defined), but it won’t work when we try to Knit. This is another reason why Knitting every so often is a good idea, since it helps us catch errors in our document that we might otherwise miss.
theAnswer
at the console (rather than in a code chunk), and assign it the value 42. Then, create a code chunk that refers to theAnswer
in an expression that computes twice the answer. What should happen is that the chunk will run fine when you just try to run it by itself, but if you try to Knit you’ll get an error.theAnswer
within the document.Most of what we do in R consists of applying functions to data objects, specifying some options for the function, which are called arguments. Together, the application of a function, together with its arguments, is called a command.
Many of the commands in R look a lot like functions from math class; that is, invoking R commands means supplying a function with some number of “inputs” (the arguments), which yields some kind of output, much as the sin()
function in math takes a number as input and returns another number corresponding to its trigonometric sine.
A useful analogy is that commands are like sentences, where the function is the verb, and the arguments (one of which usually specifies the data object) are the nouns.
There is often a “main” argument that comes first. This is like the “direct object” of the command.
For example, in the English command, “Draw a picture for me with some paint”, the verb “draw” acts like the function (what is the listener supposed to do?); the noun “picture” is the direct object (draw what?), and “me” and “paint” are extra (in this case, optional) details, that we might call the “recipient” and the “instrument”.
In the grammar of R, I could write this sentence like:
## Note: this is not real R code
draw("picture", recipient = "me", material = "paint")
We are applying the function draw()
to the object "picture"
, and adding some additional detail about the recipient and material. Here the function is called draw
, and we have a main argument with the value "picture"
, and additional arguments recipient
and material
with the values "me"
, and "paint"
, respectively.
Technically speaking, "picture"
is the value of an argument too; we might have written
### Note: this is not real R code
draw(object = "picture", recipient = "me", material = "paint")
However, in practice, there is often a required first “main” argument whose name is left out of the command.
In R, arguments always go inside parentheses, and are separated by commas when there is more than one. For arguments whose names are explicitly given, the name goes to the left of the =
, and the value goes to the right.
The command
log(100, base = 10)
finds the logarithm of the number 100, using base 10 log.
We are applying the function log()
function to the value 100
and modifying the behavior of log()
through the optional argument base
that in this case specifies what kind of logarithm we want.
log(100, base = 2)
## [1] 6.643856
log(100)
## [1] 4.60517
As we have seen, when we apply a function to some arguments, it produces a result. If we simply call the function, most of the time the result is just printed out. But often times we want to refer to or use that result later. In this case we can assign the result of the function call to a “container”; that is, to a named variable.
For example, if I have a variable called Income
, I might want to compute and store the log of that income variable:
<- 42000
Income <- log(Income, base = 10) logIncome
Now logIncome
is a variable whose value is the log (base 10) of the Income
value.
Component | Role |
---|---|
logIncome |
|
log |
|
Income |
|
base |
|
10 |
.html
file (the default format). Then, save both this .Rmd
file and the Knitted .html
file to the ~/stat113/turnin/hw2/
folder.mosaic
package version: 1.8.3tidyverse
package version: 1.3.1## R version 4.1.1 (2021-08-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] mosaic_1.8.3 ggridges_0.5.3 mosaicData_0.20.2 ggformula_0.10.1
## [5] ggstance_0.3.5 Matrix_1.3-4 lattice_0.20-44 forcats_0.5.1
## [9] stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4 readr_2.0.1
## [13] tidyr_1.1.3 tibble_3.1.4 ggplot2_3.3.5 tidyverse_1.3.1
##
## loaded via a namespace (and not attached):
## [1] ggdendro_0.1.22 httr_1.4.2 sass_0.4.0 jsonlite_1.7.2
## [5] splines_4.1.1 modelr_0.1.8 bslib_0.3.0 assertthat_0.2.1
## [9] cellranger_1.1.0 ggrepel_0.9.1 yaml_2.2.1 pillar_1.6.2
## [13] backports_1.2.1 glue_1.4.2 digest_0.6.27 polyclip_1.10-0
## [17] rvest_1.0.1 colorspace_2.0-2 htmltools_0.5.2 plyr_1.8.6
## [21] pkgconfig_2.0.3 broom_0.7.9 labelled_2.8.0 haven_2.4.3
## [25] scales_1.1.1 tweenr_1.0.2 tzdb_0.1.2 ggforce_0.3.3
## [29] generics_0.1.0 farver_2.1.0 ellipsis_0.3.2 withr_2.4.2
## [33] cli_3.0.1 magrittr_2.0.1 crayon_1.4.1 readxl_1.3.1
## [37] evaluate_0.14 fs_1.5.0 fansi_0.5.0 MASS_7.3-54
## [41] xml2_1.3.2 tools_4.1.1 hms_1.1.0 lifecycle_1.0.0
## [45] munsell_0.5.0 reprex_2.0.1 compiler_4.1.1 jquerylib_0.1.4
## [49] rlang_0.4.11 grid_4.1.1 rstudioapi_0.13 htmlwidgets_1.5.4
## [53] crosstalk_1.1.1 mosaicCore_0.9.0 rmarkdown_2.10 gtable_0.3.0
## [57] DBI_1.1.1 R6_2.5.1 gridExtra_2.3 lubridate_1.7.10
## [61] knitr_1.34 fastmap_1.1.0 utf8_1.2.2 stringi_1.7.4
## [65] Rcpp_1.0.7 vctrs_0.3.8 leaflet_2.0.4.1 dbplyr_2.1.1
## [69] tidyselect_1.1.1 xfun_0.25