Lecture Notes
Lecture Notes
Robin Evans
robin.evans@stats.ox.ac.uk
Michaelmas 2014
Administration
The course webpage is at
http://www.stats.ox.ac.uk/~evans/teaching.htm
Lectures are at 10am on Mondays and Wednesdays, and practicals at 9am
on Tuesdays and Thursdays; in reality, there will be rather a lot of overlap
between these two formats.
Please bring your own laptop to use during all classes, and ensure that you
have R working (see below). If you dont have access to a laptop, let me
know and we will try to provide one.
I will hold office hours each week during Michaelmas term on Wednesdays
between 12pm and 1pm; my office is on the first floor of 2 SPR, room
204. Im very happy to help with any difficulties or problems you are having
with R, but please take steps to help yourselves first (see below for a
list of resources).
Software
You should install R on your own computer at the first opportunity. Visit
http://cran.r-project.org/
for details. Ensure you have the latest version (as of the start of Michaelmas
2014, this was version 3.1.1). Try to spend some time getting used to the
basics of the software, including arithmetic operations and functions. There
are many excellent online tutorials for this purpose.
1
Resources
A strength of R is its help files, which we will discuss. These are accessed
with the ? and ?? commands.
The internet has almost all the answers, and knows much more about R
than I do. If you have a problem, its extremely likely that someone will
have had the same difficulty already, and posted a question on an internet
forum.
Books are useful, though not required. Here are a some of them with brief
comments.
1. Venables, W.N. and Ripley, B.D. (2002) Modern Applied Statistics with
S. Springer-Verlag. 4th edition.
The classic text.
2. Chambers (2010) - Software for Data Analysis: Programming with R,
Springer.
One of few books with information on more advanced programming (S4,
overloading).
3. Wickham, H. (2014) Advanced R. Chapman and Hall.
A great new book on the more advanced features: a good follow up to this
class.
4. Crawley, M. (2007) The R Book. Wiley.
Very thorough.
5. Fox, J. (2002) A R and S-PLUS Companion to Applied Regression. Sage.
Does what it says.
6. Ligges, U. (2009) Programmieren mit R. Third edition. Springer.
In German(!)
7. Rizzo, M. L. (2008) Statistical Computing with R. CRC/Chapman &
Hall.
More computational different examples to the other books.
8. Braun, W. J. and Murdoch, D. J. (2007) A First Course in Statistical
Programming with R. CUP.
Detailed and well written, but at a rather low level. A bit redundant given
the above.
1
1.1
Introduction
What R is good at
Statistics for relatively advanced users: R has thousands of packages, designed, maintained, and widely used by statisticians.
Statistical graphics: try doing some of our plots in Stata and you wont have
much fun.
Flexible code: R has a rather liberal syntax, and variables dont need to be
declared as they would in (for example) C++, which makes it very easy to
code in. This also has disadvantages in terms of how safe the code is.
Vectorization: R is designed to make it very easy to write functions which
are applied pointwise to every element of a vector. This is extremely useful
in statistics.
R is powerful: if a command doesnt exist already, you can code it yourself.
1.2
1.3
General Properties
R makes it extremely easy to code complex mathematical or statistical procedures, though the programs may not run all that quickly. You can interface
R with other languages (C, C++, Fortran) to provide fast implementations
of subroutines, but writing this code (and making it portable) will typically
take longer. Where the advantage falls in this trade-off will depend upon
what youre doing; for most things you will encounter during your degree, R
is sufficiently fast.
R is open source and widely adopted by statisticians, biostatisticians, and
geneticists. There is a huge wealth of existing libraries so you can often
save time by using these, though it is sometimes easier to start from scratch
than to adapt someone elses function to meet your needs. Contributing new
packages to the central repository (CRAN) is easy: even your lecturer has
managed it. As a result, R packages are not build to very high standards
(but see Bioconductor).
R is portable, and works equally well on Windows, OS X and Linux.
1.4
Interfaces
R has a command line interface, and will accept simple commands to it. This
is marked by a > symbol, called the prompt. If you type a command and
press return, R will evaluate it and print the result for you.
> 6 + 9
[1] 15
> x <- 15
> x - 1
[1] 14
The expression x <- 15 creates a variable called x and gives it the value 15.
This is called assignment; the variable on the left is assigned to the value
on the right. The left hand side must contain only contain a single variable.
> x + 4 <- 15
# doesn't work
2.1
Vectors
The key feature which makes R very useful for statistics is that it is vectorized. This means that many operations can be performed point-wise on a
vector. The function c() is used to create vectors:
1.0 -1.0
3.5
2.0
1.00
1.00 12.25
4.00
78
67
72
65
78
79
79
70
105
93
Read the before and after values into two different vectors called before
and after. Use R to evaluate the amount of weight lost for each participant.
What is the average amount of weight lost?
*Exercise 2.2. How would you write a function equivalent to sum((x - mean(x))^2)
in a language like C or Java?
Some useful vectors can be created quickly with R. The colon operator is
used to generate integer sequences
> 1:10
[1]
9 10
> -3:4
[1] -3 -2 -1
> 9:5
[1] 9 8 7 6 5
More generally, the function seq() can generate any arithmetic progression.
0.2
0.6
1.0
Sometimes its necessary to have repeated values, for which we use rep()
> rep(5,3)
[1] 5 5 5
> rep(2:5,each=3)
[1] 2 2 2 3 3 3 4 4 4 5 5 5
> rep(-1:3, length.out=10)
[1] -1
3 -1
16
32
64
128
256
512 1024
3 11 12 13 21 22 23 31 32 33
2 -3
4 -5
6 -7
8 -9 10
5 12
Exercise 2.3. Create the following vectors in R using seq() and rep().
(i) 1, 1.5, 2, 2.5, . . . , 12
(ii) 1, 8, 27, 64, . . . , 1000.
1
(iii) 1, 21 , 31 , 14 , . . . , 100
.
(iv) 1, 0, 3, 0, 5, 0, 7, . . . , 0, 49.
P
(v) 1, 3, 6, 10, 15, . . . , ni=1 i, . . . , 210 [look up ?cumsum].
(vi)
n
X
(1)i+1 xi
i=1
2.2
Subsetting
2 -4
> x[1:3]
[1] 5 9 2
> x[3:length(x)]
[1]
2 14 -4
There are two other methods for getting subvectors. The first is using a
logical vector (i.e. containing TRUE and FALSE) of the same length:
> x > 4
[1]
TRUE
TRUE FALSE
TRUE FALSE
9 14
10
> x[-1]
[1]
2 14 -4
> x[-c(1,4)]
[1]
2 -4
2.3
Logical Operators
As we see above, the comparison operator > returns a logical vector indicating whether or not the left hand side is greater than the right hand side.
Here we demonstrate the other comparison operators:
> x <= 2
TRUE FALSE
# equal to
> x != 2
# not equal to
[1]
TRUE FALSE
TRUE
TRUE
TRUE
TRUE
Note the double equals sign ==, to distinguish between assignment and comparison.
We may also wish to combine logical vectors. If we want the elements of x
within a range, we can use the following:
11
TRUE
TRUE
# 'and'
The & operator does a pointwise and comparison between the two sides.
Similarly, the vertical bar | does pointwise or, and the unary ! operator
performs negation.
> (x == 5) | (x > 10)
[1]
TRUE FALSE
TRUE FALSE
TRUE FALSE
TRUE
2.4
Character Vectors
As you might have noticed in the exercise above, vectors dont have to
contain numbers. We can equally create a character vector, in which
each entry is a string of text. Strings in R are contained within double
quotes ":
> x <- c("Hello", "how do you do", "lovely to meet you", 42)
> x
[1] "Hello"
[4] "42"
12
Notice that you cannot mix numbers with strings: if you try to do so the
number will be converted into a string. Otherwise character vectors are
much like their numerical counterparts.
> x[2:3]
[1] "how do you do"
> x[-4]
[1] "Hello"
2.5
Matrices
[1,]
[2,]
[3,]
1
2
3
1
2
3
1
2
3
1
2
3
[1,]
[2,]
[3,]
> diag(1:3)
[1,]
[2,]
[3,]
[1,]
[2,]
[3,]
[4,]
[5,]
[1,]
[2,]
[3,]
2
3
4
3
4
5
4
5
6
5
6
7
[1,]
[2,]
[3,]
[,1]
30
36
45
> A*x
[1,]
[2,]
[3,]
> t(A)
[1,]
[2,]
[3,]
# transpose
> det(A)
# determinant
[1] -3
> diag(A)
[1]
# diagonal
5 10
15
> solve(A)
# inverse
[,1]
[,2] [,3]
[1,] -0.6667 -0.6667
1
[2,] -1.3333 3.6667
-2
[3,] 1.0000 -2.0000
1
Exercise 2.7. Construct the matrix
1
2
3
2
6
B= 4
3 1 3
Show that B B B is a scalar multiple of the identity matrix, and find
the scalar.
Matrices can be subsetted much the same way as vectors, although of course
they have two indices. Row number comes first:
> A[2,1]
[1] 2
> A[2,2:ncol(A)]
[1] 5 8
> A[,1:2]
[1,]
[2,]
[3,]
[,1] [,2]
1
4
2
5
3
6
> A[c(),1:2]
[,1] [,2]
Notice that, where appropriate, R automatically reduces a matrix to a vector
or scalar when you subset it. You can override this using the optional drop
argument.
16
> A[2,2:ncol(A),drop=FALSE]
[1,]
# returns a matrix
[,1] [,2]
5
8
You can stitch matrices together using the rbind() and cbind() functions.
These employ vector recycling:
> cbind(A, t(A))
[1,]
[2,]
[3,]
> rbind(A, 1, 0)
[1,]
[2,]
[3,]
[4,]
[5,]
1 3 5 7
2 4 6 8
(b)
1 1 1 1
1 1 1 1
..
..
..
.
.
.
1 1 1 1
(dimensions 15 10).
1 1 1 0 0 0 0
0 0 0 1 1 0 0
(dimensions 5 15).
.. .. ..
..
. . .
.
0 0 0 0 0 1 1
17
1 2 3
2 3 4
3 4 5
..
..
.
.
9 10
10 11
9 10
10 11
..
.
;
..
.
17
17 18
17 18 19
1
2
..
.
9
9
1
..
.
;
..
.
6
6 7
1 6
7 8
2
3
3
4
..
.
4
18
2.6
Lists
Other than vectors and matrices, the main object for holding data in R is a
list1 . These are a bit like vectors, except that each entry can be any other
R object, even another list.
> x <- list(1:3, TRUE, "Hello", list(1:2, 5))
Here x has 4 elements: a numeric vector, a logical, a string and another list.
We can select an entry of x with double square brackets:
> x[[3]]
[1] "Hello"
To get a sub-list, use single brackets:
> x[c(1,3)]
[[1]]
[1] 1 2 3
[[2]]
[1] "Hello"
Notice the difference between x[[3]] and x[3].
We can also name some or all of the entries in our list, by supplying argument names to list():
> x <- list(y=1:3, TRUE, z="Hello")
> x
$y
[1] 1 2 3
[[2]]
[1] TRUE
$z
[1] "Hello"
1
Technically speaking, lists are also a kind of vector in R, but not every object in them
has to have the same type; ordinary logical, numeric or character vectors are known as
atomic vectors.
19
Notice that the [[1]] has been replaced by $y, which gives us a clue as to
how we can recover the entries by their name. We can still use the numeric
position if we prefer:
> x$y
[1] 1 2 3
> x[[1]]
[1] 1 2 3
The function names() can be used to obtain a character vector of all the
names of objects in a list.
> names(x)
[1] "y" ""
"z"
Youve seen most standard R objects now: almost all the more complicated
ones are just lists! Well see this in the next section.
20
Data
3.1
Data Frames
Craig Dunain
Bens of Jura
Lairig Ghru
Seven Hills
Two Breweries
Moffat Chase
dist climb
time
16 7500 204.62
28 2100 192.67
14 2200 98.42
18 5200 170.25
20 5000 159.83
48.35 33.65
43.05 65.00
27.90 47.63
32.38 170.25
45.60 62.27
44.13 26.93
17.93 18.68
28.10 159.83
73.22 204.62
72.25 98.42
26.22 34.43
36.37
78.65
28.57
29.75
17.42
50.50
The truth is that, like almost all complicated objects in R, data frames
are lists with some additional structure. Formally speaking, they are not
matrices, but they do behave similarly in certain circumstances.
Exercise 3.1. How do the results of the following commands differ from
what we would expect if hills were a matrix?
>
>
>
>
hills[1,]
hills[3]
hills %*% c(1,2,4)
mean(hills)
3.2
We often want to use functions on the columns of a data frame, and it quickly
becomes inconvenient to repeatedly type (for example) hills$ before every
such event. For example, the command below will give a scatter plot of the
race times against climbs, amongst only those races less than 10 miles long.
22
39.75
32.57
20.95
3.3
1
2
3
4
author
Ripley
Cox
Snijders
Cox
year publisher
1980
Wiley
1979
Chapman
1999
Sage
2006
CUP
23
3.4
Factors
There are two main types of data which you will encounter this year: numerical and categorical. Weve seen how to create numerical vectors already.
Suppose we have the heights of 100 individuals, the first 50 male and the
rest female.
>
>
>
>
set.seed(1442)
# fixes the random numbers
height = round(rnorm(100, mean=rep(c(170,160),each=50), sd=10))
sex = rep(c("M", "F"), each=50)
head(sex)
24
200
190
180
170
160
150
140
What happens if you try to plot sex against height instead? The distinction
between categorical and non-categorical data is especially important if we
have numbered groups.
The information in a factor is stored as a vector of integers:
> as.integer(Sex)
[1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[36] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[71] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Just as a data frame is really a list, a factor is really a vector of integers
(for levels) together with some extra information giving each level a names.
The additional information is contained within a list of attributes. You
can view this list directly.
> attributes(Sex)
$levels
[1] "F" "M"
25
$class
[1] "factor"
The attributes in this case are its class (youll see this in many objects)
and a vector of the level names. The class tells R that this object should be
treated as a factor so that, for example, it will be displayed to you in the
right way.
You may find that sometimes data are stored as a factor when you dont
want them to be (see the exercise in the previous section). You can turn a
factor back in to a character vector easily enough:
> as.character(Sex)
[1]
[18]
[35]
[52]
[69]
[86]
"M"
"M"
"M"
"F"
"F"
"F"
"M"
"M"
"M"
"F"
"F"
"F"
"M"
"M"
"M"
"F"
"F"
"F"
"M"
"M"
"M"
"F"
"F"
"F"
"M"
"M"
"M"
"F"
"F"
"F"
"M"
"M"
"M"
"F"
"F"
"F"
"M"
"M"
"M"
"F"
"F"
"F"
"M"
"M"
"M"
"F"
"F"
"F"
"M"
"M"
"M"
"F"
"F"
"F"
"M"
"M"
"M"
"F"
"F"
"F"
"M"
"M"
"M"
"F"
"F"
"F"
"M"
"M"
"M"
"F"
"F"
"F"
"M"
"M"
"M"
"F"
"F"
"F"
"M"
"M"
"M"
"F"
"F"
"F"
"M"
"M"
"M"
"F"
"F"
"F"
Exercise 3.4. (a) Sample the numbers {1, 2, 3} uniformly with replacement 50 times; use this to create a factor with levels Yes, No and Maybe.
(b) Create a subvector by removing the Maybe entries from the factor above.
What levels does the new factor have?
(c) Use the command droplevels() to remove the level Maybe.
Exercise 3.5. Take a look at the birthwt data from the MASS package.
How is race stored in these data? Is this sensible?
Define a factor based on race:
> Race = factor(birthwt$race)
Compare the effect of the commands summary(), plot() and mean() on
each of Race and birthwt$race. Which do you find more useful?
3.5
The labels above and to the left of the values in hills are not part of the
data itself, but can be accessed:
26
"M"
"M"
"M"
"F"
"F"
"M"
"M"
"F"
"F"
"F"
> names(hills)
[1] "dist"
"climb" "time"
> row.names(hills)
[1]
[4]
[7]
[10]
[13]
[16]
[19]
[22]
[25]
[28]
[31]
[34]
"Greenmantle"
"Ben Rha"
"Bens of Jura"
"Traprain"
"Lomonds"
"Cairngorm"
"Black Hill"
"Meall Ant-Suidhe"
"N Berwick Law"
"Largo Law"
"Ben Nevis"
"Cockleroi"
"Carnethy"
"Ben Lomond"
"Cairnpapple"
"Lairig Ghru"
"Cairn Table"
"Seven Hills"
"Creag Beag"
"Half Ben Nevis"
"Creag Dubh"
"Criffel"
"Knockfarrel"
"Moffat Chase"
"Craig Dunain"
"Goatfell"
"Scolty"
"Dollar"
"Eildon Two"
"Knock Hill"
"Kildcon Hill"
"Cow Hill"
"Burnswark"
"Acmony"
"Two Breweries"
As we saw above, in a data frame the column names can be used for indexing
(e.g. hills$time); the row names cannot be used in this way.
This additional information is stored as attributes, which are in a separate
list2 attached to the object hills:
> attributes(hills)
We could add an attribute to hills if we wanted:
> attributes(hills) <- c(attributes(hills), list(type="races"))
> attributes(hills)
Note that type(hills) doesnt access hills$type, the most important
attributes such as names and class happen to have functions named after
them, which can be used to extract relevant information.
3.6
Reading in Data
Actually theyre not stored as a list (see ?attributes), but they behave very similarly.
27
1
2
3
4
5
6
STATE
AL
AZ
AR
CA
CT
DE
CIG
18.20
25.82
18.24
28.60
31.10
33.60
BLAD
2.90
3.52
2.99
4.46
5.11
4.78
LUNG
17.05
19.80
15.98
22.07
22.83
24.55
KID
1.59
2.75
2.02
2.66
3.35
3.36
LEUK
6.15
6.61
6.94
7.06
7.20
6.45
> class(dat)
[1] "data.frame"
What happens if header=TRUE is omitted?
When you specify the file name, be sure to use the double quotes (") around
it. You also need to give the correct path to the file. R will automatically
look for the file in its working directory. You can check what this is:
> getwd()
[1] "/data/redcrest/evans/Dropbox/Teaching/R Programming/2014"
Then if your file is in a subfolder called files, you need to write (for example)
> dat <- read.table("files/smoking.dat", header=TRUE)
In some systems you can use file.choose() to get the full path to a file.
In particular this works well on R GUI for Windows or OS X. For example:
28
Functions
Everything which is done in R is done by functions. A function in a programming langauge is much like its mathematical equivalent: it has some
inputs called arguments, and an output called the return value. In R a
function can only return a single object. If you type a functions name at
the console, you can see its structure:
> setdiff
function (x, y)
{
x <- as.vector(x)
y <- as.vector(y)
unique(if (length(x) || length(y))
x[match(x, y, 0L) == 0L]
else x)
}
<bytecode: 0x2a49e98>
<environment: namespace:base>
There are two important parts to the function: the signature, which in
this case is function(x, y), and the body, which is the code between the
curly brackets. Broadly speaking, when a function is called, it takes the
information in the arguments, applies the code in the body to them, and
then spits out the final expression in the function. In this case thats the
complex looking expression unique( ).
4.1
Arguments
29
[1] 4 7
Most functions dont require all of their arguments to be specified.
> x <- rnorm(10)
> y <- x + rnorm(10)
> lm(y ~ x)
Call:
lm(formula = y ~ x)
Coefficients:
(Intercept)
0.0212
x
0.7415
> args(lm)
30
4.2
Writing Functions
To define your own function you just have to construct something in the
same format as above:
> square = function(x) {
+
x^2
+ }
> square(4)
[1] 16
Objects which are created inside a function do not exist outside it:
> mean2 <- function(x) {
+
n <- length(x)
+
sum(x)/n
+ }
>
> mean2(1:10)
[1] 5.5
31
> n
Error:
Clearly an object called n was used inside the function above, but it was
only inside the functions namespace. Most functions in R do not have side
effects: they return a value, but do not change any of the objects which
you can reach at the console. In order to use a function, you usually have
to assign its output to something.
> x <- mean2(1:10)
> x
[1] 5.5
Exercise 4.1. The logit function is defined as
x
logit(x) = log
,
0 < x < 1.
1x
Write an R function in one argument to implement this. How does your
function behave for values of x such as 0, 1, or 2?
Exercise 4.2. Recall that the Taylor expansion of log(1 + x) is
log(1 + x) =
X
i=1
(1)i+1
xi
i
Write a function with arguments x and n, which calculates the Taylor approximation to log(1 + x) using n terms.
How many terms do you need to get within 106 of the correct solution
when x = 0.99?
Exercise 4.3. Given real vectors x, y of length n, the least squares slope
(, )T is given by
P
(xi x
)(yi y)
= iP
)2
i (xi x
= y x
.
Write a function which takes two arguments, x and y, and returns a vector
of length 2 containing and . Verify that your function gives the correct
answer using Rs built-in function lm() [the syntax is lm(y~x)].
32
4.3
for() Loops
The most common way to execute a block of code multiple times is with a
for() loop. Whats going on in the code below?
> factorial2 = function(n) {
+
out = 1
+
+
+
for (i in 1:n) {
out = out*i
}
+
out
+ }
> factorial2(10)
[1] 3628800
You may have seen for() loops in other languages. The syntax in R is for
(i in x) for some vector (or list) x, where i will take each value in x. Most
commonly, x is a vector of the first n natural numbers.
i is a dummy variable, and can be called whatever you like, though it retains
its value outside the loop.
> for (sillyname in 1:4) print(sillyname)
[1]
[1]
[1]
[1]
1
2
3
4
> sillyname
[1] 4
Exercise 4.4. Write a function to perform matrix-vector multiplication. It
should take a matrix A and a vector b as arguments, and return the vector
Ab. Use two loops to do this, rather than %*% or any vectorization.
I generally recommend using seq_len() or seq_along() in for() loops,
because it always behaves the way you want (and runs quicker than seq()):
33
> n = 0
> 1:n # not a sequence of length n=0
[1] 1 0
> seq(n)
# ditto
[1] 1 0
> seq_len(n)
# better!
integer(0)
> for (i in seq_len(n)) print(i)
> for (i in seq(n)) print(i)
[1] 1
[1] 0
4.4
Conditional Code
Actually, any non-zero number will act the same way as TRUE, but its safer to only
use logicals.
34
Warning: the condition has length > 1 and only the first element
will be used
[1]
1 -3
4.5
while() loops
35
[1] FALSE
> isPrime(37)
[1] TRUE
This illustrates several points. First, we dont need to wait until reaching
the end of a function to return a value; we can use the return keyword
instead.
The other feature is the while() loop. This will keep running until the
expression in the parenthesis becomes false.
4.6
system elapsed
0.012
0.196
> system.time(seq_len(1e6)^2)
user
0.025
system elapsed
0.001
0.026
We can write a second function to do matrix-vector multiplication (see Exercise 4.4), but this time replacing the inner loop by a vectorized function
to take dot products.
> mult2 = function(A, b) {
+
n1 = nrow(A)
+
n2 = ncol(A)
+
out = numeric(n1)
+
+
for (i in 1:n1) {
out[i] = sum(A[i,]*b)
36
+
}
+
out
+ }
Now suppose we create a large matrix, and look at the difference in timing.
system elapsed
0.039
1.572
> system.time(mult2(A,b))
user
0.013
system elapsed
0.005
0.018
> system.time(colSums(t(A)*b))
user
0.012
system elapsed
0.000
0.012
system elapsed
0.000
0.004
The difference is dramatic. The moral of this is that its usually better to
use a built-in function, and almost always better to vectorize. The reason
%*% is so fast is that R calls underlying FORTRAN routines which have been
optimized over decades.
4.7
Recursion
Functions can recurse, which means they call themselves; here is a function
which calculates the entry Fn in the Fibonacci sequence with F0 = F1 = 1,
and Fk = Fk1 + Fk2 for k 2:
37
4.8
Scope
When a function is called, the code inside it is run in a separate environment to the code you run directly at the command line. This means its
possible for a variable inside a function to have the same name as something
at the command line without causing any problems:
>
>
+
+
+
>
x <- 3
f = function(y) {
x <- 5
x + y
}
f(4)
[1] 9
> x
[1] 3
However, if a function fails to find a variable withing its own environment,
then it will look to the parent environment for such a value: this is either
the function which called the current function, or the global environment
(i.e. the one you use at the command line).
38
>
>
+
+
>
x <- 3
g = function(y) {
x + y
}
g(4)
[1] 7
Whilst this sort of behaviour can sometimes seem helpful, it is much better
to avoid writing confusing code like this. You are strongly recommended to
write functions which only require the information in their own arguments
to run.
This is the same principle used by the functions with() and subset():
they create an environment for the data frame (or list) you give as their
first argument; if any names supplied dont match columns within the data
frame, R searches in the global environment:
> conv = 1.609
> with(hills, mean(dist/conv))
[1] 4.679
Exercise 4.7. What will happen if I create an object called dist before
running the commands above?
39
Graphics
x
numeric
factor
(missing)
series plot
bar chart
y
numeric
scatter plot
box plots
factor
spine plot
spine plot
In fact there are many more plotting methods, most of which you will rarely
use.
5.1
For graphical summaries of one dimensional data we have already seen boxplots and (in the practical) a time series for random walks. Among the most
useful is the histogram:
> hist(nlschools$lang, breaks=25, col=2)
40
100
0
50
Frequency
150
Histogram of nlschools$lang
10
20
30
40
50
60
nlschools$lang
Note that the optional argument breaks chooses (approximately) how many
bins the histogram should have, and col alters the colour of the bars. Of
course, all plots should have properly labelled axes and a title, which can
be easily added.
> hist(nlschools$lang, breaks=25, col=2, xlab="Score",
+
main="Language test scores of Dutch 8th grade pupils")
Even the simple plot command for a single numeric vector comes with a
large range of options.
> x = cumsum(rnorm(250))
> plot(x, type="l", col=3)
41
0
5
10
20
15
50
100
150
200
250
Index
Try this with type="b" or type="h" and see what happens. You can only
find out about a few of the graphics options with the documentation for
plot(). Try looking at ?par to find the real detail.
5.2
Adding to Plots
Consider the following simple scatter plot, augmented with the line y = x.
>
>
>
>
x = rnorm(300)
y = x + rnorm(300)
plot(x,y, pch=20, col=4, cex=0.5)
abline(a=0, b=1, lty=4, lwd=1.5)
42
4
0
4
y=x
line of best fit
5.3
Legends
5.4
1
2
3
4
5
6
Litter Mother
Wt
A
A 61.5
A
A 68.2
A
A 64.0
A
A 65.0
A
A 59.7
A
B 55.0
44
70
65
60
55
50
45
40
35
The function interprets the formula as requiring that the left-hand side be
summarized in a way which is broken down by the right. Note that Wt and
Litter are contained within genotype, and are not recognized at the console4 , but the argument data=genotype ensures that the boxplot() function knows where to look for genotype$Wt.
5.5
Lattice Graphics
The plots above are all found in the base package of R, which is to say that
they are all preloaded functions. A very popular and powerful extension
to Rs graphics capabilities is made using the package lattice. The range
of plots which can be produced even using lattices default methods is
staggering, and we will show only a few small examples here.
The basic command is xyplot(), whose first argument is usually a formula.
> library(lattice)
> head(crabs)
sp sex index
4
FL
RW
CL
CW
BD
45
1
2
3
4
5
6
B
B
B
B
B
B
M
M
M
M
M
M
1 8.1
2 8.8
3 9.2
4 9.6
5 9.8
6 10.8
6.7
7.7
7.8
7.9
8.0
9.0
16.1
18.1
19.0
20.1
20.3
23.0
19.0
20.8
22.4
23.1
23.0
26.5
7.0
7.4
7.7
8.2
8.2
9.8
M
B
M
O
3.0
2.8
2.6
2.4
2.2
log(FL)
2.0
F
B
F
O
3.0
2.8
2.6
2.4
2.2
2.0
2.0 2.2 2.4 2.6 2.8 3.0
log(RW)
The formula form in this case has three parts. The left-hand side log(FL)
is to be plotted against the right log(FL); since both these variables are
continuous, we will obtain a scatter plot. The conditioning bar | indicates
that we wish the information to be broken down by the third term, sp*sex
(i.e. by species and by sex). Hence lattice produces four separate scatter
plots, each with the same axes.
46
The most common use of the lattice package is to produce these trellis
plots for representing multivariate data. A few more examples you might
find useful:
> histogram(~ height | voice.part, data=singer)
60
65
70
75
Soprano 2
Soprano 1
Tenor 1
Alto 2
Percent of Total
40
30
20
10
0
Alto 1
40
30
20
10
0
Bass 2
Bass 1
Tenor 2
40
30
20
10
0
60
65
70
75
60
65
70
75
height
Notice that the command is histogram(), not hist(), and the plotting
options are different.
> library(MASS)
> densityplot(galaxies)
47
0.00015
Density
0.00010
0.00005
0.00000
10000
20000
30000
galaxies
5.6
Function Plots
48
1.0
0.5
0.0
1.0
0.5
sin
49
0.5
0.0
0.5
function(x) log(1 + x)
0.5
0.0
0.5
1.0
Using the lattice package you can also produce surface plots for functions
with two arguments using wireframe plots.
>
>
>
>
50
out
column
row
5.7
Customized Plots
set.seed(1328)
x = rnorm(100)
plot.new()
plot.window(xlim=c(-3,3), ylim=c(-0.1,0.5))
axis(side=1, pos=-0.1)
hist(x, breaks=15, add=TRUE, freq=FALSE, col=2)
plot(dnorm, -3, 3, add=TRUE)
points(x, rep(-0.05,100), pch="|")
title(main="Normal random variables")
plot.window() is used to control the range of the plot, the axis() function
draws on the axes, and title() is used to annotate with text. Try each of
the above commands in turn to see what they do.
51
5.8
Exporting Plots
You will likely need to use R plots in LaTeX documents for your practicals
and projects. If you are using a GUI such as the default R interface on
OS X or Windows, then select the window and go to File > Save As. I
recommend saving plots in PDF format, as this makes it easiest to integrate
with a LaTeX document. Other interfaces such as RStudio make it similarly
easy to create plots.
You can also save plots from the command line. The way to do this is to tell
R to send your plot commands to a file, instead of to the screen. This means
you wont be able to see your plot whilst you produce it (but presumably
youll have already checked what it looks like!) Type pdf("yourfilename.pdf")
to start, then run your chosen plot commands. Then finish with dev.off()
to go back to the default state. For example:
> pdf("plotfile.pdf")
> plot(hills)
> dev.off()
52
Much coding involves the repeated application of the same function to several different pieces of data in a vector or list. For this reason, R has a series
of functions for performing such tasks, which results in much simpler and
easier to understand code.
6.1
apply()
55
385 3025
It will also work for matrix-like objects, such as data frames (although see
also sapply() below).
> library(MASS)
> apply(hills, 2, mean)
dist
climb
7.529 1815.314
time
57.876
time
50.041
Exercise 6.1. Using apply(), write a function which, given an (I J)matrix X = (xij ) computes the magnitude of each row, that is
q
x2i1 + x2i2 + + x2iJ ,
for each i = 1, . . . , I
and returns the results as a vector.
53
x
i+
si
where x
i+ and si are respectively the sample mean and sample standard
deviation of entries in the ith row of X. [The mean divided by the standard
deviation is sometimes called the coefficient of variation.]
What happens if you use apply() with a function like range(), which returns more than one value?
Exercise 6.3. Take a look at the data set EuStockMarkets (this is in the
datasets package, which should be already loaded). Find the mean absolute
change in returns from one day to the next for each stock (that is, the average
of |xi+1 xi | over all days i). [Hint: recall the diff() function.]
Bonus*: Think of a more sensible measure of the volatility than this and
implement it [hint: one that doesnt depend upon the scale].
Note that apply() does not run substantially faster than writing a loop to
do the same thing, it is simply easier to code up and to read.
For the particular task of sums or means of rows or columns in a matrix, R
contains special functions rowSums(), colSums(), rowMeans(), colMeans().
These are all much faster than the equivalent apply() commands.
> # 2000 x 2000 random matrix
> x = matrix(rnorm(4e6), 2000, 2000)
> system.time(apply(x,1,sum))
user
0.121
system elapsed
0.004
0.127
> system.time(rowSums(x))
user
0.01
system elapsed
0.00
0.01
6.2
0.1753
1.1050
2.0080
6.3
replicate()
6.4
tapply()
1
2
3
4
5
6
Litter Mother
Wt
A
A 61.5
A
A 68.2
A
A 64.0
A
A 65.0
A
A 59.7
A
B 55.0
Median
58.2
Max.
68.2
$B
Min. 1st Qu.
42.0
55.2
Median
59.8
Max.
69.8
Median
54.2
Max.
61.8
Median
50.0
Max.
61.0
$I
$J
It is also possible to provide more than one grouping in the form of a list or
data frame, in which case the data are broken down by both:
> tapply(genotype$Wt, genotype[,1:2], mean)
Mother
Litter
A
A 63.68
B 52.33
I 47.10
J 54.35
B
52.40
60.64
64.37
56.10
I
54.12
53.92
51.60
54.53
J
48.96
45.90
49.43
49.06
Exercise 6.7. Find the heaviest rats born to each mother in the genotype()
data.
6.5
mapply()
Sometimes it may be useful to apply a function of several arguments repeatedly, where more than one argument can change.
> mapply(seq, from=c(1,4,-3), to=c(2,9,0), by=0.5)
[[1]]
[1] 1.0 1.5 2.0
[[2]]
[1] 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0
[[3]]
[1] -3.0 -2.5 -2.0 -1.5 -1.0 -0.5
57
0.0
The data sets we have seen so far were best expressed as data frames, with
a single row corresponding to each observation. For discrete data sets, this
is not always the most useful or efficient way to represent the information,
and it may be more useful to use a contingency table.
Consider the data set occupationalStatus in the datasets package. This
contains information on the occupational status of 3,498 father and son
pairs, grouped from 1 (highest status) to 8 (lowest). One way to present
these data would be a data frame with 2 columns and 3,498 rows; however,
since there are only 64 possible distinct entries in the data frame, we can
represent the data rather more compactly as a matrix:
> occupationalStatus
destination
origin
1
2
3
4
1 50 19 26
8
2 16 40 34 18
3 12 35 65 66
4 11 20 58 110
5
2
8 12 23
6 12 28 102 162
7
0
6 19 40
8
0
3 14 32
5
7
11
35
40
25
90
21
15
6
7
8
11
6
2
20
8
3
88 23 21
183 64 32
46 28 12
554 230 177
158 143 71
126 91 106
Here each row represents the occupational status of the father, and each
column that of the son.
Exercise 7.1. (a) What is the probability of a son having the same occupational status as his father? [Hint: investigate what diag(x) does if x
is a matrix.]
(b) Renormalize the data so that each row sums to 1. In the new data set the
ith row represents the conditional distribution of a sons occupational
status given that his father has occupational status i.
(c) What is the probability that a son has occupational status between 1
and 3, given that his father has status 1?
What if the father has occupational status 8?
58
(d) Calculate the vector y where yi is the probability that the son has occupational status between 1 and 3, given that his father has occupational
status i.
7.1
Higher Dimensions
Of course, if we have a data set consisting of more than two pieces of categorical information about each subject, then a matrix is not sufficient. The
generalization of matrices to higher dimensions is the array. Arrays are
defined much like matrices, with a call to the array() command. Here is a
2 3 3 array:
> arr = array(1:18, dim=c(2,3,3))
> arr
, , 1
[1,]
[2,]
, , 2
[1,]
[2,]
, , 3
[,1] [,2] [,3]
[1,]
13
15
17
[2,]
14
16
18
Each 2-dimensional slice defined by the last co-ordinate of the array is shown
as a 2 3 matrix. Note that we no longer specify the number of rows and
columns separately, but use a single vector dim whose length is the number
of dimensions. You can recover this vector with the dim() function.
> dim(arr)
[1] 2 3 3
59
, , 1
[1,]
[2,]
Cont
Sat
Low High
Low
262 305
Medium 178 268
High
273 395
This is equivalent to
> apply(house, 1:2, sum)
Infl
Sat
Low Medium High
Low
282
206
79
Medium 170
189
87
High
175
264 229
Exercise 7.2. Create a new data set in which the levels Atrium and Terrace
are combined in to a single level called Other.
7.2
Tables
Given a large data set of discrete items, a useful summary is just to count
the number of items in each category; this can be done using the table()
command.
> library(MASS)
> head(cabbages)
1
2
3
4
5
6
> table(cabbages$Date)
> table(cabbages[,1:2])
Date
Cult d16 d20 d21
c39 10 10 10
c52 10 10 10
For univariate data this gives a vector of counts, and for multi-way data a
contingency table in the form of an array. Note that the vector is named,
which makes it look slightly more complex than it actually is, but it behaves
just like a normal vector.
> tab = table(cabbages$Date)
> names(tab)
[1] "d16" "d20" "d21"
7.3
Binning Data
> head(Nile)
[1] 1120 1160
700
800
Inf
disNile
(0,700]
(700,800]
(800,900]
6
20
25
(1e+03,1.1e+03] (1.1e+03,1.2e+03] (1.2e+03,1.3e+03]
12
11
6
(900,1e+03]
19
(1.3e+03,Inf]
1
The cut() command turns the data into a factor defined by the edges of
the bins you provide. The default labels are rather unwieldy, but this can
be changed:
> disNile = cut(Nile, bins, labels=bins[-9])
> table(disNile)
disNile
0 700
6
20
800
25
63
Strings
8.1
Its often useful to be able to stick strings together, for which purpose we
have the function paste().
> paste("Hello", "there")
[1] "Hello there"
65
If given a vector paste() uses vector recycling, which can be rather useful:
"x2"
"x3"
"x4"
"x5"
"x6"
"x7"
"x8"
"x9"
"x10"
You can also make paste() concatenate all its output into a single string
with the collapse= optional argument:
> paste(LETTERS[1:10])
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J"
> paste(LETTERS[1:10], collapse=" ")
[1] "A B C D E F G H I J"
Exercise 8.3. Write a function which, given a positive integer n, produces
a string of the form 1 < 2 < ... < n. So, for example,
> f(5)
[1] "1 < 2 < 3 < 4 < 5"
[Bonus: do it with a function using only one line of code.]
Exercise 8.4. Write a function which, given a vector x of positive integers,
returns a list of the same length as x, and the ith entry of the list is a
character vector of length x[i]. The entries in the 1st element of the list
should be "a1", "a2", and so on, and in the 2nd should be "b1", "b2" for
i = 2, etc.
For example:
66
> listfunc(c(1,4,2))
[[1]]
[1] "a1"
[[2]]
[1] "b1" "b2" "b3" "b4"
[[3]]
[1] "c1" "c2"
[You can do this in one line of code using mapply()!]
8.2
Other Manipulation
"are"
"fun"
67
You may by now have noticed that certain functions, such as plot() or
summary(), appear to behave very differently when applied to different types
of object.
>
>
>
>
x <- rnorm(100)
y <- x + rnorm(100)
mod1 <- lm(y ~ x)
summary(x)
Median
0.0079
Max.
2.5900
> summary(mod1)
Call:
lm(formula = y ~ x)
Residuals:
Min
1Q
-2.2756 -0.7760
Median
0.0506
3Q
0.7570
Max
2.1564
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.00477
0.09791
0.05
0.96
x
1.22301
0.09248
13.23
<2e-16 ***
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.977 on 98 degrees of freedom
Multiple R-squared: 0.641,Adjusted R-squared: 0.637
F-statistic: 175 on 1 and 98 DF, p-value: <2e-16
You might be forgiven for wondering whether these functions have endless
lines of code telling them what to do in dozens of different circumstances.
In fact these are generic functions.
> summary
function (object, ...)
68
UseMethod("summary")
<bytecode: 0x7fbdc5b5bce0>
<environment: namespace:base>
When a generic function is called, it looks for a suitable method to use on
the arguments passed to it. You can look at the different methods available
using methods()
> methods(summary)
Which method is chosen depends upon the objects class; in other words,
what type of object it is.
> class(mod1)
[1] "lm"
When you call summary with an object of class lm, it looks for a method
with the name summary.lm(). So if you call
> summary.lm(mod1)
the result is the same as above.
9.1
"climb" "time"
$class
[1] "data.frame"
69
$row.names
[1] "Greenmantle"
[4] "Ben Rha"
[7] "Bens of Jura"
[10] "Traprain"
[13] "Lomonds"
[16] "Cairngorm"
[19] "Black Hill"
[22] "Meall Ant-Suidhe"
[25] "N Berwick Law"
[28] "Largo Law"
[31] "Ben Nevis"
[34] "Cockleroi"
"Carnethy"
"Ben Lomond"
"Cairnpapple"
"Lairig Ghru"
"Cairn Table"
"Seven Hills"
"Creag Beag"
"Half Ben Nevis"
"Creag Dubh"
"Criffel"
"Knockfarrel"
"Moffat Chase"
"Craig Dunain"
"Goatfell"
"Scolty"
"Dollar"
"Eildon Two"
"Knock Hill"
"Kildcon Hill"
"Cow Hill"
"Burnswark"
"Acmony"
"Two Breweries"
$type
[1] "races"
You can also access and set specific attributes by name using attr():
> attr(hills, "class")
[1] "data.frame"
> attr(hills, "row.names")[4] = "Big Hill"
> row.names(hills)
[1]
[4]
[7]
[10]
[13]
[16]
[19]
[22]
[25]
[28]
[31]
[34]
"Greenmantle"
"Big Hill"
"Bens of Jura"
"Traprain"
"Lomonds"
"Cairngorm"
"Black Hill"
"Meall Ant-Suidhe"
"N Berwick Law"
"Largo Law"
"Ben Nevis"
"Cockleroi"
"Carnethy"
"Ben Lomond"
"Cairnpapple"
"Lairig Ghru"
"Cairn Table"
"Seven Hills"
"Creag Beag"
"Half Ben Nevis"
"Creag Dubh"
"Criffel"
"Knockfarrel"
"Moffat Chase"
"Craig Dunain"
"Goatfell"
"Scolty"
"Dollar"
"Eildon Two"
"Knock Hill"
"Kildcon Hill"
"Cow Hill"
"Burnswark"
"Acmony"
"Two Breweries"
9.2
10
Debugging
On other occasions, code may run but fail to give the correct answer. This
is more dangerous, since we may not realise, but it also makes it harder to
find the source of the problem. Getting warning messages from R is a sign
that something is not working as you intended: do not ignore these!
> 1:3 + 1:2
Warning:
length
[1] 2 4 4
In any of these cases, you will need to discover what is wrong, and figure
out how to fix it.
10.1
Prevention
There are few steps you can take in your programming style to reduce the
likelihood of making mistakes in the first place. These are:
Make your code modular, and dont repeat yourself. Try to
construct your code in the spirit of R: you should use self contained
functions to perform specific tasks, and call those functions when necessary. If your program needs to perform the same task more than
once, then have it call the same code each time: this way its easier to
detect and cure any mistakes.
Write simple code when possible. R has lots of useful built in
features which allow complicated code to be written very succinctly
and clearly, such as by using the apply() family. Try to make use
of such features when possible. In the examination for this course,
you will lose marks for failing to use a function like lapply() in a
straightforward situation.
72
[1] 11 13 15
> row_sums(x, 2)
Error:
10.2
Modularity
distinct functions which perform small tasks, and that you can test (and
debug) individually.
Most fundamentally, if you have a complex instruction which isnt working,
try breaking it down into pieces to see whether each piece is doing what you
intend.
> if(any(x > 3) && y != 2) {
+
print("Hello")
+ }
Error:
g = function(y) {
if (y < 0) warning("Some warning")
return(y)
}
h = function(z) {
stop("Some error message")
z
}
f = function(x) {
# this is a function which calls some other functions
74
+
out = g(x) + h(x)
+ }
It is also easy to trace errors down to the particular function in which they
occur, but harder to search within that code for problems. If an error
occurs, immediately calling the traceback() function will show the series
of functions which were called in the run up to the problem (the call stack).
> f(2)
Error:
> traceback()
3: stop("Some error message") at #3
2: h(x) at #3
1: f(2)
This shows that the error occurred in the function h(). We can fix the code
and call the function again.
> h = function(z) {
+
z
+ }
> f(-2)
Warning:
Some warning
Warning:
Some warning
No traceback available
> options(warn=2)
> f(-2)
Error in g(x) :
> traceback()
7: doWithOneRestart(return(expr), restart)
6: withOneRestart(expr, restarts[[1L]])
5: withRestarts({
.Internal(.signalCondition(simpleWarning(msg, call), msg,
call))
.Internal(.dfltWarn(msg, call))
}, muffleWarning = function() NULL)
4: .signalSimpleWarning("Some warning",
quote(g(x)))
3: warning("Some warning") at #2
2: g(x) at #3
1: f(-2)
It is recommended that you add in your own stop() commands to deal with
problems which you think are likely to arise (such as bad user input), and
to provide an informative error message. This is much better than having
the code fail at a lower level in some internal R function, leaving the reason
for failure as a potential mystery.
10.3
Error Recovery
If you have an object called Q youll need to use print(Q) to look at it.
76
> options(error=NULL)
10.4
Other Functions
The function debug() allows you to inspect what a function does line by
line: this is particularly useful if your code is not doing what you expect,
but does not actually result in an error. Try calling:
> debug(g)
> f(2)
R will bring up the browser as soon as g() is called (in this case from within
f(). The browser will be inside g()s environment, and you can inspect
the objects within it on that basis. In addition, pressing Enter without
typing any commands will step through one line of g()s code at a time,
allowing you to check what is happening as the function progresses. The
special command Q can, again, be used to break out of this.
To turn off this feature use the function undebug(). If you only wish the
console to appear this way on one occasion, you can also use debugonce()
instead of debug().
You can also temporarily edit a function so as to see what it is doing in
a more hands-off fashion. Use the function trace() with the edit=TRUE
option.
> trace(g, edit=TRUE)
This brings up an editor containing a version of your function. You can
do whatever you like to this and it will not permanently alter your code;
inserting print() statements is a good way to use this. You can revert to
the original version of your function using untrace().
77
11
Arithmetic Subtleties*
Computers store most numbers in the form l2k , for some number 1 l < 2,
and an integer k {52, 51, . . . , 52}. This can lead to rounding errors, as
the following illustrates6 .
> 0.3 - 0.1 - 0.2
[1] -2.776e-17
This can cause problems when making comparisons, so the command all.equal()
can be used to avoid this problem.
> x = 0.3 - 0.1 - 0.2
> x == 0
[1] FALSE
> all.equal(x, 0)
[1] TRUE
If things are different it returns an explanation, so use isTRUE(all.equal(...))
in a conditional statement:
> if (isTRUE(all.equal(x, 0))) print("We've got nothing!")
[1] "We've got nothing!"
Very small or large numbers may not respond as you expect in ordinary
arithmetic.
> 2^-1074
[1] 4.941e-324
> 2^-1075
[1] 0
6
See http://www.smbc-comics.com/index.php?db=comics&id=2999
78
> (2^-1074)/1.5
[1] 4.941e-324
> 2^2000
[1] Inf
Large numbers may be replaced with Inf, which behaves as you might expect
infinity to.
> Inf/2
[1] Inf
> (-1)*Inf
[1] -Inf
> 0*Inf
[1] NaN
> Inf - Inf
[1] NaN
> exp(-Inf)
[1] 0
11.1
79
> i1 = 0L
> i2 = 0
> (i1 == i2)
[1] TRUE
> identical(i1, i2)
[1] FALSE
> i3 = 1e40 + 1
> i3-1e40
[1] 0
80
12
# libraries
library(), require(), install.packages()
# basic objects
`=`, `<-`, `<<-`, is(), as()
c(), list(), unlist(), matrix(), array(), data.frame(), `~`
cbind(), rbind()
head(), tail(), rev()
# attributes
attributes(), attr()
length(), dim(), ncol(), nrow()
names(), dimnames(), rownames(), colnames()
cat(), print(), print.default(), paste(),
# logic
`!`, `&`, `|`, `&&`, `||`, `==`
any(), all(), ifelse()
apply(), lapply(), sapply(), replicate(), tapply(), mapply()
seq(), seq_along(), seq_len(), `:`
rep(), rep.int()
unique(), duplicated()
match(), which(), `%in%`, which.max(), which.min()
sort(), order(), rank()
factor(), is.factor(), as.factor(),
levels(), nlevels(), droplevels()
table(), cut()
t(), `%*%`, det(), solve(), diag()
sum(), prod(), max(), min()
pmax(), pmin()
cumsum(), cumprod(), diff()
colSums(), rowSums(), colMeans(), rowMeans()
# accessing objects
setwd(), getwd(), ls(), rm()
with(), attach(), detach(),
dput(), dget()
scan(), read.table(), read.csv()
81
# loops
if(), for(), while(), next, break, switch(),
# functions
args(), missing(), Recall(), return()
warning(), stop()
# performance and debugging
system.time(),
debug(), debugonce(), undebug(), trace(), untrace()
# statistics
mean(), sd(), var(), cor(), cov()
runif(), rnorm(), rgamma(), rpois(), rbinom(), rexp(),
sample(), set.seed()
# graphics
plot.new(), plot.window(), axis(), legend()
points(), lines(), abline(),
plot(), hist(), boxplot(), pairs()
# arithmetic
abs(), ceiling(), floor(), round(), signif(),
exp(), log(),
`+`, `-`, `*`, `/`, `%%`,
# numerical methods
integrate(), nlm(), optim()
eigen(), svd(), qr()
82