R Programming Tutorial
R Programming Tutorial
R is a software language for carrying out complicated (and simple) statistical analyses. It includes
routines for data summary and exploration, graphical presentation and data modelling. The aim of this
document is to provide you with a basic fl
uency in the language. It is suggested that you work through this document at the computer, having
started an R session. Type in all of the commands that are printed, and check that you understand how
they operate. Then try the simple exercises at the end of each section.
Basic operations
> 4+6
The result should be
[1] 10
Objects can be removed from the current workspace with the rm function:
> rm(x,y)
> z<-c(5,9,1,0)
more general sequences can be generated using the seq command. For example:
> seq(1,9,by=2)
[1] 1 3 5 7 9
Another useful function for building vectors is the rep command for repeating things. For
example
rep(1:3,6)
[1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
R uses componentwise arithmetic on vectors.
> x<-c(6,8,9)
> x+2
[1] 8 10 11
> x<-c(6,8,9)
> y<-c(1,2,4)
> x+y
[1] 7 10 13
Length() calculates the length of a vector and sum() sum of the elements of the vector.
> mean(x)
[1] 7.216667
> var(x)
[1] 11.00879
> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.200 6.050 7.250 7.217 8.475 14.500
There is a similar command, rbind, for building matrices by gluing rows together.
> rbind(z,z)
[,1] [,2]
[1,] 5 6
[2,] 7 3
[3,] 9 4
[4,] 5 6
[5,] 7 3
[6,] 9 4
Matrices can also be built by explicit construction via the function matrix.
z<-matrix(c(5,7,9,6,3,4),nrow=3)
we could have specified the number of columns with the argument ncol=2 the matrix is filled
up' column-wise. If instead you wish to fill up row-wise, add the option byrow=T.
> z<-matrix(c(5,7,9,6,3,4),nr=3,byrow=T)
>z
[,1] [,2]
[1,] 5 7
[2,] 9 6
[3,] 3 4
Other useful functions on matrices are t to calculate a matrix transpose and solve to
calculate inverses:
> t(z)
[,1] [,2] [,3]
[1,] 5 9 3
[2,] 7 6 4
and
> solve(x)
[,1] [,2]
[1,] 0.23076923 -0.1538462
[2,] 0.07692308 0.1153846
> z[1,1]
[1] 5
> z[,2]
[1] 7 6 4
> z[1:2,]
[,1] [,2]
[1,] 5 7
[2,] 9 6
Attaching to objects
R includes a number of datasets that it is convenient to use for examples. You can get a
description of what's available by typing
> data()
To access any of these datasets, you then type data(dataset) where dataset is the name of
the dataset you wish to access. For example,
> data(trees)
In order to easily work with columns of data, we can attach it to R. then R would remember
the column names and we can work with them directly. For example:
> trees[1:5,]
Girth Height Volume
1 8.3 70 10.3
2 8.6 65 10.3
> attach(trees)
> mean(Height)
[1] 76
> mean(trees[,2])
[1] 76
Alternatively we can use below $ and below syntax to do the same without attachment
> trees$Height
A common situation is where we want to apply the same function to every row or column of
a matrix.
> x<-seq(-5,10,by=.1)
> dnorm(x,3,2)
dt, pt and qt for the t-distribution, though in this case it is necessary to give the degrees of
freedom rather than the mean and standard deviation.
Other distributions available include the binomial, exponential, Poisson and gamma, though
care is needed interpreting the functions for discrete variables.
R enables simulation from a wide range of distributions,
using a syntax similar to the above. For example, to simulate 100 observations from the N(3; 4)
distribution we write
> rnorm(100,3,2)
Similarly, rt, rpois for simulation from the t and Poisson distributions, etc.
Graphics
R has many facilities for producing high quality graphics. A useful facility is to divide a page into
smaller pieces so that more than one figure can be displayed. For example:
> par(mfrow=c(2,2))
creates a window of graphics with 2 rows and 2 columns. With this choice the windows are filled up
row-wise. Use mfcol instead of mfrow to ¯ll up column-wise. The function par is a general function
for setting graphical parameters. There are many options: see help(par).
> par(mfrow=c(2,2))
> hist(Height)
> boxplot(Height)
> hist(Volume)
> boxplot(Volume)
> par(mfrow=c(1,1))
We can also plot one variable against another using the function plot:
> plot(Height,Volume)
To join the data via lines we would use:
> plot(1912:1971,temp,type='l')
> plot(1912:1971,temp,type='b')
R can also produce a scatterplot matrix (a matrix of scatterplots for each pair of variables)
using the function pairs:
> pairs(trees)
Writing functions
An important feature of R is the facility to extend the language by writing your own
functions.
Below defines a function named several.plots.
several.plots<-function(x){
par(mfrow=c(3,1))
hist(x[,1])
hist(x[,2])
plot(x[,1],x[,2])
par(mfrow=c(1,1))
apply(x,2,summary)
}
> several.plots(faithful)
Defining a Function
Let’s start by defining a function fahrenheit_to_celsius that converts temperatures
from Fahrenheit to Celsius:
fahrenheit_to_celsius <- function(temp_F) {
temp_C <- (temp_F - 32) * 5 / 9
return(temp_C)
}
fahrenheit_to_celsius(37)
Other things
There are many other facilities in R. These include:
1. Functions for fitting statistical models such as linear and generalized linear models.
2. Functions for fitting curves to smooth data.
3. Functions for optimisation and equation solving.
4. Facilities to program using loops and conditional statements such as if and while.
5. Plotting routines to view 3-dimensional data.
There is also the facility to 'bolt-on' additional libraries of functions that have a specific utility.
Typing
> library()
will give a list and short description of the libraries available. Typing
> library(libraryname)
where libraryname is the name of the required library will give you access to the functions in that
library.
Task 1: The data y<-c(33,44,29,16,25,45,33,19,54,22,21,49,11,24,56) contain sales of milk in litres for
5 days in three different shops (the first 3 values are for shops 1,2 and 3 on Monday, etc.) Produce a
statistical summary of the sales for each day of the week and also for each shop.
x = |3 2|
|1 1|
y = |1 4 0|
|0 1 -1|
Task 3- Attach to the dataset mtcars and find the mean weight and mean fuel consumption for
vehicles in the dataset (type help(mtcars) for a description of the variables available).
Task 4- Write a function that takes as its argument two vectors, x and y, produces a scatterplot, and
calculates the correlation coe±cient (using cor(x,y)).
Task 5. Write a function that takes a vector (x1; : : : ; xn) and calculates both ∑xi and ∑xi2
. (Remember the use of the function sum).