R Programming 2
R Programming 2
§ Introduction
What is R?
§ R was created by Ross Ihaka and Robert Gentleman at the University of Auckland,
New Zealand, and is currently developed by the R Development Core Team.
§ This programming language was named R, based on the first letter of the first name
of the two authors (Robert Gentleman and Ross Ihaka). And partly a play on the
name of Bell Labs Language S.
The R environment
R is an integrated suite of software facilities for data manipulation, calculation and graphical
display. It includes
Starting R
§ The beauty of R is that it is a shareware, so it is free to anyone (no license fee) and
available online.
§ Once you have installed R, there will be an icon on your desktop. Double click it and R
will start up. When you do, a window should appear in your screen
§ The “>” is a prompt symbol displayed by R. This is R’s way of telling you that it’s ready
for you to type a command.
§ If a command is not complete at the end of a line, R will give a different prompt by
default “+” on the second and subsequent line and continue to read input until the
command is syntactically complete.
§ To see the list of installed datasets, use the data method with an empty argument:
>data().
§ To get more information on any specific named function e.g. sqrt, mean e.t.c the
command is :
>help(sqrt) or
>?sqrt
§ For a feature specific by special characters, the argument must be enclosed in single or
double quotes e.g. (“[[“).
>help(“[[“)
§ This is also necessary for a few word with syntactic meaning, including if, for, while and
function.
>help(“for”)
Technically R is an expression language with a very simple syntax. Users are expected to type
inputs (commands) into R in the console window.
Commands
§ Expressions and commands in R are case-sensitive. e.g. X and x do not refer to the
same variable
§ Command lines do not need to be separated by any special character like semicolon in
SAS.
§ You can use the arrow keys on the keyboard to scroll back to previous commands.
§ Variable name can be created by using letters, digits and the dot (.) symbol. The variable
name consists of letters, numbers and the dot or underlined character. E.g. Wt.male,
Var_name, Var, …
§ Variable name must not start with a digit or a dot (.) followed by a digit or vice versa.
E.g. 2var_name, 1var, .1x, 1.x … are not valid.
§ Avoid some special name use by the system, e.g. c, q, t, C, F, I,T, diff, df, pt – AVOID.
1. Numeric: The most common data type in R is numeric. A variable or a series will be
stored as numeric data if the values are numbers or if the values are decimals e.g c=3.7,
5, 6,1.231
2. Integer: Integer data are actually a special case of numeric data. E.g numbers of
children in a family … 1L, 3L, 6L (the L tells R to store this as an integer).
3. Complex: Complex numbers with real and imaginary parts. E.g 1+4i
4. Logical: A logical variable is a variable with only two values; TRUE or FALSE
5. Character: The data type character is used when sorting text, known as strings in R. The
simplest way to store data under the character format is by using ” ” around the piece of
text: e.g char = “male”.
· If you want to force any kind of data to be stored as character, you can do it by
using the command: as.character()
Data structures in R
1. Vector: A vector is the most common and basic data structure in R and is pretty much
the workhorse in R. e.g. c(3,4,5) or c(“male”, “female”).
2. List: A list is an R-object which can contain many different types of elements like vector,
functions and even another list inside it. e.g. list(c(1,2,3), 21.3, sin60, c(2,4,1))
4. Data Frames: Data frames are tabular data objects. Unlike matrix, in the data frame each
column can contain different modes of data. The first column could be numeric while
the second column could be character and the third column could be logical. It is a list
of equal length. Data frames are created using ”data.frame()” function.
5. Array: While matrices are confined to two dimensions, arrays can be of any number of
dimensions, the array takes a dim attribute which creates the required numbers of
dimension.
6. Factor: They are data objects which are used to categorize the data and store it as
levels. They can store both strings and integers. They are useful in data analysis for
statistical modeling. They are created using the ”factor()” function by taking a vector as
input.
Variable Assignment
The variable can be assigned values using leftward, rightward and equal to operator. The values
of the variable can be printed using “print()” or simply print the variable name.
R Operators
Objects
§ These may be variables, arrays of numbers, character strings, functions or more general
structures built from such components.
§ To list the object you have created in a session use either of the following commands:
>object() or ls().
§ To remove a specified number of objects use: >rm(x,y), only object x and y will be
removed.
§ To quit the R program use the close(X) button in the window or you can use the
command q().
Vectors
§ Vectors are the simplest type of object in R and it is simply a list of items that are of the
same type.
1. Numeric vectors
2. Character vectors
3. Logical Vectors
E.g. To set up a number vector X consisting of 5 numbers, 10, 6, 3, 6, 22. We use any one of
the following commands:
>assign(“X”,c(10, 6, 3, 6, 22))
2. Character Vectors: A character or strings are used for storing text. A string is
surrounded by either single quotation marks or double quotation marks, but is printed
using double quotes (or sometimes without quotes). “hello” is the same as ‘hello’. For
example, we create a vector variable called fruits;
>fruits = c(“banana”,”apple”,”orange”)
3. Logical Vectors: A logical vector is vector whose elements are TRUE, FALSE, or NA.
TRUE and FALSE are often abbreviated as T and F respectively. Comparison operators
are <, >, <=, >=, ==, !=.
Logical Operators
§ &: Element-wise logical AND operator. It returns TRUE if both elements are TRUE.
§ |: Element-wise logical OR operator. It returns TRUE if one of the element is TRUE.
§ &&: Logical AND operator. It returns TRUE if both statements are TRUE.
NOTE: The logical operator && and || consider only the first element of the vectors and give a
vector of single element as output.
a) x&y
b) x|y
c) x&&y
d) x||y
>x=c(3,4,TRUE)
Generating Sequence
[1] 4 5 6 7 8 9 10 11 12
>10:1
[1] 10 9 8 7 6 5 4 3 2 1
>3.8:11.4
Note: The colon operator has high priority within an expression. e.g 2*1:10 is equivalence to
2*(1:10)
>seq(1,10,by=2) [1] 1 3 5 7 9
Matrices
§ In R, matrices are an extension of the numeric or character vectors with dimensions; the
numbers of row and columns.
§ As with vectors, all the elements of a matrix must be of the same data type.
Example:
>X=matrix(c(3,4,5,6,7,8,9,10,11,12,13,14),nrow=4,byrow=TRUE)
>X
>Y=matrix(c(3:14),4,3,byrow=T)
>Y
When we execute the above code, it produces the following result:
[2,] 6 7 8
[3,] 9 10 11
[4,] 12 13 14
§ By default the matrix is filled by column. To fill the matrix by row specify byrow = T as
argument in the matrix function.
§ Matrix operations (multiplication , transpose e.tc) can easily be performed using a few
simple functions like: