R Programming Basics
R Programming Basics
Objectives:
Students will be able to
Understand data structures available in R programming.
Understand how data structures are used for writing programmes in R
Learn working of R programming for data visualization.
How to implement data structures in R, step by step
Understand use of inbuilt data sets and how to perform operations on it.
Prerequisites:
R vector is the basic data structure, which plays an essential role in R programming. Vectors
in R are same as the arrays in C language which hold multiple data values of the same type.
One major key point is that in R the indexing of the vector will start from ‘1’ and not from
‘0’. We can create numeric vectors and character vectors as well.
R vector comes in two parts: Atomic vectors and Lists.
- These data structures differ in the type of their elements: All elements of an atomic
vector must be of the same type, whereas the elements of a list can have different
types.
- They have three common properties, i.e., function type, function length,
and attribute function.
Types of Atomic Vector
1. Numeric Data Type - Decimal values are referred to as numeric data types in R.
Output
[1] "integer"
3. Character Data Type - The character is held as the one-byte integer in memory
# R program to create Character Vectors
Output
4. Logical Data Type - A logical data type returns either of the two values – TRUE or
FALSE based on which condition is satisfied and NA for NULL values.
For Example:
Output:
using c function 1 2 3 4
2. Using the ‘:’ Operator: We can create a vector with the help of the colon operator to
create vector of continuous values
V2 <- 1:5
cat('using colon', V2)
Output:
using colon 1 2 3 4 5
3. Using the seq() function - In R, we can create a vector with the help of the seq()
function. A sequence function creates a sequence of elements as a vector. There are
also two ways in this. The first way is to set the step size and the second method is by
setting the length of the vector.
Seq_V <- seq(1, 4, length.out = 6)
cat('using seq() function', Seq_V, '\n')
Output:
using seq() function 1 1.6 2.2 2.8 3.4 4
Seq_V2 <- seq(1, 4, by = 0.5)
cat('using seq() function by', Seq_V2, '\n')
Output:
using seq() function by 1 1.5 2.0 2.5 3.0 4
With the help of vector indexing, we can access the elements of vectors. Indexing denotes the
position where the values in a vector are stored. This indexing can be performed with the help
of integer, character or logic.
# R program to access elements of a Vector
# accessing elements with an index number.
X <- c(1, 5, 10, 1, 12)
cat('Using Subscript operator', X[2], '\n')
Output:
Using Subscript operator 5
# by passing a range of values inside the vector index.
Y <- c(14, 18, 12, 11, 17)
cat('Using combine() function', Y[c(4, 1)], '\n')
Output:
Using combine() function 11 14
# using logical expressions
Z <- c(5, 2, 1, 4, 4, 3)
cat('Using Logical indexing', Z[Z>4])
Output:
Using Logical indexing 5
Vector Operations
In R, there are various operation which is performed on the vector. We can add, subtract,
multiply or divide two or more vectors from each other. In data science, R plays an important
role, and operations are required for data manipulation. There are the following types of
operation which are performed on the vector.
1. Combining vector in R : The c() function is not only used to create a vector, but also it
is also used to combine two vectors. By combining one or more vectors, it forms a new
vector which contains all the elements of each vector. Let see an example to see how c()
function combines the vectors.
num = c(1, 2, 3, 4)
str = c("one", "two", "three", "Four")
c(num,str)
Output:
a<-c(1,3,5,7)
b<-c(2,4,6,8)
a+b #addition
Output
[1] 3 7 11 15
a-b #subtraction
Output
[1] -1 -1 -1 -1
a/b #division #a % / % b is used to give integer division
Output
[1] 0.5000000 0.7500000 0.8333333 0.8750000
a%%b #Reminder operation
Output
[1] 1 3 5 7
3. Logical Index Vector: With the help of the logical index vector in R, we can form a new
vector from a given vector. This vector has the same length as the original vector. The
names<-c("Ram","Aryan","Nisha","Siya","Radha","Gunjan")
L<-c(TRUE,FALSE,TRUE,TRUE,FALSE,FALSE)
names[b]
Output
[1] "Ram" "Nisha" "Siya"
4. Numeric Index: In R, we specify the index between square braces [ ] for indexing a
numerical value. If our index is negative, it will return us all the values except for the
index which we have specified. For example, specifying [-3] will prompt R to convert -
3 into its absolute value and then search for the value which occupies that index.
names<-c("Ram","Aryan","Nisha","Siya","Radha","Gunjan")
names[2]
Output
[1] "Aryan"
names[-4]
Output
[1] "Ram" "Aryan" "Nisha" "Radha" "Gunjan"
names[15]
Output
NA
5. Duplicate Index: The index vector allows duplicate values. Hence, the following
retrieves a member twice in one operation
names<-c("Ram","Aryan","Nisha","Siya","Radha","Gunjan")
names[c(2,4,4,3)]
Output
[1] "Aryan" "Siya" "Siya" "Nisha"
6. Range Index: Range index is used to slice our vector to form a new vector. For slicing,
we used colon(:) operator. Range indexes are very helpful for the situation involving a
large operator.
names<-c("Ram","Aryan","Nisha","Siya","Radha","Gunjan")
names[1:3]
Output
[1] "Ram" "Aryan" "Nisha"
7. Out of Order Index: In R, the index vector can be out-of-order. Below is an example
in which a vector slice with the order of first and second values reversed
names<-c("Ram","Aryan","Nisha","Siya","Radha","Gunjan")
names[c(2,1,3,4,5,6)]
Output
[1] "Aryan" "Ram" "Nisha" "Siya" "Radha" "Gunjan"
Output
ascending order 1 2 2 7 8 11
descending order 11 8 7 2 2 1
# Once our vector of characters is created, we name the first vector member as
"Start" and the #second member as "End" as:
names(lib)=c("Start","End")
lib
Output
Start End
"TensorFlow" "PyTorch"
Applications of vectors
1. In machine learning for principal component analysis vectors are used. They are
extended to eigenvalues and eigenvector and then used for performing decomposition
in vector spaces.
What is R List?
In R, lists are the second type of vector. Lists are the objects of R which contain elements of
different types such as number, vectors, string and another list inside it. It can also contain a
function or a matrix as its elements. A list is a data structure which has components of mixed
data types. We can say, a list is a generic vector which contains other objects.
How to create List?
The process of creating a list is the same as a vector. In R, the vector is created with the help
of c() function. Like c() function, there is another function, i.e., list() which is used to create a
list in R. A list avoid the drawback of the vector which is data type. We can add the elements
in the list of different data types.
Output
$Rscript main.r
[[1]]
[1] "Ram"
[[2]]
[1] "Sham"
[[3]]
[1] 1 2 3
[[4]]
[1] TRUE
[[5]]
[1] 1.03
R provides a easy way for accessing elements, i.e., by giving the name to each element of a
list. By assigning names to the elements, we can access the element easily. There are only three
steps to print the list data corresponding to the name:
1. First one is the indexing method performed in the same way as a vector.
2. In the second one, we can access the elements of a list with the help of names. It will
be possible only with the named list.; we cannot access the elements of a list using
names if the list is normal.
print(list_data$Marks)
Output
[,1] [,2] [,3]
[1,] 40 60 90
[2,] 80 70 80
print(list_data)
Output
$Student
[1] "Ram" "Sham" "Raj"
$Marks
[,1] [,2] [,3]
[1,] 40 60 90
[2,] 80 70 80
$Course
$Course[[1]]
[1] "BCA"
$Course[[2]]
[1] "MCA"
How to manipulate List elements in R?
$Course[[3]]
[1] "BSc"
Converting list to vector :There is a drawback with the list, i.e., we cannot perform all the
arithmetic operations on list elements. To remove this, drawback R provides unlist() function.
This function converts the list into vectors. In some cases, it is required to convert a list into a
vector so that we can use the elements of the vector for further manipulation. The unlist()
function takes the list as a parameter and change into a vector.
[[1]][[2]]
[1] 4
[[1]][[3]]
[1] 6
[[1]][[4]]
[1] 8
[[1]][[5]]
[1] 10
[[2]]
[[2]][[1]]
[1] 1
[[2]][[2]]
[1] 3
[[2]][[3]]
[1] 5
[[2]][[4]]
[1] 7
[[2]][[5]]
[1] 9
In R, arrays are the data objects which allow us to store data in more than two dimensions. In
R, an array is created with the help of the array() function. This array() function takes a vector
as an input and to create an array it uses vectors values in the dim parameter.
For example- if we will create an array of dimension (2, 3, 4) then it will create 4 rectangular
matrices of 2 row and 3 columns.
Output
,,1
,,2
We can give names to the rows, columns and matrices in the array by using
the dimnames parameter.
Output
, , Matrix1
COL1 COL2 COL3
ROW1 1 10 13
ROW2 2 11 14
ROW3 3 12 15
, , Matrix2
Matrix in R
In R, a two-dimensional rectangular data set is known as a matrix. A matrix is created with the
help of the vector input to the matrix function. On R matrices, we can perform addition,
subtraction, multiplication, and division operation.
In the R matrix, elements are arranged in a fixed number of rows and columns. The matrix
elements are the real numbers. In R, we use matrix function, which can easily reproduce the
memory representation of the matrix. In the R matrix, all the elements must share a common
basic type.
Matrix Computations : Various mathematical operations are performed on the matrices using
the R operators. The result of the operation is also a matrix. The dimensions (number of rows
and columns) should be same for the matrices involved in the operation.
> # Create two 2x3 matrices.
> matrix1 <- matrix(c(3, 9, -1, 4, 2, 6), nrow = 2)
> print(matrix1)
Output
[,1] [,2] [,3]
[1,] 3 -1 2
[2,] 9 4 6
> matrix2 <- matrix(c(5, 2, 0, 9, 3, 4), nrow = 2)
> print(matrix2)
Output
[,1] [,2] [,3]
[1,] 5 0 3
[2,] 2 9 4
> # Add the matrices.
> result <- matrix1 + matrix2
> cat("Result of addition","\n")
Result of addition
> print(result)
Output
[,1] [,2] [,3]
[1,] 8 -1 5
[2,] 11 13 10
> # Subtract the matrices
> result <- matrix1 - matrix2
> cat("Result of subtraction","\n")
Result of subtraction
> print(result)
Output
[,1] [,2] [,3]
[1,] -2 -1 -1
[2,] 7 -5 2
> # Multiply the matrices.
> result <- matrix1 * matrix2
> cat("Result of multiplication","\n")
Result of multiplication
> print(result)
Output
[,1] [,2] [,3]
[1,] 15 0 6
[2,] 18 36 24
> result <- matrix1 / matrix2
> cat("Result of division","\n")
Result of division
> print(result)
Output
[,1] [,2] [,3]
[1,] 0.6 -Inf 0.6666667
[2,] 4.5 0.4444444 1.5000000
Data Frames in R
In R, the data frames are created with the help of frame() function of data. This function
contains the vectors of any type such as numeric, character, or integer. In below example, we
create a data frame that contains employee id (integer vector), employee name(character
vector), salary(numeric vector), and starting date(Date vector).
# Creating the data frame.
emp.data<- data.frame(
employee_id = c (101:105),
employee_name = c("Ram","Sham","Neha","Siya","Sumit"),
sal = c(40000, 35000, 20000, 25000, 30000),
The data of the data frame is very crucial for us. To manipulate the data of the data frame, it is
essential to extract it from the data frame. We can extract the data in three ways which are as
follows:
1. We can extract the specific columns from a data frame using the column name.
Output
emp.data.employee_id emp.data.sal
1 101 40000
2 102 35000
3 103 20000
4 104 25000
5 105 30000
Output
employee_id starting_date
2 102 2019-09-01
3 103 2021-01-01
What is Factors in R?
The factor is a data structure which is used for fields which take only predefined finite
number of values.
These are the variable which takes a limited number of different values.
These are the data objects which are used to store categorical data as levels.
It can store both integers and strings values, and are useful in the column that has a
limited number of unique values.
By default R always sorts levels in alphabetical order.
Attributes of Factor
X
It is the input vector which is to be transformed into a factor.
levels
It is an input vector that represents a set of unique values which are taken by x.
labels
It is a character vector which corresponds to the number of labels.
Exclude
It is used to specify the value which we want to be excluded,
ordered
It is a logical attribute which determines if the levels are ordered.
nmax
It is used to specify the upper bound for the maximum number of level.
Notice that in the output, we have four levels “Raj”, “Ram” ,”Sham” and “Siya”. This does not
contain the redundant “Ram” that was occurring twice in our input vector.
In order to add this missing level to our factors, we use the “levels” attribute as follows:
In order to provide abbreviations or ‘labels’ to our levels, we make use of the labels
argument as follows –
# Create vector of names
stud_name<-c("Ram","Siya","Raj","Sham",”Ram”)
#Convert vector into factor
factor(stud_name, levels=c("Ram","Siya","Raj","Sham"),labels = c("R1","S1","R2","S2"))
Output
[1] R1 S1 R2 S2 R1
Levels: R1 S1 R2 S2
if you want to exclude any level from your factor, you can make use of the exclude
argument.
Output
[1] North
Levels: East North West
Output
[1] West East
Levels: East North West
Output
Output
The levels() method of the specific factor vector can be used to extract the individual levels
in the form of independent strings. It returns the number of elements equivalent to the
length of the vector.
Syntax:
levels (fac-vec)[fac-vec]
Since, these levels are each returned as strings, therefore they belong to the same data type.
Hence, they are combined now to form an atomic vector using the c() method.
Syntax:
c (vec1 , vec2)
where vec1 and vec2 are vectors belonging to same data type
Factor Functions in R
Some of these functions; is.factor(), as.factor(), is.ordered(), as.ordered().
is.factor() checks if the input is present in the form of factor and returns a Boolean
value (TRUE or FALSE).
as.factor() takes the input (usually a vector) and converts it into a factor.
is.ordered() checks if the factor is ordered and returns boolean TRUE or FALSE.
The as.ordered() function takes an unordered function and returns a factor that is
arranged in order.
Working with Dataset
1. Dataset built into R
This dataset is built into R ,to get information on the dataset is by typing ? before the
dataset. Information will appear on the 4th panel under the Help tab.
?women
If you want to see only a portion of the dataset, the function head( ) or tail( ) will do
the job.
The function head(data_frame[,nL]) will show the first 6 rows of the dataset. (nL :
where is total number of rows you want followed by L)
head(women)
The function tail(data_frame) will show the last 6 rows of the dataset.
tail(women)
Suppose we want to order the women data in ascending order of weight of the
women
data=women
sorted_data=data[order(data$weight),]
sorted_data
To order the data frame, in descending order by weight of the women, put negative
sign in front of the target vector.
data=women
sorted_data=data[order(-data$weight),]
sorted_data
R Data Visualization
In R, we can create visually appealing data visualizations by writing few lines of code. For this
purpose, we use the diverse functionalities of R. Data visualization is an efficient technique for
By using the data visualization technique, we can work with large datasets to efficiently obtain
key insights about it.
Standard Graphics
R standard graphics are available through package graphics, include several functions which
provide statistical plots, like:
o Scatterplots
Scatterplots show many points plotted in the Cartesian plane. Each point represents the values
of two variables. One variable is chosen in the horizontal axis and another in the vertical axis.
The simple scatterplot is created using the plot() function.
Syntax
The basic syntax for creating scatterplot in R is −
plot(x, y, main, xlab, ylab, xlim, ylim, axes)
Following is the description of the parameters used −
o x is the data set whose values are the horizontal coordinates.
o y is the data set whose values are the vertical coordinates.
o main is the tile of the graph.
o xlab is the label in the horizontal axis.
o ylab is the label in the vertical axis.
o xlim is the limits of the values of x used for plotting.
o ylim is the limits of the values of y used for plotting.
o axes indicates whether both axes should be drawn on the plot.
# Plot the chart for cars with weight between 2.5 to 5 and mileage #between 15 and 30.
plot(x = input$wt,y = input$mpg,
xlab = "Weight",
ylab = "Milage",
xlim = c(2.5,5),
ylim = c(15,30),
main = "Weight vs Milage"
)
The Pie charts are created with the help of pie () function, which takes positive numbers as
vector input. Additional parameters are used to control labels, colors, titles, etc.
Here,
1. X is a vector that contains the numeric values used in the pie chart.
2. Labels are used to give the description to the slices.
3. Radius describes the radius of the pie chart.
4. Main describes the title of the chart.
5. Col defines the color palette.
6. Clockwise is a logical value that indicates the clockwise or anti-clockwise direction in
which slices are drawn.
Output:
The barplot default is a vertical bar graph with black borders and gray filling. The
bar graph is drawn in the same order as the data frame entries.
To add a title, labels on the axes and color to your bar graph, we use the following
arguments.
barplot(height = IB$Users,
main = "2018 Internet Browser Users (in million)",
xlab = "Internet Browser",
ylab = "Users",
names.arg = IB$Browser,
border = "dark blue",
col = "pink")
Output: