0% found this document useful (0 votes)
16 views13 pages

1 - Introduction To Programming With R

This document provides an introduction to data types and structures in R, including vectors, matrices, and lists. It discusses the six basic data types in R: character, numeric, integer, logical, complex, and raw. Vectors are the most basic data structure and can be atomic (containing only one data type) or mixed. Matrices are atomic vectors with dimensions arranged in a two-dimensional layout. Lists allow elements of different types and can contain other lists. The document also demonstrates how to create, examine, subset and perform operations on various data structures in R.

Uploaded by

paseg78960
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
16 views13 pages

1 - Introduction To Programming With R

This document provides an introduction to data types and structures in R, including vectors, matrices, and lists. It discusses the six basic data types in R: character, numeric, integer, logical, complex, and raw. Vectors are the most basic data structure and can be atomic (containing only one data type) or mixed. Matrices are atomic vectors with dimensions arranged in a two-dimensional layout. Lists allow elements of different types and can contain other lists. The document also demonstrates how to create, examine, subset and perform operations on various data structures in R.

Uploaded by

paseg78960
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 13

Introduction to Programming with R

Data Types and Structures


Understanding Basic Data Types and Data Structures in R
Data structures are very important to understand because these are the objects you will
manipulate on a day-to-day basis in R. Dealing with object conversions is one of the
most common sources of frustration for beginners.
Everything in R is an object.
R has 6 basic data types.
• Character: alphabets, digits, punctuation marks, special characters.
• numeric (real or decimal):
• integer:
• logical: <, >, =, !=
• complex: x + iy
• Other – raw
Elements of these data types may be combined to form data structures.
R has many data structures. These include,
• Atomic vector, Vector
• List
• Matrix
• Data frame
• Factors
Atomic Vector
The vector which holds only the data of a single data type, it is called an atomic vector.
Examples of atomic vector are character vectors, numeric vectors, integer vectors, etc.
• character: all alphabets, digits, special characters…etc
• numeric: integers, floats
• logical: TRUE, FALSE
• complex: 2 + 3i (complex numbers with real and imaginary parts)
R provides many functions to examine features of vectors and other objects.
Functions Gives
class() Type of object i
typeof() Data type of the object
length() Number of Characters in the object
attributes() The meta data held
# Example
x <- "birla"
typeof(x)
[1] "character"
attributes(x)
NULL
y <- 1:10
y
[1] 1 2 3 4 5 6 7 8 9 10
typeof(y)
[1] "integer"
length(y)
[1] 10
z <- as.numeric(y)
z
[1] 1 2 3 4 5 6 7 8 9 10
typeof(z)
[1] "double"
Vectors
A vector is the most common and basic data structure in R. There are different modes
for the Vector.
A vector is a collection of elements that are most commonly of mode are
character, logical, integer or numeric.
You can create an empty vector with vector (). (By default, the mode is logical.)
There are direct constructs available in R.
It is more common to use direct constructors such as character(), numeric(), etc.
vector () # an empty 'logical' (the default) vector
logical(0)
vector("character", length = 5) # a vector of mode 'character' with 5 elements
[1] "" "" "" "" ""
character(5) # the same thing, but using the constructor directly
[1] "" "" "" "" ""
numeric(5) # a numeric vector with 5 elements
[1] 0 0 0 0 0
logical(5) # a logical vector with 5 elements
[1] FALSE FALSE FALSE FALSE FALSE
You can also create vectors by directly specifying their content.
R automatically guesses the appropriate mode of storage for the vector. For instance:
x <- c(1, 2, 3)
will create a vector x of mode numeric.
These are the most common kind, and are treated as double precision real numbers. If
you wanted to explicitly create integers, you need to add an L to each element
(or coerce to the integer type using as.integer()).
x1 <- c(1L, 2L, 3L)
Using TRUE and FALSE will create a vector of mode logical:
y <- c(TRUE, TRUE, FALSE, FALSE)
While using quoted text will create a vector of mode character:
z <- c("Sarah", "Tracy", "Jon")
Examining Vectors
The functions typeof(), length(), class() and str() provide useful information about your
vectors and R objects in general.
typeof(z)
[1] "character"
length(z)
[1] 3
class(z)
[1] "character"
str(z)
chr [1:3] "Sarah" "Tracy" "Jon"
Adding Elements
The function c() (for combine) can also be used to add elements to a vector.
z <- c(z, "Annette")
z
[1] "Sarah" "Tracy" "Jon" "Annette"
z <- c("Greg", z)
z
[1] "Greg" "Sarah" "Tracy" "Jon" "Annette"
Vectors from a Sequence of Numbers
You can create vectors as a sequence of numbers.
series <- 1:10
seq(10)
[1] 1 2 3 4 5 6 7 8 9 10
seq(from = 1, to = 10, by = 0.1)
[1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4
[16] 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9
[31] 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1 5.2 5.3 5.4
[46] 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9
[61] 7.0 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 8.0 8.1 8.2 8.3 8.4
[76] 8.5 8.6 8.7 8.8 8.9 9.0 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9
[91] 10.0
Missing Data
R supports missing data in vectors. They are represented as NA (Not Available) and can
be used for all the vector types covered in this lesson:
x <- c(0.5, NA, 0.7)
x <- c(TRUE, FALSE, NA)
x <- c("a", NA, "c", "d", "e")
x <- c(1+5i, 2-3i, NA)
The function is.na() indicates the elements of the vectors that represent missing data,
and the function anyNA() returns TRUE if the vector contains any missing values:
x <- c("a", NA, "c", "d", NA)
y <- c("a", "b", "c", "d", "e")
is.na(x)
[1] FALSE TRUE FALSE FALSE TRUE
is.na(y)
[1] FALSE FALSE FALSE FALSE FALSE
anyNA(x)
[1] TRUE
anyNA(y)
[1] FALSE
Other Special Values
Inf is infinity. You can have either positive or negative infinity.
1/0
[1] Inf
NaN means Not a Number. It’s an undefined value.
0/0
[1] NaN
Vectors with Mix Types
R will create a resulting vector with a mode that can most easily accommodate all the
elements it contains. This conversion between modes of storage is called “coercion”.
When R converts the mode of storage based on its content, it is referred to as “implicit
coercion”.
Ex – Guess what the following do (without running them first)?
xx <- c(1.7, "a") – character
xx <- c(TRUE, 2) – logical
xx <- c("a", TRUE) – character
You can also control how vectors are coerced explicitly using the
as.<class_name>() functions:
as.numeric("1")
[1] 1
as.character(1:2)
[1] "1" "2"
Objects Attributes
Objects can have attributes. Attributes are part of the object. These include:
• names
• dimnames
• dim
• class
• attributes (contain metadata)
Other attributes
1. length (works on vectors and lists) or
2. number of characters(nchar) (for character strings).
length(1:10)
[1] 10
nchar("Software Carpentry")
[1] 18
R – Matrix
In R, matrices are an extension of the numeric or character vectors. They are not a
separate type of object but simply an atomic vector with
• dimensions, where elements are arranged in a two-dimensional rectangular
layout
• rows and columns
• elements of the same data type.
We use matrices containing numeric elements to be used in mathematical calculations.
A Matrix is created using the matrix() function. The basic syntax for creating a matrix
in R is –
matrix(nrow, ncol)
matrix(data, nrow, ncol, byrow, dimnames)
m <- matrix(nrow = 2, ncol = 2)
m
[,1] [,2]
[1,] NA NA
[2,] NA NA
dim(m)
[1] 2 2
# class() and typeof() to find type and class attributes of Matrices
m <- matrix(c(1:3))
class(m)
[1] "matrix" "array"
typeof(m)
[1] "integer"
While class() shows that m is a matrix, typeof() shows that fundamentally the matrix is
an integer vector.
# Data types of matrix elements
Consider the following matrix:
M1 <- matrix(c(4, 4, 4, 4), nrow = 2, ncol = 2)
# Matrices in R are filled column-wise.
m <- matrix(1:6, nrow = 2, ncol = 3)
# Other ways to construct a matrix
m <- 1:10
dim(m) <- c(2, 5)
# This takes a vector and transforms it into a matrix with 2 rows and 5 columns.
# rbind() and cbind()
# Another way is to bind columns or rows using rbind() and cbind() (“row bind”
and “column bind”, respectively).
x <- 1:3
y <- 10:12
cbind(x, y)
x y
[1,] 1 10
[2,] 2 11
[3,] 3 12
rbind(x, y)
[,1] [,2] [,3]
x 1 2 3
y 10 11 12
# use byrow argument to specify how the matrix is filled.
mdat <- matrix(c(1, 2, 3, 11, 12, 13),
nrow = 2,
ncol = 3,
byrow = TRUE)
mdat
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 11 12 13
# Elements of a matrix can be referenced by specifying the index along each
dimension (e.g. “row” and “column”) in single square brackets.
mdat[2, 3]
[1] 13
#Create a matrix taking a vector of numbers as input.
# Elements are arranged sequentially by row.
M <- matrix(c(3:14), nrow = 4, byrow = TRUE)
print(M)

# Elements are arranged sequentially by column.


N <- matrix(c(3:14), nrow = 4, byrow = FALSE)
print(N)
# Define the column and row names.
rownames = c("row1", "row2", "row3", "row4")
colnames = c("col1", "col2", "col3")
P <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(rownames, colnames))
print(P)
# Accessing Elements of a Matrix
Elements of a matrix can be accessed by using the column and row index of the element.
We consider the matrix P above to find the specific elements below.
# Define the column and row names.
rownames = c("row1", "row2", "row3", "row4")
colnames = c("col1", "col2", "col3")
# Create the matrix.
P <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(rownames, colnames))
# Access the element at 3rd column and 1st row.
print(P[1,3])
# Access the element at 2nd column and 4th row.
print(P[4,2])
# Access only the 2nd row.
print(P[2,])
# Access only the 3rd column.
print(P[,3])
# Matrix Operations: addition and subtraction
# Create two 2x3 matrices.
matrix1 <- matrix(c(3, 9, -1, 4, 2, 6), nrow = 2)
print(matrix1)
matrix2 <- matrix(c(5, 2, 0, 9, 3, 4), nrow = 2)
print(matrix2)
# Add the matrices.
result <- matrix1 + matrix2
cat("Result of addition","\n")
print(result)
# Subtract the matrices
result <- matrix1 - matrix2
cat("Result of subtraction","\n")
print(result)
When we execute the above code, it produces the following result −
[,1] [,2] [,3]
[1,] 3 -1 2
[2,] 9 4 6
[,1] [,2] [,3]
[1,] 5 0 3
[2,] 2 9 4
Result of addition
[,1] [,2] [,3]
[1,] 8 -1 5
[2,] 11 13 10
Result of subtraction
[,1] [,2] [,3]
[1,] -2 -1 -1
[2,] 7 -5 2
# Matrix Multiplication & Division
# Create two 3x3 matrices.
matrix1 <- matrix(c(3, 9, -1, 4, 2, 6), nrow = 2)
print(matrix1)
matrix2 <- matrix(c(5, 2, 0, 9, 3, 4), nrow = 2)
print(matrix2)
# Multiply the matrices.
result <- matrix1 * matrix2
cat("Result of multiplication","\n")
print(result)
# Divide the matrices
result <- matrix1 / matrix2
cat("Result of division","\n")
print(result)
Working with Lists in R
In R, lists
• act as containers and the contents of a list are not restricted to a single mode and
can contain mix type of data.
• are sometimes called generic vectors, because the elements of a list can be of any
type of R object,
• can contain lists as element, because of which they are fundamentally different from
atomic vectors.
Features of List
1. Lists can be extremely useful inside functions.
2. Because the functions in R returns only a single object, you can “combine” together
different kinds of results into a single object that a function can return.
3. A list does not print to the console like a vector.
4. Instead, each element of the list starts on a new line.
5. Elements are indexed by double brackets.
6. Single brackets will still return (another) list.
7. If the elements of a list are named, they can be referenced by the $ notation
(i.e. xlist$a, …. xlist$data).
#Create lists using list() or coerce other objects using as.list().
An empty list of the required length can be created using vector (). The content of
elements of a list can be retrieved by using double square brackets e.g., x[[k]] gives the
kth element of the list.
x <- list(1, "a", TRUE, 1+4i)
x
[[1]]
[1] 1
[[2]]
[1] "a"
[[3]]
[1] TRUE
[[4]]
[1] 1+4i
x <- vector("list", length = 5) # empty list
length(x)
[1] 5
To coerce the vectors to lists:
x <- 1:10
x <- as.list(x)
length(x)
[1] 10
Questions
1. What is the class of x[1]?
2. What is the class of x[[1]]?
# Elements of a list can be named (i.e. lists can have the names attribute)
xlist <- list(a = "Karthik Ram", b = 1:10, data = head(mtcars))
xlist
$a
[1] "Karthik Ram"
$b
[1] 1 2 3 4 5 6 7 8 9 10
$data
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
names(xlist)
[1] "a" "b" "data"
# Create a list containing a vector, a matrix and a list.
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2),
list("green",12.3))
# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")
# Show the list.
list_data
# Access the first element of the list.
list_data[1]
# Access the thrid element. As it is also a list, all its elements will be printed.
list_data[3]
# Access the list element using the name of the element.
list_data$A_Matrix
# Manipulating List Elements
In a list
• we can add, delete list elements as shown below, only at the end of a list.
• But we can update any element of the list.
# Create a list containing a vector, a matrix and a list.
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2),
list("green",12.3))
# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")
# Add element at the end of the list.
list_data[4] <- "New element"
print(list_data[4])
# Remove the last element.
list_data[4] <- NULL
# Print the 4th Element.
print(list_data[4])
# Update the 3rd Element.
list_data[3] <- "updated element"
print(list_data[3])
#Merging Lists
You can merge many lists into one list by placing all the lists inside one list() function.
# Create two lists.
list1 <- list(1,2,3)
list2 <- list("Sun","Mon","Tue")
# Merge the two lists.
merged.list <- c(list1,list2)
# Print the merged list.
print(merged.list)
Converting List to Vector
A list can be converted to a vector so that the elements of the vector can be used for
further manipulation. All the arithmetic operations on vectors can be applied after the
list is converted into vectors. To do this conversion, we use the unlist() function. It
takes the list as input and produces a vector.
# Create lists.
list1 <- list(1:5)
print(list1)
list2 <-list(10:14)
print(list2)
# Convert the lists to vectors.
v1 <- unlist(list1)
v2 <- unlist(list2)
print(v1)
print(v2)
# Now add the vectors
result <- v1+v2
print(result)
Factors in R
• Factors are the data objects which are used to categorize the data and store it as levels.
• They can store both strings and integers.
• They are useful in the columns which have a limited number of unique values. Like
Male, Female and True, False etc.
• They are useful in data analysis for statistical modeling.
• Factors are created using the factor() function by taking a vector as input.
# Create a vector as input.
data <- c("East","West","East","North","North","East","West","West","West", "East",
"North")
print(data)
# Convert it to factor
print(is.factor(data))
# Apply the factor function.
factor_data <- factor(data)
print(factor_data)
print(is.factor(factor_data))
# Working with Data Frames in R
# Factors in Data Frame
On creating any data frame with a column of text data, R treats the text column as
categorical data and creates factors on it.
# Create the vectors for data frame.
height <- c(132,151,162,139,166,147,122)
weight <- c(48,49,66,53,67,52,40)
gender <- c("male","male","female","female","male","female","male")

# Create the data frame.


input_data <- data.frame(height,weight,gender)
print(input_data)

# Test if the gender column is a factor.


print(is.factor(input_data$gender))

# Print the gender column so see the levels.


print(input_data$gender)

Changing the Order of Levels


The order of the levels in a factor can be changed by applying the factor function again
with new order of the levels.
data <- c("East","West","East","North","North","East","West","West","West", "East",
"North")
# Create the factors
factor_data <- factor(data)
print(factor_data)

# Apply the factor function with required order of the level.


new_order_data <- factor(factor_data,levels = c("East","West","North"))
print(new_order_data)
Generating Factor Levels
We can generate factor levels by using the gl() function. It takes two integers as input
which indicates how many levels and how many times each level.
Syntax: gl(n, k, labels)
where
n is an integer giving the number of levels.
k is an integer giving the number of replications.
labels is a vector of labels for the resulting factor levels.
v <- gl(3, 4, labels = c("Tampa", "Seattle","Boston"))
print(v)
R – Data Frame
A data frame is a data structure in R used for tabular data that we use for statistics.
A data frame is a table or a two-dimensional array-like structure in which
• each column contains values of one variable and
• each row contains one set of values from each column.
Characteristics of a data frame:
1. The column names should be non-empty.
2. The row names should be unique.
3. The data stored in a data frame can be of numeric, factor or character type.
4. Each column should contain same number of data items.
A data frame is a special type of list where every element of the list has same length i.e.
data frame is a “tabular / rectangular” list.
# Create the data frame.
emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = FALSE)
# Print the data frame.
print(emp.data)
# Getting the Structure of the Data Frame
The structure of the data frame can be seen by using str() function.
# Get the structure of the data frame.
str(emp.data)
# Summary of Data in Data Frame
The statistical summary and nature of the data can be obtained by applying summary()
function.
# Print the summary.
print(summary(emp.data))
# Extracting Data (rows, columns..)from Data Frame
# To extract specific column from a data frame using column name.
# Extract Specific columns.
result <- data.frame(emp.data$emp_name,emp.data$salary)
print(result)
# Extract first two rows.
result <- emp.data[1:2,]
print(result)
# Extract 3rd and 5th row with 2nd and 4th column.
result <- emp.data[c(3,5),c(2,4)]
print(result)
# Expanding Data Frame
• A data frame can be expanded by adding columns and rows.
• Add Column: Just add the column vector using a new column name.
# Add the "dept" coulmn.
emp.data$dept <- c("IT","Operations","IT","HR","Finance")
v <- emp.data
print(v)
# Add Row
To add more rows permanently to an existing data frame, we need to bring in the new
rows in the same structure as the existing data frame and use the rbind() function.
#create a data frame with new rows and merge it with the existing data frame to
create the final data frame.
Create the first data frame.
emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
dept = c("IT","Operations","IT","HR","Finance"),
stringsAsFactors = FALSE
)
# Create the second data frame
emp.newdata <- data.frame(emp_id = c (6:8),
emp_name = c("Rasmi","Pranab","Tusar"),
salary = c(578.0,722.5,632.8),
start_date = as.Date(c("2013-05-21","2013-07-30","2014-06-17")),
dept = c("IT","Operations","Fianance"),
stringsAsFactors = FALSE)
# Bind the two data frames.
emp.finaldata <- rbind(emp.data,emp.newdata)
print(emp.finaldata)
# Data frames can have additional attributes such as rownames(), which can be useful
for annotating data, like subject_id or sample_id. But most of the time they are not used.
Some additional information on data frames:
• Usually created by read.csv() and read.table(), i.e. when importing the data into
R.
• Assuming all columns in a data frame are of same type, data frame can be
converted to a matrix with data.matrix() (preferred) or as.matrix(). Otherwise type
coercion will be enforced and the results may not always be what you expect.
• Can also create a new data frame with data.frame() function.
• Find the number of rows and columns with nrow(dat) and ncol(dat), respectively.
• Rownames are often automatically generated and look like 1, 2, …, n.
Consistency in numbering of rownames may not be honored when rows are
reshuffled or subset.
# Creating Data Frames by Hand
dat <- data.frame(id = letters[1:10], x = 1:10, y = 11:20)
dat
id x y
1 a 1 11
2 b 2 12
3 c 3 13
4 d 4 14
5 e 5 15
6 f 6 16
7 g 7 17
8 h 8 18
9 i 9 19
10 j 10 20
# Useful Data Frame Functions
• head() - shows first 6 rows
• tail() - shows last 6 rows
• dim() - returns the dimensions of data frame (i.e. number of rows and number of
columns)
• nrow() - number of rows
• ncol() - number of columns
• str() - structure of data frame - name, type and preview of data in each column
• names() or colnames() - both show the names attribute for a data frame
• sapply(dataframe, class) - shows the class of each column in the data frame
is.list(dat)
[1] TRUE
class(dat)
[1] "data.frame"
Note: Because data frames are rectangular, elements of data frame can be referenced by
specifying the row and the column index in single square brackets (similar to matrix).
dat[1, 3]
[1] 11
As data frames are also lists, it is possible to refer to columns (which are elements of
such list) using the list notation, i.e. either double square brackets or a $.
dat[["y"]]
[1] 11 12 13 14 15 16 17 18 19 20
dat$y
[1] 11 12 13 14 15 16 17 18 19 20
The following table summarizes the one-dimensional and two-dimensional data
structures in R in relation to diversity of data types they can contain.
Dimensions Homogenous Heterogeneous

1-D atomic vector list

2-D matrix data frame


Lists can contain elements that are themselves muti-dimensional (e.g. a lists can contain
data frames or another type of objects). Lists can also contain elements of any length,
therefore list do not necessarily have to be “rectangular”. However in order for the list
to qualify as a data frame, the length of each element has to be the same.
Column Types in Data Frames
Knowing that data frames are lists, can columns be of different type? – Yes

You might also like