This document provides an introduction to data types and structures in R, including vectors, matrices, and lists. It discusses the six basic data types in R: character, numeric, integer, logical, complex, and raw. Vectors are the most basic data structure and can be atomic (containing only one data type) or mixed. Matrices are atomic vectors with dimensions arranged in a two-dimensional layout. Lists allow elements of different types and can contain other lists. The document also demonstrates how to create, examine, subset and perform operations on various data structures in R.
This document provides an introduction to data types and structures in R, including vectors, matrices, and lists. It discusses the six basic data types in R: character, numeric, integer, logical, complex, and raw. Vectors are the most basic data structure and can be atomic (containing only one data type) or mixed. Matrices are atomic vectors with dimensions arranged in a two-dimensional layout. Lists allow elements of different types and can contain other lists. The document also demonstrates how to create, examine, subset and perform operations on various data structures in R.
Understanding Basic Data Types and Data Structures in R Data structures are very important to understand because these are the objects you will manipulate on a day-to-day basis in R. Dealing with object conversions is one of the most common sources of frustration for beginners. Everything in R is an object. R has 6 basic data types. • Character: alphabets, digits, punctuation marks, special characters. • numeric (real or decimal): • integer: • logical: <, >, =, != • complex: x + iy • Other – raw Elements of these data types may be combined to form data structures. R has many data structures. These include, • Atomic vector, Vector • List • Matrix • Data frame • Factors Atomic Vector The vector which holds only the data of a single data type, it is called an atomic vector. Examples of atomic vector are character vectors, numeric vectors, integer vectors, etc. • character: all alphabets, digits, special characters…etc • numeric: integers, floats • logical: TRUE, FALSE • complex: 2 + 3i (complex numbers with real and imaginary parts) R provides many functions to examine features of vectors and other objects. Functions Gives class() Type of object i typeof() Data type of the object length() Number of Characters in the object attributes() The meta data held # Example x <- "birla" typeof(x) [1] "character" attributes(x) NULL y <- 1:10 y [1] 1 2 3 4 5 6 7 8 9 10 typeof(y) [1] "integer" length(y) [1] 10 z <- as.numeric(y) z [1] 1 2 3 4 5 6 7 8 9 10 typeof(z) [1] "double" Vectors A vector is the most common and basic data structure in R. There are different modes for the Vector. A vector is a collection of elements that are most commonly of mode are character, logical, integer or numeric. You can create an empty vector with vector (). (By default, the mode is logical.) There are direct constructs available in R. It is more common to use direct constructors such as character(), numeric(), etc. vector () # an empty 'logical' (the default) vector logical(0) vector("character", length = 5) # a vector of mode 'character' with 5 elements [1] "" "" "" "" "" character(5) # the same thing, but using the constructor directly [1] "" "" "" "" "" numeric(5) # a numeric vector with 5 elements [1] 0 0 0 0 0 logical(5) # a logical vector with 5 elements [1] FALSE FALSE FALSE FALSE FALSE You can also create vectors by directly specifying their content. R automatically guesses the appropriate mode of storage for the vector. For instance: x <- c(1, 2, 3) will create a vector x of mode numeric. These are the most common kind, and are treated as double precision real numbers. If you wanted to explicitly create integers, you need to add an L to each element (or coerce to the integer type using as.integer()). x1 <- c(1L, 2L, 3L) Using TRUE and FALSE will create a vector of mode logical: y <- c(TRUE, TRUE, FALSE, FALSE) While using quoted text will create a vector of mode character: z <- c("Sarah", "Tracy", "Jon") Examining Vectors The functions typeof(), length(), class() and str() provide useful information about your vectors and R objects in general. typeof(z) [1] "character" length(z) [1] 3 class(z) [1] "character" str(z) chr [1:3] "Sarah" "Tracy" "Jon" Adding Elements The function c() (for combine) can also be used to add elements to a vector. z <- c(z, "Annette") z [1] "Sarah" "Tracy" "Jon" "Annette" z <- c("Greg", z) z [1] "Greg" "Sarah" "Tracy" "Jon" "Annette" Vectors from a Sequence of Numbers You can create vectors as a sequence of numbers. series <- 1:10 seq(10) [1] 1 2 3 4 5 6 7 8 9 10 seq(from = 1, to = 10, by = 0.1) [1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 [16] 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 [31] 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1 5.2 5.3 5.4 [46] 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 [61] 7.0 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 8.0 8.1 8.2 8.3 8.4 [76] 8.5 8.6 8.7 8.8 8.9 9.0 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 [91] 10.0 Missing Data R supports missing data in vectors. They are represented as NA (Not Available) and can be used for all the vector types covered in this lesson: x <- c(0.5, NA, 0.7) x <- c(TRUE, FALSE, NA) x <- c("a", NA, "c", "d", "e") x <- c(1+5i, 2-3i, NA) The function is.na() indicates the elements of the vectors that represent missing data, and the function anyNA() returns TRUE if the vector contains any missing values: x <- c("a", NA, "c", "d", NA) y <- c("a", "b", "c", "d", "e") is.na(x) [1] FALSE TRUE FALSE FALSE TRUE is.na(y) [1] FALSE FALSE FALSE FALSE FALSE anyNA(x) [1] TRUE anyNA(y) [1] FALSE Other Special Values Inf is infinity. You can have either positive or negative infinity. 1/0 [1] Inf NaN means Not a Number. It’s an undefined value. 0/0 [1] NaN Vectors with Mix Types R will create a resulting vector with a mode that can most easily accommodate all the elements it contains. This conversion between modes of storage is called “coercion”. When R converts the mode of storage based on its content, it is referred to as “implicit coercion”. Ex – Guess what the following do (without running them first)? xx <- c(1.7, "a") – character xx <- c(TRUE, 2) – logical xx <- c("a", TRUE) – character You can also control how vectors are coerced explicitly using the as.<class_name>() functions: as.numeric("1") [1] 1 as.character(1:2) [1] "1" "2" Objects Attributes Objects can have attributes. Attributes are part of the object. These include: • names • dimnames • dim • class • attributes (contain metadata) Other attributes 1. length (works on vectors and lists) or 2. number of characters(nchar) (for character strings). length(1:10) [1] 10 nchar("Software Carpentry") [1] 18 R – Matrix In R, matrices are an extension of the numeric or character vectors. They are not a separate type of object but simply an atomic vector with • dimensions, where elements are arranged in a two-dimensional rectangular layout • rows and columns • elements of the same data type. We use matrices containing numeric elements to be used in mathematical calculations. A Matrix is created using the matrix() function. The basic syntax for creating a matrix in R is – matrix(nrow, ncol) matrix(data, nrow, ncol, byrow, dimnames) m <- matrix(nrow = 2, ncol = 2) m [,1] [,2] [1,] NA NA [2,] NA NA dim(m) [1] 2 2 # class() and typeof() to find type and class attributes of Matrices m <- matrix(c(1:3)) class(m) [1] "matrix" "array" typeof(m) [1] "integer" While class() shows that m is a matrix, typeof() shows that fundamentally the matrix is an integer vector. # Data types of matrix elements Consider the following matrix: M1 <- matrix(c(4, 4, 4, 4), nrow = 2, ncol = 2) # Matrices in R are filled column-wise. m <- matrix(1:6, nrow = 2, ncol = 3) # Other ways to construct a matrix m <- 1:10 dim(m) <- c(2, 5) # This takes a vector and transforms it into a matrix with 2 rows and 5 columns. # rbind() and cbind() # Another way is to bind columns or rows using rbind() and cbind() (“row bind” and “column bind”, respectively). x <- 1:3 y <- 10:12 cbind(x, y) x y [1,] 1 10 [2,] 2 11 [3,] 3 12 rbind(x, y) [,1] [,2] [,3] x 1 2 3 y 10 11 12 # use byrow argument to specify how the matrix is filled. mdat <- matrix(c(1, 2, 3, 11, 12, 13), nrow = 2, ncol = 3, byrow = TRUE) mdat [,1] [,2] [,3] [1,] 1 2 3 [2,] 11 12 13 # Elements of a matrix can be referenced by specifying the index along each dimension (e.g. “row” and “column”) in single square brackets. mdat[2, 3] [1] 13 #Create a matrix taking a vector of numbers as input. # Elements are arranged sequentially by row. M <- matrix(c(3:14), nrow = 4, byrow = TRUE) print(M)
# Elements are arranged sequentially by column.
N <- matrix(c(3:14), nrow = 4, byrow = FALSE) print(N) # Define the column and row names. rownames = c("row1", "row2", "row3", "row4") colnames = c("col1", "col2", "col3") P <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(rownames, colnames)) print(P) # Accessing Elements of a Matrix Elements of a matrix can be accessed by using the column and row index of the element. We consider the matrix P above to find the specific elements below. # Define the column and row names. rownames = c("row1", "row2", "row3", "row4") colnames = c("col1", "col2", "col3") # Create the matrix. P <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(rownames, colnames)) # Access the element at 3rd column and 1st row. print(P[1,3]) # Access the element at 2nd column and 4th row. print(P[4,2]) # Access only the 2nd row. print(P[2,]) # Access only the 3rd column. print(P[,3]) # Matrix Operations: addition and subtraction # Create two 2x3 matrices. matrix1 <- matrix(c(3, 9, -1, 4, 2, 6), nrow = 2) print(matrix1) matrix2 <- matrix(c(5, 2, 0, 9, 3, 4), nrow = 2) print(matrix2) # Add the matrices. result <- matrix1 + matrix2 cat("Result of addition","\n") print(result) # Subtract the matrices result <- matrix1 - matrix2 cat("Result of subtraction","\n") print(result) When we execute the above code, it produces the following result − [,1] [,2] [,3] [1,] 3 -1 2 [2,] 9 4 6 [,1] [,2] [,3] [1,] 5 0 3 [2,] 2 9 4 Result of addition [,1] [,2] [,3] [1,] 8 -1 5 [2,] 11 13 10 Result of subtraction [,1] [,2] [,3] [1,] -2 -1 -1 [2,] 7 -5 2 # Matrix Multiplication & Division # Create two 3x3 matrices. matrix1 <- matrix(c(3, 9, -1, 4, 2, 6), nrow = 2) print(matrix1) matrix2 <- matrix(c(5, 2, 0, 9, 3, 4), nrow = 2) print(matrix2) # Multiply the matrices. result <- matrix1 * matrix2 cat("Result of multiplication","\n") print(result) # Divide the matrices result <- matrix1 / matrix2 cat("Result of division","\n") print(result) Working with Lists in R In R, lists • act as containers and the contents of a list are not restricted to a single mode and can contain mix type of data. • are sometimes called generic vectors, because the elements of a list can be of any type of R object, • can contain lists as element, because of which they are fundamentally different from atomic vectors. Features of List 1. Lists can be extremely useful inside functions. 2. Because the functions in R returns only a single object, you can “combine” together different kinds of results into a single object that a function can return. 3. A list does not print to the console like a vector. 4. Instead, each element of the list starts on a new line. 5. Elements are indexed by double brackets. 6. Single brackets will still return (another) list. 7. If the elements of a list are named, they can be referenced by the $ notation (i.e. xlist$a, …. xlist$data). #Create lists using list() or coerce other objects using as.list(). An empty list of the required length can be created using vector (). The content of elements of a list can be retrieved by using double square brackets e.g., x[[k]] gives the kth element of the list. x <- list(1, "a", TRUE, 1+4i) x [[1]] [1] 1 [[2]] [1] "a" [[3]] [1] TRUE [[4]] [1] 1+4i x <- vector("list", length = 5) # empty list length(x) [1] 5 To coerce the vectors to lists: x <- 1:10 x <- as.list(x) length(x) [1] 10 Questions 1. What is the class of x[1]? 2. What is the class of x[[1]]? # Elements of a list can be named (i.e. lists can have the names attribute) xlist <- list(a = "Karthik Ram", b = 1:10, data = head(mtcars)) xlist $a [1] "Karthik Ram" $b [1] 1 2 3 4 5 6 7 8 9 10 $data mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 names(xlist) [1] "a" "b" "data" # Create a list containing a vector, a matrix and a list. list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2), list("green",12.3)) # Give names to the elements in the list. names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list") # Show the list. list_data # Access the first element of the list. list_data[1] # Access the thrid element. As it is also a list, all its elements will be printed. list_data[3] # Access the list element using the name of the element. list_data$A_Matrix # Manipulating List Elements In a list • we can add, delete list elements as shown below, only at the end of a list. • But we can update any element of the list. # Create a list containing a vector, a matrix and a list. list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2), list("green",12.3)) # Give names to the elements in the list. names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list") # Add element at the end of the list. list_data[4] <- "New element" print(list_data[4]) # Remove the last element. list_data[4] <- NULL # Print the 4th Element. print(list_data[4]) # Update the 3rd Element. list_data[3] <- "updated element" print(list_data[3]) #Merging Lists You can merge many lists into one list by placing all the lists inside one list() function. # Create two lists. list1 <- list(1,2,3) list2 <- list("Sun","Mon","Tue") # Merge the two lists. merged.list <- c(list1,list2) # Print the merged list. print(merged.list) Converting List to Vector A list can be converted to a vector so that the elements of the vector can be used for further manipulation. All the arithmetic operations on vectors can be applied after the list is converted into vectors. To do this conversion, we use the unlist() function. It takes the list as input and produces a vector. # Create lists. list1 <- list(1:5) print(list1) list2 <-list(10:14) print(list2) # Convert the lists to vectors. v1 <- unlist(list1) v2 <- unlist(list2) print(v1) print(v2) # Now add the vectors result <- v1+v2 print(result) Factors in R • Factors are the data objects which are used to categorize the data and store it as levels. • They can store both strings and integers. • They are useful in the columns which have a limited number of unique values. Like Male, Female and True, False etc. • They are useful in data analysis for statistical modeling. • Factors are created using the factor() function by taking a vector as input. # Create a vector as input. data <- c("East","West","East","North","North","East","West","West","West", "East", "North") print(data) # Convert it to factor print(is.factor(data)) # Apply the factor function. factor_data <- factor(data) print(factor_data) print(is.factor(factor_data)) # Working with Data Frames in R # Factors in Data Frame On creating any data frame with a column of text data, R treats the text column as categorical data and creates factors on it. # Create the vectors for data frame. height <- c(132,151,162,139,166,147,122) weight <- c(48,49,66,53,67,52,40) gender <- c("male","male","female","female","male","female","male")
The order of the levels in a factor can be changed by applying the factor function again with new order of the levels. data <- c("East","West","East","North","North","East","West","West","West", "East", "North") # Create the factors factor_data <- factor(data) print(factor_data)
# Apply the factor function with required order of the level.
new_order_data <- factor(factor_data,levels = c("East","West","North")) print(new_order_data) Generating Factor Levels We can generate factor levels by using the gl() function. It takes two integers as input which indicates how many levels and how many times each level. Syntax: gl(n, k, labels) where n is an integer giving the number of levels. k is an integer giving the number of replications. labels is a vector of labels for the resulting factor levels. v <- gl(3, 4, labels = c("Tampa", "Seattle","Boston")) print(v) R – Data Frame A data frame is a data structure in R used for tabular data that we use for statistics. A data frame is a table or a two-dimensional array-like structure in which • each column contains values of one variable and • each row contains one set of values from each column. Characteristics of a data frame: 1. The column names should be non-empty. 2. The row names should be unique. 3. The data stored in a data frame can be of numeric, factor or character type. 4. Each column should contain same number of data items. A data frame is a special type of list where every element of the list has same length i.e. data frame is a “tabular / rectangular” list. # Create the data frame. emp.data <- data.frame( emp_id = c (1:5), emp_name = c("Rick","Dan","Michelle","Ryan","Gary"), salary = c(623.3,515.2,611.0,729.0,843.25), start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27")), stringsAsFactors = FALSE) # Print the data frame. print(emp.data) # Getting the Structure of the Data Frame The structure of the data frame can be seen by using str() function. # Get the structure of the data frame. str(emp.data) # Summary of Data in Data Frame The statistical summary and nature of the data can be obtained by applying summary() function. # Print the summary. print(summary(emp.data)) # Extracting Data (rows, columns..)from Data Frame # To extract specific column from a data frame using column name. # Extract Specific columns. result <- data.frame(emp.data$emp_name,emp.data$salary) print(result) # Extract first two rows. result <- emp.data[1:2,] print(result) # Extract 3rd and 5th row with 2nd and 4th column. result <- emp.data[c(3,5),c(2,4)] print(result) # Expanding Data Frame • A data frame can be expanded by adding columns and rows. • Add Column: Just add the column vector using a new column name. # Add the "dept" coulmn. emp.data$dept <- c("IT","Operations","IT","HR","Finance") v <- emp.data print(v) # Add Row To add more rows permanently to an existing data frame, we need to bring in the new rows in the same structure as the existing data frame and use the rbind() function. #create a data frame with new rows and merge it with the existing data frame to create the final data frame. Create the first data frame. emp.data <- data.frame( emp_id = c (1:5), emp_name = c("Rick","Dan","Michelle","Ryan","Gary"), salary = c(623.3,515.2,611.0,729.0,843.25), start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27")), dept = c("IT","Operations","IT","HR","Finance"), stringsAsFactors = FALSE ) # Create the second data frame emp.newdata <- data.frame(emp_id = c (6:8), emp_name = c("Rasmi","Pranab","Tusar"), salary = c(578.0,722.5,632.8), start_date = as.Date(c("2013-05-21","2013-07-30","2014-06-17")), dept = c("IT","Operations","Fianance"), stringsAsFactors = FALSE) # Bind the two data frames. emp.finaldata <- rbind(emp.data,emp.newdata) print(emp.finaldata) # Data frames can have additional attributes such as rownames(), which can be useful for annotating data, like subject_id or sample_id. But most of the time they are not used. Some additional information on data frames: • Usually created by read.csv() and read.table(), i.e. when importing the data into R. • Assuming all columns in a data frame are of same type, data frame can be converted to a matrix with data.matrix() (preferred) or as.matrix(). Otherwise type coercion will be enforced and the results may not always be what you expect. • Can also create a new data frame with data.frame() function. • Find the number of rows and columns with nrow(dat) and ncol(dat), respectively. • Rownames are often automatically generated and look like 1, 2, …, n. Consistency in numbering of rownames may not be honored when rows are reshuffled or subset. # Creating Data Frames by Hand dat <- data.frame(id = letters[1:10], x = 1:10, y = 11:20) dat id x y 1 a 1 11 2 b 2 12 3 c 3 13 4 d 4 14 5 e 5 15 6 f 6 16 7 g 7 17 8 h 8 18 9 i 9 19 10 j 10 20 # Useful Data Frame Functions • head() - shows first 6 rows • tail() - shows last 6 rows • dim() - returns the dimensions of data frame (i.e. number of rows and number of columns) • nrow() - number of rows • ncol() - number of columns • str() - structure of data frame - name, type and preview of data in each column • names() or colnames() - both show the names attribute for a data frame • sapply(dataframe, class) - shows the class of each column in the data frame is.list(dat) [1] TRUE class(dat) [1] "data.frame" Note: Because data frames are rectangular, elements of data frame can be referenced by specifying the row and the column index in single square brackets (similar to matrix). dat[1, 3] [1] 11 As data frames are also lists, it is possible to refer to columns (which are elements of such list) using the list notation, i.e. either double square brackets or a $. dat[["y"]] [1] 11 12 13 14 15 16 17 18 19 20 dat$y [1] 11 12 13 14 15 16 17 18 19 20 The following table summarizes the one-dimensional and two-dimensional data structures in R in relation to diversity of data types they can contain. Dimensions Homogenous Heterogeneous
1-D atomic vector list
2-D matrix data frame
Lists can contain elements that are themselves muti-dimensional (e.g. a lists can contain data frames or another type of objects). Lists can also contain elements of any length, therefore list do not necessarily have to be “rectangular”. However in order for the list to qualify as a data frame, the length of each element has to be the same. Column Types in Data Frames Knowing that data frames are lists, can columns be of different type? – Yes
Instant ebooks textbook The Digital Agricultural Revolution : Innovations and Challenges in Agriculture through Technology Disruptions 1st Edition Roheet Bhatnagar download all chapters
Instant ebooks textbook The Digital Agricultural Revolution : Innovations and Challenges in Agriculture through Technology Disruptions 1st Edition Roheet Bhatnagar download all chapters