0% found this document useful (0 votes)

124 views47 pages

EM622 Data Analysis and Visualization Techniques For Decision-Making

This document provides an introduction to data analysis and visualization techniques in R. It covers importing and manipulating data, basic operations in R like installing packages and exporting data, and different data structures like vectors, arrays, and data frames. The document contains code examples for importing data from files, the web, and other sources. It also demonstrates accessing and subsetting elements within different data structures.

Uploaded by

Ridhi B

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

124 views47 pages

EM622 Data Analysis and Visualization Techniques For Decision-Making

Uploaded by

Ridhi B

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

EM622 Data Analysis and Visualization

Techniques for Decision-Making

Introduction to R and Data Manipulation

1 / 47
Getting Started
RStudio console

Options (Import dataset)

File Viewer (Data & Code)

Console (for typing commands) Plots

2 / 47
Your first graph
Copy and paste:
data(iris)
plot(Sepal.Width ~ Sepal.Length, data=iris,
col=c("red","orange","blue")[iris$Species],pch=16,
xlab="Sepal Length", ylab="Sepal Width")
legend("topright", legend=levels(iris$Species),
col=c("red","orange","blue"), bty="n",pch=16)

3 / 47
Agenda

1. Basic operations
2. Data structures
3. Data Manipulation
4. Your First Graph

4 / 47
Basic Operation - Import data
1. Import data from drop down menu in R Studio:

2. Import data from SAS/SPSS, etc: http://www.statmethods.net/input/importingdata.html

5 / 47
Intermediate - Import data

## install.packages(c("tseries","lubridate"))
library(tseries)
library(lubridate)
amazon <- as.data.frame(get.hist.quote("amzn",
start="2013-1-1", end="2018-9-15", quote=c("Cl")))

## time series starts 2013-01-02

## time series ends 2018-09-14

amazon$Date<-ymd(row.names(amazon))
tail(amazon)

## Close Date
## 2018-09-07 1952.07 2018-09-07
## 2018-09-10 1939.01 2018-09-10
## 2018-09-11 1987.15 2018-09-11
## 2018-09-12 1990.00 2018-09-12
## 2018-09-13 1989.87 2018-09-13
## 2018-09-14 1970.19 2018-09-14

6 / 47
Advanced - Import data
# list of addresses for raw data.
addressList <- list(
drives_address = "http://stats.nba.com/js/data/sportvu/drivesData.js",
defense_address = "http://stats.nba.com/js/data/sportvu/defenseData.js",
catchshoot_address = "http://stats.nba.com/js/data/sportvu/catchShootData.js")

# function that grabs the data from the website and converts to R data frame
readIt <- function(address) {
web_page <- readLines(address)

## regex to strip javascript bits and convert raw to csv format

x1 <- gsub("[\\{\\}\\]]", "", web_page, perl = TRUE)
x2 <- gsub("[\\[]", "\n", x1, perl = TRUE)
x3 <- gsub("\"rowSet\":\n", "", x2, perl = TRUE)
x4 <- gsub(";", ",", x3, perl = TRUE)

# read the resulting csv with read.table()

nba <- read.table(textConnection(x4), header = T,
sep = ",", skip = 2, stringsAsFactors = FALSE)
return(nba)
}
# download the data
df_list <- lapply(addressList, readIt)

7 / 47
Advanced (Cont.) - Import data

# check the data

catchshoot<-df_list$catchshoot_address
#str(catchshoot) # Get information about structure
head(catchshoot)

## PLAYER_ID PLAYER FIRST_NAME LAST_NAME TEAM_ABBREVIATION GP MIN

## 1 202691 Klay Thompson Klay Thompson GSW 78 34.0
## 2 1717 Dirk Nowitzki Dirk Nowitzki DAL 53 26.3
## 3 2594 Kyle Korver Kyle Korver CLE 35 24.6
## 4 201586 Serge Ibaka Serge Ibaka TOR 23 30.9
## 5 201567 Kevin Love Kevin Love CLE 60 31.4
## 6 202331 Paul George Paul George IND 74 35.8
## PTS FGM FGA FG_PCT FG3M FG3A FG3_PCT EFG_PCT PTS_TOT X
## 1 11.5 4.2 9.3 0.454 3.1 7.1 0.438 0.621 899 NA
## 2 8.1 3.4 7.5 0.446 1.3 3.5 0.388 0.535 427 NA
## 3 7.6 2.7 5.7 0.470 2.2 4.7 0.470 0.662 265 NA
## 4 7.5 2.9 6.9 0.424 1.7 4.3 0.394 0.547 173 NA
## 5 7.5 2.6 6.6 0.388 2.3 5.8 0.395 0.561 448 NA
## 6 7.4 2.7 6.1 0.437 2.0 4.8 0.420 0.603 546 NA

8 / 47
Advanced: scraping the web using R

#install.packages("rvest")
library(rvest)
# Store web url
lego_movie <- read_html("http://www.imdb.com/title/tt1490017/")
#Scrape the website for the movie rating
rating <- lego_movie %>%
html_nodes("strong span") %>%
html_text() %>%
as.numeric()
#rating
# Scrape the website for the cast
cast <- lego_movie %>%
html_nodes("#titleCast .itemprop span") %>%
html_text()
#cast

https://stat4701.github.io/edav/2015/04/02/rvest_tutorial/

9 / 47
Advanced (Cont.): scraping the web using R

#Scrape the website for the movie rating

rating

## [1] 7.8

# Scrape the website for the cast

cast

## character(0)

https://stat4701.github.io/edav/2015/04/02/rvest_tutorial/

10 / 47
Basic Operation - Export data

I Export dataframe into a spreedsheet,the easiest way to do this is to

use write.csv().
I By default, write.csv() includes row names, but these are usually
unnecessary and may cause confusion.
I The export file will be stored under working directory.
# export 'mydf' as a .csv file:
write.csv(mydf,"test.csv")

I How to find out your working directory?

# returns an absolute filepath representing the current working directory o
getwd()
## [1] "/Users/annieyu/Dropbox/622 visualization/lectures/Lecture 3_intro_t

I Write data into other format files:

http://www.cookbook- r.com/Data_input_and_output/Writing_data_to_a_file/

11 / 47
Basic Operation - Install pacakges
Two ways to install a package:
1. From drop down menu in R Studio:

2. Using command:
# Download and install packages from CRAN-like repositories or from local f
install.packages(c("ggplot2","tidyr","dplyr"))
# Always load package before call it:
library(ggplot2)
12 / 47
Basic Operation - Update pacakges
1. To update all your installed packages to the latest versions available:

update.packages()

2. To store your R code, always create a R script:

3. Export your images to pdf/png format:

13 / 47
Getting Started
R programming style

I R is case sensitive: a and A are two different objects.

I The assignment symbol is <-. Alternatively, the classical = symbol
can be used.
I The symbol # comments to the end of the line:

# This is a comment
# The two following statements are equivalent:
a <- 1
# Assigning value 1 to object a:
a = 1

14 / 47
Data Structure
1. Vector
2. Matrix
3. Array
4. Data Frame
5. List

http://venus.ifca.unican.es/Rintro/dataStruct.html

15 / 47
Data Structure - Variable
Like most other languages, R lets you assign values to variables and refer
to them by name:
x <- 1
# x gets 1
y <- 2
# c(...): a generic function which combines values into a vector
z <- c(x,y)
# evaluate z to see what's stored as z
z

## [1] 1 2

Notice that the substitution is done at the time that the value is assigned
to z, not the time that z is evaluated:
y <- 5
z

## [1] 1 2

16 / 47
Data Structure - Vector
Fetch element(s) by location in a vector:

a <- c(1,2,3,4,5,6,7,8)
a

## [1] 1 2 3 4 5 6 7 8

# fetch the 5th item in vector a:

a[5]

## [1] 5

# fetch item 1 through 6:

a[1:6]

## [1] 1 2 3 4 5 6

# fetch item 1, 3, 7:
a[c(1,3,7)]

## [1] 1 3 7

17 / 47
Data Structure - Array
I In R, you can construct more complicated data structures than just
vectors.
I An array object is just a vector that’s associated with a dimension
attribute.

# Define an array
a <- array(c(1, 2, 3, 4, 5, 6, 7, 8), dim=c(2, 4))
a

## [,1] [,2] [,3] [,4]

## [1,] 1 3 5 7
## [2,] 2 4 6 8

# fetch one cell in array a:

a[2,3]

## [1] 6

# fetch 1st row only

a[1,]

## [1] 1 3 5 7

18 / 47
Data Structure - Data frame
I A data frame is a list that contains multiple named vectors that are
the same length.
I Like a spreadsheet or a database table, particularly good for
representing experimental data.
# data.frame() is a function to creates data frames
team <-c("A","B","C","D","E")
first <- c(92, 89, 94, 72, 59)
second <- c(70, 73, 77, 90, 102)
mydf <- data.frame(team, first, second)
mydf

## team first second

## 1 A 92 70
## 2 B 89 73
## 3 C 94 77
## 4 D 72 90
## 5 E 59 102

# refer to the components of a data frame by name:

mydf$team

## [1] A B C D E
## Levels: A B C D E
19 / 47
Data Structure - List
I R has a built-in data type for mixing objects of different types, called
lists.

# list() function to construct R lists.

#Example: a list containing two strings, and a data frame
e <- list(thing=c("hat","shoes"), size=c("8.25","5"), myData=mydf)
e

## $thing
## [1] "hat" "shoes"
##
## $size
## [1] "8.25" "5"
##
## $myData
## team first second
## 1 A 92 70
## 2 B 89 73
## 3 C 94 77
## 4 D 72 90
## 5 E 59 102

20 / 47
Data Structure - List Cont

# fetch the 1st item in the list:

e$thing

## [1] "hat" "shoes"

e[1]

## $thing
## [1] "hat" "shoes"

# fetch the 1st row in the data frame

# which is the third component in the list:
e$myData[1,]

## team first second

## 1 A 92 70

21 / 47
Data Structure - Get Info about structure
# Here are some sample variables for example:
n <- 1:4
let <- LETTERS[1:4]
let

## [1] "A" "B" "C" "D"

df <- data.frame(n, let)

## n let
## 1 1 A
## 2 2 B
## 3 3 C
## 4 4 D

# Get information about structure

str(df)

## 'data.frame': 4 obs. of 2 variables:

## $ n : int 1 2 3 4
## $ let: Factor w/ 4 levels "A","B","C","D": 1 2 3 4

22 / 47
Data Structure - Get Info about structure

# Get the length of a vector

length(n)

## [1] 4

# Number of rows
nrow(df)

## [1] 4

# Number of columns
ncol(df)

## [1] 2

# Get num of rows and columns

dim(df)

## [1] 4 2

23 / 47
1
Data Exploration
“Happy families are all alike; every unhappy family is unhappy in its own
way. ” Leo Tolstoy

“Tidy datasets are all alike, but every messy dataset is messy in its own
way. ” Hadley Wickham

1 Hadley Wickham. http://r4ds.had.co.nz/tidy-data.html

24 / 47
Working with NA and NaN
There are some special characters in R
I NA : Not Available (ie missing values)

I NaN : Not a Number

I Inf: Infinity

I -Inf : Minus Infinity

# For instance:
0/0

## [1] NaN

1/0

## [1] Inf

# Here's how to test whether a variable has one of these values:

y <- NA
# Is y NA?
is.na(y)

## [1] TRUE

25 / 47
Working with NA and NaN
Ignoring "bad" values in vector summary functions:
I If you run functions like mean() or sum() on a vector or data frame
containing NA or NaN, they will return NA and NaN(bad value).
I Many of these functions take the flag na.rm, which tells them to
ignore these values:
df1 <- c(1, 2, 3, NA, 5)
mean(df1)

## [1] NA

mean(df1, na.rm=TRUE)

## [1] 2.75

df2 <- c(1, 2, 3, NaN, 5)

sum(df2)

## [1] NaN

sum(df2, na.rm=TRUE)

## [1] 11
26 / 47
Example: Import Data
library(readr)
HW <- read_csv("dataSets/Student_List_HW.csv")
HW<-as.data.frame(HW)
summary(HW)

## Last_Name First_Name Status

## Length:20 Length:20 Length:20
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## Home Homework_1 Homework_2 Homework_3
## Length:20 Min. :58.00 Min. :77.00 Min. : 80.00
## Class :character 1st Qu.:70.50 1st Qu.:80.00 1st Qu.: 85.50
## Mode :character Median :74.50 Median :88.00 Median : 90.50
## Mean :77.39 Mean :87.35 Mean : 90.90
## 3rd Qu.:84.25 3rd Qu.:93.00 3rd Qu.: 98.25
## Max. :99.00 Max. :99.00 Max. :100.00
## NA's :2

27 / 47
Example: Replace Missing Variables
HW$Homework_1[is.na(HW$Homework_1)]<-0
HW$Home[which(HW$Last_Name=="Garcia")]<-"NJ"
HW$Home[is.na(HW$Home)]<-"Unknown"
HW<-HW[complete.cases(HW),]
summary(HW)

## Last_Name First_Name Status

## Length:18 Length:18 Length:18
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## Home Homework_1 Homework_2 Homework_3
## Length:18 Min. : 0.00 Min. :77.00 Min. : 80.00
## Class :character 1st Qu.:66.75 1st Qu.:80.00 1st Qu.: 86.25
## Mode :character Median :74.50 Median :86.00 Median : 90.50
## Mean :70.28 Mean :86.39 Mean : 91.33
## 3rd Qu.:84.25 3rd Qu.:91.75 3rd Qu.: 98.75
## Max. :99.00 Max. :98.00 Max. :100.00

28 / 47
Subset Observations (Rows)2

2 https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-

cheatsheet.pdf
29 / 47
Subset Observations (Rows) Cont.

#load dplyr
library(dplyr)
Subset_HW_1 <- filter(HW,Status == "Master")
head(Subset_HW_1)

## Last_Name First_Name Status Home Homework_1 Homework_2 Homework_3

## 1 Brown Susan Master NJ 74 88 98
## 2 Wilson Karen Master NJ 0 93 84
## 3 Moore Nancy Master PA 74 91 89
## 4 Taylor Betty Master GA 93 92 88
## 5 Anderson Anthony Master CA 96 98 100
## 6 Thomas Donald Master NJ 82 77 96

30 / 47
Subset Variables (Columns)

There are many options to choose columns

31 / 47
Subset Variables (Columns) Cont.

Subset_HW_2 <- select(HW,contains("Name"),contains("Homework"))

head(Subset_HW_2)

## Last_Name First_Name Homework_1 Homework_2 Homework_3

## 1 Smith Patricia 82 97 82
## 2 Johnson Jennifer 0 77 99
## 3 Williams Robert 99 80 80
## 4 Jones Michael 75 82 86
## 5 Brown Susan 74 88 98
## 7 Miller Richard 85 78 82

32 / 47
Subset Observations (Rows) and Variables (Columns)

Subset_HW_3 <- subset(HW,Status == "Master" ,

select=c("Last_Name","First_Name",
"Homework_1","Homework_2","Homework_3"))
head(Subset_HW_3)

## Last_Name First_Name Homework_1 Homework_2 Homework_3

## 5 Brown Susan 74 88 98
## 8 Wilson Karen 0 93 84
## 9 Moore Nancy 74 91 89
## 10 Taylor Betty 93 92 88
## 11 Anderson Anthony 96 98 100
## 12 Thomas Donald 82 77 96

33 / 47
Pipe Operator

Piping makes coding more readable and allow us to make several actions
in one sentence such as sort, filter, or create a variable.

34 / 47
Pipe Operator Cont.

HW %>%
filter(Status == "Master") %>%
select(contains("Name"),contains("Homework"))%>%
arrange(desc(Homework_1))%>%
head()

## Last_Name First_Name Homework_1 Homework_2 Homework_3

## 1 Anderson Anthony 96 98 100
## 2 Taylor Betty 93 92 88
## 3 Garcia Linda 93 91 100
## 4 Thomas Donald 82 77 96
## 5 Brown Susan 74 88 98
## 6 Moore Nancy 74 91 89

35 / 47
Create New Columns and Re-order
The mutate() function will add new columns to the data frame.
Arrange or re-order rows using arrange().

HW_update<-HW %>%
filter(Status != "Unknown") %>%
mutate(Homework_Average = 0.2*Homework_1+0.3*Homework_2+0.5*Homework_3)%>%
arrange(desc(Homework_Average))
head(HW_update)

## Last_Name First_Name Status Home Homework_1 Homework_2

## 1 Anderson Anthony Master CA 96 98
## 2 Garcia Linda Master NJ 93 91
## 3 Wang Thomas PhD CHINA 72 98
## 4 Martin Morgan Undergraduate NJ 72 88
## 5 Brown Susan Master NJ 74 88
## 6 Taylor Betty Master GA 93 92
## Homework_3 Homework_Average
## 1 100 98.6
## 2 100 95.9
## 3 95 91.3
## 4 99 90.3
## 5 98 90.2
## 6 88 90.2

36 / 47
Split-Apply-Combine
Idea: split up a big problem into manageable pieces, apply a function to
each piece and then combine all the pieces together.

Split Apply Combine

(by X) X Y (average)
A 2
A 4
X Y
X Y A 3 X Y
A 2 A 3
A 4 X Y X Y B 2.5
B 0 B 0 B 2.5 C 7.5
B 5 B 5
C 5
C 10
X Y X Y
C 5 B 7.5
C 10

37 / 47
Group Data
Implement group operations in the “split-apply-combine” concept:

38 / 47
Group Data

Group_Summarise_HW<- HW %>%
filter(Status != "Unknown") %>%
mutate(Homework_Average = 0.2*Homework_1+0.3*Homework_2+0.5*Homework_3)%>%
group_by(Status) %>%
summarise(Homework_Average=mean(Homework_Average),
Number_of_Student=length(Status))%>%
arrange(desc(Homework_Average))
head(Group_Summarise_HW)

## # A tibble: 3 x 3
## Status Homework_Average Number_of_Student
## <chr> <dbl> <int>
## 1 Master 87.4 8
## 2 PhD 86.4 2
## 3 Undergraduate 83.7 8

39 / 47
Reshape Data3
Lets change the layout of a data set, our tools from Tidyr library are:

3 https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-

cheatsheet.pdf
40 / 47
Reshape Data Cont.

I gather() makes "wide" data longer

I unite() combines two variables into one variable

#load tidyr
library(tidyr)
tidyr_HW<- HW %>% unite(Name, First_Name, Last_Name, sep = " ")%>%
select(-c(Status,Home)) %>%
gather(Homework, Score, Homework_1:Homework_3)
head(tidyr_HW)

## Name Homework Score

## 1 Patricia Smith Homework_1 82
## 2 Jennifer Johnson Homework_1 0
## 3 Robert Williams Homework_1 99
## 4 Michael Jones Homework_1 75
## 5 Susan Brown Homework_1 74
## 6 Richard Miller Homework_1 85

41 / 47
Merge Data
Exam<- read_csv("dataSets/Student_List_Exam.csv")
Exam<-as.data.frame(Exam)
head(Exam,3)

## Last_Name First_Name Exam Project

## 1 Smith Patricia 77 65
## 2 Johnson Jennifer 100 96
## 3 Williams Robert 92 53

HW_update<-mutate(HW,Homework_Average =
0.2*Homework_1+0.3*Homework_2+0.5*Homework_3)
Merged_df<-inner_join(HW_update, Exam,by=c("Last_Name","First_Name"))
head(Merged_df,3)

## Last_Name First_Name Status Home Homework_1 Homework_2 Homework_3

## 1 Smith Patricia Undergraduate MD 82 97 82
## 2 Johnson Jennifer Undergraduate NY 0 77 99
## 3 Williams Robert Undergraduate NY 99 80 80
## Homework_Average Exam Project
## 1 86.5 77 65
## 2 72.6 100 96
## 3 83.8 92 53

42 / 47
ggplot2

I ggplot2 is an R package designed for creating high quality plots.

I ggplot is based on the layered grammar of graphics, which means
that plots can be constructed layer by layer.

#you need to install the package just once

install.packages('ggplot2')

43 / 47
Composition of plots in ggplot2
Plots have two main components: 1) data to use and 2) type of plot.

Basic We want
function points Aesthetics
for plotting

ggplot(data=economics) + geom_point(aes(x=date, y=unemploy))

Specify Specify
Dataset what goes what goes
on the on the
X axis Y axis

Type of plot
Data to use

44 / 47
Our first offcial graph
library(ggplot2)
ggplot(data=iris)+
geom_point(aes(x=Sepal.Width,y=Sepal.Length,colour=Species))

Species
Sepal.Length

setosa
6 versicolor
virginica

2.0 2.5 3.0 3.5 4.0 4.5

Sepal.Width

45 / 47
Resources

1. Rob Kabacoff, “R in Action”: https://www.amazon.com/Action- Data- Analysis- Graphics/dp/

1617291382/ref=pd_sbs_14_t_0?_encoding=UTF8&psc=1&refRID=EEBN1DRHWQ6J09Z6TTBY

2. Michael J Crawley, “The R Book”:

http://users.humboldt.edu/ygkim/CrawleyMJ_TheRBook.pdf

3. Joseph Adler, “R in a Nutshell”:

http://www.amazon.com/R- Nutshell- Joseph- Adler/dp/144931208X

4. Quick-R tutorial: http://www.statmethods.net/input/datatypes.html

5. Cookbook for R, Data input and output:

http://www.cookbook- r.com/Data_input_and_output/Writing_data_to_a_file/

46 / 47
What have we learned?

1. Define Data structures such as vector, array, list and dataframe.

2. Basic operations such as install package, import/export datasets
3. Common data manipulation operations such as filtering for rows,
selecting specific columns, re-ordering rows, adding new columns,
summarizing data, and performing the "split-apply-combine" task
4. Draw the graph

47 / 47

R Programming Essentials
No ratings yet
R Programming Essentials
27 pages
R Programming: © 2016 SMART Training Resources Pvt. LTD
No ratings yet
R Programming: © 2016 SMART Training Resources Pvt. LTD
28 pages
MTech R Notes
No ratings yet
MTech R Notes
14 pages
Module 2.5
No ratings yet
Module 2.5
19 pages
R Topicscovered
No ratings yet
R Topicscovered
22 pages
R Programming
No ratings yet
R Programming
22 pages
Unit 4
No ratings yet
Unit 4
27 pages
Tutorial 1
No ratings yet
Tutorial 1
29 pages
Lecture 1
No ratings yet
Lecture 1
42 pages
Bdo Co1 Session 4
No ratings yet
Bdo Co1 Session 4
43 pages
DSF 9-10
No ratings yet
DSF 9-10
25 pages
Unit 1 Big Data Analytics - An Introduction (Final)
No ratings yet
Unit 1 Big Data Analytics - An Introduction (Final)
65 pages
Data Science Course Overview
No ratings yet
Data Science Course Overview
318 pages
R for Data Science Beginners
No ratings yet
R for Data Science Beginners
37 pages
Chap 1
No ratings yet
Chap 1
105 pages
R Programming: Data Structures Guide
No ratings yet
R Programming: Data Structures Guide
18 pages
1 Introduction
No ratings yet
1 Introduction
88 pages
Obejcts in R A13
No ratings yet
Obejcts in R A13
8 pages
R Concepts - 25092018 PDF
No ratings yet
R Concepts - 25092018 PDF
51 pages
Set Working Directory in R Programming
No ratings yet
Set Working Directory in R Programming
17 pages
Presentation 3 - Data Structures
No ratings yet
Presentation 3 - Data Structures
45 pages
MLlab 5 TH
No ratings yet
MLlab 5 TH
17 pages
2 Undefined
No ratings yet
2 Undefined
86 pages
ProgrammingForDS14 Rbasics
No ratings yet
ProgrammingForDS14 Rbasics
32 pages
Unit Ii Ids Notes
No ratings yet
Unit Ii Ids Notes
30 pages
DA Lab Week-2
No ratings yet
DA Lab Week-2
22 pages
Dar Lecture 7
No ratings yet
Dar Lecture 7
24 pages
MIS 4.hafta (Introduction To R)
No ratings yet
MIS 4.hafta (Introduction To R)
52 pages
R Data Structures Guide
No ratings yet
R Data Structures Guide
10 pages
R for NGS Data Analysis Beginners
No ratings yet
R for NGS Data Analysis Beginners
5 pages
Creating and Manipulating Objects
No ratings yet
Creating and Manipulating Objects
12 pages
1 - Introduction To Programming With R
No ratings yet
1 - Introduction To Programming With R
13 pages
Ex 4 R Objects
No ratings yet
Ex 4 R Objects
6 pages
R Programming: Print and Data Structures
No ratings yet
R Programming: Print and Data Structures
49 pages
Week1 Slides
No ratings yet
Week1 Slides
64 pages
Basics of R
No ratings yet
Basics of R
12 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
37 pages
ProgrammingForDS13 Intror
No ratings yet
ProgrammingForDS13 Intror
25 pages
Ed 3
No ratings yet
Ed 3
26 pages
R Programming Basics Guide
No ratings yet
R Programming Basics Guide
19 pages
Rmarkdown
No ratings yet
Rmarkdown
10 pages
Introduction to R for Statistics
No ratings yet
Introduction to R for Statistics
56 pages
People Analytics With R Part 3
No ratings yet
People Analytics With R Part 3
11 pages
Introduction To R
No ratings yet
Introduction To R
52 pages
A Crash Course in R - Intro To Statistical Programming
No ratings yet
A Crash Course in R - Intro To Statistical Programming
53 pages
Beginner's Guide to R Programming
No ratings yet
Beginner's Guide to R Programming
155 pages
Introduction To R For Business Analytics
No ratings yet
Introduction To R For Business Analytics
7 pages
R Lecture 2-1
No ratings yet
R Lecture 2-1
28 pages
01 IntroSlides
No ratings yet
01 IntroSlides
43 pages
R Network Analysis with igraph Guide
No ratings yet
R Network Analysis with igraph Guide
62 pages
N2 Data in R
No ratings yet
N2 Data in R
7 pages
R Vectors and Matrices Guide
No ratings yet
R Vectors and Matrices Guide
33 pages
Introduction to R Programming Basics
No ratings yet
Introduction to R Programming Basics
39 pages
Lec 4 Basics of R
No ratings yet
Lec 4 Basics of R
22 pages
Data Mining Lab 2
No ratings yet
Data Mining Lab 2
15 pages
R Programming-Chapiter 4
No ratings yet
R Programming-Chapiter 4
16 pages
Chapter 1 Introduction - An R Companion For Introduction To Data Mining
No ratings yet
Chapter 1 Introduction - An R Companion For Introduction To Data Mining
9 pages
English Project 25-26 INT
No ratings yet
English Project 25-26 INT
6 pages
The Rat Trap
No ratings yet
The Rat Trap
8 pages
8 Steps To High Performance Coaching
No ratings yet
8 Steps To High Performance Coaching
5 pages
CH 10 E-Commerce Digital Markets, Digital Goods CH 10 E-Commerce Digital Markets, Digital Goods
No ratings yet
CH 10 E-Commerce Digital Markets, Digital Goods CH 10 E-Commerce Digital Markets, Digital Goods
7 pages
SIM Summary CH8 - 14 Edition SIM Summary CH8 - 14 Edition
No ratings yet
SIM Summary CH8 - 14 Edition SIM Summary CH8 - 14 Edition
9 pages
MIS Course Handout - 2018-19 Second Sem - WILP BITS Pilani
No ratings yet
MIS Course Handout - 2018-19 Second Sem - WILP BITS Pilani
14 pages
CNC Machining Tutorial Overview
No ratings yet
CNC Machining Tutorial Overview
39 pages
904 - Ayush Jha - Internship Letter For NoQs
No ratings yet
904 - Ayush Jha - Internship Letter For NoQs
4 pages
Mobile App Privacy Policy Template
100% (1)
Mobile App Privacy Policy Template
6 pages
Extract Data From SQL Database
No ratings yet
Extract Data From SQL Database
5 pages
Assignment - Research Methods For Management
No ratings yet
Assignment - Research Methods For Management
19 pages
Analytical Skills
No ratings yet
Analytical Skills
6 pages
The Gigamidi Dataset With Features For Expressive Music Performance Detection
No ratings yet
The Gigamidi Dataset With Features For Expressive Music Performance Detection
26 pages
Chapter-3 Data Science
No ratings yet
Chapter-3 Data Science
7 pages
BRMS CLass
No ratings yet
BRMS CLass
11 pages
AI in Entrepreneurial Marketing: A Review
No ratings yet
AI in Entrepreneurial Marketing: A Review
32 pages
Plan of Mata Elang Stable Development
No ratings yet
Plan of Mata Elang Stable Development
11 pages
3510-6510 Ch1
No ratings yet
3510-6510 Ch1
40 pages
Practical File: Database Management System
No ratings yet
Practical File: Database Management System
27 pages
Essentials of Marketing Research 6th Edition Babin Solutions Manual - Free Download Available To Read All Chapters
100% (25)
Essentials of Marketing Research 6th Edition Babin Solutions Manual - Free Download Available To Read All Chapters
45 pages
Management Research 2nd Edition Susan Rose Install Download
No ratings yet
Management Research 2nd Edition Susan Rose Install Download
61 pages
SQL Wildcard Characters Explained
No ratings yet
SQL Wildcard Characters Explained
4 pages
Managerial Communication & IT Guide
No ratings yet
Managerial Communication & IT Guide
14 pages
Bureau of Fire Protection Bureau of Fire Protection: Visitor'S Slip Visitor'S Slip
100% (1)
Bureau of Fire Protection Bureau of Fire Protection: Visitor'S Slip Visitor'S Slip
2 pages
Project Report Ranchi E-Resources
No ratings yet
Project Report Ranchi E-Resources
17 pages
Syllabus Bba Analytics 2024
No ratings yet
Syllabus Bba Analytics 2024
126 pages
Understanding the Linux File System
No ratings yet
Understanding the Linux File System
2 pages
PL/SQL FTP Integration Guide
No ratings yet
PL/SQL FTP Integration Guide
4 pages
Satyam Bca Synopsis
No ratings yet
Satyam Bca Synopsis
27 pages
SAP System Administration Questions
No ratings yet
SAP System Administration Questions
5 pages
UNIT4 Database Programming With ADO
No ratings yet
UNIT4 Database Programming With ADO
20 pages
SQL Basics for IF4I Students
No ratings yet
SQL Basics for IF4I Students
35 pages
Dark Tourism Experiences in Kinmen
No ratings yet
Dark Tourism Experiences in Kinmen
7 pages
Employee Training Insights
100% (1)
Employee Training Insights
52 pages
Quiz 1
No ratings yet
Quiz 1
3 pages
Pwcs Data Quality Capabilities
No ratings yet
Pwcs Data Quality Capabilities
39 pages
Azure Data Bricks
No ratings yet
Azure Data Bricks
8 pages

EM622 Data Analysis and Visualization Techniques For Decision-Making

Uploaded by

EM622 Data Analysis and Visualization Techniques For Decision-Making

Uploaded by

EM622 Data Analysis and Visualization

Techniques for Decision-Making

Introduction to R and Data Manipulation

Options (Import dataset)

Console (for typing commands) Plots

2. Import data from SAS/SPSS, etc: http://www.statmethods.net/input/importingdata.html

## time series starts 2013-01-02

## regex to strip javascript bits and convert raw to csv format

# read the resulting csv with read.table()

# check the data

## PLAYER_ID PLAYER FIRST_NAME LAST_NAME TEAM_ABBREVIATION GP MIN

#Scrape the website for the movie rating

# Scrape the website for the cast

I Export dataframe into a spreedsheet,the easiest way to do this is to

I How to find out your working directory?

I Write data into other format files:

2. To store your R code, always create a R script:

3. Export your images to pdf/png format:

I R is case sensitive: a and A are two different objects.

# fetch the 5th item in vector a:

# fetch item 1 through 6:

## [,1] [,2] [,3] [,4]

# fetch one cell in array a:

# fetch 1st row only

## team first second

# refer to the components of a data frame by name:

# list() function to construct R lists.

# fetch the 1st item in the list:

## [1] "hat" "shoes"

# fetch the 1st row in the data frame

## team first second

## [1] "A" "B" "C" "D"

df <- data.frame(n, let)

# Get information about structure

## 'data.frame': 4 obs. of 2 variables:

# Get the length of a vector

# Get num of rows and columns

1 Hadley Wickham. http://r4ds.had.co.nz/tidy-data.html

I NaN : Not a Number

I -Inf : Minus Infinity

# Here's how to test whether a variable has one of these values:

df2 <- c(1, 2, 3, NaN, 5)

## Last_Name First_Name Status

## Last_Name First_Name Status

## Last_Name First_Name Status Home Homework_1 Homework_2 Homework_3

There are many options to choose columns

Subset_HW_2 <- select(HW,contains("Name"),contains("Homework"))

## Last_Name First_Name Homework_1 Homework_2 Homework_3

Subset_HW_3 <- subset(HW,Status == "Master" ,

## Last_Name First_Name Homework_1 Homework_2 Homework_3

## Last_Name First_Name Homework_1 Homework_2 Homework_3

## Last_Name First_Name Status Home Homework_1 Homework_2

Split Apply Combine

I gather() makes "wide" data longer

## Name Homework Score

## Last_Name First_Name Exam Project

## Last_Name First_Name Status Home Homework_1 Homework_2 Homework_3

I ggplot2 is an R package designed for creating high quality plots.

#you need to install the package just once

ggplot(data=economics) + geom_point(aes(x=date, y=unemploy))

2.0 2.5 3.0 3.5 4.0 4.5

1. Rob Kabacoff, “R in Action”: https://www.amazon.com/Action- Data- Analysis- Graphics/dp/

2. Michael J Crawley, “The R Book”:

3. Joseph Adler, “R in a Nutshell”:

4. Quick-R tutorial: http://www.statmethods.net/input/datatypes.html

5. Cookbook for R, Data input and output:

1. Define Data structures such as vector, array, list and dataframe.

You might also like