0% found this document useful (0 votes)

62 views36 pages

Week3 Slides

This document outlines the Week 3 tutorial for the DSA2101 course on Essential Data Analytics Tools, focusing on importing data into R. It covers various file formats, including CSV, Excel, and JSON, and provides instructions on how to read these files into R, along with best practices for managing data and memory. Additionally, it emphasizes the importance of data checks and introduces the readr and readxl packages for efficient data handling.

Uploaded by

Tùng Nguyễn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views36 pages

Week3 Slides

Uploaded by

Tùng Nguyễn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

DSA2101

Essential Data Analytics Tools: Data Visualization

Yuting Huang

AY24/25

Week 3: Importing Data I

1 / 36
The teaching team

Instructor:
▶ Dr. Huang Yuting (yhuang@[Link])
▶ Office: S16 04-01
▶ Office hour: In-person and by appointment

Teaching assistants (TAs): In-person/online and by appointment

▶ Yeo Jaye Lin (e1249197@[Link])
▶ Quek Chui Qing (e1157262@[Link])
▶ Agrawal Naman (naman.a@[Link])
▶ Loo Wen Wen (e0970566@[Link])
▶ Zhang Mingyuan (e0970135@[Link])

2 / 36
Tutorials in Week 3

Tutorials will begin in this week.

Due to the CNY public holidays, we will reschedule the session online.
▶ Your TA will be in touch with you and share the time and
meeting link.
▶ All sessions will be recorded and available on Canvas by end of
this week.

3 / 36
Importing data into R

1. CSV files Week 3

2. Flat files
3. Excel Files
4. R data files Week 4
5. JSON Files
6. Files from the web
7. APIs

4 / 36
Recap

An important pre-requisite to loading data into R is that we are able

to point to the location at which the data files are stored.

1. Where am I?
2. Where are my data?

5 / 36
Working directory

The first question addresses the notion of our current working

directory.
▶ Typically, it is the location of our current R script.
▶ The function getwd() returns the absolute path of our current
working directory.

getwd()

6 / 36
File path

The second question implies that data are not necessarily stored at
the location of our current working directory.
▶ Relative path: the address of a file relative to our current
working directory.
▶ Access files directly in the current working path.
▶ Use two dots .. to denote “one level up in the directory
hierarchy”.

Using relative path in all code you write.

7 / 36
File path (Important!)

We will strictly adhere to the following practice:

▶ Store all course materials in a folder named DSA2101.
▶ Within DSA2101, create a sub-folder named src to store all R
scripts and Rmd files.
▶ Within DSA2101, create another sub-folder called data to store
all data sets.
▶ The src and data folders should be positioned at the same
hierarchical level within DSA2101.

8 / 36
Memory requirements for R objects

Remember that R stores all its objects using physical memory.

▶ It is important to be aware of how much memory is being used in
your workspace.
▶ Especially when we are reading in or creating a new (large) data
set in R.

Other programs running on our computer take up RAM; other R

objects exist in the workspace, also take up RAM.

9 / 36
Memory requirements for R objects

If you do not have enough RAM, your computer (or at least

your R session) will freeze up.
▶ Usually an unpleasant experience that requires you to kill the R
session (the best scenario), or
▶ . . . reboot your computer.

So make sure you understand the memory requirements before

reading in or creating large data sets!
Read more about this on Posit.

10 / 36
Comma separated values

We first consider the simplest file format – comma separated values

(CSV).

Alice, 98, 92, 94

Brown, 85, 89, 91
Carly, 81, 96, 97

These files are in fact just text files, with

▶ An optional header, listing the column names.
▶ Each observation separated by commas within each row.

11 / 36
What does a CSV file look like?
A .csv file, opened in a text editor.
▶ This is the raw form of the data.

12 / 36
What does a CSV file look like?
Here is the same file opened in Microsoft Excel.
▶ Excel assumes that it is a spreadsheet and put elements in its
own cell.

13 / 36
Read a CSV file into R

The base R command to read a CSV file is [Link]()

The main arguments to this function are:
▶ file: The file name.
▶ header: Absence/presence of a header row. The default is TRUE.
▶ [Link]: The names to identify columns in the table.
▶ stringsAsFactors: Whether to convert character vectors to
factors.
▶ [Link]: Specify strings to be interpreted as NA values.

14 / 36
Example: Education, Height, and Income

The file [Link] contains information on 1192 individuals.

▶ Contains 6 columns. There’s also a column header.
▶ Hence, we read in the data in the following way:

heights <- [Link]("../data/[Link]", header = TRUE)

dim(heights)

## [1] 1192 6

▶ The function dim() (stands for dimensions) tells us that the

data frame has 1192 rows and 6 columns.

15 / 36
Data checks

1. What type has each column been read in as?

str(heights)

## ’[Link]’: 1192 obs. of 6 variables:

## $ earn : num 50000 60000 30000 50000 51000 9000 29000 32000 2000 2
## $ height: num 74.4 65.5 63.6 63.1 63.4 ...
## $ sex : chr "male" "female" "female" "female" ...
## $ ed : int 16 16 16 16 17 15 12 17 15 12 ...
## $ age : int 45 58 29 91 39 26 49 46 21 26 ...
## $ race : chr "white" "white" "white" "other" ...

▶ The function str() (stands for structure) reveals information

about the columns, giving the names of the columns and a peek
into the contents of each.

16 / 36
Data checks
2. race is a categorical variable.
What are the different races that have been read in?

heights$race <- factor(heights$race)

levels(heights$race)

## [1] "black" "hispanic" "other" "white"

▶ A contingency variable of the counts of each factor level:

table(heights$race)

##
## black hispanic other white
## 112 66 25 989

17 / 36
Data checks

3. Are there any missing values in the data?

sum([Link](heights))

## [1] 0

▶ Use [Link]() to check missing entries in the entire data set.

18 / 36
Summary statistics
▶ We can compute summary statistics for earn:

summary(heights$earn)

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## 200 10000 20000 23155 30000 200000

▶ Summary statistics by group with aggregate():

aggregate(earn ~ sex, data = heights, FUN = median)

## sex earn
## 1 female 15000
## 2 male 25000

19 / 36
Histogram

Let us use a histogram to visualize the distribution of income.

▶ A histogram, hist(), divides the range of numeric values into
bins, then counts the number of observations that fall into each
bin.
▶ By default, the height of each bar represents frequencies.
▶ freq = FALSE alters a histogram such that the height represents
the probability densities (that is, the histogram has a total area
of one).

20 / 36
hist(heights$earn, freq = FALSE, col = "maroon",
main = "Histogram of Earnings", xlab = "Earnings")

Histogram of Earnings
1.5e−05
Density

0.0e+00

0 50000 100000 150000 200000

Earnings

▶ The distribution of income is right-skewed, as expected.

21 / 36
Histogram (revised code)
Our presentation of the histogram can be improved:

1. The bins correspond to intervals of width 20,000. We would like

bins of width 10,000 instead.
2. Transform the x-axis to display earnings in thousands of dollars
for better readability.

hist(heights$earn/1000, freq = FALSE, col = "maroon",

breaks = seq(0, 200, by = 10),
main = "Histogram of Earnings",
xlab = "Earnings (in thousands)")

▶ heights$earn/1000 divides earnings by a thousand. Now the

earnings value ranges from 0.2 to 200.
▶ breaks = seq(0, 200, by = 10) sets the range of the x-axis
from 0 to 200, and split it into bins with width 10.

22 / 36
Histogram (revised code)

Histogram of Earnings
0.030
0.020
Density

0.010
0.000

0 50 100 150 200

Earnings (in thousands)

23 / 36
The income distribution
Who are those high-earning individuals – earn more than 100,000 a
year?

# [Link]("tidyverse")
library(tidyverse)
filter(heights, earn > 100000)

## earn height sex ed age race

## 1 125000 74.34062 male 18 45 white
## 2 170000 71.01003 male 18 45 white
## 3 175000 70.58955 male 16 48 white
## 4 148000 66.74020 male 18 38 white
## 5 110000 65.96504 male 18 37 white
## 6 105000 74.58005 male 12 49 white
## 7 123000 61.42908 female 14 58 white
## 8 200000 69.66276 male 18 34 white
## 9 110000 66.31203 female 18 48 other

24 / 36
The income distribution

library(tidyverse)
filter(heights, earn > 100000)

The code uses the dplyr syntax.

▶ It is an great tool for data cleaning and manipulation.
▶ We shall learn about it soon.
▶ For now, only need to understand that it filters irrelevant rows
from the heights data frame, keeping only those who earned
more than 100, 000 per year.

25 / 36
Recap

Remember that you should inspect your data before and after you
read them in.
▶ Try to think of as many ways in which it could have gone wrong
and check.

As we covered here, you should at least consider the following:

▶ Correct number of rows and columns.
▶ Column variables read in with the correct class type.
▶ Missing values.

26 / 36
Flat file

The readr package is developed to deal with reading in large flat

files quickly.
▶ Faster than base R analogues.
▶ The function for CSV files is read_csv().

# [Link]("readr")
library(readr)
heights <- read_csv("../data/[Link]")

▶ We can also use this function to read data directly from a URL
(more on this later).

27 / 36
Other file types

readr provides other functions to read in data:

▶ read_csv2() reads semicolon-separated files.
▶ read_tsv() reads tab-delimited files.
▶ read_delim() reads in files with any delimiter, attempting to
automatically guess the delimiter if you do not specify it.
▶ ...

Useful documentation and cheatsheet on data import.

28 / 36
Excel spreadsheets

To read data from xls and xlsx spreadsheets, we need the readxl
package.

# [Link]("readxl")
library(readxl)

▶ The read_excel() function automatically detects the rectangle

region that contains non-empty cells in the Excel spreadsheet.
▶ Nonetheless, ensure that you open up your file in Excel first, to
see what it contains and how you can provide further contextual
information for the function to use.

29 / 36
Excel example

read_excel("../data/read_excel_01.xlsx")

## # A tibble: 7 x 5
## ‘Table 1‘ ...2 ...3 ...4 ...5
## <lgl> <lgl> <chr> <dbl> <chr>
## 1 NA NA <NA> NA <NA>
## 2 NA NA <NA> NA <NA>
## 3 NA NA <NA> NA <NA>
## 4 NA NA <NA> NA <NA>
## 5 NA NA a 1 m
## 6 NA NA b 2 m
## 7 NA NA c 3 m

▶ In this example, read_excel() needs a little help as the data

seems to be “floating” in the center of the worksheet.

30 / 36
Excel example

read_excel("../data/read_excel_01.xlsx", skip = 5)

## # A tibble: 2 x 3
## a ‘1‘ m
## <chr> <dbl> <chr>
## 1 b 2 m
## 2 c 3 m

▶ The skip argument tells R to skip a certain number of rows.

▶ By default, the function reads the first row as the header. We
can disable it with col_names = FALSE.
▶ Notice that read_excel() uses a col_names argument, instead
of header.

31 / 36
Excel example
Another way is the specify the data range precisely.
▶ We can also supply a set of column names in col_names.

read_excel("../data/read_excel_01.xlsx",
range = "C6:E8", col_names = c("var1", "var2", "var3"))

## # A tibble: 3 x 3
## var1 var2 var3
## <chr> <dbl> <chr>
## 1 a 1 m
## 2 b 2 m
## 3 c 3 m

▶ In case you were wondering, a tibble is an improved version of a

data frame. We shall learn more about it soon.

32 / 36
Example: Workplace injuries
The excel file Workplace_injuries.xlsx contains data on selected
workplace injuries from 2019 to 2022.
▶ Originally from the Ministry of Manpower (MOM).

injuries <- read_excel("../data/Workplace_injuries.xlsx")

injuries

## # A tibble: 6 x 5
## Type ‘2019‘ ‘2020‘ ‘2
## <chr> <dbl> <dbl> <
## 1 Crushing, fractures and dislocations 3107 2577
## 2 Cuts and Bruises 4500 3895
## 3 Sprains & Strains 1982 1791
## 4 Others 2418 1675
## 5 <NA> NA NA
## 6 Notes: Workplace injury numbers include injuries ~ NA NA

33 / 36
To read in the correct range of data, we should specify an appropriate
range.

injuries <- read_excel("../data/Workplace_injuries.xlsx",

range = "A1:E5")
injuries

## # A tibble: 4 x 5
## Type ‘2019‘ ‘2020‘ ‘2021‘ ‘2022‘
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Crushing, fractures and dislocations 3107 2577 2950 2759
## 2 Cuts and Bruises 4500 3895 4263 4333
## 3 Sprains & Strains 1982 1791 1829 1778
## 4 Others 2418 1675 2100 2022

34 / 36
Common errors

When we first start importing data into R, it’s common to see some
frustrating error messages.
▶ The most common error is:

Error in file(file, "rt") : cannot open the connection

In addition: Warning message:
In file(file, "rt") :
cannot open file 'some_file.csv': No such file or directory

▶ This indicates that R cannot find the file you are trying to import.
▶ Check your file path! Perhaps also the spelling of the filename.

35 / 36
Summary

We learn about importing data from different formats and sources:

1. CSV file using [Link]()

2. Flat file using functions from the readr package.
3. Excel file with read_excel() from the readxl package.

Also a few more ways to clean and visualize data.

36 / 36

Lab0 R Tutorial EHS
No ratings yet
Lab0 R Tutorial EHS
9 pages
Business Analytics - L2
No ratings yet
Business Analytics - L2
41 pages
Module 1: Unit - 1.1: Introduction To Analytics or R Programming
No ratings yet
Module 1: Unit - 1.1: Introduction To Analytics or R Programming
26 pages
R1 Uptovisualisation
No ratings yet
R1 Uptovisualisation
122 pages
R Topicscovered
No ratings yet
R Topicscovered
22 pages
MBA Sem 1 Unit 3 Fundamentals of R
No ratings yet
MBA Sem 1 Unit 3 Fundamentals of R
41 pages
R Programming
No ratings yet
R Programming
22 pages
Basics of Data Analysis and Graphics in
No ratings yet
Basics of Data Analysis and Graphics in
103 pages
Basic R Commands For Data Analysis
No ratings yet
Basic R Commands For Data Analysis
7 pages
Week 7
No ratings yet
Week 7
10 pages
R Commands
No ratings yet
R Commands
18 pages
Apunts BLOC 1 Estadística
No ratings yet
Apunts BLOC 1 Estadística
15 pages
R for Applied Econometrics Tutorial
No ratings yet
R for Applied Econometrics Tutorial
21 pages
Introduction to R for Statistics
No ratings yet
Introduction to R for Statistics
56 pages
Advance R Prog.-1
No ratings yet
Advance R Prog.-1
24 pages
Lab 1
No ratings yet
Lab 1
26 pages
Getting Started With R
No ratings yet
Getting Started With R
7 pages
R Studio: Scripts, Data Handling & Cleaning
No ratings yet
R Studio: Scripts, Data Handling & Cleaning
25 pages
01 IntroSlides
No ratings yet
01 IntroSlides
43 pages
R Programming Language Tutorial PDF
No ratings yet
R Programming Language Tutorial PDF
100 pages
Lect 4
No ratings yet
Lect 4
17 pages
R Intro2021
No ratings yet
R Intro2021
23 pages
Introduction to R Programming
No ratings yet
Introduction to R Programming
34 pages
Unit 1 Big Data Analytics - An Introduction (Final)
No ratings yet
Unit 1 Big Data Analytics - An Introduction (Final)
65 pages
Data Analytic Using R - Advanced
No ratings yet
Data Analytic Using R - Advanced
51 pages
Beginner's Guide to R Programming
No ratings yet
Beginner's Guide to R Programming
155 pages
Lecture 1
No ratings yet
Lecture 1
167 pages
Chapter - 03 - Review of Basic Data
No ratings yet
Chapter - 03 - Review of Basic Data
92 pages
Set Working Directory in R Programming
No ratings yet
Set Working Directory in R Programming
17 pages
R Programming: © 2016 SMART Training Resources Pvt. LTD
No ratings yet
R Programming: © 2016 SMART Training Resources Pvt. LTD
28 pages
Data Analytic R
No ratings yet
Data Analytic R
28 pages
RStudio Tutorial: Data Analysis Guide
No ratings yet
RStudio Tutorial: Data Analysis Guide
15 pages
Lecture 10 R
No ratings yet
Lecture 10 R
117 pages
Week4 Slides
No ratings yet
Week4 Slides
54 pages
Lecture 7 - Integrated Analysis With R
No ratings yet
Lecture 7 - Integrated Analysis With R
79 pages
Unit 2
No ratings yet
Unit 2
32 pages
Simple Tutorial in R
No ratings yet
Simple Tutorial in R
15 pages
R Programming Essentials
No ratings yet
R Programming Essentials
27 pages
Lab01 Note R
No ratings yet
Lab01 Note R
7 pages
Introduction To R For Business Analytics
No ratings yet
Introduction To R For Business Analytics
7 pages
MDPN460 Lecture05
No ratings yet
MDPN460 Lecture05
32 pages
Beginner's Guide to R for Econometrics
No ratings yet
Beginner's Guide to R for Econometrics
33 pages
Introduction To R
No ratings yet
Introduction To R
34 pages
Basic R Commands for Data Analysis
No ratings yet
Basic R Commands for Data Analysis
7 pages
Ma 3
No ratings yet
Ma 3
32 pages
S24 Stats10 Lab1-1
No ratings yet
S24 Stats10 Lab1-1
8 pages
Agenda: 1) Assign Homework #1 (Due Wednesday 6/30) 2) Lecture Over More of Chapter 2
No ratings yet
Agenda: 1) Assign Homework #1 (Due Wednesday 6/30) 2) Lecture Over More of Chapter 2
43 pages
R Basics for Non-Programmers
No ratings yet
R Basics for Non-Programmers
9 pages
R Basics Continued - Factors and Data Frames - Intro To R and RStudio For Genomics
No ratings yet
R Basics Continued - Factors and Data Frames - Intro To R and RStudio For Genomics
17 pages
Stats Lab1
No ratings yet
Stats Lab1
11 pages
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
No ratings yet
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
72 pages
R Programming UNIT 2
No ratings yet
R Programming UNIT 2
119 pages
R for Database Management & Analysis
No ratings yet
R for Database Management & Analysis
79 pages
Working with Data Frames in R
No ratings yet
Working with Data Frames in R
8 pages
Chap 1
No ratings yet
Chap 1
32 pages
Unit - 5 R
No ratings yet
Unit - 5 R
21 pages
Introduction to R Programming Basics
No ratings yet
Introduction to R Programming Basics
39 pages
Week5 Slides
No ratings yet
Week5 Slides
72 pages
Week12 Slides
No ratings yet
Week12 Slides
46 pages
Week6 Slides Updated
No ratings yet
Week6 Slides Updated
57 pages
Week11 Slides
No ratings yet
Week11 Slides
27 pages
Week13 Slides Review
No ratings yet
Week13 Slides Review
23 pages
Week2 Slides
No ratings yet
Week2 Slides
76 pages
Ch03-Digital Transmission (Son)
No ratings yet
Ch03-Digital Transmission (Son)
37 pages
ABA Report of UIUX - FULL - FINAL
No ratings yet
ABA Report of UIUX - FULL - FINAL
9 pages
Cambridge International AS and A Level Computer Science Coursebook 2nd Edition Sylvia Langfield Available Instantly
No ratings yet
Cambridge International AS and A Level Computer Science Coursebook 2nd Edition Sylvia Langfield Available Instantly
142 pages
Operating Systems Study Guide
No ratings yet
Operating Systems Study Guide
2 pages
DOS Interrupts Guide
No ratings yet
DOS Interrupts Guide
17 pages
Salesforce Tutorial - Free Salesforce Admin, Developer Tutorial
No ratings yet
Salesforce Tutorial - Free Salesforce Admin, Developer Tutorial
2 pages
Circular 20241017154635 Seating Plan Format 19 10 2024 30to35-1
No ratings yet
Circular 20241017154635 Seating Plan Format 19 10 2024 30to35-1
14 pages
QSP Control of Documented Information
No ratings yet
QSP Control of Documented Information
4 pages
Sample Report
No ratings yet
Sample Report
85 pages
Big Data Tools: Scala & Spark Guide
No ratings yet
Big Data Tools: Scala & Spark Guide
53 pages
Step 3
No ratings yet
Step 3
1 page
2023 Washer Touch Screen Manual
No ratings yet
2023 Washer Touch Screen Manual
46 pages
TNSET 2024: Computer Science Syllabus
No ratings yet
TNSET 2024: Computer Science Syllabus
8 pages
IoT Based Smart Home Automation Using Wireless Power Transfer
No ratings yet
IoT Based Smart Home Automation Using Wireless Power Transfer
6 pages
Data Centers Valuation Multiples and Public Comps - Multiples - VC
No ratings yet
Data Centers Valuation Multiples and Public Comps - Multiples - VC
6 pages
IR Global Brand Guidelines 2020
No ratings yet
IR Global Brand Guidelines 2020
17 pages
Example PLSQL
No ratings yet
Example PLSQL
3 pages
Aamir Resume
No ratings yet
Aamir Resume
2 pages
Unit V-IoT
No ratings yet
Unit V-IoT
24 pages
Free Bitcoin: 10 Easy Methods
No ratings yet
Free Bitcoin: 10 Easy Methods
2 pages
40+ Bundle
No ratings yet
40+ Bundle
7 pages
A Theory For System Security
No ratings yet
A Theory For System Security
8 pages
Century Computer Skills and Applications Lessons 10th Edition by Hoggatt Shank Smith ISBN Test Bank
100% (69)
Century Computer Skills and Applications Lessons 10th Edition by Hoggatt Shank Smith ISBN Test Bank
5 pages
Unit - V
No ratings yet
Unit - V
25 pages
Apache Httpclient Tutorial
100% (1)
Apache Httpclient Tutorial
69 pages
Presentation - U Hetmaniuk
No ratings yet
Presentation - U Hetmaniuk
45 pages
Wireless Value Realization
No ratings yet
Wireless Value Realization
14 pages
License VS2015 Update3 ShellsRedist ENU
No ratings yet
License VS2015 Update3 ShellsRedist ENU
3 pages
Social Media Engagement Metrics
No ratings yet
Social Media Engagement Metrics
84 pages
Allowed Head Efficiency Ver2.11
No ratings yet
Allowed Head Efficiency Ver2.11
7 pages

Week3 Slides

Uploaded by

Week3 Slides

Uploaded by

DSA2101

Essential Data Analytics Tools: Data Visualization

Week 3: Importing Data I

Teaching assistants (TAs): In-person/online and by appointment

Tutorials will begin in this week.

1. CSV files Week 3

An important pre-requisite to loading data into R is that we are able

The first question addresses the notion of our current working

Using relative path in all code you write.

We will strictly adhere to the following practice:

Remember that R stores all its objects using physical memory.

Other programs running on our computer take up RAM; other R

If you do not have enough RAM, your computer (or at least

So make sure you understand the memory requirements before

We first consider the simplest file format – comma separated values

Alice, 98, 92, 94

These files are in fact just text files, with

The base R command to read a CSV file is [Link]()

The file [Link] contains information on 1192 individuals.

heights <- [Link]("../data/[Link]", header = TRUE)

▶ The function dim() (stands for dimensions) tells us that the

1. What type has each column been read in as?

## ’[Link]’: 1192 obs. of 6 variables:

▶ The function str() (stands for structure) reveals information

heights$race <- factor(heights$race)

## [1] "black" "hispanic" "other" "white"

▶ A contingency variable of the counts of each factor level:

3. Are there any missing values in the data?

▶ Use [Link]() to check missing entries in the entire data set.

## Min. 1st Qu. Median Mean 3rd Qu. Max.

▶ Summary statistics by group with aggregate():

aggregate(earn ~ sex, data = heights, FUN = median)

Let us use a histogram to visualize the distribution of income.

0 50000 100000 150000 200000

▶ The distribution of income is right-skewed, as expected.

1. The bins correspond to intervals of width 20,000. We would like

hist(heights$earn/1000, freq = FALSE, col = "maroon",

▶ heights$earn/1000 divides earnings by a thousand. Now the

0 50 100 150 200

Earnings (in thousands)

## earn height sex ed age race

The code uses the dplyr syntax.

As we covered here, you should at least consider the following:

The readr package is developed to deal with reading in large flat

readr provides other functions to read in data:

Useful documentation and cheatsheet on data import.

▶ The read_excel() function automatically detects the rectangle

▶ In this example, read_excel() needs a little help as the data

▶ The skip argument tells R to skip a certain number of rows.

▶ In case you were wondering, a tibble is an improved version of a

injuries <- read_excel("../data/Workplace_injuries.xlsx")

injuries <- read_excel("../data/Workplace_injuries.xlsx",

Error in file(file, "rt") : cannot open the connection

We learn about importing data from different formats and sources:

1. CSV file using [Link]()

Also a few more ways to clean and visualize data.

You might also like