0% found this document useful (0 votes)
176 views37 pages

Data Visualization in Data Science Using R

Aswan introduces himself as a senior data analyst and instructor for data visualization and R. He then shares contact information. The document discusses the grammar of graphics and its key elements like data, mappings, scales and geometries. It provides an example of visualizing COVID-19 case data in Jakarta using R, calculating a 7-day amplification factor and plotting the results.

Uploaded by

BomBom
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
176 views37 pages

Data Visualization in Data Science Using R

Aswan introduces himself as a senior data analyst and instructor for data visualization and R. He then shares contact information. The document discusses the grammar of graphics and its key elements like data, mappings, scales and geometries. It provides an example of visualizing COVID-19 case data in Jakarta using R, calculating a 7-day amplification factor and plotting the results.

Uploaded by

BomBom
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 37

>_dataviz with #rstats

from table into canvas


DTS X DQLab | 26 March 2021
Hi, I'm Aswan!
Senior Data analyst @ Jabar Digital Service
Sensometrics specialist @ Sensolution.ID
Initiator of Komunitas R Indonesia
R instructor @ DQLab, R Academy Telkom
University, and some universities
Freelance data analyst & visualization designer
Passionate about: Data carpentry and data
visualization

Twitter: @aswansyahputra_
Telegram: @aswansyahputra
Email:
muhammadaswansyahputra@gmail.com

2 / 15
Grammar of graphics
3 / 15
"In order to build, rst we need to
deconstruct."

4 / 15
Elements of graphics

5 / 15
Elements of graphics
1. Data
2. Mapping
3. Statistic
4. Scales
5. Geometries
6. Facets
7. Coordinates
8. Theme

5 / 15
Elements of graphics
1. Data
2. Mapping
3. Statistic
4. Scales
5. Geometries
6. Facets
7. Coordinates
8. Theme

5 / 15
The devil is in the detail
6 / 15
7 / 15
8 / 15
COVID-19 in Jakarta

library(tidyverse)
library(lubridate)
library(slider)
library(scales)

load("data/cov_jakarta.rda")

9 / 15
COVID-19 in Jakarta

library(tidyverse) cov_jakarta
library(lubridate)
library(slider)
## # A tibble: 306 x 4
library(scales)
## date newcase death recovered
load("data/cov_jakarta.rda") ## <date> <int> <int> <int>
## 1 2020-03-01 2 0 0
## 2 2020-03-02 2 0 0
## 3 2020-03-03 2 0 0
## 4 2020-03-04 2 0 0
## 5 2020-03-05 0 1 0
## 6 2020-03-06 0 0 0
## 7 2020-03-07 0 2 0
## 8 2020-03-08 0 0 0
## 9 2020-03-09 0 1 0
## 10 2020-03-10 0 0 0
## # … with 296 more rows

9 / 15
7-th day ampli cation factor preparation

cov_jakarta_a7 <- cov_jakarta ## # A tibble: 306 x 4


## date newcase death recovered
cov_jakarta_a7 ## <date> <int> <int> <int>
## 1 2020-03-01 2 0 0
## 2 2020-03-02 2 0 0
## 3 2020-03-03 2 0 0
## 4 2020-03-04 2 0 0
## 5 2020-03-05 0 1 0
## 6 2020-03-06 0 0 0
## 7 2020-03-07 0 2 0
## 8 2020-03-08 0 0 0
## 9 2020-03-09 0 1 0
## 10 2020-03-10 0 0 0
## # … with 296 more rows

9 / 15
7-th day ampli cation factor preparation

cov_jakarta_a7 <- cov_jakarta %>% ## # A tibble: 306 x 4


transmute( ## date infected infected_lastweek a7
date, ## <date> <int> <int> <dbl>
infected = cumsum(newcase), ## 1 2020-03-01 2 NA NA
infected_lastweek = dplyr::lag(infected, 7), ## 2 2020-03-02 4 NA NA
a7 = infected / infected_lastweek, ## 3 2020-03-03 6 NA NA
a7 = slide_dbl(a7, mean, .before = 1, .after = 1 ## 4 2020-03-04 8 NA NA
) ## 5 2020-03-05 8 NA NA
## 6 2020-03-06 8 NA NA
cov_jakarta_a7 ## 7 2020-03-07 8 NA NA
## 8 2020-03-08 8 2 NA
## 9 2020-03-09 8 4 2.44
## 10 2020-03-10 8 6 1.44
## # … with 296 more rows

9 / 15
7-th day ampli cation factor preparation

cov_jakarta_a7 <- cov_jakarta %>% ## # A tibble: 306 x 4


transmute( ## date infected infected_lastweek a7
date, ## <date> <int> <int> <dbl>
infected = cumsum(newcase), ## 1 2020-03-01 2 NA NA
infected_lastweek = dplyr::lag(infected, 7), ## 2 2020-03-02 4 NA NA
a7 = infected / infected_lastweek, ## 3 2020-03-03 6 NA NA
a7 = slide_dbl(a7, mean, .before = 1, .after = 1 ## 4 2020-03-04 8 NA NA
) ## 5 2020-03-05 8 NA NA
## 6 2020-03-06 8 NA NA
## 7 2020-03-07 8 NA NA
cov_jakarta_a7
## 8 2020-03-08 8 2 NA
## 9 2020-03-09 8 4 2.44
## 10 2020-03-10 8 6 1.44
## # … with 296 more rows

9 / 15
7-th day ampli cation factor visualization

cov_jakarta_a7 ## # A tibble: 306 x 4


## date infected infected_lastweek a7
## <date> <int> <int> <dbl>
## 1 2020-03-01 2 NA NA
## 2 2020-03-02 4 NA NA
## 3 2020-03-03 6 NA NA
## 4 2020-03-04 8 NA NA
## 5 2020-03-05 8 NA NA
## 6 2020-03-06 8 NA NA
## 7 2020-03-07 8 NA NA
## 8 2020-03-08 8 2 NA
## 9 2020-03-09 8 4 2.44
## 10 2020-03-10 8 6 1.44
## # … with 296 more rows

9 / 15
7-th day ampli cation factor visualization

cov_jakarta_a7 %>%
ggplot(aes(date, a7))

9 / 15
7-th day ampli cation factor visualization

cov_jakarta_a7 %>%
ggplot(aes(date, a7)) +
geom_col(fill = "#136F63", alpha = 0.8, colour = "

9 / 15
7-th day ampli cation factor visualization

cov_jakarta_a7 %>%
ggplot(aes(date, a7)) +
geom_col(fill = "#136F63", alpha = 0.8, colour = "
geom_hline(yintercept = 1, colour = "#D72638", typ

9 / 15
7-th day ampli cation factor visualization

cov_jakarta_a7 %>%
ggplot(aes(date, a7)) +
geom_col(fill = "#136F63", alpha = 0.8, colour = "
geom_hline(yintercept = 1, colour = "#D72638", typ
scale_x_date(
breaks = "1 month",
labels = label_date(format = "%b"),
expand = c(0.005, 0.005)
)

9 / 15
7-th day ampli cation factor visualization

cov_jakarta_a7 %>%
ggplot(aes(date, a7)) +
geom_col(fill = "#136F63", alpha = 0.8, colour = "
geom_hline(yintercept = 1, colour = "#D72638", typ
scale_x_date(
breaks = "1 month",
labels = label_date(format = "%b"),
expand = c(0.005, 0.005)
) +
labs(
x = NULL,
y = "A7",
title = "Silent transmission in Jakarta",
subtitle = "Red line segment indicates 7-day per
caption = "Data source: covid.19.go.id"
)

9 / 15
7-th day ampli cation factor visualization

cov_jakarta_a7 %>%
ggplot(aes(date, a7)) +
geom_col(fill = "#136F63", alpha = 0.8, colour = "
geom_hline(yintercept = 1, colour = "#D72638", typ
scale_x_date(
breaks = "1 month",
labels = label_date(format = "%b"),
expand = c(0.005, 0.005)
) +
labs(
x = NULL,
y = "A7",
title = "Silent transmission in Jakarta",
subtitle = "Red line segment indicates 7-day per
caption = "Data source: covid.19.go.id"
) +
theme_xaringan()

9 / 15
7-th day ampli cation factor visualization

cov_jakarta_a7 %>%
ggplot(aes(date, a7)) +
geom_col(fill = "#136F63", alpha = 0.8, colour = "
geom_hline(yintercept = 1, colour = "#D72638", typ
scale_x_date(
breaks = "1 month",
labels = label_date(format = "%b"),
expand = c(0.005, 0.005)
) +
labs(
x = NULL,
y = "A7",
title = "Silent transmission in Jakarta",
subtitle = "Red line segment indicates 7-day per
caption = "Data source: covid.19.go.id"
) +
theme_xaringan() +
theme(
panel.grid.major.x = element_blank(),
panel.grid.minor = element_blank(),
axis.text.x = element_text(size = rel(0.5))
)

9 / 15
Adding annotation (1)

cov_jakarta_a7_plot

9 / 15
Adding annotation (1)

cov_jakarta_a7_plot +
annotate(
geom = "label",
x = min(cov_jakarta$date),
y = 1,
label = "A7 = 1.0",
fontface = "italic",
size = 4,
colour = "#D72638",
alpha = 0.8
)

9 / 15
Adding annotation (1)

cov_jakarta_a7_plot +
annotate(
geom = "label",
x = min(cov_jakarta$date),
y = 1,
label = "A7 = 1.0",
fontface = "italic",
size = 4,
colour = "#D72638",
alpha = 0.8
) +
geom_curve(
x = as.Date("2020-04-16"),
xend = as.Date("2020-04-9"),
y = 5,
yend = subset(cov_jakarta_a7, date == "2020-04-9
arrow = arrow(length = unit(0.07, "inch")),
size = 0.025,
color = "gray40",
curvature = 0.3
)

9 / 15
Adding annotation (1)

cov_jakarta_a7_plot +
annotate(
geom = "label",
x = min(cov_jakarta$date),
y = 1,
label = "A7 = 1.0",
fontface = "italic",
size = 4,
colour = "#D72638",
alpha = 0.8
) +
geom_curve(
x = as.Date("2020-04-16"),
xend = as.Date("2020-04-9"),
y = 5,
yend = subset(cov_jakarta_a7, date == "2020-04-9
arrow = arrow(length = unit(0.07, "inch")),
size = 0.025,
color = "gray40",
curvature = 0.3
) +
annotate(
geom = "label",
x = as.Date("2020-04-16"),
y = 5,
label = "First PSBB\n(9 April)",
size = 5,
colour = "gray30",
alpha = 1
) 9 / 15
Adding annotation (2)

cov_jakarta_a7_plot

9 / 15
Adding annotation (2)

cov_jakarta_a7_plot +
annotate(
geom = "text",
x = as.Date("2020-08-31"),
y = 12,
label = str_wrap("A7 is the ratio of total confi
fontface = "italic",
size = 3.5,
colour = "gray30"
)

9 / 15
Adding annotation (2)

cov_jakarta_a7_plot +
annotate(
geom = "text",
x = as.Date("2020-08-31"),
y = 12,
label = str_wrap("A7 is the ratio of total confi
fontface = "italic",
size = 3.5,
colour = "gray30"
) +
geom_segment(
x = as.Date("2020-04-9"),
xend = as.Date("2020-04-9") + 7,
y = 2,
yend = 2,
arrow = arrow(length = unit(0.05, "inch"), type
size = 0.025,
colour = "#D72638"
)

9 / 15
Adding annotation (2)

cov_jakarta_a7_plot +
annotate(
geom = "text",
x = as.Date("2020-08-31"),
y = 12,
label = str_wrap("A7 is the ratio of total confi
fontface = "italic",
size = 3.5,
colour = "gray30"
) +
geom_segment(
x = as.Date("2020-04-9"),
xend = as.Date("2020-04-9") + 7,
y = 2,
yend = 2,
arrow = arrow(length = unit(0.05, "inch"), type
size = 0.025,
colour = "#D72638"
) +
coord_cartesian(clip = "off")

9 / 15
Some practical tips
10 / 15
11 / 15
12 / 15
13 / 15
Learn from re-viz, demo time!
14 / 15
Thank you!

Contact me!
muhammadaswansyahputra@gmail.com

15 / 15

You might also like