0% found this document useful (0 votes)

25 views1 page

Module 2 Iris Data Set

The iris dataset contains measurements of 150 iris flowers of 3 species. It includes 4 numeric attributes and 1 categorical species attribute. The dataset structure and descriptive statistics are explored. Visualizations include scatter plots of attributes, histograms, and box plots to examine relationships and distributions.

Uploaded by

Rachell Ann Uson

Available Formats

Download as PDF, TXT or read online on Scribd

Download as pdf or txt

0% found this document useful (0 votes)

25 views1 page

Module 2 Iris Data Set

Uploaded by

Rachell Ann Uson

Available Formats

Download as PDF, TXT or read online on Scribd

Download as pdf or txt

You are on page 1/ 1

Iris Dataset

Allan Lao
2023-09-26
##ctrl-alt-i for code blocks

Iris Dataset in R
The iris dataset is a built-in dataset in R that contains measurements on 4 different attributes (in centimeters) for 50 flowers from 3 different
species.

To explore the dataset, we can describe it statistically or visualize using charts.

Load the Iris Dataset

Since the iris dataset is a built-in dataset, we simply need to load and use it

data(iris)

Explore the Structure of the dataset

First is to examine the data structure to determine the size, number of columns and other attributes. The order on what you want to look is all up to
the analyst.

Structure
The structure of the dataset

str(iris)

## 'data.frame': 150 obs. of 5 variables:

## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

str() shows the structure indicating the number of observations (records) and variables as well as its data type. There are 150 rows of records in
the iris dataset with 5 columns. Note the Species variable has a data type of Factor

The dimension

dim(iris)

## [1] 150 5

The names of the columns

names(iris)

## [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"

If you want to take a glimpse at the first 4 lines of rows.

head(iris,4)

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

1 5.1 3.5 1.4 0.2 setosa

2 4.9 3.0 1.4 0.2 setosa

3 4.7 3.2 1.3 0.2 setosa

4 4.6 3.1 1.5 0.2 setosa

4 rows

Optionally you may check also the last 6 records

tail(iris)

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

145 6.7 3.3 5.7 2.5 virginica

146 6.7 3.0 5.2 2.3 virginica

147 6.3 2.5 5.0 1.9 virginica

148 6.5 3.0 5.2 2.0 virginica

149 6.2 3.4 5.4 2.3 virginica

150 5.9 3.0 5.1 1.8 virginica

6 rows

Describe the Iris Dataset using Statistical tools

Now, lets usse some statistics to describe the dataset.

The descriptive statistics summary

summary(iris)

## Sepal.Length Sepal.Width Petal.Length Petal.Width

## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##

For each of the numeric variables we can see the following information:

Min: The minimum value.

1st Qu: The value of the first quartile (25th percentile).
Median: The median value.
Mean: The mean value.
3rd Qu: The value of the third quartile (75th percentile).
Max: The maximum value.

For the only categorical variable in the dataset (Species) we see a frequency count of each value:

setosa: This species occurs 50 times.

versicolor: This species occurs 50 times.
virginica: This species occurs 50 times.

Visualize the Iris Dataset

The plot () function is the generic function for plotting R objects.

plot(iris)

the entire dataset provides a glimpse of the relation between its variables. The chart below Sepal.Length represents the Sepal.Width in the y-axis
and Sepal.Length in the x-axis

Plot quantitative variables

plot(iris$Sepal.Length) #Quantitative

<> #### Plot 2 quantitative variables

plot(iris$Sepal.Width, iris$Sepal.Length,
col=factor(iris$Species),
main='Sepal Length vs Width',
xlab='Sepal Width',
ylab='Sepal Length',

pch=19)

legend(x = "topleft", lty = c(4,6), text.font = 4,

text.col = "blue",
pch=13,
col = (factor(iris$Species)),
legend=levels(factor(iris$Species)))

Plotting a Factor variable

The plot() function automatically detects the type of variable and determines the appropriate chart to use by default

plot(iris$Species)

Next, will use histogram to determine how data is spread across a range of values. Just being curious on the distribution of Sepal Length.

hist(iris$Sepal.Length,
col='steelblue',
main='Histogram',
xlab='Length',
ylab='Frequency')

Box Plot shows 5 statistically significant numbers- the minimum, the 25th percentile, the median, the 75th percentile and the maximum. It is thus
useful for visualizing the spread of the data is and deriving inferences accordingly

Using a boxplot() we can determine the distribution of sepal length across species.

boxplot(Sepal.Length~Species,
data=iris,
main='Sepal Length by Species',
xlab='Species',
ylab='Sepal Length',
col='steelblue',
border='black')

A Complete Guide To The Iris Dataset in R
No ratings yet
A Complete Guide To The Iris Dataset in R
3 pages
Data Exploration and Visualisation With R: Yanchang Zhao
No ratings yet
Data Exploration and Visualisation With R: Yanchang Zhao
45 pages
Merging and Importing Data Additionalmaterial
No ratings yet
Merging and Importing Data Additionalmaterial
2 pages
10
No ratings yet
10
7 pages
Module2 R Report
No ratings yet
Module2 R Report
6 pages
Descriptive Statistics in R
No ratings yet
Descriptive Statistics in R
49 pages
Data Visualisation in R
No ratings yet
Data Visualisation in R
3 pages
Ass 10 DSBDL
No ratings yet
Ass 10 DSBDL
9 pages
Task 1
No ratings yet
Task 1
14 pages
Using R For Data Preprocessing, Exploratory Analysis, Visualization
No ratings yet
Using R For Data Preprocessing, Exploratory Analysis, Visualization
7 pages
practical-01
No ratings yet
practical-01
18 pages
NUMPY-case Study
100% (1)
NUMPY-case Study
4 pages
03b EDA-Tutorial
No ratings yet
03b EDA-Tutorial
16 pages
Exploratory Data Analysis - Iris Dataset - by Pranshu Sharma - Analytics Vidhya - Medium
No ratings yet
Exploratory Data Analysis - Iris Dataset - by Pranshu Sharma - Analytics Vidhya - Medium
24 pages
Introduction To R. Graphical Representation of Multivariate Observations
No ratings yet
Introduction To R. Graphical Representation of Multivariate Observations
5 pages
Canonical Discriminant Analysis
No ratings yet
Canonical Discriminant Analysis
10 pages
Plot Library Handouts
No ratings yet
Plot Library Handouts
6 pages
David James B. Ignacio - Midterm Exam 1
No ratings yet
David James B. Ignacio - Midterm Exam 1
3 pages
B Question5
No ratings yet
B Question5
5 pages
Discriminant Analysis Example
No ratings yet
Discriminant Analysis Example
19 pages
Rexpt 6&7
No ratings yet
Rexpt 6&7
3 pages
Exno 4
No ratings yet
Exno 4
13 pages
1 3 ST-explore
No ratings yet
1 3 ST-explore
55 pages
Data Visualization With Ggplot2: Sca!er Plots
No ratings yet
Data Visualization With Ggplot2: Sca!er Plots
54 pages
R Programs
No ratings yet
R Programs
30 pages
Iris Flower Classification
No ratings yet
Iris Flower Classification
47 pages
9 .ML Programs
No ratings yet
9 .ML Programs
95 pages
Python (Visualization)
No ratings yet
Python (Visualization)
3 pages
Case Study (Iris Data Set)
No ratings yet
Case Study (Iris Data Set)
1 page
Part A Assignment 10
No ratings yet
Part A Assignment 10
3 pages
Iris Visual Code
No ratings yet
Iris Visual Code
6 pages
Ds Practical
No ratings yet
Ds Practical
25 pages
R For Data Science - Tidyverse For Beginners (Ggplot2, Dplyr, Tidyr, Readr, Purr, Tibble, Stringr, Forcats) PDF
No ratings yet
R For Data Science - Tidyverse For Beginners (Ggplot2, Dplyr, Tidyr, Readr, Purr, Tibble, Stringr, Forcats) PDF
1 page
Tidyverse Cheat Sheet
No ratings yet
Tidyverse Cheat Sheet
1 page
LAB1
No ratings yet
LAB1
13 pages
Material DA 7
No ratings yet
Material DA 7
3 pages
Material DA 7
No ratings yet
Material DA 7
3 pages
Material DA 7
No ratings yet
Material DA 7
3 pages
STA 272 Chapter 02 Notes and Codes Data Frames in R
No ratings yet
STA 272 Chapter 02 Notes and Codes Data Frames in R
5 pages
R Programming: 122AD0029 - T.MANISH
No ratings yet
R Programming: 122AD0029 - T.MANISH
21 pages
R Programming
No ratings yet
R Programming
4 pages
Some R Commander Examples: Sunday, January 03, 2010
No ratings yet
Some R Commander Examples: Sunday, January 03, 2010
5 pages
STATISTICALinference
No ratings yet
STATISTICALinference
5 pages
Assignment 5'
No ratings yet
Assignment 5'
4 pages
Lab 3 - SciKitLearn ML
No ratings yet
Lab 3 - SciKitLearn ML
2 pages
Wk. 4. Exploring Data (12-05-2021)
No ratings yet
Wk. 4. Exploring Data (12-05-2021)
10 pages
Babd Mid-Term
No ratings yet
Babd Mid-Term
16 pages
Kmeansrcode
No ratings yet
Kmeansrcode
2 pages
Statdescr
No ratings yet
Statdescr
23 pages
Name:-Nisha Ambike: Roll No: - 02
No ratings yet
Name:-Nisha Ambike: Roll No: - 02
2 pages
Data Mining - R Assignment: Konstantinos Stavrou (70134) 11/11/2012
No ratings yet
Data Mining - R Assignment: Konstantinos Stavrou (70134) 11/11/2012
13 pages
Data Visualization Using R & Ggplot2: Karthik Ram October 6, 2013
No ratings yet
Data Visualization Using R & Ggplot2: Karthik Ram October 6, 2013
78 pages
Univariate and Multivariate Data Exploration
No ratings yet
Univariate and Multivariate Data Exploration
26 pages
DSBDA Lab Assignment No 10
No ratings yet
DSBDA Lab Assignment No 10
3 pages
Babd End-Term
No ratings yet
Babd End-Term
43 pages
Data Science: Objectives
No ratings yet
Data Science: Objectives
10 pages
Dimensionality - Reduction - Principal - Component - Analysis - Ipynb at Master Llsourcell - Dimensionality - Reduction GitHub
No ratings yet
Dimensionality - Reduction - Principal - Component - Analysis - Ipynb at Master Llsourcell - Dimensionality - Reduction GitHub
14 pages
Iris Project Presentation
No ratings yet
Iris Project Presentation
13 pages
Experiment No - 5 R
No ratings yet
Experiment No - 5 R
3 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Module 4 - Data Exploration and Visualization
No ratings yet
Module 4 - Data Exploration and Visualization
80 pages
Statcon Module 3 With Mylegalwhiz Summaries
No ratings yet
Statcon Module 3 With Mylegalwhiz Summaries
106 pages
Statcon Digest Module 3
No ratings yet
Statcon Digest Module 3
21 pages
Module 5 - Data Cleaning and Transformation
No ratings yet
Module 5 - Data Cleaning and Transformation
26 pages
Program Brief - NICTM 2023 - v4
No ratings yet
Program Brief - NICTM 2023 - v4
3 pages
Statcon Module 2 Summaries (3 Versions For Supplements)
No ratings yet
Statcon Module 2 Summaries (3 Versions For Supplements)
65 pages
Module 2 Intro To R
No ratings yet
Module 2 Intro To R
26 pages
Module 3 - Lets Elaborate
No ratings yet
Module 3 - Lets Elaborate
2 pages
Journal Prompts To Get To Know Yourself
No ratings yet
Journal Prompts To Get To Know Yourself
8 pages
Statcon Module 4 With Mylegalwhiz Summaries
No ratings yet
Statcon Module 4 With Mylegalwhiz Summaries
109 pages
Self Discovery Prompts
No ratings yet
Self Discovery Prompts
2 pages
Statcon Module 2 Case Digest Summary
No ratings yet
Statcon Module 2 Case Digest Summary
23 pages
Consti Module 2 Case Digests and Mylegalwhiz Supplementals
No ratings yet
Consti Module 2 Case Digests and Mylegalwhiz Supplementals
40 pages
Co Kim Chan Vs Valdez Tan Keh 75 Phil 113
No ratings yet
Co Kim Chan Vs Valdez Tan Keh 75 Phil 113
3 pages
Protecting Filipino Pride
No ratings yet
Protecting Filipino Pride
8 pages
5111 Written Assignment Unit 7
No ratings yet
5111 Written Assignment Unit 7
6 pages
Government of The Philippine Islands Vs Monte de Piedad
No ratings yet
Government of The Philippine Islands Vs Monte de Piedad
4 pages
Written Assignment Unit 7
No ratings yet
Written Assignment Unit 7
5 pages
Subject: Call For Intellectual Property (IP) Rights Requests
No ratings yet
Subject: Call For Intellectual Property (IP) Rights Requests
2 pages
Unit 7 Writing Assignment
No ratings yet
Unit 7 Writing Assignment
4 pages
Unit - 1 Air Standard Cycle - Numericals Questions
No ratings yet
Unit - 1 Air Standard Cycle - Numericals Questions
3 pages
Recount Text
71% (7)
Recount Text
4 pages
Syllabus Pe1
No ratings yet
Syllabus Pe1
4 pages
Anzsco Version 1.3 Index of Principal Titles, Alternative Titles and Specialisations v1 - 1
No ratings yet
Anzsco Version 1.3 Index of Principal Titles, Alternative Titles and Specialisations v1 - 1
136 pages
Process Management in Human Resources: "Simply, Improvement "
No ratings yet
Process Management in Human Resources: "Simply, Improvement "
8 pages
OPIC Practice
100% (1)
OPIC Practice
43 pages
Green Illustrative Climate Change Global Warming Trifold Brochure
No ratings yet
Green Illustrative Climate Change Global Warming Trifold Brochure
2 pages
Banking Course Outline 6th Semester
No ratings yet
Banking Course Outline 6th Semester
2 pages
PID Controller Design For Semi-Active Car Suspension Based On Model From Intelligent System Identification
No ratings yet
PID Controller Design For Semi-Active Car Suspension Based On Model From Intelligent System Identification
4 pages
Explain Different Methods Used For Management Accounting Reporting
No ratings yet
Explain Different Methods Used For Management Accounting Reporting
2 pages
CS-QTest2
No ratings yet
CS-QTest2
10 pages
The Tiger King
100% (1)
The Tiger King
27 pages
Cementing Checklists
No ratings yet
Cementing Checklists
2 pages
Small Engine: A Step-By-Step Guide To Maintaining Your Small Engine
No ratings yet
Small Engine: A Step-By-Step Guide To Maintaining Your Small Engine
8 pages
BIO300 Proposal Example
No ratings yet
BIO300 Proposal Example
11 pages
A Seminar On Lower Respiratory Tract Infections: Click To Edit Master Subtitle Style
No ratings yet
A Seminar On Lower Respiratory Tract Infections: Click To Edit Master Subtitle Style
44 pages
Ssessing The Effectiveness of Social Marketing: Jude Varcoe
No ratings yet
Ssessing The Effectiveness of Social Marketing: Jude Varcoe
13 pages
The Guide To Shirt Collars - and What Suits You
100% (1)
The Guide To Shirt Collars - and What Suits You
4 pages
Download full Inescapable Entrapments The Civil Military Decision Paths to Uruzgan and Helmand 1st Edition Mirjam Grandia Mantas ebook all chapters
100% (2)
Download full Inescapable Entrapments The Civil Military Decision Paths to Uruzgan and Helmand 1st Edition Mirjam Grandia Mantas ebook all chapters
50 pages
Module 2 - Elements of Dance Pt.1
No ratings yet
Module 2 - Elements of Dance Pt.1
13 pages
SFS V2.0 User Manual - 20231123
No ratings yet
SFS V2.0 User Manual - 20231123
25 pages
ANGLE - Canine Retraction and Anchorage Loss - Self-Ligating Versus Conventional Brackets in A Randomized Split-Mouth Study
No ratings yet
ANGLE - Canine Retraction and Anchorage Loss - Self-Ligating Versus Conventional Brackets in A Randomized Split-Mouth Study
7 pages
Iatf 16949
No ratings yet
Iatf 16949
2 pages
APFU Deputation Notification
No ratings yet
APFU Deputation Notification
7 pages
2017 HMI Catalog
No ratings yet
2017 HMI Catalog
12 pages
Gratitude Anthon
No ratings yet
Gratitude Anthon
2 pages
ANUUAL EXAM 2022-23 Gyan Bharti SC
No ratings yet
ANUUAL EXAM 2022-23 Gyan Bharti SC
286 pages
Research Proposal 2019 Updated
No ratings yet
Research Proposal 2019 Updated
8 pages
GIS - Geography Information System: Name of Presenter: Class: Date
No ratings yet
GIS - Geography Information System: Name of Presenter: Class: Date
18 pages
1st National Moot Court Competition - Winners
No ratings yet
1st National Moot Court Competition - Winners
21 pages

Module 2 Iris Data Set

Uploaded by

Module 2 Iris Data Set

Uploaded by

Iris Dataset

To explore the dataset, we can describe it statistically or visualize using charts.

Load the Iris Dataset

Explore the Structure of the dataset

## 'data.frame': 150 obs. of 5 variables:

The names of the columns

## [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"

If you want to take a glimpse at the first 4 lines of rows.

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

1 5.1 3.5 1.4 0.2 setosa

2 4.9 3.0 1.4 0.2 setosa

3 4.7 3.2 1.3 0.2 setosa

4 4.6 3.1 1.5 0.2 setosa

Optionally you may check also the last 6 records

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

145 6.7 3.3 5.7 2.5 virginica

146 6.7 3.0 5.2 2.3 virginica

147 6.3 2.5 5.0 1.9 virginica

148 6.5 3.0 5.2 2.0 virginica

149 6.2 3.4 5.4 2.3 virginica

150 5.9 3.0 5.1 1.8 virginica

Describe the Iris Dataset using Statistical tools

The descriptive statistics summary

## Sepal.Length Sepal.Width Petal.Length Petal.Width

Min: The minimum value.

setosa: This species occurs 50 times.

Visualize the Iris Dataset

Plot quantitative variables

<> #### Plot 2 quantitative variables

legend(x = "topleft", lty = c(4,6), text.font = 4,

Plotting a Factor variable

You might also like