0% found this document useful (0 votes)

1K views16 pages

Machine Learning for AQI Prediction

Uploaded by

walkevarad66

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1K views16 pages

Machine Learning for AQI Prediction

Uploaded by

walkevarad66

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Air Quality

Prediction
using Machine
Learning

Guruprasad Velikadu Krishnamoorthy

Air Pollution
Air Pollution is a very common environmental
Health Hazard that refers to the release of
pollutants into the air that are dangerous to human
health and to the entire planet.

Some of the common causes of Air Pollution are:

• Emissions from Factories

• Vehicle exhausts

• Agricultural processes

• Natural causes such as Wildfires, Volcanoes, etc.

Air Quality Index

Air Quality Index (AQI) is an index that measures and
quantifies the extent of Air Pollution in an area. Its value
ranges from 0 to 500, with 0 to 50 considered safe while
300- 500 are considered extremely dangerous.
The Problem
Statement
To be able to predict the Air Quality Index in real-time based on
the pollutant concentration and environmental factors.

How Machine Learning Can

Be Used?
•Regression or Classification models can be built to predict the Air
Quality index based on the Pollutant Concentration levels.

•Effects of interactions between the variables and linear and non-

linear relationship patterns between the variables can be studied.

•Meterological parameters such as Temperature, Wind, Pressure,

etc can be used to simulate the models to predict the pollutant
concentration and AQI levels.

•Data can be processed in real-time to alert the authorities and

issue warnings to the public in extreme cases to avoid a certain PAGE 3
area due to poor air quality.
Steps Involved in this Project

The data is then explored to

In the first step, the identify the patterns and
source of the data will
be identified 01 Data Collection 02 Data Exploration relationships between the
variables.

In this step, the dataset

In this step data cleansing
will be split into train and
such as duplicate checks, Data Cleansing Model Building in R
null handling, and feature 03 04 test sets and the models
selections are done. The & Preparation And Python will be trained using R and
Python.
datasets are also merged.

The Model results

along with the chosen Analyzing the Model Results The final results will then
metrics will be 05 06 be published.
analyzed and the best outcome
model will be chosen.
Ozone Concentration Dataset

Carbon Monoxide Concentration

Source of the Data Dataset

NO2 and SO2 Concentration Datasets

The Data files are taken from the Environmental Particulate Matter PM2.5 and PM10
Protection Agency (EPA) website. Various Datasets Concentrations
each containing the concentration of pollutants,
gases, and meteorological parameters for the year
2022 are used in the study. Temperature and Pressure Datasets

Wind Speed and Humidity Datasets

Data Exploration

The Histogram of AQI indicates most of the AQI is situated The Scatter plot of Ozone Concentration versus AQI on
between the 20 and 50 range. There is a long tail indicating the Ozone dataset suggests a strong positive
the presence of outliers above AQI of 100. relationship between the two variables.
Data Exploration
(continued)

The Bar plot indicates that the NO2 Median

values of most of the U.S. states such as The Bar Plot of County analysis in California
Georgia, and Arizona are above the national suggests that San Bernardino is the most
average. While Kansas, Iowa, and Maine are polluted followed by Los Angeles County.
the lowest.
Data Exploration
(continued)

The Tree Map built on the Dataset suggests that California Facet Scatter plot of AQI vs NO2 levels based on the Ozone
and Texas had the greatest number of observations. This is levels in Southwest states suggests that the relationship is
similar to the observations in the individual datasets. similar throughout the region.
Data Format Conversion by handling the factors
and strings.

The Process Renaming columns, Null Handling, and Duplicate

Checks

of Data
Preparation Merging the datasets based on common
fields such as the Date of observation, City,
county, and State.

Feature selection using Correlation

Matrix and Feature Reductions using
Principal Component Analysis.
Model Building in R
•Linear Regression Model was built in R by merging all
datasets.

•The features were scaled using standard scaling. The

Dataset was split into Test and Train sets in the ratio
30:70.

•The Model results indicate that R-squared and Adjusted

R-squared are very close suggesting that the model is a
good representation of the data

•Root Mean Square Error (RMSE) which measures the

average difference between the values predicted by the
model versus the actual was used to measure the
Performance metrics of the model. The value of RMSE
was around 4.7 for both train and test datasets.
Model Building in Python
 Multiple Regression models were built in Python using algorithms such
as
 Linear Regression
 Decision Tree Regression
 Random Forest Regression
 K Neighbors Regression
 Gradient Boost Regression

 The Merged dataset from all sources was split into Train and test sets and
the Trained model was used to make Predictions on the Test set.

 Feature selection and Feature reduction methods such as Pearson’s

Feature reduction and Principal Component Analysis methods were
applied to the datasets before the models were built.

 Metrics such as Mean Squared Error (MSE), Root Mean Squared Error
(RMSE), Mean Absolute Error (MAE), and R-squared (R2) were used to
evaluate the model performance.
Model Building in Python (Contd)

 The Regression model results were better

with the Random Forest Regressor and
Gradient Boost Algorithms regarding R2
statistic.

 These algorithms also had lower Mean

Squared Error and Higher Accuracy than
the rest.

 The models returned similar results using

the PCA Feature reduction method
Model Outcome and Conclusion

 The model results from R and Python suggest

that NO2 and CO are better predictors for AQI in
comparison with other pollutants.

 Though ozone was not a strong predictor for

AQI as per the model, the effects of ozone
cannot be undermined as it can have
detrimental effects on Air quality, so it was
included during the model building.

 Additional models such as Clustering can be

built to identify the clusters of regions with
higher AQI. Also clustering based on the period
of the year can be built to identify patterns in
the period of the year when the air in those
regions is most polluted.
Ethical Concerns
• Though gas-powered vehicle emissions and
industrial smoke have played a significant role
in air pollution, they cannot be entirely
replaced by sustainable solutions, as they can
lead to many job losses affecting many
families employed by the manufacturing
industries. Care must be taken while
publishing the results, keeping in mind the
impact it can have on families.

• While determining the acceptable levels of

greenhouse gases and pollutants for humans,
careful assessment should be made while
determining the values, as the acceptable
levels for humans may cause significant
damage to other ecosystems and species.

• The acceptable levels should also be carefully

assessed with international considerations in
mind, as the gas emissions and pollutants
from the developed countries are no longer a
local issue. These impacts are already seen on
the other side of the world, with extreme
floods and drought conditions that were not
seen in the past.
• https://www.nps.gov/subjects/air/sources.htm

• https://www.airnow.gov/education/students/what-is-the-aqi

References
/

• https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8586941/

• https://www.amacad.org/publication/ethical-dimensions-glo
bal-environmental-issues
Thank you

Air Quality Prediction Using Machine Learning Algorithms
100% (1)
Air Quality Prediction Using Machine Learning Algorithms
4 pages
Student Info System Design Report
No ratings yet
Student Info System Design Report
1 page
Agriculture Management System-3
No ratings yet
Agriculture Management System-3
22 pages
Weather Forecasting GUI in Python
No ratings yet
Weather Forecasting GUI in Python
7 pages
Model Question Paper II - 21cs642 - 6 Sem (2021 Scheme)
No ratings yet
Model Question Paper II - 21cs642 - 6 Sem (2021 Scheme)
2 pages
Email Classification: Roll No-41463 (LP-3)
No ratings yet
Email Classification: Roll No-41463 (LP-3)
5 pages
Python Weather Forecasting Project
100% (1)
Python Weather Forecasting Project
6 pages
Fake News Documentation Andhra University Project
No ratings yet
Fake News Documentation Andhra University Project
87 pages
IoT Design Case Studies Overview
100% (1)
IoT Design Case Studies Overview
30 pages
Project Synopsis Mit 2021
No ratings yet
Project Synopsis Mit 2021
3 pages
Color Model
No ratings yet
Color Model
10 pages
DWDM R20 Lab Manual 3-1 Cse 2022-2023 Sem 1
No ratings yet
DWDM R20 Lab Manual 3-1 Cse 2022-2023 Sem 1
151 pages
IoT-Based Air Pollution Monitoring Report
No ratings yet
IoT-Based Air Pollution Monitoring Report
76 pages
DDCET 2024-25 Exam Syllabus Overview
No ratings yet
DDCET 2024-25 Exam Syllabus Overview
4 pages
B.E Project Domain and Titles 2021-22 (Last Year)
No ratings yet
B.E Project Domain and Titles 2021-22 (Last Year)
8 pages
Urban Mobility: Traffic Prediction & Routes
100% (1)
Urban Mobility: Traffic Prediction & Routes
4 pages
Air Canvas
No ratings yet
Air Canvas
15 pages
Data Mining TOC
No ratings yet
Data Mining TOC
3 pages
Moodle Cloud Architecture Analysis
No ratings yet
Moodle Cloud Architecture Analysis
37 pages
Cse Vi Computer Graphics and Visualization 10CS65 Notes PDF
100% (1)
Cse Vi Computer Graphics and Visualization 10CS65 Notes PDF
97 pages
Viewing: 1. Classical and Computer Viewing
No ratings yet
Viewing: 1. Classical and Computer Viewing
5 pages
CSE Flat Notes: Formal Languages
No ratings yet
CSE Flat Notes: Formal Languages
117 pages
Zoo Database Project Report
No ratings yet
Zoo Database Project Report
6 pages
05b.BDA (18CS72) Module-5 Text Mining
No ratings yet
05b.BDA (18CS72) Module-5 Text Mining
23 pages
Minor Project 1
No ratings yet
Minor Project 1
37 pages
Weather App Final Report
No ratings yet
Weather App Final Report
19 pages
VTU Notes on Automata Theory
No ratings yet
VTU Notes on Automata Theory
4 pages
Dip Module - 5 Color Image Processing
No ratings yet
Dip Module - 5 Color Image Processing
74 pages
Module 4 Nosql
No ratings yet
Module 4 Nosql
8 pages
MiniProject Synopsis Format
No ratings yet
MiniProject Synopsis Format
3 pages
Diabetes Prediction Internship Report
No ratings yet
Diabetes Prediction Internship Report
15 pages
Collision Resolution
No ratings yet
Collision Resolution
19 pages
AI Notes Part-3
No ratings yet
AI Notes Part-3
29 pages
Information Retrieval
No ratings yet
Information Retrieval
31 pages
Project Synopsis Format
No ratings yet
Project Synopsis Format
3 pages
@vtucode - in 21CS63 Question Bank 2021 Scheme
No ratings yet
@vtucode - in 21CS63 Question Bank 2021 Scheme
6 pages
WFF and Properties
No ratings yet
WFF and Properties
10 pages
BOE310 Digital Electronics Syllabus
No ratings yet
BOE310 Digital Electronics Syllabus
25 pages
DBMS Normalization Guide
No ratings yet
DBMS Normalization Guide
8 pages
Anna University IT Syllabus 2021
No ratings yet
Anna University IT Syllabus 2021
415 pages
Water Quality Monitoring System Using IoT and Machine Learning
No ratings yet
Water Quality Monitoring System Using IoT and Machine Learning
5 pages
Fab - Care Report
No ratings yet
Fab - Care Report
49 pages
CANDIDATE-ELIMINATION Learning Algorithm
0% (1)
CANDIDATE-ELIMINATION Learning Algorithm
3 pages
FSD Module 1 NOTES
No ratings yet
FSD Module 1 NOTES
20 pages
Sentiment Analysis of Movie Reviews
No ratings yet
Sentiment Analysis of Movie Reviews
5 pages
AECS Lab Viva Q&A
No ratings yet
AECS Lab Viva Q&A
4 pages
Data Science Lab Guide
No ratings yet
Data Science Lab Guide
98 pages
Blood Group Detection Using Fingerprint
No ratings yet
Blood Group Detection Using Fingerprint
14 pages
OpenGL Card Game Project Report
No ratings yet
OpenGL Card Game Project Report
6 pages
Using Predicate Logic: Representation of Simple Facts in Logic
No ratings yet
Using Predicate Logic: Representation of Simple Facts in Logic
10 pages
IoT-Based Smart Garbage Monitoring System
100% (1)
IoT-Based Smart Garbage Monitoring System
21 pages
Internship Report Core Java
No ratings yet
Internship Report Core Java
46 pages
Sustainability Ideathon Expo Pitch
No ratings yet
Sustainability Ideathon Expo Pitch
8 pages
Internship - Report Weather App
No ratings yet
Internship - Report Weather App
35 pages
BCS703 Module II
No ratings yet
BCS703 Module II
59 pages
HTML & CSS Web Development Guide
100% (1)
HTML & CSS Web Development Guide
87 pages
Model Question Paper - Big Data - 2024-25 - Kca022
No ratings yet
Model Question Paper - Big Data - 2024-25 - Kca022
3 pages
BDA Lab Manual - BAD601-Final One - 7-11
No ratings yet
BDA Lab Manual - BAD601-Final One - 7-11
25 pages
Project Report On DBMS Project
No ratings yet
Project Report On DBMS Project
22 pages
Visual Analytics Presentation
No ratings yet
Visual Analytics Presentation
22 pages
Sodapdf
No ratings yet
Sodapdf
1 page
AI & ML B.Tech Handbook 2021-25
No ratings yet
AI & ML B.Tech Handbook 2021-25
301 pages
AIML Chatbot Creation Guide
No ratings yet
AIML Chatbot Creation Guide
1 page
Evalution Sheet ACN
No ratings yet
Evalution Sheet ACN
4 pages
Corrency Converter Using AWT: Padmabhooshan Vasantraodada Patil Institute of Technology Budhagaon Sangali Certificate
No ratings yet
Corrency Converter Using AWT: Padmabhooshan Vasantraodada Patil Institute of Technology Budhagaon Sangali Certificate
19 pages
Java AWT & Swing Login Form Guide
No ratings yet
Java AWT & Swing Login Form Guide
21 pages
Osy Part B
No ratings yet
Osy Part B
9 pages
Network Security & Firewall Report
No ratings yet
Network Security & Firewall Report
9 pages
ACN Project Part-A
No ratings yet
ACN Project Part-A
8 pages
Evalution Sheet OSY-1
No ratings yet
Evalution Sheet OSY-1
4 pages
Handout 1 Unlocked
No ratings yet
Handout 1 Unlocked
14 pages
The Role of Digital Marketing, Influencer Marketing and Electronic Word of Mouth On Online Purchase Decisions For Consumers
No ratings yet
The Role of Digital Marketing, Influencer Marketing and Electronic Word of Mouth On Online Purchase Decisions For Consumers
8 pages
George Grekousis - Spatial Analysis Methods and Practice - Describe - Explore - Explain Through GIS-Cambridge University Press (2020) PDF
100% (4)
George Grekousis - Spatial Analysis Methods and Practice - Describe - Explore - Explain Through GIS-Cambridge University Press (2020) PDF
535 pages
Corporate Finance 9th Edition Ross Solutions Manual - Complete Set of Chapters Available For One-Click Download
100% (19)
Corporate Finance 9th Edition Ross Solutions Manual - Complete Set of Chapters Available For One-Click Download
36 pages
Datcb565 Competency 3 Reflection
No ratings yet
Datcb565 Competency 3 Reflection
13 pages
FDSA Lab Manual 1
No ratings yet
FDSA Lab Manual 1
34 pages
Chapter 6 Multiple Regression Analysis Further Issues
100% (3)
Chapter 6 Multiple Regression Analysis Further Issues
9 pages
Vietnam Export
No ratings yet
Vietnam Export
10 pages
2014 Modified Taylor Complex Figure Norma
No ratings yet
2014 Modified Taylor Complex Figure Norma
14 pages
Prems Mann
0% (1)
Prems Mann
17 pages
Survey Data Analysis Results
No ratings yet
Survey Data Analysis Results
20 pages
Machine Learning Data & Metrics Guide
No ratings yet
Machine Learning Data & Metrics Guide
12 pages
Artifact 1 - Copper Chloride Lab Report
No ratings yet
Artifact 1 - Copper Chloride Lab Report
6 pages
Accuracy of Claims-Based Risk Scoring Models
No ratings yet
Accuracy of Claims-Based Risk Scoring Models
90 pages
IJES 14 3 174 16 896 Giridharan K TX
No ratings yet
IJES 14 3 174 16 896 Giridharan K TX
13 pages
CSR Impact on Nigerian Construction Firms
No ratings yet
CSR Impact on Nigerian Construction Firms
7 pages
Testing McCrometer and Krohne Magmeters
No ratings yet
Testing McCrometer and Krohne Magmeters
36 pages
Spanish Norms for Visual Perception Tests
No ratings yet
Spanish Norms for Visual Perception Tests
16 pages
Managerial Economics Baye Solutions (3-5)
77% (22)
Managerial Economics Baye Solutions (3-5)
36 pages
SSRN Id4760126
No ratings yet
SSRN Id4760126
23 pages
ML
No ratings yet
ML
21 pages
Exploring The Impact of Devops and Agile Practices From The Perspective of Source Code Analysis
No ratings yet
Exploring The Impact of Devops and Agile Practices From The Perspective of Source Code Analysis
12 pages
Alexithymia Treatment Insights
No ratings yet
Alexithymia Treatment Insights
14 pages
Descriptive Analytics I: Nature of Data,: Statistical Modeling, and Visualization
No ratings yet
Descriptive Analytics I: Nature of Data,: Statistical Modeling, and Visualization
76 pages
Stress and Academic Relationships
No ratings yet
Stress and Academic Relationships
14 pages
Quiz 7 Solutions Review
No ratings yet
Quiz 7 Solutions Review
11 pages
(Aide300) (Group 5) Final Report
No ratings yet
(Aide300) (Group 5) Final Report
36 pages
Inflation Vs Interest Rate
No ratings yet
Inflation Vs Interest Rate
12 pages
Advanced Regression Techniques
No ratings yet
Advanced Regression Techniques
20 pages
Mba 1 Sem Statistics and Decision Science 15mng101 2020
No ratings yet
Mba 1 Sem Statistics and Decision Science 15mng101 2020
2 pages

Machine Learning for AQI Prediction

Uploaded by

Machine Learning for AQI Prediction

Uploaded by

Air Quality

Guruprasad Velikadu Krishnamoorthy

Some of the common causes of Air Pollution are:

• Emissions from Factories

• Natural causes such as Wildfires, Volcanoes, etc.

Air Quality Index

How Machine Learning Can

•Effects of interactions between the variables and linear and non-

•Meterological parameters such as Temperature, Wind, Pressure,

•Data can be processed in real-time to alert the authorities and

The data is then explored to

In this step, the dataset

The Model results

Carbon Monoxide Concentration

Source of the Data Dataset

NO2 and SO2 Concentration Datasets

Wind Speed and Humidity Datasets

The Bar plot indicates that the NO2 Median

The Process Renaming columns, Null Handling, and Duplicate

Feature selection using Correlation

•The features were scaled using standard scaling. The

•The Model results indicate that R-squared and Adjusted

•Root Mean Square Error (RMSE) which measures the

 Feature selection and Feature reduction methods such as Pearson’s

 The Regression model results were better

 These algorithms also had lower Mean

 The models returned similar results using

 The model results from R and Python suggest

 Though ozone was not a strong predictor for

 Additional models such as Clustering can be

• While determining the acceptable levels of

• The acceptable levels should also be carefully

You might also like