0% found this document useful (0 votes)
1K views16 pages

Machine Learning for AQI Prediction

Uploaded by

walkevarad66
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views16 pages

Machine Learning for AQI Prediction

Uploaded by

walkevarad66
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Air Quality

Prediction
using Machine
Learning

Guruprasad Velikadu Krishnamoorthy


Air Pollution
Air Pollution is a very common environmental
Health Hazard that refers to the release of
pollutants into the air that are dangerous to human
health and to the entire planet.

Some of the common causes of Air Pollution are:

• Emissions from Factories

• Vehicle exhausts

• Agricultural processes

• Natural causes such as Wildfires, Volcanoes, etc.

Air Quality Index


Air Quality Index (AQI) is an index that measures and
quantifies the extent of Air Pollution in an area. Its value
ranges from 0 to 500, with 0 to 50 considered safe while
300- 500 are considered extremely dangerous.
The Problem
Statement
To be able to predict the Air Quality Index in real-time based on
the pollutant concentration and environmental factors.

How Machine Learning Can


Be Used?
•Regression or Classification models can be built to predict the Air
Quality index based on the Pollutant Concentration levels.

•Effects of interactions between the variables and linear and non-


linear relationship patterns between the variables can be studied.

•Meterological parameters such as Temperature, Wind, Pressure,


etc can be used to simulate the models to predict the pollutant
concentration and AQI levels.

•Data can be processed in real-time to alert the authorities and


issue warnings to the public in extreme cases to avoid a certain PAGE 3
area due to poor air quality.
Steps Involved in this Project

The data is then explored to


In the first step, the identify the patterns and
source of the data will
be identified 01 Data Collection 02 Data Exploration relationships between the
variables.

In this step, the dataset


In this step data cleansing
will be split into train and
such as duplicate checks, Data Cleansing Model Building in R
null handling, and feature 03 04 test sets and the models
selections are done. The & Preparation And Python will be trained using R and
Python.
datasets are also merged.

The Model results


along with the chosen Analyzing the Model Results The final results will then
metrics will be 05 06 be published.
analyzed and the best outcome
model will be chosen.
Ozone Concentration Dataset

Carbon Monoxide Concentration

Source of the Data Dataset

NO2 and SO2 Concentration Datasets

The Data files are taken from the Environmental Particulate Matter PM2.5 and PM10
Protection Agency (EPA) website. Various Datasets Concentrations
each containing the concentration of pollutants,
gases, and meteorological parameters for the year
2022 are used in the study. Temperature and Pressure Datasets

Wind Speed and Humidity Datasets


Data Exploration

The Histogram of AQI indicates most of the AQI is situated The Scatter plot of Ozone Concentration versus AQI on
between the 20 and 50 range. There is a long tail indicating the Ozone dataset suggests a strong positive
the presence of outliers above AQI of 100. relationship between the two variables.
Data Exploration
(continued)

The Bar plot indicates that the NO2 Median


values of most of the U.S. states such as The Bar Plot of County analysis in California
Georgia, and Arizona are above the national suggests that San Bernardino is the most
average. While Kansas, Iowa, and Maine are polluted followed by Los Angeles County.
the lowest.
Data Exploration
(continued)

The Tree Map built on the Dataset suggests that California Facet Scatter plot of AQI vs NO2 levels based on the Ozone
and Texas had the greatest number of observations. This is levels in Southwest states suggests that the relationship is
similar to the observations in the individual datasets. similar throughout the region.
Data Format Conversion by handling the factors
and strings.

The Process Renaming columns, Null Handling, and Duplicate


Checks

of Data
Preparation Merging the datasets based on common
fields such as the Date of observation, City,
county, and State.

Feature selection using Correlation


Matrix and Feature Reductions using
Principal Component Analysis.
Model Building in R
•Linear Regression Model was built in R by merging all
datasets.

•The features were scaled using standard scaling. The


Dataset was split into Test and Train sets in the ratio
30:70.

•The Model results indicate that R-squared and Adjusted


R-squared are very close suggesting that the model is a
good representation of the data

•Root Mean Square Error (RMSE) which measures the


average difference between the values predicted by the
model versus the actual was used to measure the
Performance metrics of the model. The value of RMSE
was around 4.7 for both train and test datasets.
Model Building in Python
 Multiple Regression models were built in Python using algorithms such
as
 Linear Regression
 Decision Tree Regression
 Random Forest Regression
 K Neighbors Regression
 Gradient Boost Regression

 The Merged dataset from all sources was split into Train and test sets and
the Trained model was used to make Predictions on the Test set.

 Feature selection and Feature reduction methods such as Pearson’s


Feature reduction and Principal Component Analysis methods were
applied to the datasets before the models were built.

 Metrics such as Mean Squared Error (MSE), Root Mean Squared Error
(RMSE), Mean Absolute Error (MAE), and R-squared (R2) were used to
evaluate the model performance.
Model Building in Python (Contd)

 The Regression model results were better


with the Random Forest Regressor and
Gradient Boost Algorithms regarding R2
statistic.

 These algorithms also had lower Mean


Squared Error and Higher Accuracy than
the rest.

 The models returned similar results using


the PCA Feature reduction method
Model Outcome and Conclusion

 The model results from R and Python suggest


that NO2 and CO are better predictors for AQI in
comparison with other pollutants.

 Though ozone was not a strong predictor for


AQI as per the model, the effects of ozone
cannot be undermined as it can have
detrimental effects on Air quality, so it was
included during the model building.

 Additional models such as Clustering can be


built to identify the clusters of regions with
higher AQI. Also clustering based on the period
of the year can be built to identify patterns in
the period of the year when the air in those
regions is most polluted.
Ethical Concerns
• Though gas-powered vehicle emissions and
industrial smoke have played a significant role
in air pollution, they cannot be entirely
replaced by sustainable solutions, as they can
lead to many job losses affecting many
families employed by the manufacturing
industries. Care must be taken while
publishing the results, keeping in mind the
impact it can have on families.

• While determining the acceptable levels of


greenhouse gases and pollutants for humans,
careful assessment should be made while
determining the values, as the acceptable
levels for humans may cause significant
damage to other ecosystems and species.

• The acceptable levels should also be carefully


assessed with international considerations in
mind, as the gas emissions and pollutants
from the developed countries are no longer a
local issue. These impacts are already seen on
the other side of the world, with extreme
floods and drought conditions that were not
seen in the past.
• https://www.nps.gov/subjects/air/sources.htm

• https://www.airnow.gov/education/students/what-is-the-aqi

References
/

• https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8586941/

• https://www.amacad.org/publication/ethical-dimensions-glo
bal-environmental-issues
Thank you

You might also like