Report MINI PROJECT
Report MINI PROJECT
Belagavi-590014, Karnataka
In partial fulfillment of the requirement for the award of the degree of
Bachelor of Engineering
In
Artificial Intelligence and Machine Learning
A PROJECT REPORT ON
SUBMITTED BY:
KAILASH BASWARAJ
GUDME(3GN22AI014)
REETU(3GN22AI039)
SYED SOHAIL(3GN22AI058 )
1
MAILOOR ROAD, BIDAR, KARNATAKA-585403
2
GURU NANAK DEV ENGINEERING
COLLEGE BIDAR-585401, KARNATAKA
This is to certify that the Project Report entitled “REAL-TIME Water Quality Monitoring”
is a bonafide work carried out by KAILASH BASWARAJ GUDME (3GN22AI014), REETU
(3GN22AI039), SYED SOHAIL (3GN22AI058) in practical fulfillment of the requirements for
the award of Bachelor of Engineering in 6th sem Artificial intelligence and machine
learning of Visvesvaraya Technological University, Belagavi during the year 2024-25. It is
certified that all the corrections/suggestions indicated for Internal Assessment have been
incorporated in the report deposited in the department library. The project has been approved
as it satisfies the academic requirements in respect of major project work prescribed for the
said degree.
3
ACKNOWLEDGEMENT
The project report on “REAL-TIME Water Quality Monitoring” is the outcome of guidance,
moral support and devotion bestowed onus throughout our work. Fort this we acknowledge and
express out profound sense of gratitude and thanks to everybody who have been a source of
inspiration during the project work.
First and foremost, we offer our sincere phrases of thanks with innate humility to our Dr.
Dhananjay M, Principal for their support and encouragement. We feel deeply indebted to our Dr.
Dayanand J, HOD AIML for right help provided from the time of inception till date. I would take
this opportunity to acknowledge our Professor Dr. Harish Joshi, AIML Dept. who not only stood
by us as a source of inspiration, but also dedicated his time to enable us to present the project on
time.
4
Table of Contents
S. No. Section Title Page No.
1. Abstract 5
2. Introduction 6
3. Problem Statement 7
4. Objective 8
5. Literature Survey 9
6. Methodology 10
7. Implementation 12
8. Output 14
9. Conclusion 17
11. References 19
5
ABSTRACT
Water is one of the most essential resources for sustaining life, and its quality directly affects human
health, agriculture, industry, and the overall ecosystem. With increasing levels of industrialization,
urbanization, and agricultural activities, water bodies are increasingly being polluted by harmful
substances, leading to severe ecological and health consequences. Traditional methods of water
quality monitoring involve manual sampling and laboratory-based analysis, which are not only time-
consuming but also limited in scope and frequency. This lag in real-time analysis hinders the timely
detection and response to water contamination, potentially putting communities and environments at
risk.
This project presents a smart, real-time water quality monitoring system powered by hybrid machine
learning models. By combining supervised learning (Random Forest) with unsupervised learning
(Isolation Forest), the system not only classifies the water as safe or unsafe based on known
conditions but also detects previously unseen anomalies that may indicate new or unexpected types
of pollution. The hybrid approach ensures greater reliability and accuracy in detecting both expected
and unexpected variations in water quality.
The project is implemented as a modern web application using cutting-edge front-end technologies
such as React, Vite, Tailwind CSS, and TypeScript. It offers an interactive dashboard that visualizes
real-time water quality parameters like pH, dissolved oxygen (DO), temperature, turbidity, and total
dissolved solids (TDS). Simulated sensor data is used to mimic real-time input, demonstrating the
project’s potential integration with actual IoT-based water sensors.
In addition to data visualization, the system highlights anomalies and provides insights into the
probable causes of water quality degradation. It allows stakeholders—such as environmental
agencies, water management authorities, and the general public—to gain timely insights and act
promptly.
By integrating real-time data processing with machine learning and dynamic visualization, this
project addresses the major limitations of existing water quality monitoring systems. It demonstrates
how digital transformation can be applied to environmental monitoring to make it smarter, faster,
and more scalable. The results show promise for future integration into larger IoT frameworks,
enabling robust, real-time environmental surveillance across wide geographical regions.
6
INTRODUCTION
The quality of water is a fundamental factor influencing the health of ecosystems, human
populations, and economic activities. As concerns over pollution and resource sustainability grow, it
becomes imperative to have systems that can provide accurate, timely, and continuous monitoring of
water quality. Traditionally, water quality monitoring is conducted through periodic manual
sampling followed by laboratory analysis. While effective in controlled environments, this method is
slow, costly, and lacks the ability to respond in real time to pollution events. Consequently, harmful
contaminants may go undetected for prolonged periods, endangering ecosystems and human health.
Technological advancements in the fields of the Internet of Things (IoT), data science, and machine
learning have paved the way for innovative approaches to environmental monitoring. Machine
learning models, especially hybrid models that combine supervised and unsupervised techniques,
offer the ability to not only predict known patterns but also detect novel, anomalous behaviors in
data streams. When integrated into a real-time framework, these technologies can drastically
improve the responsiveness and scalability of water monitoring systems.
This project proposes a real-time water quality monitoring system using hybrid machine learning
algorithms and a web-based visualization interface. The system is built using a combination of
Random Forest (for supervised classification) and Isolation Forest (for unsupervised anomaly
detection). This dual-approach allows the system to not only classify water as “safe” or “unsafe”
based on labelled training data, but also flag unusual readings that may signify emerging threats or
undetected contaminants.
The primary goal of this project is to provide a foundation for a smarter, more efficient water quality
monitoring system that leverages modern machine learning techniques and intuitive data
visualization. It aims to enable environmental scientists, municipal authorities, and even common
users to make data-driven decisions quickly and accurately. The real-time anomaly detection
capability adds a critical layer of insight, potentially enabling early detection of harmful
contamination events and reducing the time to intervention.
Ultimately, this system represents a scalable, flexible solution that could be adapted to monitor
various environmental parameters beyond water, supporting global efforts toward a cleaner and safer
planet.
7
1. PROBLEM STATEMENT:
1.1 Background
Water pollution is a growing global issue due to urbanization, industrial waste, and agricultural
runoff. Contaminated water poses serious health risks and environmental damage. Traditional water
quality monitoring relies on manual sampling and lab tests, which are time-consuming, labor-
intensive, and lack real-time responsiveness. As a result, sudden contamination events may go
undetected, causing harm before action can be taken
1.2 Challenges
There is a lack of a scalable and intelligent water quality monitoring system that can:
Automatically assess water safety.
Detect unusual patterns or contamination events in real-time.
Provide clear, actionable insights through an interactive dashboard
We propose a real-time, web-based monitoring system using hybrid machine learning models:
Random Forest for classifying water quality based on labeled data.
Isolation Forest for detecting anomalies in unlabeled data.
Simulated Real-Time Data to represent live sensor input.
React-based Dashboard for visualizing water metrics and alerts.
This system offers faster, smarter, and more scalable water quality monitoring, capable of early
detection and continuous tracking across regions.
8
2. OBJECTIVE:
The main goal of this project is to develop a real-time water quality monitoring system that leverages
machine learning to assess and visualize water safety effectively. The system aims to provide timely
and accurate insights using a hybrid model approach, ensuring both known and unknown water
quality issues can be detected and addressed proactively
9
3. LITERATURE SURVEY:
3.1 Introduction
Recent advancements in machine learning and IoT have opened new possibilities for automating
water quality monitoring. Traditional manual methods are often slow and limited, while modern
systems aim to detect issues in real time using smart algorithms. This section highlights key research
areas relevant to our project.
Supervised learning uses labeled datasets to train models that can classify water quality. Among
these, Random Forest is widely used due to its high accuracy and ability to handle multiple features
like pH, turbidity, and temperature. It’s been applied in various studies to distinguish between safe
and unsafe water based on historical data. However, supervised models can only detect patterns
they've been trained on and may miss new types of contamination.
Unsupervised models like Isolation Forest are designed to detect unusual data points without
needing labeled inputs. This is especially useful in water monitoring, where sudden contamination
events might not follow known patterns. Isolation Forest works by identifying outliers in the dataset
and is efficient even with large or noisy data. It's often used to complement supervised systems by
catching unexpected anomalies.
To overcome the limitations of using a single approach, hybrid models combine both supervised and
unsupervised learning. For instance, a Random Forest can classify known water conditions, while an
Isolation Forest runs in parallel to catch anomalies. This blend improves reliability and ensures the
system can respond to both expected and unforeseen events.
With the rise of IoT, real-time monitoring systems are becoming more common. These use sensors
to continuously collect data, which is then analyzed using machine learning models. Real-time
systems allow faster detection of pollution and better resource management. However, they still face
challenges like sensor reliability and data quality, which researchers are actively working to
improve.
3.6 Summary
Machine learning improves water quality monitoring by enabling accurate classification (using
supervised models like Random Forest) and detecting unknown anomalies (using unsupervised
models like Isolation Forest). Combining both in hybrid systems offers better reliability. Real-time
monitoring with IoT further enhances responsiveness and efficiency.
10
4. METHODOLOGY:
The development of this real-time water quality monitoring system followed a structured
methodology combining data-driven machine learning techniques with a simulated real-time
environment and a user-friendly interface. This section explains each stage of the process in detail.
The project relies on a public dataset containing key water quality parameters such as pH,
turbidity, temperature, dissolved oxygen (DO), and total dissolved solids (TDS). These attributes
are commonly used to evaluate the safety of water for consumption and environmental use. Each
data sample is labeled as either “safe” or “unsafe,” making it suitable for supervised classification
tasks.
Raw data often contains inconsistencies such as missing values, outliers, or incorrect formats. Before
training the models, the dataset underwent preprocessing steps, including:
Handling missing values using imputation techniques.
Scaling numerical features to normalize the data range.
Encoding categorical labels (e.g., "safe" or "unsafe") into numerical form for model
compatibility.
This ensures W that the models receive clean, standardized input and can perform accurately.
To evaluate model performance reliably, the dataset was split into training and testing sets
(commonly 80% training and 20% testing). This allows the system to learn from one portion of the
data and then be evaluated on unseen data, simulating real-world usage.
11
4.4 Supervised Learning – Random Forest
A Random Forest Classifier was chosen for supervised learning due to its robustness and
interpretability. It operates by creating an ensemble of decision trees and making predictions based
on majority voting. The model was trained to classify whether water samples are safe or unsafe,
based on their physicochemical properties.
Key advantages of Random Forest:
Handles both linear and nonlinear data.
Provides high accuracy with reduced risk of overfitting.
Offers feature importance metrics to understand key influencers in water quality.
To detect unexpected or subtle changes in water quality, the system uses Isolation Forest, an
unsupervised anomaly detection algorithm. Unlike classification, this model identifies data points
that differ significantly from the norm, flagging them as potential anomalies.
Isolation Forest was chosen because:
It’s efficient with high-dimensional data.
It doesn’t require labelled training data.
It’s suitable for detecting unknown or rare events, such as pollution spikes.
4.7 Visualization
The project includes an interactive web interface built using React, Tailwind CSS, and TypeScript.
Key features include:
Real-time charts for parameters like pH and turbidity.
Status indicators for water safety.
Anomaly alerts when unusual data is detected.
Dynamic updates as new data is streamed.
This visualization helps users easily interpret the model's outputs and monitor water quality over
time.
12
5. IMPLEMENTATION:
The implementation phase brought together data science, machine learning, and web development to
create a working prototype of a real-time water quality monitoring system. This section outlines the
key tools, datasets, workflow, and integration steps used to bring the project to life.
Several modern tools and libraries were used across different components of the project:
Python – The core language for data processing, machine learning model development, and
simulations.
Scikit-learn – For building and training the Random Forest and Isolation Forest models.
Pandas & NumPy – For data manipulation and numerical computations.
Matplotlib & Seaborn – For generating visualizations during the analysis and debugging
phases.
[Link] – A JavaScript library used for building the interactive user interface.
Tailwind CSS – A utility-first CSS framework used to style the front end cleanly and
efficiently.
TypeScript – A typed version of JavaScript used in the frontend for safer and more
manageable code.
[Link] (Optional backend layer) – Can be added to serve data or communicate with
physical devices in future extensions.
The system uses a pre-existing public dataset containing water quality parameters:
Attributes: pH, Temperature, Turbidity, Dissolved Oxygen (DO), Total Dissolved Solids
(TDS), etc.
Label: Each data point is marked as “safe” or “unsafe” for consumption.
The dataset reflects realistic environmental conditions, making it suitable for training both
classification and anomaly detection models.
13
5.3 Workflow Summary
1. Data Preprocessing
o Cleaning and standardizing the dataset.
o Handling missing values, scaling, and label encoding.
2. Model Training
o Random Forest was trained using the labeled data to classify water samples.
o Isolation Forest was trained on the same dataset (without labels) to detect anomalies.
3. Model Evaluation
o The models were validated using the test data.
o Metrics such as accuracy, precision, recall, and F1-score were used to assess
performance.
5. Frontend Development
o The React-based interface was designed to display real-time water quality
information.
o It includes charts, status indicators, and alerts.
6. Integration
o The ML models run in the backend (or offline) and feed data to the frontend.
o The system architecture is modular, allowing easy replacement of simulated
input with actual IoT sensor data.
14
7. Output
The output of this project is a fully functional, real-time simulation-based water quality monitoring
system that leverages machine learning to evaluate water safety and detect anomalies. The system
combines the predictive capabilities of a Random Forest classifier with the anomaly detection
strengths of Isolation Forest, all integrated within a dynamic, web-based user interface.
16
7.4 Output Summary Table
Component Description
Random Forest Predicts water status (Safe / Unsafe) with high accuracy
Isolation Forest Flags anomalies even in unlabeled or unknown patterns
Real-Time Simulation Streams one data point at a time to mimic live sensor input
React Dashboard Visual interface with charts, status badges, and anomaly alerts
17
8. CONCLUSION:
This project successfully demonstrates the integration of machine learning techniques into a practical
system for real-time water quality monitoring. Water pollution and contamination continue to pose
serious risks to human health and the environment, and traditional methods of water testing are often
delayed, manual, and resource-intensive. In response, this project proposes and implements an
intelligent solution capable of performing real-time analysis, detecting anomalies, and presenting
results through an interactive web interface.
The use of Random Forest, a supervised learning algorithm, enabled accurate classification of
water quality based on historical labeled data. Meanwhile, the inclusion of Isolation Forest, an
unsupervised algorithm, provided the system with the ability to detect unexpected or abnormal
readings, ensuring robustness even in the face of previously unseen data patterns. This dual-model
approach offers a hybrid mechanism that enhances both the reliability and flexibility of the
monitoring system.
To simulate real-world application, a real-time data stream was emulated, and the results were
visualized on a web dashboard. This feature demonstrates how users can monitor water quality
remotely, with instant alerts for unsafe conditions. This is particularly beneficial in contexts like
municipal water management, rural areas, or industries dependent on water quality regulation.
The system also excels in terms of adaptability. It is modular and scalable, meaning it can be
enhanced further by connecting to IoT sensors, expanded across geographical regions, or upgraded
with cloud infrastructure. Additionally, the use of open-source tools ensures that the solution is both
cost-effective and customizable.
In conclusion, the project not only provides a working prototype of an intelligent water monitoring
system but also opens avenues for broader applications in smart cities, environmental monitoring,
and public health. By harnessing the power of data science and automation, it paves the way for a
future where water quality can be monitored efficiently, accurately, and continuously, with minimal
human intervention.
18
9. FUTURE SCOPE:
9.1 Integration with IoT Sensors
Currently, the system uses simulated data to mimic real-time input. In the future, it can be integrated
with actual IoT-based water quality sensors that measure pH, turbidity, temperature, dissolved
oxygen, and TDS in real-time. This would enable continuous, autonomous monitoring of water
sources such as rivers, lakes, or supply pipelines.
19
10. REFERENCES:
1. Breiman, L. (2001).
Random Forests. Machine Learning, 45(1), 5–32.
— This seminal paper introduces the Random Forest algorithm used in the supervised
learning portion of the project.
2. Liu, F. T., Ting, K. M., & Zhou, Z.-H. (2008).
Isolation Forest. In 2008 Eighth IEEE International Conference on Data Mining.
— This paper outlines the Isolation Forest algorithm, a key method for anomaly detection
in the system.
3. Scikit-learn Documentation
[Link]
— Official documentation for the machine learning library used for model
development, training, and evaluation.
4. Pandas Documentation
[Link]
— Used extensively for data cleaning, transformation, and preprocessing operations.
5. Matplotlib & Seaborn
[Link] and [Link]
— Used for data visualization, helping in exploratory analysis and plotting parameter trends.
6. ReactJS Official Documentation
[Link]
— React was used to build the dynamic and responsive frontend dashboard for visualizing
real-time water quality status.
7. Tailwind CSS Documentation
[Link]
— Tailwind CSS was used for styling the user interface with a clean and modular design.
20