A Project Report
on
STOCK MARKET PREDICTION
MODEL
carried out as part of the Minor Project Submitted
by
Anshika Gupta
229302283
in partial fulfilment for the award of the degree of
Bachelor of Technology
in
CSE(IOT&IS)
Under the Guidance of
Dr. Suman Bhakar
Department of IOT and Intelligent Systems
MANIPAL UNIVERSITY JAIPUR
RAJASTHAN, INDIA
APRIL 2025
CERTIFICATE
Date: 20-04-2025
This is to certify that the minor project title is a record of the bonafide work done by Anshika Gupta
(229302283) submitted in partial fulfilment of the requirements for the award of the Degree of
Bachelor of Technology in Information Technology of Manipal University Jaipur, during the academic
year 2024- 25.
Dr. Suman Bakar
Department of IOT and Intelligent Systems
Manipal University Jaipur
ABSTRACT
This project presents a comprehensive machine learning-based approach to stock market prediction,
aiming to forecast short-term price movements and generate actionable trading signals. Leveraging a
combination of historical stock data, technical indicators, and sentiment analysis, the system
integrates multiple models to predict market trends with improved accuracy.
The project utilizes data fetched via Yahoo Finance (yfinance) and processes it with pandas and
NumPy. A wide range of technical indicators—including Moving Averages, MACD, RSI, Bollinger
Bands, and ATR—are calculated using the TA-Lib and ta libraries. These features are further refined
through feature selection techniques such as Recursive Feature Elimination (RFE) with Random
Forests.
To handle class imbalances in prediction classes (e.g., up, down, neutral), the SMOTE algorithm is
employed. The models used include Random Forest, Support Vector Machine (SVM), and
LightGBM, with hyperparameter tuning conducted via GridSearchCV and Optuna for performance
optimization.
In addition to technical patterns, the system integrates sentiment analysis using VADER and TextBlob,
extracting insights from financial news through the NewsAPI. This hybrid feature set enhances the
predictive capability of the models, particularly during high-volatility periods.
The backend is built using the Django framework, enabling a structured and scalable deployment for
real-time inference and user interaction. Visualizations for trading signals, volatility impact, and
cumulative returns are generated using Matplotlib and Seaborn to facilitate strategic decision-making.
Overall, the project demonstrates a robust and extensible pipeline for stock prediction, combining
quantitative analysis with machine learning to deliver valuable insights for traders and researchers alike.
LIST OF FIGURES
Figure No Figure Title Page No
1. Flow-chart 5
2. Webpage 6
3. Performance Demonstration 6
4. Confusion matrix of LightGBM 7
5. Confusion matrix of Random Forest 7
6. Confusion matrix of SVM 7
7. Prediction result of Apple 8
8. Prediction result of WMT 8
Table of Content
Page No
Chapter 1 INTRODUCTION
1.1 Problem Statement 2
1.2 Objectives of the Project 2
1.3 Scope of Report 3
Chapter 2 BACKGROUND OVERVIEW 3
Chapter 3 METHODOLOGY
3.1 Flowchart Sign Language Detection 4
3.2 Flowchart Speech to Sign Language 5
3.3 Model Architecture 5
3.4 Speech to text and sign 5
Chapter 4 RESULTS
4.1 Model Evaluation 5
4.2 Trading Strategy Analysis 9
Chapter 5 FUTURE WORK AND CONCLUSION 9
REFERENCES 10
Page | 1
1. Introduction
1.1 Problem statement:
Stock market prediction has long been a challenging and sought-after problem in the world of finance.
Traditional methods, such as linear regression or time series models like ARIMA, often fail to capture the
complex and nonlinear nature of the stock market. These models also struggle to incorporate unstructured
data such as news articles or sentiment from social media platforms. Additionally, the dynamic nature of
the market makes it essential to develop adaptive models that can adjust to rapid changes and real-time
signals.
With the increasing availability of financial data and advances in machine learning and natural language
processing, there is now an opportunity to design systems that combine both quantitative and qualitative
data. Technical indicators can reveal trends and momentum, while sentiment analysis can capture the
emotional tone of market participants. Despite these tools, integrating them into a robust, real-time system
that provides actionable insights remains an open problem.
This project aims to address this gap by developing a hybrid machine learning system for short-term stock
market prediction. By combining historical data, technical indicators, and sentiment analysis from
financial news, the system seeks to produce accurate predictions and generate meaningful trading signals
that outperform naive investment strategies.
1.2 Objective:
The primary objective of this project is to develop a machine learning-based system for predicting short-
term stock price movements and generating actionable trading signals. The key goals include:
Collect and preprocess historical stock data using the Yahoo Finance API.
Engineer technical indicators such as Moving Averages, RSI, MACD, and Bollinger Bands.
Perform sentiment analysis on financial news articles using VADER and TextBlob.
Apply feature selection techniques to identify the most impactful features.
Train and evaluate machine learning models such as Random Forest, SVM, and LightGBM.
Optimize model performance using hyperparameter tuning methods like GridSearchCV and
Optuna.
Analyze trading strategy performance using metrics like cumulative return, drawdown, and win
rate.
Deploy the predictive system using the Django web framework for real-time use.
Page | 2
1.3 Scope:
This report encapsulates the design, development, and evaluation of a stock market prediction system.
The focus is on short-term movement prediction using supervised machine learning methods and hybrid
features derived from both numerical data and natural language text. The system does not aim to replace
human financial advisors but to augment their decision-making with data-driven insights. The report
covers:
Data acquisition from financial APIs.
Detailed feature engineering and selection.
Model development and performance evaluation.
Trading signal generation and strategy backtesting.
Deployment architecture using Django.
2. Background Details
The stock market, often described as a complex, dynamic, and highly volatile environment, has
historically been influenced by a wide array of factors including economic indicators, corporate
performance, investor sentiment, political events, and global crises. Forecasting stock movements has
always been a key interest in financial research and investment practice, dating back to early statistical
models like moving averages, exponential smoothing, and autoregressive models such as ARIMA.
However, these classical time-series models exhibit limitations in capturing nonlinear relationships and
handling unstructured data like news headlines or social media posts. With the advent of machine
learning (ML) and artificial intelligence (AI), particularly since the 2010s, there has been a significant
shift toward leveraging data-driven techniques that adaptively learn patterns from large datasets.
Technical analysis plays a pivotal role in this domain. Indicators like Relative Strength Index (RSI),
Moving Average Convergence Divergence (MACD), and Bollinger Bands are frequently used by traders
to assess market momentum, volatility, and trend reversals. These indicators, though historically
interpreted manually by experts, can now be fed into ML models to automate and enhance prediction.
The increasing availability of financial news and sentiment-rich content from APIs like NewsAPI,
Twitter, and Reddit has opened new frontiers in predictive modeling. Tools like VADER and TextBlob
can quantify sentiment from this unstructured text, offering new insights into market psychology that are
not visible through numerical indicators alone.
Recent studies have also highlighted the effectiveness of deep learning models like LSTM (Long Short-
Term Memory networks), which are designed for sequence modeling and time-series forecasting. These
models excel in capturing temporal dependencies and have been successfully applied to stock return
prediction.
By combining these methodologies—technical analysis, sentiment analysis, and machine learning—this
project aims to provide a hybrid system capable of generating more accurate and interpretable trading
signals.
Page | 3
3. System Design and Methodology:
3.1 Flowchart Sign Language Detection
The following flowchart represents the overall workflow of the Stock Market Prediction project. It
provides a clear, step-by-step visualization of the major components involved in the process:
1. Stock Market Prediction
This is the central goal of the project—using historical stock data and other financial indicators to
forecast future trends in the stock market.
2. Data Collection
In this step, historical stock data is gathered from sources such as Yahoo Finance using APIs like
yfinance. Additional financial indicators or sentiment data may also be collected from relevant
platforms.
3. Data Preprocessing
Collected data often contains noise, missing values, and inconsistencies. Preprocessing includes
cleaning the data, handling missing values, normalization, feature engineering (e.g., technical
indicators using TA-Lib), and preparing it for training.
4. Model Training
Here, machine learning models such as LightGBM, Random Forest, or LSTM are trained on the
processed data. Training includes splitting the dataset, hyperparameter tuning, and fitting the
model on the training data.
5. Prediction
After training, the model is used to make predictions on unseen data. This involves forecasting
stock prices or trends for a given period.
6. Evaluation
The model’s performance is evaluated using metrics like RMSE, MAE, accuracy, or R²-score.
This helps assess how well the model is performing and guides any necessary improvements.
This flowchart provides a structured view of the project pipeline, ensuring clarity and aiding in
systematic development and analysis of the stock prediction system.
Figure 1. Flowchart
Page | 4
3.2 Data collection
Stock data is retrieved using the yfinance API, which provides access to open, high, low,
close, adjusted close, and volume data on a daily basis. News headlines are fetched from the
NewsAPI based on relevant keywords and ticker symbols. Each news headline is then
timestamped and matched to corresponding trading days to enable sentiment feature
extraction.
3.3 MODEL ARCHITECTURE
Technical Features:
Moving Averages (MA5, MA20, MA50)
Price Change and Returns
RSI, MACD, MACD Histogram
Bollinger Bands (Upper, Lower, Middle, Width)
ATR and Volatility Metrics
Volume Ratios and Trends
Sentiment Features:
Polarity and Subjectivity Scores from TextBlob
Compound Sentiment Scores from VADER
News Volume per day as a proxy for market interest
These features are combined into a single dataset and aligned with target labels, which represent
future price movements (up, down, or neutral).
3.4 Training and Deployment Pipeline
After preprocessing and feature selection, the models are trained using labeled data. The
SMOTE technique is applied to address class imbalance. GridSearchCV is used for
exhaustive hyperparameter tuning, while Optuna provides a more automated and efficient
optimization.
The trained models are evaluated using standard classification metrics and are also assessed
based on simulated trading performance. The final system is deployed through a Django web
application that allows users to input a stock symbol and retrieve predictions and
visualizations.
4. RESULTS AND DISCUSSION
4.1 Model Evaluation
Performance is assessed using metrics such as:
Accuracy: Correct predictions over total predictions
Precision and Recall: Especially important for evaluating false signals
F1 Score: Balance between precision and recall
ROC-AUC: Area under the Receiver Operating Characteristic curve
Page | 5
Example Results:
Random Forest: Accuracy = 0.78, F1 Score = 0.76
LightGBM: Accuracy = 0.81, ROC-AUC = 0.84
Figure 2 : webpage
Page | 6
Figure3.
Performance demonstration
Page | 7
Page | 8
Figure4. Prediction Results of APPLE
Figure5. Prediction result of WMT
4.2 Trading Strategy Analysis
Trading signals are derived from the model’s class predictions. Buy signals are generated when a
Page | 9
significant price increase is predicted, and sell signals when a drop is forecasted.
Performance of these strategies is benchmarked against a traditional Buy & Hold approach. Key metrics
include:
Cumulative Returns
Win Rate (Profitable trades / Total trades)
Maximum Drawdown (Peak-to-trough loss)
Profit Factor (Total gain / Total loss)
Graphs and heatmaps are used to visualize signal timing, trading volume, and indicator correlations.
5. CONCLUSION AND FUTURE WORK
5.1 Conclusion
This project successfully demonstrates how the integration of machine learning, technical indicators, and
sentiment analysis can be harnessed to build a stock market prediction system. Using data from Yahoo
Finance and financial news sources, we constructed a pipeline that extracts meaningful features, balances
imbalanced datasets, selects the most influential variables, and applies multiple classification models
including Random Forest, Support Vector Machines, and LightGBM.
Our results show that such hybrid models are capable of not only achieving high classification accuracy
but also providing insightful trading signals, such as when to buy or sell a particular stock. We also
analyzed model performance under varying market volatility conditions, helping to better understand
when and why the model performs best.
By incorporating a Django-based frontend, we also established a user-friendly interface for investors or
analysts to interact with the model, visualize predictions, and monitor trading performance metrics like
cumulative returns, win rate, and drawdowns.
5.2 Future work
Despite its achievements, this project opens several avenues for future improvement:
o Deep Learning Integration: Implementing LSTM, GRU, or Transformer-based architectures could
significantly enhance time-series learning and prediction accuracy.
o Multi-Horizon Forecasting: Extending the model to support predictions over multiple future time
periods (e.g., 3 days, 7 days, 30 days) could provide more flexibility for different trading styles.
o Reinforcement Learning: Applying RL to optimize trading strategies and simulate a real-time agent
adapting to market conditions could bring the system closer to autonomous trading.
o Real-Time Data Pipeline: Currently based on historical data, the system could be extended to ingest
real-time stock data and sentiment streams to deliver up-to-date predictions.
o Broader Market Coverage: Expanding the system to include ETFs, commodities, or
cryptocurrencies would enhance its utility and applicability across financial markets.
Page | 10
6. REFERENCES
1. Mean Foong, Oi, Tang Jung Low, and Wai Wan La. 2009. "V2S: Voice to Sign Language
Translation System for Malaysian Deaf People." In Visual Informatics: Bridging Research and
Practice, Lecture Notes in Computer Science, vol. 5857, pp. 868–876. Springer.
2. Munde, Mansi, Ganesh Jadhav, Sushma Gunjal, Kamlesh Mahale, and Aditya Kale. 2024. "A
Real-Time Sign Language to Text Conversion System for Enhanced Communication
Accessibility." Quantitative Research, 2(1):7–13. doi:10.15157/QR.2024.2.1.7-13.
3. Natarajan, B., E. Rajalakshmi, R. Elakkiya, Ketan Kotecha, Ajith Abraham, Lubna Abdelkareim
Gabralla, and V. Subramaniyaswamy. 2022. "Development of an End-to-End Deep Learning
Framework for Sign Language Recognition, Translation, and Video Generation." IEEE Access,
10:104358–104374. doi:10.1109/ACCESS.2022.3210543.
4. Sultana, Shaheena, M. A. H. Akhand, Prodip Kumer Das, and M. M. Hafizur Rahman. 2012.
"Bangla Speech-to-Text Conversion Using SAPI." In 2012 International Conference on Computer
and Communication Engineering (ICCCE), pp. 385–390.
5. Fischer, T., and C. Krauss. 2018. "Deep Learning with Long Short-Term Memory Networks for
Financial Market Predictions." European Journal of Operational Research, 270(2):654–669.
doi:10.1016/j.ejor.2017.11.054.
6. Zhang, Y., and L. Wu. 2009. "Stock Market Prediction of S&P 500 via Combination of Improved
BCO Approach and BP Neural Network." Expert Systems with Applications, 36(5):8849–8854.
doi:10.1016/j.eswa.2008.11.028.
7. Chen, K., Y. Zhou, and F. Dai. 2015. "A LSTM-Based Method for Stock Returns Prediction: A
Case Study of China Stock Market." In 2015 IEEE International Conference on Big Data (Big
Data), pp. 2823–2824. doi:10.1109/BigData.2015.7364089.
8. Nassirtoussi, A. K., S. Aghabozorgi, T. Y. Wah, and D. C. L. Ngo. 2014. "Text Mining for Market
Prediction: A Systematic Review." Expert Systems with Applications, 41(16):7653–7670.
doi:10.1016/j.eswa.2014.06.009.
Page | 11