Final RSR Word Report
Final RSR Word Report
Submitted by
SAI SUSHANTH C [RA2111028020054]
RIZWAN AHMED JAWED [RA2111028020049]
RAHUL S [RA2111028020052]
Dr. M. Prabu
(Assistant Professor, Department of Computer Science and
Engineering)
in fulfillment for the award of the degree
BACHELOR OF TECHNOLOGY
in
BONAFIDE CERTIFICATE
Certified that this project report titled “Optimizing Stock Prediction Using
Hybrid Neural Network : Unified Evaluation Method Approach ” is
the bonafide work of SAI SUSHANTH C [RA2111028020054], RIZWAN AHMED
JAWED [RA2111028020049], RAHUL S [RA2111028020052]who carried out the project
work under my supervision. Certified further, that to the best of my knowledge the work
reported herein does not form any other project report or dissertation on the basis of which a
degree or award was conferred on an occasion on this or any other candidate.
SIGNATURE SIGNATURE
Dr. M. Prabhu,
Assistant Professor, Professor and Head,
Computer Science and Engineering, Computer Science and Engineering,
SRM Institute of Science and Technology, SRM Institute of Science and Technology,
Ramapuram, Chennai. Ramapuram, Chennai.
DECLARATION
We hereby declare that the entire work contained in this project report titled “Optimizing
Stock Prediction Using Hybrid Neural Network : Unified Evaluation Method
Approach” has been carried out by SAI SUSHANTH C [RA2111028020054],
RIZWAN AHMED JAWED [RA2111028020049], RAHUL S [RA2111028020052] at
SRM Institute of Science and Technology, Ramapuram Campus, Chennai- 600089, under the
guidance of [Link], Assistant Professor, Department of Computer Science and
Engineering.
Place: Chennai
Date:
SAI SUSHANTH C
RAHUL S
ABSTRACT
Page. No
ABSTRACT vi
LIST OF FIGURES ix
1 INTRODUCTION 1
1.1 Introduction 1
1.2 Problem Statement 3
1.3 Aim of the project 4
1.4 Project Domain 4
1.5 Scope of the Project. 4
1.6 Methodology 5
2 LITERATURE REVIEW 6
3 PROJECT DESCRIPTION 11
3.1 Existing System 11
3.2 Proposed System 11
3.2.1 Advantages 12
3.3 Feasibility Study 12
3.3.1 Economic Feasibility 12
3.3.2 Technical Feasibility 13
3.3.3 Social Feasibility 13
3.4 System Specification 13
3.4.1 Hardware Specification 13
3.4.2 Software Specification 14
4 MODULE DESCRIPTION 15
4.1 General Architecture 15
4.2 Design Phase 16
4.2.1 Data Flow Diagram 16
4.2.2 UML Diagram 17
4.2.3 Activity Diagram 18
4.2.4 Sequence Diagram 19
4.3 Module Description 20
4.3.1 Data Visualization 20
4.3.2 Feature Selection 20
4.3.3 Train the Model 21
4.3.4 Testing the model 21
4.3.5 Implementing the model 22
8 SOURCE CODE 43
8.1 Code 43
REFERENCES 48
PLAGARISM REPORT 50
1.1 Introduction
Understanding and predicting inventory prices is a critical vicinity of take a look at,
each in instructional studies and realistic utility, because it reflects the functioning of our
economic and social structures. The stock marketplace's pivotal position inside the worldwide
economic system underscores the importance of comprehending its dynamics. Despite the
inherent complexity and unpredictability of economic pastime, vast efforts have been
committed to elucidating those dynamics. Scholars often conceptualize inventory markets as
tricky, nonlinear, and evolutionary systems, acknowledging their dynamic nature. In current
years, device gaining knowledge of has emerged as a treasured tool for inventory price
prediction, because of its capacity to discern and make the most nonlinear relationships inside
records mechanically. Among the plethora of system gaining knowledge of strategies, the
Trader Company (TC) approach has garnered interest as a promising technique. The TC
approach, akin to an real economic group, accommodates core components: the Trader, liable
for prediction, and the Company, tasked with aggregating predictions. This approach is
designed to house the dynamic nature of the inventory market, supplying each high predictive
accuracy and interpretability. By simulating the roles of buyers within a monetary group, the
TC method not most effective captures the intricacies of market dynamics however
additionally offers insights into the elements influencing stock charge actions. Its capacity to
fuse superior machine getting to know strategies with interpretability makes it a strong device
for traders and analysts looking for to navigate the complexities of the stock marketplace.
Many modern methods of stock forecasting, such as the TC method, mainly focus
on horizontal point estimates. However, the lack of statistical uncertainty that accompanies
these point forecasts raises reliability concerns and security concerns, especially the level of
forecast uncertainty with which the main effects of investment decisions are generated in the
economy market sector where a large portion of trading is automated and algorithmic trading
based machine learning strategies prevail Visibility is critical Algorithmic trading strategies,
which operate in large investor universes and often use data use multiple applications (such as
intraday or tick-to-tick data), requires reliable uncertainty to effective decision making.
Consequently, the main challenge addressed in this paper revolves around budgeting that not
1
only boasts high predictability but also provides robust levels of uncertainty. Forecasting price
movements in financial markets, adhering to the efficient market hypothesis, is inherently
challenging given the nature of stock market prices as random steps and unpredictable changes
it reveals therefore This challenge is further compounded in the case of bitcoin, whose price
fluctuates widely and reveals complex factors. Traders have traditionally relied on two main
tools—technology and fundamental research—to formulate their trading strategies. Based on
price trend analysis and trading volume analysis, technical analysis provides insights into
possible trading signals. In contrast, mainstream research delves into the economic and
economic dimensions of protectionism and examines its sensitivity.
Humans and computers analyze information, but as humans can manage budgets
and make decisions based on experience, algorithmic trading has emerged as a solution to
overcome these challenges, which can be difficult to manage large amounts of data due to
material various effects due to inflation. Algorithmic trading uses preprogrammed computers
with specific mathematical rules. There are two main channels in financial markets: price
prediction and algorithmic trading. Price forecasting focuses on building models to accurately
predict future prices, while algorithmic trading goes beyond forecasting to actively participate
in the market, such as choosing areas to maximize profits and trading volume What
fortunately, getting an accurate forecast doesn’t always mean the most benefit. The total loss
incurred by the trader in unfair practices may be greater than the just loss. Stock price data
exhibit time series characteristics, and the auto-regressive integrated moving average
(ARIMA) method is often used for time series forecasting.
2
relationships within time collection facts, probably main to more accurate predictions. A
terrific distinction between LSTM and conventional RNN lies of their dealing with of
temporal information. While RNNs rely upon quick-term reminiscence, recycling past data for
fast use, LSTMs excel in shooting lengthy-term dependencies, thereby improving their
predictive talents. To examine and examine the forecasting performance of these fashions,
empirical analyses are conducted using records from outstanding tech businesses like Google,
Apple, Netflix, and Amazon. By examining the forecasting consequences derived from each
ARIMA and LSTM fashions, insights into their relative efficacy in predicting stock prices
across different temporal horizons can be gleaned.
Economics and finance have long been fertile grounds for studies, drawing hobby
from commercial, governmental, and academic sectors alike. With their complex
dependencies on severa tangible and intangible elements, these fields gift a challenging yet
attractive arena for analysis and prediction. In specific, the volatility of stock markets adds a
further layer of complexity to prediction endeavors. Nevertheless, the ability for excessive
rewards serves as a sturdy motivator for the vast take a look at of these structures. Over the
years, a plethora of works have delved into stock-charge prediction, using various statistical
models and time-series analyses. Given the non-linear nature of stock fees' dependence on
multiple variables, conventional techniques frequently fall brief. To address this venture,
many have turned to large-data-driven device gaining knowledge of strategies. While earlier
tries the usage of techniques like random forests, help vector regression, and shallow neural
networks confirmed evidence-of-idea applicability, latest advancements in deep studying,
including Long Short-Term Memory (LSTM) networks and encoder-decoder structures, show
more promise, specifically due to their capability to address the time-series nature of market
records. In addition to prediction, the layout of optimized portfolios has been a focus of
studies in quantitative and statistical finance. The purpose of an optimal portfolio is to allocate
weights to a set of capital belongings in a manner that maximizes the go back while handling
chance. Markowitz's mean-variance optimization approach, based on asset returns suggest and
covariance matrix, laid a foundational framework. However, this concept has wonderful
obstacles, in particular concerning estimation errors in predicted returns and covariance
matrix.
1.2Problem Statement
Fundamental evaluation involves assessing the intrinsic value of a security by means
of analyzing the underlying factors impacting a organisation's present operations and future
potential. This approach delves into diverse aspects including monetary statements, industry
3
conditions, control excellent, and economic signs to determine whether or not a protection is
hyped up, undervalued, or pretty priced. On the other hand, technical analysis specializes in
reading statistical styles derived from buying and selling records, together with rate actions
and buying and selling volumes. Its goal is to become aware of trends and styles inside the
marketplace conduct to forecast future price moves and pinpoint ability access or go out
factors for trades. Both fundamental and technical evaluation provide awesome methods to
evaluating investment opportunities and are frequently selected primarily based on elements
like market situations, funding horizon, and man or woman options.
This project aims for a more accurate stock prediction system by using a time-sensitive
approach to analyze historical data and a Hybrid Neural Network (HNN) that leverages
financial theories. The HNN extracts patterns from stock prices for reliable, short-term
predictions.
The project focuses on creating a more unified and objective approach to predicting
stock trends using a specialized neural network. Here's a breakdown of the key aspects:
Unifying Evaluation Objective Methods: This highlights the project's aim to move beyond
subjective or fragmented evaluation methods for stock prediction. It seeks to establish a single,
standardized approach for assessing the effectiveness of prediction models. Stock Prediction:
The core objective lies in developing a system capable of forecasting future stock price
movements. Hybrid Neural Network Algorithm: The project utilizes a unique neural network
architecture (HNN). This HNN combines the strengths of traditional neural networks, known
for their ability to learn complex patterns from data, with insights gleaned from established
financial theories. The goal is to create a model that not only identifies patterns in historical
stock prices but also incorporates financial knowledge to improve prediction accuracy. In
essence, this project seeks to bridge the gap between machine learning techniques and
financial theory to create a more robust and reliable system for predicting stock market trends.
The project focuses on creating a more comprehensive and accurate system for
4
predicting stock market trends. Here's a breakdown of its scope: Time-Varying Data Analysis:
The project moves beyond static analysis of historical stock data. It acknowledges that recent
data holds greater significance for predicting future trends compared to distant historical
information. The scope encompasses developing techniques to assign weights to data points
based on their timeliness, allowing the model to prioritize the most relevant information for
improved prediction accuracy. Hybrid Neural Network (HNN) Development: The project
centers around creating a specialized neural network architecture called the HNN. This
network combines the strengths of traditional neural networks, known for their ability to learn
complex patterns from data, with insights gleaned from established financial theories. While
the specific details of the evaluation method might not be explicitly mentioned in the provided
scope description, the project title suggests a focus on creating a unified approach for
assessing the effectiveness of the HNN model. This likely involves establishing clear metrics
and benchmarks to gauge the model's prediction accuracy and potentially comparing its
performance against other existing stock prediction methods.
1.6 Methodology
The project "Unifying Evaluation Objective Methods for Stock Prediction using
Hybrid Neural Network Algorithm" likely involves collecting historical stock price data,
preprocessing it, and developing a specialized Hybrid Neural Network (HNN) architecture.
The methodology includes incorporating time-varying importance into the analysis, training
the HNN on preprocessed data, and evaluating its performance using standardized metrics.
Model refinement based on evaluation results is also anticipated. These educated guesses
outline a framework for the project's approach to stock prediction using a time-sensitive
methodology and specialized HNN.
5
CHAPTER 2
LITERATURE REVIEW
There are many papers published on Unifying Evaluation Objective Methods for Stock
Prediction using Hybrid Neural Network Algorithm each using their own algorithms. A few of
those papers are mentioned below.
The paper “How to Handle Data Imbalance and Feature Selection Problems in CNN-
Based Stock Price Forecasting”, 2022 by authors Zinnet Duygu Akehir and Erdal Kili,
Forecasting inventory market movements is a hard task due to the inherent uncertainty and the
multitude of influencing factors. Traditional time series strategies regularly conflict to achieve
correct predictions in this complicated environment. In latest literature, Convolutional Neural
Networks (CNNs) have emerged as a promising method for inventory marketplace
forecasting, demonstrating wonderful fulfillment. However, issues inclusive of information
imbalance stemming from labeling discrepancies and challenges in function selection were
determined in using these models. To address those shortcomings, this have a look at
introduces a singular rule-based totally labeling set of rules and an revolutionary characteristic
choice technique. The proposed labeling algorithm targets to mitigate data imbalances by
using providing an improved framework for assigning labels to inventory market information.
Simultaneously, the novel characteristic choice method seeks to decorate model overall
performance with the aid of figuring out the maximum relevant input variables. Leveraging
these improvements, a CNN-based totally model is built to expect tomorrow's exchange action
for stocks within the Dow30 index. Multiple units of image-based input variables are
generated, incorporating technical signs, gold, and oil rate data, to feed into the CNN version.
Comparative evaluation of prediction overall performance is conducted against existing
research within the literature. The experimental findings demonstrate that the CNN prediction
model, leveraging the proposed feature choice and labeling approaches, achieves a remarkable
development in accuracy, starting from 3% to 22%, in comparison to preceding CNN-
primarily based fashions. Moreover, the effectiveness of the proposed labeling technique
surpasses that of conventional records weighting methods, as established by means of
comparisons with Chen and Huang's approach. Overall, these findings underscore the
significance of innovative strategies in addressing data imbalance and feature selection
challenges in stock market forecasting, thus advancing the efficacy of CNN-based models in
this domain.
6
The paper “Novel Stock Crisis Prediction Technique-A Study on Indian Stock
Market”, 2021 by Nagaraj Naik and Biju R. Mohan, Predicting inventory fees has emerge as a
focus of research, and traditional techniques frequently rely upon statistical and econometric
fashions. Yet, these fashions face demanding situations in coping with nonstationary time
collection information efficiently. With the net's speedy evolution and the surge in social
media utilization, online information and comments function indicators of investor sentiments
and attitudes closer to shares, imparting treasured insights for inventory charge prediction.
This paper seeks to introduce a novel technique leveraging deep studying strategies,
amalgamating conventional financial index variables with social media text features to
beautify prediction accuracy.
The paper “A stock price prediction method based on deep learning technology”, 2021
by authors Xuan Ji , Jiachen Wang and Zhijun Yan, An innovative technique to personnel
recruitment selection involves leveraging a probabilistic computerized advice approach. This
technique contains numerous key components geared toward facilitating green matching
among candidates and task necessities. By harnessing the energy of automation, this technique
seeks to streamline the recruitment technique and optimize personnel selection. Notably, the
absence of computerized structures in medical establishments and hospitals has spurred the
development of automation inside the healthcare sector. This shift underscores the pressing
want for technological solutions to address staffing demanding situations in important sectors.
One such era, Natural Language Processing (NLP), has emerged as a transformative device
reaping benefits society at large. NLP's capacity to handle text-based totally records alleviates
the manual burden related to processing enormous amounts of textual data. Consequently, its
application in recruitment tactics offers great capacity for reinforcing performance and
accuracy even as riding fine outcomes for each employers and applicants. Automation in
resume classification is a game-changer for several reasons. First and foremost, it saves time
and resources.
The paper “Automated Resume Screening System”, 2020 by authors Frank Färber,
Utilizing a Vector Space Model, we can efficiently pair each CV with its corresponding
activity description. This approach entails using a vectorization version coupled with cosine
similarity to gauge the relevance among candidate profiles and job necessities. By computing
ranking scores thru this technique, we can become aware of the maximum suitable candidates
for the given job role. Sure thing! An Automated Resume Screening System is like a digital
7
gatekeeper for job applications. Its main job is to sift through the mountains of resumes that
flood in when a job is posted. This system uses algorithms and predefined criteria set by
recruiters or hiring managers to quickly categorize and filter resumes. The system typically
analyzes resumes based on keywords, skills, education, work experience, and other relevant
factors. It helps streamline the hiring process by narrowing down the pool of applicants to
those who closely match the job requirements. Think of it like a smart assistant for recruiters,
saving them time and effort by automating the initial screening process. It's like having a
helpful assistant that does the heavy lifting in the early stages of recruitment.
The paper “System for screening candidates for recruitment”, 2020 by authors Momin
Adnan, To search for the resumes that are closest to the specified description of the job, the
model used cosine similarity, CNN and content-based Recommendation. Resume
classification systems are designed to streamline the recruitment process by automatically
sorting and categorizing resumes based on predefined criteria. These systems often use a
combination of natural language processing (NLP) and machine learning algorithms to analyze
the content of resumes and extract relevant information. The system scans resumes for specific
keywords and phrases relevant to the job requirements. This helps filter out resumes that don't
contain the essential skills or qualifications. NLP algorithms are employed to identify and
extract key skills, qualifications, and experiences mentioned in the resumes. This allows the
system to assess whether candidates possess the necessary background for the position.
The paper “Stock Trend Prediction Using Candlestick Charting and Ensemble Machine
Learning Techniques With a Novelty Feature Engineering Scheme”, 2021 by authors Yaohu
Lin , Shancun Liu , Haijun Yang and Harris Wu, This system sorts all resumes according to
the company's requirements and sends them to the HR for further consideration. The required
resume is chosen from a pool of applicants, and the others are discarded. Resume sorting is a
crucial step in the resume classification process. It involves the use of automated systems to
analyze and categorize resumes based on predefined criteria. This process helps recruiters and
hiring managers efficiently manage large volumes of resumes and identify the most relevant
candidates for a particular job. Automated systems extract relevant information from resumes,
such as education history, work experience, skills, and contact details. The system compares
the extracted information with predefined keywords or criteria set by the employer. For
example, if a job requires specific skills or qualifications, the system looks for those keywords
in the resume.
8
The paper “Integrated Long-Term Stock Selection Models Based on Feature Selection
and Machine Learning Algorithms for China Stock Market”, 2020 by authors Xianghui Yuan ,
Jin Yuan , Tianzhao Jiang and Qurat Ul Ai, Which helps the recruiters in selecting the resumes
based on job-description in a short duration of time. It helps in an easy and efficient hiring
process by extracting the requirements automatically. It showcases your skills in developing
systematic approaches to solving problems or handling tasks within a professional or technical
context. In the context of resume classification, mentioning your proficiency in the
formulation of systems implies that you are adept at creating and implementing efficient
systems, methodologies, or frameworks relevant to your field of expertise.
The paper “Global Stock Market Prediction Based on Stock Chart Images Using Deep
Q-Network”, 2019 by authors Jinho Lee , Raehyun Kim , Yookyung Koh and Jaewoo Kang,
In Research shows that country-specific stock charts have the potential to generate returns not
only in that country but also in global markets Our findings suggest that a model raised in the
US. market alone showed strong performance that equaled or exceeded results in many other
markets over a 12 -year test period. This means that machine learning and artificial
intelligence approaches to currency price forecasting, which are typically only applied in
single-country studies, can be successfully applied globally and crucially, as a modeling
framework , the inputs and training methods are complex.
The paper “Stock Volatility Prediction by Hybrid Neural Network”, 2019 by authors
yujie Wang , Hui Liu , Qiang Guo , Shenxiang Xie and Xiaofeng Zhang, This paper offers a
singular technique that integrates sophisticated textual features with fundamental stock
records. Unlike traditional strategies that solely focus on volatility trends, our approach offers
a more complete extraction of stock capabilities. As a result, HTPNN demonstrates advanced
overall performance by means of correctly balancing prediction accuracy and computational
performance in forecasting stock volatility.
The paper “A Dual-Attention-Based Stock Price Trend Prediction Model With Dual
Features”, 2019 by authors Yingxuan Chen , Weiwei Lin and James Z. Wang, In order to
overcome the limitation of traditional methods of extracting relevant factors for analyzing
economic time series data, we propose a new method called Two Phase Trend Prediction
9
Model (TPM) This model uses factors two uses, two concepts designed for investment banks.
Initially, in the data preprocessing step we use the PLR method and CNN to extract two
factors, capture long-term trends and short-term market trends from historical data and then in
the time series modeling phase we apply a new system including short-term features to coder
and long-term decoder. We combine cognitive techniques for encoder and decoder
components, enabling adaptive selection and combining the most appropriate feature
dimensions across time Finally, TPM exhibits high accuracy in predicting trend slopes and
time length of the. The experimental results confirm the effectiveness of our method, showing
a remarkable reduction of 13 in RMSE.
10
CHAPTER 3
PROJECT DESCRIPTION
This study uses machine learning algorithms to examine the relationship between
the Korea Composite Group Price Index (KOSPI), a national statistical index administered by
the Korean government, and the analysis of Korean stock market sales analysis examines data
that from the 1,470 companies listed in KOSPI and KOSDAQ and 20-year history This ranges
from 2000 to 2021 with physical indices It uses various machine learning algorithms such as
random forest, gradient growth, extreme gradient enhancement, adaptive enhancement, and
categorical enhancement on the spanned [Link] findings show that various changes in
national accounting indicators affect corporate sales in different sectors. For example,
industrial accidents greatly affect manufacturing, finance, and insurance, while other sectors
are affected by factors such as the price of gold, the number of automobiles produced, and
foreign exchange reserves. Consequently, the study suggests the use of national statistical
indicators to develop management strategies based on machine learning techniques. It
attempts to understand the impact of these indices on the company’s sales and determine the
best machine learning algorithms for analysis. By scrutinizing various indices and the
company’s sales data, the study identifies key variables affecting sales performance and
measures algorithm performance. Notably, gradient-boost appears as the best algorithm for all
tasks, with particular tasks favoring different algorithms. The study emphasizes the
importance of industry-specific variation of the Impact of macroeconomic indicators on
enterprise income. In particular, it finds the industrial twist of fate charge to be a giant
predictor of sales performance across various sectors, offering empirical insights into the
limitations of preceding studies.
3.2.1 Advantages
• Achieve a well-balanced tradeoff among various parameters.
The project "Unifying Evaluation Objective Methods for Stock Prediction using
Hybrid Neural Network Algorithm" presents both promising aspects and challenges to
feasibility. Promising elements include acknowledging the dynamic nature of market data
through time-varying importance, leveraging neural networks and financial theories for a more
robust model, and establishing a unified evaluation method for objective assessment.
However, challenges such as the complexity of stock market dynamics, neural network design,
and data availability and quality must be addressed for successful implementation. Overall
feasibility relies on effectively navigating these challenges while capitalizing on the project's
promising features to enhance stock prediction accuracy.
The economic feasibility of the project "Unifying Evaluation Objective Methods for
Stock Prediction using Hybrid Neural Network Algorithm" involves weighing costs against
potential benefits. Costs include data acquisition expenses, computational resources for
training the HNN, and ongoing development and maintenance efforts. Benefits could include
improved investment decisions, reduced risk through trend insights, and potential for trading
automation. However, economic viability depends on the HNN's ability to generate returns
12
that outweigh costs. Factors like prediction accuracy, market efficiency, and transaction costs
must be carefully considered. While promising, the project may be most beneficial for
institutional investors or niche applications where inefficiencies can be exploited.
The project "Unifying Evaluation Objective Methods for Stock Prediction using Hybrid
Neural Network Algorithm" is technically feasible based on several factors: Feasible
Techniques: Established methods like time-varying importance weighting and hybrid neural
networks are well-suited for stock prediction tasks. Abundant neural network libraries and
readily available historical stock data facilitate HNN development. Challenges: Data quality,
HNN architecture design, and market unpredictability are key challenges that needs
addressed.
The Social feasibility in this project, it examines the mission's impact on stakeholders
and society at massive. It considers elements inclusive of reputation via cease-users, ability
process introduction or displacement, and the broader societal implications of adopting
computerized power prediction structures. Engaging with stakeholders and addressing any
concerns regarding privateness, equity, and accessibility are crucial factors of ensuring social
acceptance and help for the undertaking.
13
Memory (RAM): Minimum 8 GB; Recommended 32 GB or above
• Python
• Anaconda
• Jupyter Notebook
• TensorFlow
14
CHAPTER 4
MODULE DESCRIPTION
Figure 4.1 represents the architecture diagram of the project. The architecture
diagram depicts a typical machine learning process for stock price prediction using a neural
network. The data is collected, preprocessed, and then split into training and testing sets. The
training set is used to train the HNN model, and the testing set is used to evaluate the model's
performance.
15
4.2 Design Phase
Figure 4.2 likely depicts the project's data flow, illustrating the journey of stock price data
within the system. It would showcase various stages of data processing, likely ranging from data
collection and cleaning to feature extraction, model training, evaluation, and finally, prediction
generation. The core of the diagram would be the Hybrid Neural Network (HNN) ensemble
technique. This would show how stock price data is fed into the ensemble model, how predictions
are generated, and how they are ultimately used to reduce the human workload in stock price
prediction tasks. Overall, Figure 4.2 would serve as a visual representation of the data pipeline and
workflow for stock price prediction, emphasizing the key components and interactions involved in
the system.
16
4.2.2 UML Diagram
Class Diagram
Figure 4.3 represents the Class Diagram of the project. The class diagram of this
project illustrates likely depicts the numerous instructions and their relationships within the
device. This diagram might outline the key entities and their attributes, alongside the
techniques they employ to achieve the venture's goals. Given the nature of the assignment,
instructions might consist of additives such as data preprocessing modules, various
predictive models (including neural networks), optimization algorithms, and evaluation
metrics. Relationships between these components would illustrate how they interact and
collaborate to predict stock prices successfully. The diagram would possibly comprise
inheritance, aggregation, and association to represent the hierarchical structure and
dependencies between different components of the system, such as feature extraction
modules, model training components, and result evaluation methods. Overall, the class
diagram serves as a blueprint for developers, providing a visual representation of the
project's architecture and assisting in the implementation of the proposed methodology for
enhancing stock price prediction accuracy.
17
4.2.3 Activity Diagram
Figure 4.4 represents the Activity diagram of the project. It represents an employed to
visualize the sequence of moves or methods worried in stock prediction. This diagram detail
the steps taken through the deep learning ensemble version, from statistics preprocessing to
model schooling and assessment. It may illustrate statistics collection, characteristic
engineering, model choice, and performance evaluation. By depicting those activities, enables
stakeholders apprehend the go with the flow of operations inside the prediction gadget,
facilitating, evaluation, and potential optimizations inside the method.
18
4.2.4 Sequence Diagram
Figure 4.5 represents the sequence diagram. The sequence diagram might depict the
flow of interactions and messages between the distinct components involved in the stock price
prediction process. It could illustrate how data is collected from various sources such as
financial markets or news feeds, how it's preprocessed to extract relevant features, how these
features are utilized by the predictive models, and how predictions are generated.
Additionally, it can show interactions with external systems or databases for retrieving
historical stock data or validating predictions. The diagram serves as a visual representation of
the system's behavior, aiding stakeholders in understanding the process from a high-level
perspective.
19
4.3 Module Description
4.3.1 Module 1: DATA VISUALIZATION
Introduction to Data Visualization:
Data visualization involves presenting raw data through graphical representations, enabling a
more intuitive understanding of complex datasets. Its primary purpose is to explore the data
and uncover deep insights that may not be immediately apparent from the raw data alone.
Using visualization, we can conduct exploratory data analysis to identify patterns, trends,
correlations, and anomalies within the dataset. This process helps in gaining a deeper
understanding of the data and its underlying structures.
Data visualization aids in identifying areas of the dataset that require attention and
improvement. It helps in detecting data types, missing or duplicated values, and outliers,
which are crucial for refining the dataset and ensuring its quality.
Creating a department among schooling and checking out sets within your dataset serves
as a important method to rapidly examine the efficacy of an set of rules to your unique
assignment. The training set is applied to assemble and refine the model, basically serving as a
simulation ground for the set of rules. Conversely, the check set is treated as novel statistics,
holding returned the actual output values from the algorithm. By using the trained version at the
check set inputs and evaluating its predictions towards the withheld outputs, we derive a overall
performance metric, gauging the model's effectiveness on unseen statistics. This process yields
an approximation of the algorithm's functionality whilst making predictions on unfamiliar
instances. A final new release of the machine getting to know model represents the model
deemed suitable for predicting outcomes on new information. To facilitate model training, get
entry to to the dataset, in conjunction with several utility functions, is fundamental. The
schooling phase involves a couple of iterations or passes thru the dataset, throughout which the
version's parameters are initialized randomly and step by step delicate.
22
CHAPTER 5
In both the existing and proposed systems, the process of implementation and
testing involves similar foundational steps, yet varies in specific details based on chosen
algorithms and data characteristics. The existing system necessitates substantial historical data
acquisition, including annual sales data sourced from financial databases or the Korea Listed
Companies Association, alongside national statistical indicators from sources like Statistics
Korea e-Nara Index and Bank of Korea Economic Statistical System. Data preprocessing
involves standardizing formats, handling missing values, and ensuring consistency across
companies and timeframes. Outputs entail feature importance analysis using machine learning
algorithms to discern national statistical indicators' influence on sales and performance
evaluation through metrics like MAE, MSE, and RMSE. Conversely, the proposed system
mandates historical stock price data acquisition, covering daily or weekly closing prices for
several years, and preprocessing akin to the existing system. Model training involves defining
the architecture and hyperparameters of a Hybrid Neural Network model and feeding
preprocessed data for training. Outputs encompass predicted stock prices and returns for
specified time horizons, alongside model evaluation using metrics like Sharpe Ratio and
Sortino Ratio, comparing predictions with actual market outcomes. This comprehensive
analysis of inputs and outputs is pivotal for effective model implementation and testing.
23
5.1.1 Stock Price Prediction
Figure 5.1 likely shows a comparison between a price prediction model's output and
the actual market price. One line would represent the predicted prices over time, while the other
line shows the real Bitcoin prices. Ideally, the predicted price line would closely follow the
actual price line, with minimal deviations. Significant and consistent gaps between the lines
would indicate that the model is not accurately capturing the price movements. This analysis
helps assess the model's effectiveness in forecasting prices.
24
5.1.2 View of Final Prediction
Figure 5.2 likely shows a comparison between a Bitcoin price prediction model's output
and the actual market price. One line would represent the predicted prices over time, while the
other line shows the real Bitcoin prices. Ideally, the predicted price line would closely follow the
actual price line, with minimal deviations. Significant and consistent gaps between the lines would
indicate that the model is not accurately capturing the price movements. This analysis helps assess
the model's effectiveness in forecasting Bitcoin prices.
5.2 Testing
Testing involves systematically validating the functionality and performance of the
predictive models developed for stock price prediction. It ensures that the models produce accurate
forecasts and behave as expected under various conditions. For the stock price prediction project
described, testing is a critical phase to ensure the accuracy and reliability of the predictive models.
25
5.2.1 Types of testing
Unit testing
Validation Testing
Integration testing
Performance Testing
Regression Testing
INPUT:
import pytest
Test result:
Testing data preprocessing functions to ensure they handle missing data,
outliers, and transformations accurately.
Testing model training functions to verify that the models are trained with the
correct hyperparameters and architecture.
Testing evaluation metrics functions to ensure they calculate performance
metrics such as Mean Absolute Error and Sharpe Ratio accurately.
5.2.3 Integration testing
Integration testing ensures that individual components of a system work
together as intended, detecting any interface issues. Integration testing focus on testing
specific functions or modules responsible for data preprocessing, model training, and
evaluation.
26
INPUT:
import pytest
Test result:
It involves assessing the collaboration of different components to ensure seamless
functionality.
The results indicated a successful integration, as the modules responsible for parsing,
feature extraction, and classification worked cohesively.
Overall, the integration testing phase instilled confidence in the project's ability to
handle the intricacies of resume data processing.
INPUT:
import pandas as pd
from sklearn.model_selection import train_test_split
from [Link]
ing import MinMaxScaler
# Load historical stock datastock_data = pd.read_csv('historical_stock_data.csv')
# Extract features
features = stock_data[['Volume', 'Price_Trend', 'Moving_Average',
'Technical_Indicator_1', 'Technical_Indicator_2']]
# Normalize feature data
scaler = MinMaxScaler()
27
scaled_features = scaler.fit_transform(features)
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(scaled_features,
stock_data['Next_Day_Price'], test_size=0.2, random_state=42)
Test result:
It involves assessing the collaboration of different components to ensure seamless
functionality.
The results indicated a successful integration, as the modules responsible for parsing,
feature extraction, and classification worked cohesively.
Overall, the integration testing phase instilled confidence in the project's ability to
handle the intricacies of resume data processing.
5.2.6 Regression Testing
Regression Testing is defined as a type of software testing to confirm that a recent
program or code change has not adversely affected existing features. We can also say it is
nothing but a full or partial selection of already executed test cases that are re-executed to
ensure existing functionalities work fine.
INPUT:
28
# Retrain the model with the same data and [Link](X_train, y_train)
# Make predictions
predictions = [Link](X_test)
# Compare with previous MSE
mse_regression = mean_squared_error(y_test, predictions)
print("Regression Testing MSE:", mse_regression)
Test result:
It involves assessing the collaboration of different components to ensure seamless
functionality.
The results indicated a successful integration, as the modules responsible for parsing,
feature extraction, and classification worked cohesively.
Overall, the integration testing phase instilled confidence in the project's ability to
handle the intricacies of resume data processing.
29
Figure 5.3 likely shows a test image of stock price prediction model's output and
the actual market price. One line would represent the predicted prices over time, while
the other line shows the real prices. Ideally, the predicted price line would closely
follow the actual price line, with minimal deviations. Significant and consistent gaps
between the lines would indicate that the model is not accurately capturing the price
movements. This analysis helps assess the model's effectiveness in forecasting prices.
30
CHAPTER 6
RESULTS AND DISCUSSIONS
Improved Accuracy: Hybrid neural networks can combine the strengths of different
neural network architectures, potentially leading to more accurate predictions compared to
individual models. This translates to making better use of computational resources. Reduced
Training Time: Certain hybrid approaches might achieve similar accuracy with less training data
or shorter training times compared to standalone models. Efficiency Considerations:
Computational Complexity: Hybrid models can be computationally expensive to train, especially
if they involve deep learning architectures. This could impact efficiency on machines with
limited resources. Feature Engineering: Selecting and engineering relevant features for the model
can be a time-consuming process. The efficiency of the system depends on how well this is
addressed. Overall, the efficiency of the system hinges on the specific design choices and
implementation.
The existing system analyzes how national statistical indicators affect company sales
across different industries. It uses machine learning to identify the most relevant indicators for
each industry (e.g., industrial accident rate for manufacturing). The proposed system focuses on
predicting stock prices and portfolio returns using a hybrid neural network (HNN) model. It
considers the time-series nature of stock data and uses historical price data to predict future
trends. The proposed system also incorporates techniques to optimize the HNN model for better
accuracy.
31
6.3 Sample Code
32
33
LMS
34
LSTM
35
36
Epoch Printing Callback
LSTM Algorithm
37
38
Get Predictions From Model
39
40
Output
41
CHAPTER 7
7.1 Conclusion
42
create a more practical system, future work can integrate the prediction model with trading
strategies. This would involve developing algorithms that translate predictions into buy/sell
decisions, taking into account factors like risk management and portfolio allocation.
Combining these elements would lead to a complete algorithmic trading system that
leverages the project's prediction capabilities for real-world application. These enhancements
hold promise for building a more robust and practical stock prediction system.
43
CHAPTER 8
SOURCE CODE
8.1 Code
# In[1]:
import numpy as np
import pandas as pd
import [Link] as plt import
warnings
[Link]('ignore')
from sklearn.naive_bayes import MultinomialNB from
[Link] import OneVsRestClassifier from sklearn
import metrics
from [Link] import accuracy_score from
[Link] import scatter_matrix
from [Link] import KNeighborsClassifier from
sklearn import metrics
resumeDataSet = pd.read_csv('[Link]')
resumeDataSet['cleaned_resume'] = ''
[Link]()
# In[2]:
# In[3]:
print ("Displaying the distinct categories of resume and the number of records
belonging to each category -")
print (resumeDataSet['Category'].value_counts())
44
# In[4]:
[Link](y="Category", data=resumeDataSet)
# In[5]:
targetCounts = resumeDataSet['Category'].value_counts()
targetLabels = resumeDataSet['Category'].unique()
# Make square figures and axes
[Link](1, figsize=(25,25))
the_grid = GridSpec(2, 2)
cmap = plt.get_cmap('coolwarm')
[Link]()
# In[6]:
import re
def cleanResume(resumeText):
resumeText
# In[7]:
print (resumeDataSet['cleaned_resume'][31])
# In[8]:
fig.tight_layout()
[Link](figsize=(12, 9))
plt.tight_layout()
closing_df = pdr.get_data_yahoo(tech_list, start=start, end=end)['Adj Close']
[Link](x='GOOG', y='GOOG', data=tech_rets, kind='scatter', color='seagreen')
[Link](x='GOOG', y='MSFT', data=tech_rets, kind='scatter')
[Link](tech_rets, kind='reg')
46
# In[9]:
[Link](figsize=(12, 10))
[Link](2, 2, 2)
df
[Link](figsize=(16,6))
[Link](df['Close'])
[Link]('Date', fontsize=18)
[Link]()
data = [Link](['Close'])
dataset = [Link]
training_data_len
scaler = MinMaxScaler(feature_range=(0,1))
scaled_data = scaler.fit_transform(dataset)
scaled_data
train_data = scaled_data[0:int(training_data_len), :]
x_train = []
y_train = []
# In[10]:
x_train.append(train_data[i-60:i, 0])
y_train.append(train_data[i, 0])
47
if i<= 61:
print(x_train)
print(y_train)
print()
def create_model():
model = Sequential()
[Link](LSTM(64, return_sequences=False))
[Link](Dense(25))
[Link](Dense(1))
return model
model = create_model()
[Link](optimizer='adam', loss='mean_squared_error')
# In[11]:
x_test.append(test_data[i-60:i, 0])
x_test = [Link](x_test)
predictions = scaler.inverse_transform(predictions)
rmse
# In[12]:
train = data[:training_data_len]
valid = data[training_data_len:]
valid['Predictions'] = predictions
[Link](figsize=(16,6))
[Link]('Model')
[Link]('Date', fontsize=18)
[Link](train['Close'])
[Link](valid[['Close', 'Predictions']])
[Link]()
49
REFERENCES
[1] A. L. P. Selçuk, E. Yigit and Z. Ersoy, Prediction of bist price indices: A comparative study
between traditional and deep learning methods, Sigma J. Eng. Natural Sci., vol. 38, no. 4, pp.
1693-1704, (2020).
[2] A. Oueslati and Y. Hammami, Forecasting stock returns in Saudi Arabia and Malaysia, Rev.
Accounting Finance, vol. 17, no. 2, pp. 259-279, May 2018.
[3] B. Unal and C. Aladag, Stock exchange prediction via long short-term memory networks,
Proceedings Book, pp. 246, (2019).
[4] Gwangsu Lee, Exploring Predictive Variables Affecting the Sales of Companies Listed With
Korean Stock Indices Through Machine Learning Analysis, IEEE Access, (2023).
[5] K. Tissaoui and J. Azibi, International implied volatility risk indexes and Saudi stock return-
volatility predictabilities, North Amer. J. Econ. Finance, vol. 47, pp. 65-84, Jan. (2019).
[6] M. Vijh, D. Chandola, V. A. Tikkiwal and A. Kumar, Stock closing price prediction using
machine learning techniques, Proc. Computer. Sci., vol. 167, pp. 599-606, (Jan. 2020).
[7] Nagaraj Naik, Biju R. Mohan, Novel Stock Crisis Prediction Technique - A Study on Indian
Stock Market, IEEE Access, (2021).
[8] N. T. Hung, Stock market volatility and exchange rate movements in the Gulf Arab countries:
A Markov-state switching model, J. Islamic Accounting Bus. Res., vol. 11, no. 9, pp. 1969-
1987, Aug. (2020).
[10] Saud S. Alotaibi, Ensemble Technique With Optimal Feature Selection for Saudi Stock
Market Prediction: A Novel Hybrid Red Deer-Grey Algorithm, IEEE Access, (2021).
[11] S. M. Idrees, M. A. Alam and P. Agarwal, A prediction approach for stock market volatility
based on time series data, IEEE Access, vol. 7, pp. 17287-17298, (2019).
[12] S. Tekin and E. Canakoglu, Analysis of price models in Istanbul stock exchange, Proc. 27th
Signal Process. Commun. Appl. Conf. (SIU), pp. 1-4, Apr. (2019).
50
[13] Xuan Ji, Jiachen Wang, Zhijun Yan, A stock price prediction method based on deep learning
technology, International Journal of Crowd Science, (2020).
[14] Y. Trichi li, M. B. Abbes and A. Masmoudi, Predicting the effect of Googling investor
sentiment on Islamic stock market returns: A five-state hidden Markov model, Int. J. Islamic
Middle Eastern Finance Manage., vol. 13, no. 2, pp. 165-193, Feb. (2020).
[15] Yaohu Lin, Shancun Liu, Haijun Yang, Harris Wu Stock Trend Prediction Using
Candlestick Charting and Ensemble Machine Learning Techniques With a Novelty Feature
Engineering Scheme IEEE Access, 2021.
51
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
(Deemed to be University u / s 3 of UGC Act, 1956)
Chennai 600091
Chennai 600122
2 Address of Candidate Chennai 600089
Mail ID : juslinsj@[Link]
Mobile Number : 9597694549
52
10 Name and address of the Co-Supervisor /
Guide
13 Plagiraism Details : (to attach the final report from the software)
NA NA NA
Appendices
I / We declare that the above information have been verified and found true to the best of my / our knowledge.
Dr. K. Raja
53
54
55