0% found this document useful (0 votes)

3 views10 pages

Enhanced FP-Growth Framework and Apriori Algorithm Utilizing TDA for Big Data Analysis

To efficiently analyze huge datasets, mining big data requires advanced computational techniques and algorithms. Apriori and FP-Growth are two of the most well-known algorithms in data mining. They help businesses make decisions based on customer trends and behaviors by finding patterns and correlations. Machine learning has made these algorithms even better by making them more accurate and efficient. The association rule approach does have some problems, though.

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views10 pages

Enhanced FP-Growth Framework and Apriori Algorithm Utilizing TDA for Big Data Analysis

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Volume 10, Issue 12, December – 2025 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 [Link]

Enhanced FP-Growth Framework and Apriori

Algorithm Utilizing TDA for Big Data Analysis
Abdulkader Mohammed Abdulla Al-Badani*1
1
Faculty of Science and Engineering,
Department of Computers,, Aljazeera University, Ibb, Yemen.

Publication Date: 2025/12/19

Abstract: To efficiently analyze huge datasets, mining big data requires advanced computational techniques and algorithms.
Apriori and FP-Growth are two of the most well-known algorithms in data mining. They help businesses make decisions
based on customer trends and behaviors by finding patterns and correlations. Machine learning has made these algorithms
even better by making them more accurate and efficient. The association rule approach does have some problems, though.
For example, it needs a lot of memory, it has to search through all the data sets to find the frequency of an item set, and it
sometimes makes rules that aren't the best. This study conducts a comparative analysis of the FP-Growth, Apriori, and
TDA algorithms, demonstrating notable performance differences. The FP-Growth algorithm was much better at working
with large datasets than the Apriori method, which had problems with scalability and took longer to process larger datasets,
even though it was easier to build. This study suggests changes to the FP-Growth algorithm to fix these problems. It uses
the TDA matrix to make a very compact FP-tree. This method tries to cut down on the time it takes to mine and the number
of items that are created, which will make memory use more efficient and speed up processing for large datasets. In short,
the proposed method is a promising way to make data mining processes more efficient and scalable, especially when it comes
to big data analytics.

Keywords: FP-Growth Algorithm, Aprioiri Algorithm, FP-tree, Support Count, TDA.

How to Cite: Abdulkader Mohammed Abdulla Al-Badani (2025) Enhanced FP-Growth Framework and Apriori Algorithm Utilizing
TDA for Big Data Analysis. International Journal of Innovative Science and Research Technology, 10(12), 919-928.
[Link]

I. INTRODUCTION make itemsets, which cuts down on the number of times the
database needs to be scanned for association analysis.
In today's world of information technology, the rapid
increase in data creation means that we need better ways to Even though the FP-Growth algorithm has come a long
collect and store data. This change has greatly improved the way, we still need to improve data mining methods so that we
ability to collect and store huge amounts of data[1], which can get more information from large datasets. Current
makes it easier to analyze all of that data and lets businesses research is still looking into new ways to make data mining
get useful information from large amounts of it. Being able to processes more efficient, with the goal of getting around
handle and understand large datasets is now an important part current problems and making them work better. This research
of making decisions in many fields. Association rule mining aims to augment the existing discourse by exploring
is one of the many methods that have been created to deal with innovative methodologies that improve the efficiency and
these problems. It is a key way to find patterns in large efficacy of association rule mining[3]. This research seeks to
datasets. enhance the field of data mining by filling existing knowledge
gaps and offering valuable insights for businesses aiming to
Association rule mining, especially the Apriori utilize large datasets for strategic decision-making.
algorithm, has been very helpful in finding frequent itemsets
and making Boolean association rules[2]. Since it was first The field of data mining has become more and more
created, the Apriori algorithm has been improved many times, important for getting useful information from large
making association analysis faster and more accurate. Even datasets[4]. This helps people make decisions in many
with these improvements, the algorithm's dependence on different areas. Data mining involves a number of important
candidate generation and multiple database scans makes it steps, including preparing the data, choosing the right
hard to work with large datasets quickly. To deal with these methods, putting them into action, looking at the results, and
problems, new methods like the FP-Growth algorithm have figuring out what they mean. Every step is important for
come up. The FP-Growth algorithm uses a tree structure to making sure that the results are correct and useful.

IJISRT25DEC579 [Link] 919

Volume 10, Issue 12, December – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 [Link]

Classification, regression, clustering, association, and ranking II. RELATED WORK

models are some of the most common data mining methods
because they can handle many different types of data and are The literature on Frequent Itemset Mining (FIM) has
very useful [5, 6]. Clustering techniques are especially good progressed considerably[15,16,17,18], resulting in the
at finding natural groups in datasets, while classification creation of various algorithms aimed at improving the
methods are great at putting data into groups that have already efficiency and scalability of mining association rules. A new
been set up. Researchers can find important patterns and FP-Linked list method has been introduced, as explained in
insights that can help them make smart decisions by carefully [19]. This method is based on the FP-Growth idea. This new
choosing these methods based on the study's goals. method uses a bit matrix to find common patterns, which
speeds up calculations and uses less memory. The use of a bit
Even though data mining techniques have come a long matrix makes it easy to quickly find relevant itemsets, which
way, they are still not widely used in inventory management, makes the algorithm especially useful for working with large
especially when it comes to improving stock levels and datasets. This progress meets the urgent need for effective
making better predictions. Analyzing inventory is an data mining methods that can deal with the growing amount
important part of running a business. Good management can and complexity of data in modern applications.
cut costs and improve service delivery. FP-Growth and
Apriori are two well-known data mining algorithms that have [16] talks about Distributed Frequent Pattern Analysis in
shown promise in this area. These algorithms are very helpful Big Data, which shows that people are still interested in
for finding patterns in inventory items that happen often and distributed data processing. This study employs the FP-
connect with each other [7, 8]. They show businesses which Growth algorithm to discover frequent itemsets without the
items are often bought together, which lets them improve their need for candidate generation. The addition of incremental
stock levels and buying strategies, which leads to better FP-Growth analysis is especially important because it tries to
inventory management. cut down on unnecessary tree structures and database scans.
This method greatly improves scalability and efficiency,
The FP-Growth (Frequent Pattern Growth) method is a making it a great choice for managing large amounts of data.
modified version of the Apriori method [9,10]. The FP The method speeds up data processing by lowering latency
Growth method finds common sets of items by making a tree, and computational costs by scanning less often.
or FP-Tree [11]. FP-Growth works better because of the FP-
Tree idea. FP-Growth [12] is a new and very effective tree- Even with these improvements, there are still some gaps
based method for mining sets of items that happen a lot. Our in our understanding of frequent pattern mining[17].
method makes a lot fewer candidate sets than the standard Furthermore, the amalgamation of these methodologies with
Apriori algorithm, which speeds up processing and uses less nascent technologies like machine learning and artificial
memory. FP-Growth effectively identifies patterns without intelligence offers a promising direction for future research.
necessitating multiple database scans by focusing on the
compressed representation of the dataset within the FP-Tree. In recent years, there has been a lot of interest in using
A divide-and-conquer strategy is carefully studied and used to association rule mining to look at sales patterns, especially to
make the conditional FP-tree smaller. You will need to scan get better insights into customer transactions[20]. The Apriori
the dataset twice for this. The FP-tree shows fewer details algorithm, known for its ability to find important patterns in
about the transactions. FP-Growth is limited because a how customers buy things, helped improve customer
compact representation does not reduce the potential satisfaction and guided sales efforts. This study showed how
combinatorial number of candidate item sets [13]. Also, the useful the algorithm is for getting useful information from
main memory can't handle the large database structure transactional data, which helps businesses make better
because the tree that comes from it could be very big [14]. The decisions.
proposed technique employs a novel two-dimensional array
structure derived from the FP-Growth algorithm, referred to Conversely, a comparative performance analysis of the
as the Two-Dimensional Array (TDA), in lieu of the tree Apriori and FP-Growth algorithms was performed,
structure utilized in previous methods. The matrix TDA lets emphasizing the respective advantages of each method in
you store and get to frequently occurring itemsets more easily, association rule mining [21]. The study used the Weka
which speeds up calculation and result extraction compared to platform to look at the algorithms based on different factors,
the old tree-based method. such as the number of instances, confidence, and support
levels. The results showed that the FP-Growth algorithm
Here is how the rest of the paper is set up: Section 2 worked better than the Apriori method. This was mostly
contains relevant work. Section 3 talks about the Research because it managed memory better and cost less to run.
Method. Section 4 has great information about the TDA. The Because of these features, FP-Growth can handle larger
proposed algorithm is shown in Section 5. Section 6 goes into datasets more efficiently, which means it runs faster and can
great detail about the experiment's discussions and grow more easily. As a result, FP-Growth is preferred by
conclusions. the end of Section 7. professionals who work with large transactional databases
because it always provides faster processing and stronger
performance.

IJISRT25DEC579 [Link] 920

Volume 10, Issue 12, December – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 [Link]

In [22], a better FP-Growth method for mining rules that prospects for its utilization in optimization contexts beyond
are based on descriptions is introduced. They have made a those initially investigated in the research.
unique change to how gene groups are described using the
Gene Ontology (GO) FP-Growth algorithm. The results show III. APRIORI ALGORITHM
that the new method lets you make rules faster. Reference [23]
introduces a new way to mine association rules using FP- The method described in this study successfully
Linked lists. It has come up with a new way to mine frequent identifies subsets that are shared by only a few item sets,
patterns that uses a linked list structure and a bit matrix to find showing that it can be used in many different fields. The
them. This is based on the FP-Growth idea. This method method consistently produces accurate and meaningful results
makes things more efficient by using less memory and by using regular pattern mining techniques based on support
speeding up the process of finding patterns. and confidence metrics. The research results show that the
method can successfully find patterns in data that are both
In [24], you can learn about good ways to find frequent common and important. This ability to see important patterns
itemsets using data mining. These methods are based on the makes it easier to make decisions in many situations, such as
frequent pattern development approach and are meant to recommendation systems and market basket analysis. This
protect privacy, usefulness, and speed in mining frequent shows how useful and flexible the approach is in real life.
itemsets. In [25], a better way to mine frequent itemsets is
created. It has put forward a more effective, non-recursive In other words, the difference in execution time and
FPNR-growth method that boosts performance in terms of magnitude of results between the two databases points out
both space and time complexities. This new method closes the how much the number of attributes and the general operation
gap between theoretical research and real-world use by complexity affect the performance of the algorithm. The
lowering the computing overhead and making sure that the underlying message here is that, while one is constructing the
patterns mined are useful and relevant to real-world situations. algorithm, the database architecture should be understood first
By focusing on these improvements, the method makes because it reflects the efficiency and effectiveness of the data
frequent itemset mining much more scalable, which is useful to be processed.
for large datasets that are common in fields like banking and
retail. As a result, practitioners can get useful information IV. FP-GROWTH ALGORITHM
faster, which leads to better ways of making decisions in the
long run. The FP-Growth algorithm is an important tool for data
mining, especially for quickly finding frequent itemsets
The literature on association rule mining has changed a without having to create candidates. The algorithm has two
lot over time. FP-Growth trees have become a popular method main steps: building the FP-tree and using FP-tree recursion
because they work well with large datasets. A significant to find frequent itemsets. To start, the FP-tree is built by
research by [26] presents a novel method that eliminates the scanning the dataset to find feature items. These items are then
necessity for building conditional FP-trees, thereby improving put in the first column in order of their support, from highest
the efficiency of frequent itemset mining. This improvement to lowest. This structure makes sure that the most important
solves a major problem with traditional FP-Growth methods, things come first, which makes it easier to get around. At the
which often require a lot of extra computing power because same time, the second column has a chain table that connects
conditional tree construction is recursive. By removing this nodes of the same items in the FP-tree. This keeps the structure
step, the suggested method not only makes mining easier but consistent, which is important for recursive mining. During
also makes it useful in more areas that need real-time data the mining phase, the algorithm goes through the FP-tree in a
analysis. systematic way to find frequent itemsets. It uses conditional
FP-trees to narrow down and isolate patterns. This two-phase
The study further clarifies the benefits and limitations of method not only makes computations more efficient, but it
FP-Growth trees in association rule mining. It shows how also makes traditional candidate generation methods much
well the method can handle large datasets, which is very less complicated. Because of this, the FP-Growth algorithm
important in areas like market basket analysis, bioinformatics, is a great way to find patterns in large datasets. It is both fast
and network traffic analysis. The decrease in computational and accurate for frequent itemset mining.
overhead is especially important because it makes this method
a good choice for real-time applications where speed and Below is the pseudo-code for the FP-Growth algorithm
efficiency are very important. in a transaction database [27]:

A major part of this research is the creation of a new Input: Dataset D; support threshold min_sup
algorithm that uses the TDA (Two-Dimensional array) The output is an FP-tree.
structure. This algorithm solves difficult optimization
problems and makes FP-tree-based algorithms work better,  To get the frequent 1 item set L1, first go through the
getting better accuracy and speed than earlier methods. The dataset and find the support for each feature item. Then,
fact that it consistently outperforms different datasets shows sort the items in order of support and use min_sup to get
how strong it is and how it could change the way optimization rid of the less common ones.
works. The algorithm's capacity to provide expedited and  You can get the frequent 1 item set L1 by fi Make the root
accurate solutions to complex issues indicates favorable node of the FP-tree, give it the value T, and set the content

IJISRT25DEC579 [Link] 921

Volume 10, Issue 12, December – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 [Link]

of the root node to "null." Make a list of items that are  The items that show up a lot in the transactions are filtered
often used and leave the connection blank. To move on to based on L1, recorded as P, and sorted according to the
the second iteration of the dataset, follow these steps. First, feature item order in L1.
the dataset is traversed, and the support for each feature  Change the connections that are relevant in the table of
item is calculated. Then, the items are sorted in order of commonly used objects.
support, and the infrequent items are filtered out using
min_sup. There is a header table that the FP-tree is linked to. The
 for the D to do business. header table sorts single items and their numbers by how often
 The items in the transactions are sorted based on the they appear, from most to least. Table 1 is a transactional
feature item order in L1, and the items that show up most dataset, and Figure 1 shows the FP-tree that the FP-Growth
often are filtered based on L1. The items are then recorded algorithm made from this data. Every node in the FP-tree
as P. shows an item and how many times it appears. The tree
structure makes it easy to mine itemsets that come up often
without making candidate sets.

Table 1: A Dataset with Nine Transactions.

TID List of items
T1 I1,I2,I5
T2 I2,I4
T3 I2,I3
T4 I1,I2,I4
T5 I1,I3
T6 I2,I3
T7 I1,I3
T8 I1,I2,I3,I5
T9 I1,I2,I3

Header Table

Fig 1: An Example of FP-tree(minsup=50%).

Table 2 displays the frequent itemsets that were generated.

Table 2: The Discovered Frequent Itemsets by FP-Growth Algorithm.

TID Conditional FP-tree Frequent itemsets
I5 <I2:2,I1:2> {I2,I5:2}, {I1,I5:2}, {I2,I1,I5:2}
I4 <I2:2> {I2,I4:2}
I3 <I2:4,I1:2>,<I1:2> {I2,I3:4},{I1,I3:4}, {I2,I1,I3:2}
I1 <I2:4> {I2,I1:4}

V. THE PROPOSED ALGORITHM ordered items. Each cell in this array holds the support value
for the itemsets that go with it. This makes it easy and quick
The Two-Dimensional Array (TDA) is an important tool to get important information about frequently used itemsets
for summarizing transactional databases because it organizes from the database. This systematic arrangement not only
all of the frequent itemsets in a way that makes sense. The makes it easier to get to the data, but it also makes it easier to
TDA is organized in a way that makes it easier to find support. analyze transactional patterns.
Its dimensions are N×M, where N is the number of
transactions and M is the maximum number of frequently

IJISRT25DEC579 [Link] 922

Volume 10, Issue 12, December – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 [Link]

Table 3 shows how each Ordered Frequent Itemsets List time and effort needed to make frequent itemsets. This makes
(OFIL) is processed over and over again to build the TDA. At data analysis easier and more efficient overall.
first, the TDA matrix is filled with "0" values, which sets a
baseline for adding more data later. During each iteration, By getting rid of rare itemsets early on, the TDA
items from the OFIL lists are taken out and added to the algorithm becomes more efficient because it focuses on the
matrix. This makes the representation of transactional data most important and relevant patterns. This leads to better data
more accurate. After this process is done, Table 3 gives a full mining results. This method works much better than the FP-
explanation of the finished TDA, showing how well it works Growth method. The TDA algorithm speeds up the mining
to summarize and present transactional insights. This process by quickly getting rid of non-frequent itemsets. This
methodical approach shows how useful the TDA is for making makes it easier and more accurate to find patterns in the data.
data analysis in transactional databases more accurate and So, the algorithm not only speeds up the mining process but
efficient. also makes the results more accurate, which means that large
datasets can give researchers and practitioners more useful
Table 3. The TDA. information. This enhancement aids strategic planning and
T1 I2 I1 I5 0 decision-making by diminishing processing time and
T2 I2 I4 0 0 computational burden while producing more frequent
T3 I2 I3 0 0 itemsets. The TDA algorithm works better than FP-Growth,
T4 I2 I1 I4 0 which means it can analyze data faster and handle bigger
T5 I1 I3 0 0 datasets without losing quality. This feature lets businesses
T6 I2 I3 0 0 make quick, smart choices that lead to more efficiency and
T7 I1 I3 0 0 new ideas in many fields. The algorithm's streamlined
T8 I2 I1 I3 I5 approach also makes it easier to work with large datasets, and
T9 I2 I1 I3 0 its ability to grow makes it even better for frequent itemset
mining tasks. The next sections will give a full explanation of
The FP-Growth algorithm works well with small the proposed algorithm, including its benefits and uses:-
datasets, but it has big problems when used with large datasets
Input
because it needs a lot of memory and takes a long time to build
• Transaction database (DB)
an FP-tree and find frequent itemsets. The FP-tree may grow
too big for the main memory, making the method useless for • Minimum support threshold (minsup)
analyzing large amounts of data. The One-Itemset-at-a-Time
Output
Mining (TDA) method, along with a minimum support
(minsup) threshold, is a good way to deal with these problems. • Identified frequent (recurring) itemsets
This method uses a two-dimensional array that is updated in Step 1: Database Scanning and Frequent Item Preparation
real time with new itemsets. This solves the memory problems After each transaction is processed:
that come with the traditional FP-Growth method. The TDA 1. Scan the entire database.
method improves memory management and scalability by 2. Identify:
o The set F of all items.
focusing on one itemset at a time. This makes it possible to
o The supporters (transactions) for each item in F.
process large datasets efficiently without running out of
memory. The TDA approach is a big step forward in the field o The frequent items (those meeting minsup).
of data mining because it gives you a strong way to work with 3. Sort F in descending order of frequency to form OFIL
large amounts of data. (Ordered Frequent Item List).
4. Remove all infrequent items.
The suggested method starts scanning the dataset from
This step ensures that only the most relevant, high-
the last column and uses the Transactional Data Analysis
(TDA) to figure out how much support each item in each support items remain, improving both computational
column gets. It then skips groups of items that don't get enough efficiency and the overall accuracy of later itemset mining
support. The system improves efficiency by cutting down on processes.
the search space by removing infrequent itemsets. This
method makes it easier to quickly find frequent itemsets in big Step 2: Construction of the TDA
datasets by focusing on the most important elements and For every transaction row that corresponds to the OFIL
structure:
getting rid of those with little support. So, this method not
only makes the math easier, but it also makes sure that the o Insert each frequently occurring item into the appropriate
column, following the sorted order in OFIL.
analysis is accurate and important when dealing with large
2. The resulting matrix:
amounts of data. The last column will show the frequencies
of the previous itemsets and those of similar records. This will o Highlights usage/consumption patterns.
help find strong correlations and trends. Such insights are o Provides a clear view of item availability and inventory
useful for making smart choices because they are more likely status.
to lead to real-world results. The method lets you get useful o Helps determine which items may need restocking.
information by focusing on high-frequency itemsets, which
are then taken out of the TDA. This process speeds up the Maintaining OFIM ensures a clean, organized data
process of finding patterns in the data, which cuts down on the structure for pattern mining.

IJISRT25DEC579 [Link] 923

Volume 10, Issue 12, December – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 [Link]

Step 3: Frequent Itemset Generation Table 4. Displays the Created Frequent Item Sets
Let c be the number of columns in the TDA. Frequent itemsets
{I2,I3:2}, {I1,I3:2}, {I2,I1,I5:2}
 Initialize
{I2,I1,I3:2}
• Set c = M, where M is the total number of columns in
TDA.
VI. RESULTS AND DISCUSSIONS
 Process Columns in Reverse Order
The UCI Machine Learning Repository is an important
For each column from c = M down to 1: resource for the data mining and knowledge discovery
communities. It has a wide range of benchmark and real-world
Case A: When c = 1 datasets that are necessary for testing the effectiveness of new
1. Compare frequent items in the current column with those methods [28]. Researchers can rigorously test their algorithms
in previous columns. across a wide range of fields, such as the social sciences,
2. Compile their supporters. biology, and economics, by using these datasets. This variety
3. Represent results as [r, f : n | OFIL], where: not only makes the algorithms stronger, but it also encourages
o r = parent frequent item from earlier columns. new uses in many different areas. Researchers use these
o f = frequent item in the current column. databases to find patterns, back up their results, and learn more
o n = support count. about the things they are studying. This study compares the
4. Retrieve the rows of item f from the supporters of column suggested method to the well-known FP Growth algorithm by
1. looking at how many frequent itemsets it finds and how long
5. Retrieve the corresponding rows of item r. it takes to get these itemsets from the datasets. The results
6. Remove these extracted rows from TDA to eliminate show that the suggested method makes data mining much
duplicates and maintain accuracy. more efficient by finding more frequent itemsets and cutting
down on computation time by a large amount. This
This step clarifies the relationships between items and helps enhancement facilitates the application of these
identify patterns in early-column frequent items. methodologies to larger and more intricate datasets,
potentially transforming the extraction of data-driven insights.
Case B: When c > 1 The experiments were done on a laptop with a 64-bit Windows
1. Move to column c and compare its frequent items with 10 operating system, Python, 32GB of RAM, and an Intel(R)
those in earlier columns. Core(TM) i7-10850H CPU @ 2.70GHz. Table 6 shows a
2. Compile supporters for each frequent item. detailed statistical analysis of the datasets used in this
3. Represent results again as [r, f : n | OFIL]. comparative study. These datasets range in size and
4. Extract rows linked to the repeating parent items. complexity from small to large. These differences make it
5. Process the repeating item f according to its order. easier to fully evaluate the analytical techniques, showing how
6. Remove the extracted rows from TDA. useful and relevant they are in many different situations.
7. Verify that the remaining TDA structure is consistent and
matches the master file. Table 5: Characteristics of the Test Datasets

This makes sure that the generation of frequent itemsets Datasets Size #Transactions
happens in an orderly way and that the relationships between Poker Hand 23.9MB 268325
items are correctly represented. Sepsis Survival Minimal 1.31MB 110205
Clinical Records
Table 4 shows a complete list of all the frequent item sets
that were found in the data analysis. The sets are arranged by
their support values, with the ones with the highest frequency In algorithmic performance evaluation, employing
at the top. This systematic arrangement makes it easier to find varied datasets is essential for evaluating the effectiveness of
patterns and trends in the dataset. Researchers can find various computational methodologies. This study utilized
important links between products that might affect how people two separate datasets to methodically evaluate the
buy things by looking closely at these common item groups. performance of algorithms under diverse conditions. The
These kinds of insights are very helpful for creating targeted results showed a big difference: one algorithm was much
marketing plans and improving inventory management to faster than the other, while the other was much more accurate.
better match what customers want. So, looking at frequent This difference shows how important it is to consider the
item sets not only helps us understand how consumers behave, situation when choosing the best algorithmic strategy for a
but it also gives us useful information that we can use to make job. A detailed comparison of the FP-Growth algorithm and
strategic business decisions. the suggested algorithm reveals their strengths and
weaknesses, which helps us understand these results better.
This analysis ultimately emphasizes the significance of
considering both speed and accuracy, among other factors,
when selecting an algorithm for practical application.

IJISRT25DEC579 [Link] 924

Volume 10, Issue 12, December – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 [Link]

 Experiment One learning models not only made predictions more accurate, but
The experiment used the Poker Hand dataset, which has they also gave us a better understanding of how to play
records of hands of five playing cards taken from a standard complex games. This progress opens up new possibilities for
deck of 52 cards. There are ten predictive features for analysis looking into how AI can make it easier to make decisions in
because each card has two properties: suit and rank. This poker and other games that require strategy. Researchers
dataset has a lot of information that makes it easy to look at showed that the proposed method worked better than the
different poker hands and their chances of winning. Using original FP-Growth algorithm in a number of different
machine learning, it is possible to make models that can guess situations by changing the minsup parameters. Table 6 shows
how strong a hand is or find the best ways to play. We did a the results, including how long it took to find the frequency of
lot of tests with different minimum support (minsup) values to itemsets and the number of frequent itemsets at different
see how well the proposed approach worked compared to the minsup values, like 30%, 45%, 50%, and 60%. These results
original FP-Growth algorithm. The Poker Hand dataset show how AI could help improve strategic gameplay and
offered a practical and realistic environment for evaluating the decision-making.
algorithm's efficacy. The results showed that the machine

Table 6: Comparison Results for the Poker Hand Dataset with Various minsup thresholds.
No. minsup Execution time per milliseconds (s) # Discovered Frequent itemsets
Apriori FP-Growth New Apriori Apriori New
algorithm algorithm
1 30% 84.745 77.288 7.932 1061 864 266
2 45% 82.815 75.267 7.544 837 602 245
3 50% 81.897 74.875 7.165 831 535 94
4 60% 79.197 72.586 7.154 827 411 53

The minimum support (minsup) threshold has a big shows how three algorithms compare in terms of performance
effect on how well data mining algorithms work. It changes at four different minsup thresholds. It shows that the proposed
both the number of frequent itemsets created and the time it method is more efficient than the FP-Growth algorithm. The
takes to run. When the minsup values go up, the number of results show a big increase in execution speed, which shows
frequent itemsets and the execution times of both the proposed that the proposed method can handle large amounts of data
method and the FP-Growth algorithm go down. The proposed well. This progress also makes data mining easier and opens
method, on the other hand, always runs faster than the FP- the door for more research into how to improve algorithmic
Growth algorithm, no matter what minsup threshold is used. performance across a range of applications. The FP-Growth
This shows that the suggested method is more scalable and method, on the other hand, needs a lot of memory and time to
efficient, especially when working with large datasets where build multiple conditional sub-trees before it can find frequent
the cost of computing is very important. The faster execution itemsets. This can make it less efficient in large data
speed not only makes everything work better, but it also makes environments.
it easier to get insights faster when analyzing data. Figure 3

Table 7: Comparison results for the Sepsis Survival Minimal Clinical Records dataset with various minsup thresholds.
No. minsup Execution Time Per Milliseconds (s) # Discovered Frequent Itemsets
Apriori FP-Growth New Apriori Apriori New
Algorithm Algorithm
1 10% 3.146 0.686 0.501 123 102 88
2 20% 2.015 0.641 0.488 102 93 42
3 30% 1.722 0.614 0.321 96 83 31
4 50% 1.430 0.561 0.315 81 61 17

IJISRT25DEC579 [Link] 925

Volume 10, Issue 12, December – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 [Link]

Fig 2: Comparing the Results of the Execution time and the minsup Thresholds for the Poker Hand Dataset.

 Experiment two not only shows how the dataset could help doctors make
The study used the Sepsis Survival Minimal Clinical better decisions, but it also shows how carefully the treatment
Records dataset, which is a complete set of 110,204 hospital outcomes were measured.
admissions in Norway between 2011 and 2012. These
admissions involved 84,811 patients who had infections, Fig 4 shows that the four different minimum support
septic shock, sepsis caused by pathogenic microorganisms, or (minsup) thresholds have very different algorithm execution
systemic inflammatory response syndrome. The wealth of times, which shows that the performance is very different.
information in this dataset makes it possible to do a detailed The data shows that the chosen minsup threshold has a big
analysis of patient outcomes, which makes it easier to assess effect on execution times, which shows how important it is to
the effectiveness of different treatment plans. Researchers choose the right threshold for algorithm efficiency.
are looking for patterns in demographics, clinical Algorithm A consistently outperformed the other algorithms
interventions, and survival rates that could help improve how tested, even when the minsup levels were lower. This means
sepsis is treated and how patients are cared for in the future. that Algorithm A is especially good at handling large
The main prediction task is to use the patient's medical datasets, making it a strong choice for applications that need
records to figure out if they will live for about nine days after a lot of data. The results show that the algorithm can improve
being admitted. Table 7 shows the execution time, the data processing and optimize computer resources, making it
number of frequent item sets found using the FP-Growth the best choice for jobs that need to be done quickly and on a
algorithm, and the best method for each of the four minimal large scale.
criteria thresholds: 10%, 20%, 30%, and 50%. This analysis

Fig3: Comparing the Results of the Execution time and the minsup Thresholds for the Sepsis Survival Minimal Clinical
Records Dataset.

IJISRT25DEC579 [Link] 926

Volume 10, Issue 12, December – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 [Link]

VII. CONCLUSION [3]. Singh R, Bhala A, Salunkhe J, et al.2015. Optimized

Apriori Algorithm Using Matrix Data Structure[J].
In this study, we conducted a comparative analysis of International Journal of Research in Engineering and
three algorithms—Apriori, FP-Growth, and TDA—focusing AppSciences,9(5),pp. 2249-3905.
on their efficacy in mining frequent itemsets. People like the [4]. Yu, C., Liang, Y., & Zhang, X. 2023. Research on
Apriori algorithm because it's easy to use and can find the Apriori algorithm based on compression processing and
best combinations of rules. FP-Growth, on the other hand, is hash table. In Third International Conference on
slower because it has to make a lot of conditional sub-trees, Machine Learning and Computer Application
which takes a long time to scan the dataset and uses a lot of (ICMLCA 2022) (Vol. 12636, pp. 606-611). SPIE.
memory. Our research introduces an enhanced FP-Growth [5]. A. S. Hoong Lee, L. S. Yap, H. N. Chua, Y. C. Low, and
methodology that employs Ordered Frequent Item Lists M. A. Ismail. 2021. “A data mining approach to analyse
(OFILs) to generate the TDA, thereby increasing mining crash injury severity level,” J. Eng. Sci. Technol., vol.
efficiency in large data environments. 16, pp. 1–14.
[6]. S. Wang, J. Cao, and P. S. Yu. 2019 . “Deep learning for
The evaluation, which looked at how long it took to run spatiotemporal data mining: A survey,” IEEE
with different minimum support (minsup) values, shows that Transactions on Knowledge and Data Engineering, vol.
the new method works better than Apriori and FP-Growth. 34, no. 8, pp. 1–21.
The results show that the program runs much faster when the [7]. Elisa, E. 2018. Market Basket Analysis Pada Mini
minsup values are higher. This proves that the suggested Market Ayu Dengan Algoritma Apriori. Jurnal RESTI
method is effective at speeding up data mining and helping (Rekayasa Sistem dan Teknologi Informasi), 2(2),pp.
people make decisions more quickly. By eliminating the need 472-478.
for conditional sub-tree construction, our method speeds [8]. Salam, A., Zeniarja, J., Wicaksono, W., & Kharisma, L.
things up and uses less memory. It handles computational 2018. Pencarian Pola Asosiasi Untuk Penataan Barang
resources and efficiency better than the old Apriori and FP- Dengan Menggunakan Perbandingan Algoritma Apriori
Growth algorithms. Dan Fp-Growth (Study Kasus Distro Epo Store
Pemalang). Dinamik, 23(2),pp. 57-65.
The study acknowledges limitations despite these [9]. M. D. Febrianto and A. Supriyanto. , 2022.
enhancements, including potential performance variability “Implementasi algoritma apriori untuk menentukan pola
across different datasets and the need for further optimization pembelian produk,” Jurikom, vol. 9, no. 6, pp. 2010–
in specific contexts. Future research should examine the 2020.
integration of the proposed methodology with other data [10]. M. M. Hasan and S. Z. Mishu..2018. “An adaptive
mining techniques and assess its applicability across diverse method for mining frequent itemsets based on apriori
domains. Also, looking into adaptive strategies that can and fp growth algorithm,” in 2018 International
change minsup values on the fly could make the algorithm Conference on Computer, Communication, Chemical,
work even better. This study generally contributes valuable Material and Electronic Engineering (IC4ME2). IEEE,
insights to the development of more efficient data mining pp. 1–4.
algorithms that require fewer resources. [11]. A. Almira, S. Suendri, and A. Ikhwan,
2021.“Implementasi data mining menggunakan
The suggested method is a big step forward in making algoritma fp-growth pada analisis pola pencurian daya
frequent item sets because it gets rid of the need to build listrik,” Jurnal Informatika Universitas Pamulang, pp.
conditional sub-trees, which makes the process go faster. This 442–448.
simpler method makes things work better, which is especially [12]. J. Han, J. Pei, and Y. Yin. .2000. “Mining frequent
helpful when looking at big datasets where old methods often patterns without candidate generation,” ACM sigmod
can't keep up with performance levels. This method is better record, no. 2, pp. 1– 12.
than the FP-Growth method because it uses less memory and [13]. F. Wei and L. Xiang. 2015. “Improved frequent pattern
works faster. The proposed algorithm's performance benefits mining algorithm based on fp-tree,” in Proceedings of
become even clearer as the minimum support (minsup) values The Fourth International Conference on Information
go up. This shows that it is better than both the FP-Growth Science and Cloud Computing (ISCC2015), pp. 18–19.
and Apriori algorithms. This shows that the method could be [14]. R. Krupali, D. Garg, and K. Kotecha. 2017. “An
a good way to deal with hard data mining jobs. improved approach of fp-growth tree for frequent
itemset mining using partition projection and parallel
REFERENCES projection techniques,” International Recent and
Innovation Trends in Computing and Communication,
[1]. Riadi, I., Herman, H., Fitriah, F., Suprihatin, S., Muis, pp. 929–934.
A., & Yunus, M. 2023. Implementation of association [15]. AGRAWAL, Rakesh, et al. 1994. Fast algorithms for
rule using apriori algorithm and frequent pattern growth mining association rules. In: Proc. 20th int. conf. very
for inventory control. Jurnal Infotel, 15(4),pp. 369-378. large data bases, VLDB. pp. 487-499.
[2]. Dunham M, Naughton J, Chen W D, et al. [16]. HAN, Jiawei; PEI, Jian; YIN, Yiwen. 2000. Mining
2010..Proceedings of the 2000 ACM SIGMOD frequent patterns without candidate generation. ACM
international conference on Management of data[J]. sigmod record, 29.2: 1-12.
Water International, 26(4),pp. 607-609.

IJISRT25DEC579 [Link] 927

Volume 10, Issue 12, December – 2025 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 [Link]

[17]. SHRIDHAR, M.; PARMAR. 2017. Mahesh. Survey on

association rule mining and its approaches. Int J Comput
Sci Eng, 5.3: 129-135.
[18]. KHANALI, Hoda; VAZIRI, Babak. 2017. A survey on
improved algorithms for mining association rules. Int. J.
Comput. A,pp. 165: 8887.
[19]. Sohrabi, M. K., & HASANNEJAD, M. H. 2016.
Association rule mining using new FP-linked list
algorithm.
[20]. Huaman Llanos, A. A., Huatangari, L. Q., Yalta Meza,
J. R., Monteza, A. H., Adrianzen Guerrero, O. D., &
Rodriguez Estacio, J. S. 2024. Toward Enhanced
Customer Transaction Insights: An Apriori Algorithm-
based Analysis of Sales Patterns at University Industrial
Corporation. International Journal of Advanced
Computer Science & Applications, 15(2).
[21]. BALA, Alhassan, et al. 2016. Performance analysis of
apriori and fp-growth algorithms (association rule
mining). Int. J. Computer Technology &Applications,
7.2,pp. 279-293.
[22]. Al-Maolegi, M., & Arkok, B. 2014. An improved
Apriori algorithm for association rules. arXiv preprint
arXiv:1403.3948.
[23]. Yuan, X. 2017. An improved Apriori algorithm for
mining association rules. In AIP conference
proceedings (Vol. 1820, No. 1). AIP Publishing.
[24]. Han J, Pei J, Yin Y. 2000. Mining frequent patterns
without candidate generation (Acm Sigmod Record,
29(2)), pp.1-12.
[25]. Gruca, A. 2014. Improvement of FP-Growth algorithm
for mining description-oriented rules. In Man-Machine
Interactions, Part of Advances in Intelligent Systems
and Computing, (AISC), Springer, vol. 242, pp. 183-
192.
[26]. Sohrabi, M. K., and Marzooni, H. H. 2016.
Association rule mining using new FP-Linked list
algorithm. Journal of Advances in Computer Research
(JACR), 7(1), pp. 23-34.
[27]. B. Zhang. 2021 .“Optimization of fp-growth algorithm
based on cloud computing and computer big data,”
International Journal of System Assurance Engineering
and Management, pp. 853–863.
[28]. Blake, C. L., and Merz., M. J, UCI Repository of
Machine Learning Databases [[Link] ics. uci.
edu/~ mlearn/ MLRepository. html]. Irvine, CA:
University of California‖, Department of Information
and Computer Science.

IJISRT25DEC579 [Link] 928

An Improved Efficient FP-Growth Algorithm Using FP-TDA Algorithm
No ratings yet
An Improved Efficient FP-Growth Algorithm Using FP-TDA Algorithm
9 pages
Body Adipo-Structuring
No ratings yet
Body Adipo-Structuring
5 pages
From Talent Insights to Market Impact: The Role of AI in Linking HR Analytics and Marketing
No ratings yet
From Talent Insights to Market Impact: The Role of AI in Linking HR Analytics and Marketing
5 pages
SecureAid Bot
No ratings yet
SecureAid Bot
8 pages
Intelligent Customer Segmentation in Digital Commerce Using K-Means Clustering
No ratings yet
Intelligent Customer Segmentation in Digital Commerce Using K-Means Clustering
12 pages
Smart-LungNet for Lung Disease Classification
No ratings yet
Smart-LungNet for Lung Disease Classification
4 pages
Isolation and Characterization of Bacteria Culture Mimics of Vibrio Cholerae from Drinking Water Samples in Internally Displaced Persons (IDPS) Camps Within North Central Nigeria
No ratings yet
Isolation and Characterization of Bacteria Culture Mimics of Vibrio Cholerae from Drinking Water Samples in Internally Displaced Persons (IDPS) Camps Within North Central Nigeria
5 pages
Management of Renal Angiomyolipomas in Pregnancy: A Protocol is Needed
No ratings yet
Management of Renal Angiomyolipomas in Pregnancy: A Protocol is Needed
5 pages
SentinelAI: An Intelligent Real-Time Face Recognition Framework for CCTV Surveillance
No ratings yet
SentinelAI: An Intelligent Real-Time Face Recognition Framework for CCTV Surveillance
9 pages
Design and Fabrication of Automated Waste Segregation System
No ratings yet
Design and Fabrication of Automated Waste Segregation System
10 pages
School Violence and the Role of Social Climate in Student Protection: A Case Study from Cameroon
No ratings yet
School Violence and the Role of Social Climate in Student Protection: A Case Study from Cameroon
4 pages
Analyzing the Influence of Market Structures on Rice Pricing in San Jose City, Nueva Ecija
No ratings yet
Analyzing the Influence of Market Structures on Rice Pricing in San Jose City, Nueva Ecija
49 pages
Examining the Effect of Human Resource Management Practices on Employee Performance in Public Sector Organisations: A Systematic Review
No ratings yet
Examining the Effect of Human Resource Management Practices on Employee Performance in Public Sector Organisations: A Systematic Review
15 pages
The Impact of Business Process Automation and Robotic Process Automation (RPA) on Telecom Operational Performance: A Case Study of STC
No ratings yet
The Impact of Business Process Automation and Robotic Process Automation (RPA) on Telecom Operational Performance: A Case Study of STC
3 pages
Characterization, Synthesis, Analytical Application of Composite Cation Exchange Materials for Environmental Metal Ion Separation
No ratings yet
Characterization, Synthesis, Analytical Application of Composite Cation Exchange Materials for Environmental Metal Ion Separation
6 pages
AI Prompting and the Development of Prompters: Implications for Nigeria’s Technological Future
No ratings yet
AI Prompting and the Development of Prompters: Implications for Nigeria’s Technological Future
9 pages
A Study on Awareness Level of Financial Literacy and Factors Influencing the Investment Decision among Employed Women in Vatakara City
No ratings yet
A Study on Awareness Level of Financial Literacy and Factors Influencing the Investment Decision among Employed Women in Vatakara City
7 pages
Influence of Promotional Channels on Consumer Purchasing Behavior for OTC Medical Devices
No ratings yet
Influence of Promotional Channels on Consumer Purchasing Behavior for OTC Medical Devices
8 pages
MediaPipe Based Workout Monitoring System Using BlazePose Models
No ratings yet
MediaPipe Based Workout Monitoring System Using BlazePose Models
8 pages
Melatonin and its Application in Dentoalveolar Surgery: A Review of Literature
No ratings yet
Melatonin and its Application in Dentoalveolar Surgery: A Review of Literature
6 pages
Human-Wildlife Conflict and its Impact on Tourism in Manas and Kaziranga National Parks
No ratings yet
Human-Wildlife Conflict and its Impact on Tourism in Manas and Kaziranga National Parks
6 pages
Web-Based Engagement and Confirming Behavior of Junior High School Students
No ratings yet
Web-Based Engagement and Confirming Behavior of Junior High School Students
3 pages
Assessing the Impact of Information and Communication Technology (ICT) on Student Learning in Higher Education: Evidence from Milton Margai Technical University and the College of Business and Information Technology, Sierra Leone
No ratings yet
Assessing the Impact of Information and Communication Technology (ICT) on Student Learning in Higher Education: Evidence from Milton Margai Technical University and the College of Business and Information Technology, Sierra Leone
8 pages
AI-Based Fraud Detection in the Telecom Sector
No ratings yet
AI-Based Fraud Detection in the Telecom Sector
4 pages
Green Marketing Practices and Consumer Purchase Intention
No ratings yet
Green Marketing Practices and Consumer Purchase Intention
4 pages
Phytochemical and Biological Characterization of Phyllanthus emblica Seed Extract
No ratings yet
Phytochemical and Biological Characterization of Phyllanthus emblica Seed Extract
11 pages
Challenges of Formalizing Informal SMEs in the Telecommunications Sector in the Democratic Republic of Congo
No ratings yet
Challenges of Formalizing Informal SMEs in the Telecommunications Sector in the Democratic Republic of Congo
4 pages
Seasonal Assessment of Organic Pollution Using DO–BOD–COD in the Water Sources of Budni Tehsil, District Sehore, M.P., India
No ratings yet
Seasonal Assessment of Organic Pollution Using DO–BOD–COD in the Water Sources of Budni Tehsil, District Sehore, M.P., India
6 pages
Mayfly Algorithm Based Convolutional Neural Network for Human Diseases Recognition System
No ratings yet
Mayfly Algorithm Based Convolutional Neural Network for Human Diseases Recognition System
10 pages
Deepfake Detection in Manipulated Images/ Audio
No ratings yet
Deepfake Detection in Manipulated Images/ Audio
11 pages
Open Gapps Arm 6.0 Micro 20161125.versionlog
No ratings yet
Open Gapps Arm 6.0 Micro 20161125.versionlog
2 pages
HP 15-Ay Series Bdl50 La-D704p
No ratings yet
HP 15-Ay Series Bdl50 La-D704p
52 pages
Ludus: Collaborative Presentation Tool
No ratings yet
Ludus: Collaborative Presentation Tool
8 pages
Kaijudex
No ratings yet
Kaijudex
5 pages
Hi-Target Qmini Manual
100% (1)
Hi-Target Qmini Manual
32 pages
.Net 500 + MCQ'S
No ratings yet
.Net 500 + MCQ'S
223 pages
Europa Universalis 3 User Manual PDF
No ratings yet
Europa Universalis 3 User Manual PDF
148 pages
Microcat Land Rover Parts Quick Start Guide
No ratings yet
Microcat Land Rover Parts Quick Start Guide
23 pages
Mobile Cybercrime: Trends and Challenges
100% (2)
Mobile Cybercrime: Trends and Challenges
58 pages
Studio One 5 Reference Manual 31072020
No ratings yet
Studio One 5 Reference Manual 31072020
372 pages
Verilog Lab Record
No ratings yet
Verilog Lab Record
18 pages
JavaScript Essentials for Developers
No ratings yet
JavaScript Essentials for Developers
32 pages
Analysis Techniques For Information Security
No ratings yet
Analysis Techniques For Information Security
164 pages
Phishing Simulation
No ratings yet
Phishing Simulation
5 pages
Neural Network Hardware Acceleration
No ratings yet
Neural Network Hardware Acceleration
5 pages
Project Proposal: AI For Natural Disaster Prediction and Response
No ratings yet
Project Proposal: AI For Natural Disaster Prediction and Response
14 pages
Unity Diagnostics
No ratings yet
Unity Diagnostics
13 pages
Radio Network Controller For 3G Mobile and Wireless Network: BY Sudhakar Shastri-14MCE1009
No ratings yet
Radio Network Controller For 3G Mobile and Wireless Network: BY Sudhakar Shastri-14MCE1009
14 pages
CND - Questions 1
No ratings yet
CND - Questions 1
13 pages
Saving and Restoring PLC Ladders
No ratings yet
Saving and Restoring PLC Ladders
55 pages
Online Food Ordering System Project
No ratings yet
Online Food Ordering System Project
40 pages
SE1964 HS186268 Chapter4
No ratings yet
SE1964 HS186268 Chapter4
3 pages
10.3.4 - Packet Tracer - Configure and Verify NTP
No ratings yet
10.3.4 - Packet Tracer - Configure and Verify NTP
2 pages
Dbms Lab
No ratings yet
Dbms Lab
10 pages
Advanced Visual Basic 6 Techniques - Wiley
100% (1)
Advanced Visual Basic 6 Techniques - Wiley
520 pages
Servicelink User Guide
100% (3)
Servicelink User Guide
130 pages
Memory Management - Early System: Understanding Operating Systems, Sixth Edition
No ratings yet
Memory Management - Early System: Understanding Operating Systems, Sixth Edition
30 pages
SOP 1: Creation of Website Using HTML5
No ratings yet
SOP 1: Creation of Website Using HTML5
4 pages
Larry Page: Google Co-Founder Overview
No ratings yet
Larry Page: Google Co-Founder Overview
9 pages
Future of Quant Investing: 10 Hypotheses
No ratings yet
Future of Quant Investing: 10 Hypotheses
15 pages

Enhanced FP-Growth Framework and Apriori Algorithm Utilizing TDA for Big Data Analysis

Uploaded by

Enhanced FP-Growth Framework and Apriori Algorithm Utilizing TDA for Big Data Analysis

Uploaded by

Volume 10, Issue 12, December – 2025 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 [Link]

Enhanced FP-Growth Framework and Apriori

Publication Date: 2025/12/19

Keywords: FP-Growth Algorithm, Aprioiri Algorithm, FP-tree, Support Count, TDA.

IJISRT25DEC579 [Link] 919

Classification, regression, clustering, association, and ranking II. RELATED WORK

IJISRT25DEC579 [Link] 920

IJISRT25DEC579 [Link] 921

Table 1: A Dataset with Nine Transactions.

Fig 1: An Example of FP-tree(minsup=50%).

Table 2 displays the frequent itemsets that were generated.

Table 2: The Discovered Frequent Itemsets by FP-Growth Algorithm.

IJISRT25DEC579 [Link] 922

IJISRT25DEC579 [Link] 923

IJISRT25DEC579 [Link] 924

IJISRT25DEC579 [Link] 925

IJISRT25DEC579 [Link] 926

VII. CONCLUSION [3]. Singh R, Bhala A, Salunkhe J, et al.2015. Optimized

IJISRT25DEC579 [Link] 927

[17]. SHRIDHAR, M.; PARMAR. 2017. Mahesh. Survey on

IJISRT25DEC579 [Link] 928

You might also like