Week 1 Homework ITS 632 UC

1
Week 1 Homework
LAXMI RAVULA
Department of the Information Technology, University of the Cumberlands
ITS-632-B02: Intro to Data Mining (Second Bi-term)
Prof. Amit Karmaker
July 9, 2023
2
Week 1 Homework
Answer 1
Knowledge Discovery in Databases (KDD) is a systematic and iterative process for
extracting valuable knowledge from large datasets (Tan et al., 2020). It involves several
interconnected steps, starting with selecting relevant data from various sources. The selected data
is then preprocessed to ensure its quality and prepared for analysis. Next, the data is transformed
into a suitable format for applying data mining techniques, such as classification, clustering, and
association rule mining. These techniques help uncover patterns, relationships, and models
within the data. The discovered knowledge is then interpreted and evaluated with the help of
domain experts and evaluation metrics to assess its significance and usefulness. Finally, the
knowledge derived from the process is presented meaningfully to facilitate decision-making and
problem-solving (Saghari et al., 2023). KDD is a powerful tool for gaining actionable insights,
improving processes, and making informed decisions across various domains, including
business, healthcare, finance, and more.
Answer 2
Traditional data analysis techniques have struggled to meet the challenges posed by big
data applications. The excellent volume, velocity, and variety of big data present practical
difficulties for conventional methods due to their limited scalability and adaptability. Addressing
these challenges requires ongoing research and developing innovative algorithms, techniques,
and methodologies (Piatetsky-Shapiro et al., 2006). It also calls for collaboration between data
scientists, domain experts, and stakeholders to effectively utilize data mining results in real-
world applications. This proactive approach combines advancements in scalability, high

3
dimensionality, data ownership distribution, and non-traditional analysis to tackle the
complexities of extensive data analysis.
Heterogeneous and complex data refers to datasets that exhibit diverse attributes and
intricate structures, posing challenges for traditional data analysis methods designed for
homogeneous datasets. As data mining becomes increasingly prevalent in fields like business,
science, and medicine, there is a growing demand for techniques capable of effectively handling
these complexities (Tan et al., 2020). Such data includes web and social media data containing
text, hyperlinks, images, audio, and videos, DNA data with sequential and three-dimensional
structures, and climate data with varying measurements across time and locations. Analyzing and
extracting valuable insights from these data types necessitates specialized techniques that
account for relationships, such as graph connectivity and parent-child connections. Ongoing
research and development in this area drive advancements in data mining, enabling us to unlock
the hidden knowledge within heterogeneous and complex datasets and enhance decision-making
across various domains (Piatetsky-Shapiro et al., 2006).
Answer 3
Data mining has transitioned from an intermediate process in the KDD framework to an
independent academic field. Originating from workshops in the late 1980s, it has grown into
conferences attended by researchers and industry professionals, fueling its development (Tan et
al., 2020). Data mining encompasses data preprocessing, mining, and postprocessing, drawing
upon methodologies from various disciplines such as statistics, AI, pattern recognition, and
machine learning. It also incorporates ideas from optimization, information theory, and other
areas to address the challenges posed by big data. Supporting areas like database systems, high-
performance computing, and distributed techniques play crucial roles. The relationship of data
4
mining to other fields like statistics, AI, machine learning, and pattern recognition and
showcasing its interdisciplinary connections and ability to handle knowledge extraction from
large and complex datasets.
Data mining integrates statistics, artificial intelligence (AI), machine learning (ML), and
pattern recognition to extract valuable insights from data. It incorporates statistical techniques for
data analysis, using methods like hypothesis testing and regression analysis to evaluate the
significance of discovered patterns (Tan et al., 2020). As a subfield of AI, data mining utilizes AI
techniques such as knowledge representation and reasoning to develop intelligent algorithms.
ML algorithms are crucial in data mining, facilitating pattern identification and prediction
through classification, regression, clustering, and anomaly detection. Moreover, pattern
recognition techniques like neural networks and decision trees are employed to uncover
meaningful patterns. Combining these components makes data mining a powerful tool for
extracting knowledge and making informed decisions across various domains.
Answer 4
Data mining tasks can be broadly categorized into two main types: predictive tasks and
descriptive tasks. Predictive studies aim to predict the value of a specific target variable based on
other independent variables (Mukhopadhyay et al., 2014). These tasks involve building models
that can forecast or classify future instances. Classification tasks are used when the target
variable is discrete, while regression tasks are employed for continuous target variables.
Examples of predictive studies include predicting customer behavior, forecasting stock prices, or
diagnosing diseases based on medical test results. On the other hand, descriptive tasks focus on
uncovering patterns, relationships, clusters, anomalies, and trends within the data. They provide
insights into the underlying characteristics and summarize the relationships present in the
5
dataset. Descriptive tasks are often exploratory and may employ clustering, association rule
mining, or anomaly detection techniques. These tasks typically require postprocessing techniques
to validate and explain the discovered patterns. Predictive studies are centered around accurate
predictions, while descriptive lessons aim to summarize and understand the data's intrinsic
properties and relationships (Mukhopadhyay et al., 2014).
Predictive tasks in data mining, such as forecasting and classification, are essential for
businesses. They enable accurate predictions and classifications, aiding sales forecasting,
customer segmentation, fraud detection, and risk assessment. The insights gained from predictive
tasks support decision-making, strategic planning, and proactive actions, allowing organizations
to optimize resources, prevent issues, and make informed choices based on anticipated outcomes
(Tan et al., 2020). Descriptive tasks in data mining are essential for gaining insights into patterns,
trends, and relationships within the dataset, facilitating understanding, exploration,
summarization, and validation/explanation. They aid in understanding the data's characteristics,
uncovering hidden patterns, summarizing key findings, and validating the results obtained from
data mining techniques. Descriptive tasks are vital in enhancing the understanding and
interpretation of data, facilitating effective communication, and increasing confidence in the
derived insights (Mukhopadhyay et al., 2014).
In conclusion, predictive and descriptive tasks play crucial roles in data mining, offering
distinct yet complementary contributions. Predictive studies allow organizations to forecast
future outcomes, make accurate classifications, support decision-making processes, and take
proactive actions. On the other hand, descriptive tasks help analysts comprehensively understand
the data, uncover hidden patterns, summarize key findings, and validate the results obtained.
Together, these tasks provide a comprehensive approach to extracting valuable insights, enabling
6
informed decision-making, and unlocking the potential of data assets across various domains and
industries. By leveraging the power of both predictive and descriptive tasks, organizations can
harness the full potential of their data and gain a competitive edge in today's data-driven world.
7
References
Mukhopadhyay, A., Maulik, U., Bandyopadhyay, S., & Coello, C. A. (2014). A survey of
multiobjective evolutionary algorithms for data mining: Part I. IEEE Transactions on
Evolutionary Computation, 18(1), 4–19. https://doi.org/10.1109/tevc.2013.2290086
Piatetsky-Shapiro, G., Djeraba, C., Getoor, L., Grossman, R., Feldman, R., & Zaki, M. (2006).
What are the grand challenges for data mining? ACM SIGKDD Explorations Newsletter,
8(2), 70–77. https://doi.org/10.1145/1233321.1233330
Saghari, A., Budinská, I., Hosseinimehr, M., & Rahmani, S. (2023). A robust-reliable decision-
making methodology based on a combination of stakeholders’ preferences simulation and
KDD techniques for Selecting Automotive Platform Benchmark. Symmetry, 15(3), 750.
https://doi.org/10.3390/sym15030750
Tan, P.-N., Steinbach, M., & Kumar, V. (2020). Introduction to data mining. Pearson Education.

Week 1 Homework ITS 632 UC

Uploaded by

Week 1 Homework ITS 632 UC

Uploaded by

1

Department of the Information Technology, University of the Cumberlands

ITS-632-B02: Intro to Data Mining (Second Bi-term)

Prof. Amit Karmaker

Knowledge Discovery in Databases (KDD) is a systematic and iterative process for

business, healthcare, finance, and more.

world applications. This proactive approach combines advancements in scalability, high

dimensionality, data ownership distribution, and non-traditional analysis to tackle the

complexities of extensive data analysis.

across various domains (Piatetsky-Shapiro et al., 2006).

large and complex datasets.

techniques such as knowledge representation and reasoning to develop intelligent algorithms.

through classification, regression, clustering, and anomaly detection. Moreover, pattern

extracting knowledge and making informed decisions across various domains.

properties and relationships (Mukhopadhyay et al., 2014).

trends, and relationships within the dataset, facilitating understanding, exploration,

summarization, and validation/explanation. They aid in understanding the data's characteristics,

interpretation of data, facilitating effective communication, and increasing confidence in the

derived insights (Mukhopadhyay et al., 2014).

distinct yet complementary contributions. Predictive studies allow organizations to forecast

multiobjective evolutionary algorithms for data mining: Part I. IEEE Transactions on

Evolutionary Computation, 18(1), 4–19. https://doi.org/10.1109/tevc.2013.2290086

8(2), 70–77. https://doi.org/10.1145/1233321.1233330

making methodology based on a combination of stakeholders’ preferences simulation and

You might also like