0% found this document useful (0 votes)
6 views7 pages

Week 1 Homework ITS 632 UC

Uploaded by

laxmianirudhk
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
6 views7 pages

Week 1 Homework ITS 632 UC

Uploaded by

laxmianirudhk
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 7

1

Week 1 Homework

LAXMI RAVULA

Department of the Information Technology, University of the Cumberlands

ITS-632-B02: Intro to Data Mining (Second Bi-term)

Prof. Amit Karmaker

July 9, 2023
2

Week 1 Homework

Answer 1

Knowledge Discovery in Databases (KDD) is a systematic and iterative process for

extracting valuable knowledge from large datasets (Tan et al., 2020). It involves several

interconnected steps, starting with selecting relevant data from various sources. The selected data

is then preprocessed to ensure its quality and prepared for analysis. Next, the data is transformed

into a suitable format for applying data mining techniques, such as classification, clustering, and

association rule mining. These techniques help uncover patterns, relationships, and models

within the data. The discovered knowledge is then interpreted and evaluated with the help of

domain experts and evaluation metrics to assess its significance and usefulness. Finally, the

knowledge derived from the process is presented meaningfully to facilitate decision-making and

problem-solving (Saghari et al., 2023). KDD is a powerful tool for gaining actionable insights,

improving processes, and making informed decisions across various domains, including

business, healthcare, finance, and more.

Answer 2

Traditional data analysis techniques have struggled to meet the challenges posed by big

data applications. The excellent volume, velocity, and variety of big data present practical

difficulties for conventional methods due to their limited scalability and adaptability. Addressing

these challenges requires ongoing research and developing innovative algorithms, techniques,

and methodologies (Piatetsky-Shapiro et al., 2006). It also calls for collaboration between data

scientists, domain experts, and stakeholders to effectively utilize data mining results in real-

world applications. This proactive approach combines advancements in scalability, high


3

dimensionality, data ownership distribution, and non-traditional analysis to tackle the

complexities of extensive data analysis.

Heterogeneous and complex data refers to datasets that exhibit diverse attributes and

intricate structures, posing challenges for traditional data analysis methods designed for

homogeneous datasets. As data mining becomes increasingly prevalent in fields like business,

science, and medicine, there is a growing demand for techniques capable of effectively handling

these complexities (Tan et al., 2020). Such data includes web and social media data containing

text, hyperlinks, images, audio, and videos, DNA data with sequential and three-dimensional

structures, and climate data with varying measurements across time and locations. Analyzing and

extracting valuable insights from these data types necessitates specialized techniques that

account for relationships, such as graph connectivity and parent-child connections. Ongoing

research and development in this area drive advancements in data mining, enabling us to unlock

the hidden knowledge within heterogeneous and complex datasets and enhance decision-making

across various domains (Piatetsky-Shapiro et al., 2006).

Answer 3

Data mining has transitioned from an intermediate process in the KDD framework to an

independent academic field. Originating from workshops in the late 1980s, it has grown into

conferences attended by researchers and industry professionals, fueling its development (Tan et

al., 2020). Data mining encompasses data preprocessing, mining, and postprocessing, drawing

upon methodologies from various disciplines such as statistics, AI, pattern recognition, and

machine learning. It also incorporates ideas from optimization, information theory, and other

areas to address the challenges posed by big data. Supporting areas like database systems, high-

performance computing, and distributed techniques play crucial roles. The relationship of data
4

mining to other fields like statistics, AI, machine learning, and pattern recognition and

showcasing its interdisciplinary connections and ability to handle knowledge extraction from

large and complex datasets.

Data mining integrates statistics, artificial intelligence (AI), machine learning (ML), and

pattern recognition to extract valuable insights from data. It incorporates statistical techniques for

data analysis, using methods like hypothesis testing and regression analysis to evaluate the

significance of discovered patterns (Tan et al., 2020). As a subfield of AI, data mining utilizes AI

techniques such as knowledge representation and reasoning to develop intelligent algorithms.

ML algorithms are crucial in data mining, facilitating pattern identification and prediction

through classification, regression, clustering, and anomaly detection. Moreover, pattern

recognition techniques like neural networks and decision trees are employed to uncover

meaningful patterns. Combining these components makes data mining a powerful tool for

extracting knowledge and making informed decisions across various domains.

Answer 4

Data mining tasks can be broadly categorized into two main types: predictive tasks and

descriptive tasks. Predictive studies aim to predict the value of a specific target variable based on

other independent variables (Mukhopadhyay et al., 2014). These tasks involve building models

that can forecast or classify future instances. Classification tasks are used when the target

variable is discrete, while regression tasks are employed for continuous target variables.

Examples of predictive studies include predicting customer behavior, forecasting stock prices, or

diagnosing diseases based on medical test results. On the other hand, descriptive tasks focus on

uncovering patterns, relationships, clusters, anomalies, and trends within the data. They provide

insights into the underlying characteristics and summarize the relationships present in the
5

dataset. Descriptive tasks are often exploratory and may employ clustering, association rule

mining, or anomaly detection techniques. These tasks typically require postprocessing techniques

to validate and explain the discovered patterns. Predictive studies are centered around accurate

predictions, while descriptive lessons aim to summarize and understand the data's intrinsic

properties and relationships (Mukhopadhyay et al., 2014).

Predictive tasks in data mining, such as forecasting and classification, are essential for

businesses. They enable accurate predictions and classifications, aiding sales forecasting,

customer segmentation, fraud detection, and risk assessment. The insights gained from predictive

tasks support decision-making, strategic planning, and proactive actions, allowing organizations

to optimize resources, prevent issues, and make informed choices based on anticipated outcomes

(Tan et al., 2020). Descriptive tasks in data mining are essential for gaining insights into patterns,

trends, and relationships within the dataset, facilitating understanding, exploration,

summarization, and validation/explanation. They aid in understanding the data's characteristics,

uncovering hidden patterns, summarizing key findings, and validating the results obtained from

data mining techniques. Descriptive tasks are vital in enhancing the understanding and

interpretation of data, facilitating effective communication, and increasing confidence in the

derived insights (Mukhopadhyay et al., 2014).

In conclusion, predictive and descriptive tasks play crucial roles in data mining, offering

distinct yet complementary contributions. Predictive studies allow organizations to forecast

future outcomes, make accurate classifications, support decision-making processes, and take

proactive actions. On the other hand, descriptive tasks help analysts comprehensively understand

the data, uncover hidden patterns, summarize key findings, and validate the results obtained.

Together, these tasks provide a comprehensive approach to extracting valuable insights, enabling
6

informed decision-making, and unlocking the potential of data assets across various domains and

industries. By leveraging the power of both predictive and descriptive tasks, organizations can

harness the full potential of their data and gain a competitive edge in today's data-driven world.
7

References

Mukhopadhyay, A., Maulik, U., Bandyopadhyay, S., & Coello, C. A. (2014). A survey of

multiobjective evolutionary algorithms for data mining: Part I. IEEE Transactions on

Evolutionary Computation, 18(1), 4–19. https://doi.org/10.1109/tevc.2013.2290086

Piatetsky-Shapiro, G., Djeraba, C., Getoor, L., Grossman, R., Feldman, R., & Zaki, M. (2006).

What are the grand challenges for data mining? ACM SIGKDD Explorations Newsletter,

8(2), 70–77. https://doi.org/10.1145/1233321.1233330

Saghari, A., Budinská, I., Hosseinimehr, M., & Rahmani, S. (2023). A robust-reliable decision-

making methodology based on a combination of stakeholders’ preferences simulation and

KDD techniques for Selecting Automotive Platform Benchmark. Symmetry, 15(3), 750.

https://doi.org/10.3390/sym15030750

Tan, P.-N., Steinbach, M., & Kumar, V. (2020). Introduction to data mining. Pearson Education.

You might also like