0% found this document useful (0 votes)
21 views7 pages

DM 1

Uploaded by

sumrun sahab
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
21 views7 pages

DM 1

Uploaded by

sumrun sahab
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 7

DATA MINING LECTURE NO 1

Lecture Notes: Data Mining — On What Kinds of Data?

Data mining can be applied to various types of data, depending on the domain, the structure of
the data, and the specific needs of the application. These data types include traditional
database data as well as more advanced and complex data sources. Below is a breakdown of
the di erent kinds of data that are commonly used in data mining.

1. Database-Oriented Data Sets and Applications

Data mining is often used with traditional database systems. These include:

 Relational Databases (RDBMS):

o Stores data in structured formats with rows and columns.

o Example: A customer database where each row represents a customer, and


columns represent attributes (e.g., name, age, purchase history).

 Data Warehouses:

o A large, centralized repository of data, often integrated from multiple sources,


optimized for query and analysis.

o Example: An enterprise-wide data warehouse that consolidates sales,


marketing, and customer support data for decision-making.

 Transactional Databases:

o Data associated with transactions, typically recording events like sales or


orders.

o Example: A retail system that logs each sale or transaction made in a store.

2. Advanced Data Sets and Applications

Modern applications require data mining on more complex and non-traditional types of data.
These include:

 Data Streams and Sensor Data:

o Data Streams: Continuous flows of data (e.g., real-time stock prices, social
media feeds).

o Sensor Data: Data collected from sensors in environments like healthcare, IoT
devices, or environmental monitoring.

o Example: Real-time tra ic data collected from sensors placed on roads for
intelligent transportation systems.

 Time-Series Data:

o Data points indexed in time order, often used for forecasting or trend analysis.

o Example: Stock prices, sales figures recorded over time.


DATA MINING LECTURE NO 1

 Temporal Data:

o Data that includes time-related information (e.g., timestamped events).

o Example: Patient health records that track changes over time.

 Sequence Data:

o Data where the sequence or order of the elements is important.

o Bio-sequences: DNA or protein sequences where the order of nucleotides or


amino acids matters.

o Example: Web clickstreams, where the sequence of user actions on a website is


analyzed.

3. Structured Data

 Graphs and Social Networks:

o Graph Data: Data represented as nodes and edges (e.g., road networks, citation
networks).

o Social Networks: Data representing social interactions and relationships (e.g.,


Facebook friends, Twitter followers).

o Example: Analyzing a social network to discover influencer nodes or predict new


connections between users.

 Multi-Linked Data:

o Data with complex interrelations between entities (e.g., a graph with multiple
types of nodes and links).

o Example: A company’s organizational chart, where employees and departments


are interconnected.

4. Advanced Database Systems

 Object-Relational Databases:

o A hybrid database that supports both object-oriented and relational features,


allowing storage of complex data types.

o Example: A medical database where patient information is stored as objects


with attributes like medical history, prescriptions, and diagnoses.

 Heterogeneous Databases:

o Databases that combine data from di erent sources and formats, which may
not follow a uniform structure.

o Example: A company using di erent databases for finance, human resources,


and sales, with varying structures that need to be integrated.
DATA MINING LECTURE NO 1

 Legacy Databases:

o Older databases that are still in use, often requiring specialized techniques to
mine due to outdated formats.

o Example: Mining data from an outdated customer database that still stores
information in flat files or legacy systems.

5. Spatial and Spatiotemporal Data

 Spatial Data:

o Data representing the location, shape, and relationships of geographic features.

o Example: Geographic Information Systems (GIS) used to analyze land use, urban
planning, or environmental conditions.

 Spatiotemporal Data:

o Combines spatial and temporal information, analyzing changes in data over time
and space.

o Example: Tracking and predicting hurricane paths over time using satellite and
weather data.

6. Multimedia Databases

 Multimedia Data:

o Data in the form of images, videos, audio, and other multimedia formats.

o Example: A video database that stores movie clips for streaming services,
requiring mining for recommendations or trend analysis.

7. Text Databases

 Text Databases:

o Unstructured or semi-structured data stored as text.

o Example: Mining customer reviews from an e-commerce site to understand


sentiment and identify common themes.

8. The World-Wide Web

 Web Data:

o The vast amount of data available on the internet, including web pages, social
media, and web tra ic logs.
DATA MINING LECTURE NO 1

o Example: Web mining for analyzing search engine logs, user behavior on
websites, or web content analysis (e.g., topic extraction, sentiment analysis).

Summary

Data mining can be applied to a wide variety of data sources, from traditional relational
databases to advanced and complex data such as multimedia, spatiotemporal data, and web
data. Each type of data presents unique challenges and requires specialized techniques to
extract valuable insights. As data continues to evolve in both scale and complexity, data mining
methodologies continue to adapt to handle these diverse data sets e ectively.

4o

You said:

01Intro.pdf

PDF

make notes for this chapter (so that i can read before exams ) they should short but having full
meanings

ChatGPT

Chapter 1: Introduction to Data Mining – Summary for Exam Review

1. Why Data Mining?

 Data Explosion: The rapid growth from terabytes to petabytes of data due to digital
transactions, sensors, the web, and social media.

 Need for Knowledge: Although we have an abundance of data, there's a lack of


valuable insights, creating a need for data mining.

 Applications: Used in business (e-commerce, stocks), science (bioinformatics, remote


sensing), and society (news, videos).

2. What is Data Mining?

 Definition: Data mining is the process of discovering interesting, non-trivial, previously


unknown, and useful patterns from large datasets.

 Synonyms: Also called Knowledge Discovery in Databases (KDD), data/pattern


analysis, information harvesting, and business intelligence.

 Not Data Mining: Simple search queries or deductive systems are not considered data
mining.

3. Knowledge Discovery Process (KDD)


DATA MINING LECTURE NO 1

 Steps:

1. Data Cleaning: Remove noise and inconsistencies.

2. Data Integration: Combine data from multiple sources.

3. Data Selection: Extract relevant data for analysis.

4. Data Mining: Apply algorithms to find patterns.

5. Pattern Evaluation: Identify truly interesting patterns.

6. Knowledge Presentation: Visualize and report findings.

4. Data Mining in Business Intelligence

 Flow: Data sources → Data preprocessing → Data mining → Information discovery →


Decision-making support.

 Roles: From end users to business analysts and data analysts, data mining supports
decision-making processes.

5. Data Mining: On What Kinds of Data?

 Traditional Data:

o Relational databases: Structured, tabular data.

o Data warehouses: Integrated, large data repositories.

o Transactional databases: Records of transactions.

 Advanced Data:

o Data streams: Real-time, continuous data.

o Sensor data: Collected from environmental sensors.

o Time-series: Sequential data over time.

o Spatial and spatiotemporal data: Data with spatial and temporal dimensions
(e.g., GIS).

o Text and web data: Unstructured or semi-structured text and web content.

o Graphs and social networks: Data with nodes and links (e.g., social media).

o Multimedia: Images, audio, and video content.

6. Types of Patterns Mined

 Characterization: Summarize data.

 Discrimination: Contrast di erent data groups.


DATA MINING LECTURE NO 1

 Association: Discover relationships between variables (e.g., market basket analysis).

 Classification: Assign data into predefined categories.

 Clustering: Group data based on similarity.

 Outlier Analysis: Detect unusual data points.

 Trend and Deviation: Analyze changes over time.

7. Techniques Used

 Machine Learning: Automated learning from data.

 Statistics: Analyzing and summarizing data.

 OLAP: Online Analytical Processing for multidimensional analysis.

 Pattern Recognition: Identifying regularities in data.

 High-Performance Computing: For handling large-scale data.

8. Applications of Data Mining

 Retail: Customer behavior, personalized marketing.

 Telecommunications: Network optimization, churn prediction.

 Banking: Fraud detection, risk management.

 Healthcare: Medical diagnosis, patient data analysis.

 Web and Text Mining: Analyzing online user behavior, search engine optimization.

9. Major Issues in Data Mining

 Scalability: Algorithms must handle large datasets.

 Data Diversity: Mining di erent types of data (structured, unstructured, multimedia).

 Privacy: Protecting sensitive data while mining.

 Noise and Incomplete Data: Managing uncertainty in data.

 Interactive Mining: Involving users to refine mining results.

10. History and Society of Data Mining

 Key Events:

o 1989: IJCAI Workshop on Knowledge Discovery in Databases.


DATA MINING LECTURE NO 1

o 1995-1998: First International Conferences on Knowledge Discovery and Data


Mining (KDD).

o KDD Conferences: Annual ACM SIGKDD conferences focused on the latest


advancements in data mining research.

This summary outlines the core concepts, applications, and challenges in data mining as
presented in Chapter 1 of the text. These notes are concise but comprehensive enough to
provide a full understanding of the foundational topics in data mining for exam preparation.

You might also like