0% found this document useful (0 votes)
54 views5 pages

Data Mining Syllabus Overview

The document covers various chapters on data mining, including numerical problems and previous year questions related to data preprocessing, mining frequent patterns, classification, clustering, and web mining. Each chapter provides practical exercises and theoretical questions to enhance understanding of data mining concepts and techniques. Key topics include data normalization, classification metrics, clustering algorithms, and web usage mining.

Uploaded by

story.legandery
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views5 pages

Data Mining Syllabus Overview

The document covers various chapters on data mining, including numerical problems and previous year questions related to data preprocessing, mining frequent patterns, classification, clustering, and web mining. Each chapter provides practical exercises and theoretical questions to enhance understanding of data mining concepts and techniques. Key topics include data normalization, classification metrics, clustering algorithms, and web usage mining.

Uploaded by

story.legandery
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Chapter 1: Introduction to Data Mining

Numerical Problems:

1. Suppose a dataset contains 10,000 records, and each record has 20 attributes. Estimate
the storage required if each attribute takes 4 bytes.

2. If a data mining system can analyze 500 records per second, estimate how long it will
take to process 2 million records.

3. A company uses data mining to classify customer transactions. Given that 10% of the
transactions are fraudulent, estimate the number of fraudulent transactions from a
dataset of 50,000 transactions.

Previous Year Questions:

1. Define data mining and explain its different functionalities.

2. Discuss the major issues in data mining with examples.

3. Explain the Knowledge Discovery in Databases (KDD) process.

4. How does a data warehouse support data mining?

Chapter 2: Data Preprocessing

Numerical Problems:

1. Given the dataset: (10, 20, 30, 40, 50, 60, 70, 80, 90, 100)

o Normalize the dataset using Min-Max Normalization between [0,1].

o Perform Z-score normalization using mean and standard deviation.

2. Suppose a dataset contains missing values. If 20% of a 1,000-record dataset has missing
values, how many records need imputation?

3. Given a dataset with 5 categorical attributes, each having 4 possible values, calculate the
number of unique possible records.

Previous Year Questions:

1. What are different techniques for handling missing values in a dataset?

2. Explain data discretization and concept hierarchy generation with examples.

3. How is dimensionality reduction helpful in data mining? Explain CUR decomposition.


4. Describe feature selection and feature transformation.

Chapter 3: Mining Frequent Patterns, Associations, and Correlations

Numerical Problems:

1. Given the transactions:

css

CopyEdit

T1: {Milk, Bread, Butter}

T2: {Milk, Bread}

T3: {Milk, Butter}

T4: {Bread, Butter}

o Calculate the support and confidence for the rule {Milk} → {Bread}.

o Find frequent itemsets using Apriori algorithm with a minimum support of 50%.

2. Suppose you have the following frequent itemsets and confidence values:

less

CopyEdit

{A, B} → {C} (Confidence = 70%)

{A} → {C} (Confidence = 50%)

o Compute the Lift Ratio and analyze the association rule strength.

Previous Year Questions:

1. What are the major steps in the Apriori algorithm? Explain with an example.

2. Discuss how association rule mining differs from correlation analysis.

3. Explain different types of association rule techniques and their applications.

4. What are the methods for measuring the quality of association rules?

Chapter 4: Classification and Prediction


Numerical Problems:

1. A dataset has the following classification results for a binary classifier:

cpp

CopyEdit

True Positives (TP) = 40, False Positives (FP) = 10

True Negatives (TN) = 30, False Negatives (FN) = 20

o Calculate Accuracy, Precision, Recall, and F1-score.

2. Given the following dataset:

makefile

CopyEdit

Age: (25, 30, 45, 50, 60)

Salary: (40k, 50k, 80k, 90k, 100k)

o Fit a linear regression model to predict salary based on age.

o Estimate the salary for an employee aged 40.

3. A company uses a decision tree-based classifier. The training dataset has 100 records
with 4 attributes. If each attribute can take 3 different values, calculate the number of
possible attribute-value combinations.

Previous Year Questions:

1. Explain Decision Tree algorithm with an example.

2. Differentiate between classification and prediction.

3. What are the different performance evaluation techniques used for classification?

4. Explain the role of logistic regression in prediction problems.

Chapter 5: Cluster Analysis

Numerical Problems:

1. Apply the K-Means clustering algorithm to the dataset:

scss
CopyEdit

(2,2), (3,3), (8,8), (9,9)

o Assume K=2 and initial centroids as (2,2) and (8,8).

o Perform one iteration of K-Means and update the centroids.

2. A dataset has 200 points and is divided into 5 clusters. Compute the intra-cluster
distance if the average distance within each cluster is 2.5.

3. A hierarchical clustering algorithm merges two clusters with distances D(A, B) = 5 and
D(B, C) = 7. Compute the new distance using:

o Single linkage method

o Complete linkage method

Previous Year Questions:

1. Compare K-Means and Hierarchical clustering techniques.

2. What are the major issues in clustering high-dimensional data?

3. How does outlier detection affect clustering performance?

4. Explain agglomerative and divisive hierarchical clustering techniques.

Chapter 6: Web Mining and Other Data Mining Techniques

Numerical Problems:

1. Given a web log file with 500 entries, where 100 belong to a single user session,
calculate the session duration if the average time spent per entry is 2 minutes.

2. A website has the following clickstream data:

css

CopyEdit

Page A → Page B (50 clicks)

Page B → Page C (30 clicks)

Page C → Page A (20 clicks)

o Compute the transition probability matrix for web usage mining.


3. Given a dataset of multimedia files, where images occupy 40% of the total storage,
videos 50%, and audio 10%, compute the total space required if the total dataset size is
1TB.

Previous Year Questions:

1. What is web usage mining? Explain its applications.

2. Discuss various types of web mining and their importance.

3. Explain how spatial and temporal data mining differ from traditional data mining
techniques.

4. What are the challenges in multimedia mining?

You might also like