0% found this document useful (0 votes)

52 views

Assignment 7

This document contains solutions to 10 questions about data analytics concepts like frequent itemset mining, association rule mining, and locality sensitive hashing. The questions cover calculating confidence and lift of association rules, identifying frequent itemsets given a transactional database, determining candidate and frequent itemsets of different sizes, counting association rules for a given frequent itemset, and properties of locality sensitive hashing functions.

Uploaded by

SivaMaroju

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views

Assignment 7

Uploaded by

SivaMaroju

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Assignment 7 (Sol.

)
Introduction to Data Analytics
Prof. Nandan Sudarsanam & Prof. B. Ravindran

1. Let X, Y be two itemsets, and let supp(X) denote the support of itemset X. Then the
confidence of the rule X → Y , denoted by conf (X → Y ) is
supp(X)
(a) supp(Y )
supp(Y )
(b) supp(X)
supp(X∪Y )
(c) supp(X)
supp(X∪Y )
(d) supp(Y )
supp(X∩Y )
(e) supp(X)

Sol. (c)
Confidence measures the probability of seeing items in the consequent (RHS) of the rule given
that we have observed items in the antecedent (LHS) of the rule in a transaction.
2. In identifying frequent itemsets in a transactional database, we find the following to be the
frequent 3-itemsets: {B, D, E}, {C, E, F}, {B, C, D}, {A, B, E}, {D, E, F}, {A, C, F}, {A,
C, E}, {A, B, C}, {A, C, D}, {C, D, E}, {C, D, F}, {A, D, E}. Which among the following
4-itemsets can possibly be frequent?

(a) {A, B, C, D}
(b) {A, B, D, E}
(c) {A, C, E, F}
(d) {C, D, E, F}

Sol. (d)
By the apriori property, only itemset {C, D, E, F} can possibly be frequent since all of its
subsets of size 3 are listed as frequent. The other 4-itemsets cannot be frequent since not all
of their subsets of size 3 are frequent. For example, for the first option, the itemset {A, B, D}
is not frequent.
3. Let X, Y be two itemsets, supp(X) denote the support of itemset X and conf (X → Y ) denote
the confidence of the rule X → Y , denoted by conf (X → Y ). Then lift of the rule, denoted
by lif t(x → Y is
supp(X)
(a) supp(Y )

1
supp(X)×supp(Y )
(b) supp(Y )
supp(X∪Y )
(c) supp(X)
supp(X∪Y )
(d) supp(X)×supp(Y )
supp(X∩Y )
(e) supp(X)×supp(Y )

Sol. (d)
The lift of a rule can be thought of as the ratio of the observed support to the support that
would be expected if the antecedent and consequent were independent.
4. Consider the following transactional data.

Transaction ID Items
1 A, B, E
2 B, D
3 B, C
4 A, B, D
5 A, C
6 B, C
7 A, C
8 A, B, C, E
9 A, B, C

Assuming that the minimum support is 2, what is the number of frequent 2-itemsets (i.e.,
frequent items sets of size 2)?

(a) 2
(b) 4
(c) 6
(d) 8

Sol. (c)
Candidate 1-itemsets:
itemset support
{A} 6
{B} 7
{C} 6
{D} 2
{E} 2

Frequent 1-itemsets:

itemset support
{A} 6
{B} 7
{C} 6
{D} 2
{E} 2

2
Candidate 2-itemsets:

itemset support
{A, B} 4
{A, C} 4
{A, D} 1
{A, E} 2
{B, C} 4
{B, D} 2
{B, E} 2
{C, D} 0
{C, E} 1
{D, E} 0

Frequent 2-itemsets:

itemset support
{A, B} 4
{A, C} 4
{A, E} 2
{B, C} 4
{B, D} 2
{B, E} 2

5. For the same data as above, what are the number of candidate 3-itemsets and frequent 3-
itemsets respectively?

(a) 1, 1
(b) 2, 2
(c) 2, 1
(d) 3. 2

Sol. (b)
Candidate 3-itemsets:

itemset support
{A, B, C} 2
{A, B, E} 2

Frequent 3-itemsets:

itemset support
{A, B, C} 2
{A, B, E} 2

6. Continuing with the same data, how many association rules can be derived from the frequent
itemset {A, B, E}? (Note: for a frequent itemset X, consider only rules of the form S -¿ (X-S),
where S is a non-empty subset of X.)

3
(a) 3
(b) 6
(c) 7
(d) 8

Sol. (b)
{A} → {B, E}
{B} → {A, E}
{E} → {A, B}
{A, B} → {E}
{A, E} → {B}
{B, E} → {A}
7. For the same frequent itemset as mentioned above, which among the following rules have a
minimum confidence of 60%?

(a) A ∧ B =⇒ E
(b) A ∧ E =⇒ B
(c) E =⇒ A ∧ B
(d) A =⇒ B ∧ E

Sol. (b), (c)

The confidence values for the above four rules are respectively, 2/4, 2/2, 2/2, and 2/6. Hence,
only rules in (b) and (c) have the minimum required confidence.
8. Suppose we are given a large text document and the aim is to count the words of different
lengths, i.e., our output will be of the form - x words of length 1, y words of length 2, and
so on. Assuming a map-reduce approach to solving this problem, which among the following
key-value outputs would you prefer for the map phase? (Hint: consider the solution for the
reduce part asked in the next question as well to ensure a complete algorithm to solve the
problem.)

(a) key - word, value - length (of corresponding word)

(b) key - word, value - 1
(c) key - length (of corresponding word), value - word
(d) key - 1, value - word

Sol. (c)
Given a word, in the map phase we create a key-value pair, where the key is the length of the
word and the value is the word itself.
9. For the above question, what would be the appropriate processing action in the reduce phase?

(a) for each key which is a word, compute the sum of the values corresponding to this key
(b) for each key which is a number, compute the lengths of the words in the corresponding
list of values

4
(c) for each key which is a number, count the number of words in the corresponding list of
values

Sol. (c)
In the reduce phase, all words of the same size will be available in the same reduce node. Thus,
in each reduce node, counting the number of words in the list of values will give us the number
of words of a particular length observed in the original document.
10. Let d1 and d2 be two distances according to some distance measure d. A function f is said to
be (d1 , d2 , p1 , p2 )-sensitive if

(a) if d(x, y) ≤ d1 , then the probability that f (x) = f (y) is at least p1

(b) if d(x, y) ≥ d2 , then the probability that f (x) = f (y) is at most p2
where d(·, ·) is a distance function. Given such a (d1 , d2 , p1 , p2 )-sensitive function, a better
function (for use in locality sensitive hashing) would be one with

(a) an increased value of p1

(b) a decreased value of p1
(c) an increased the value of p2
(d) a decreased the value of p2

Sol. (a), (d)

Compared to the original function, if we can increase the value of p1 , we can ensure that if two
points are close enough (d(x, y) ≤ d1 ), then the probability of a collision is higher. This is a
desirable property when performing locality sensitive hashing. Similarly, if p2 can be reduced
it would indicate that given some separation between two points (d(x, y) ≥ d2 ), the probability
of still observing a collision (which is undesirable) is reduced.

BRITTANY - Design Assignment 3 Data
0% (4)
BRITTANY - Design Assignment 3 Data
12 pages
(LASER) survival8-DM An DSAD-2-print Pending
No ratings yet
(LASER) survival8-DM An DSAD-2-print Pending
29 pages
Midterm_D_sol(1)
No ratings yet
Midterm_D_sol(1)
9 pages
Homework 7
0% (1)
Homework 7
5 pages
IT-3006(DA)-CS_END_MAY_2023
No ratings yet
IT-3006(DA)-CS_END_MAY_2023
23 pages
sample_question
No ratings yet
sample_question
19 pages
Multiple Choice Questions For Review
No ratings yet
Multiple Choice Questions For Review
9 pages
GATE 2022 Paper Solution (CS) IESMaster
No ratings yet
GATE 2022 Paper Solution (CS) IESMaster
26 pages
COMP1942 Question Paper
No ratings yet
COMP1942 Question Paper
5 pages
Chapter 04
No ratings yet
Chapter 04
32 pages
sem2023
No ratings yet
sem2023
6 pages
Math 139 Exam 2 Problems to Practice - Fall 2022
No ratings yet
Math 139 Exam 2 Problems to Practice - Fall 2022
7 pages
Section A: Multiple Choice (24 Points) : (2 Points Each, - 1 Per Incorrect Circle/non-Circle, Minimum 0 Points Per Problem)
No ratings yet
Section A: Multiple Choice (24 Points) : (2 Points Each, - 1 Per Incorrect Circle/non-Circle, Minimum 0 Points Per Problem)
11 pages
Week 7 Assignment 1
No ratings yet
Week 7 Assignment 1
6 pages
Maths Selection Paper 12 2023-2024
No ratings yet
Maths Selection Paper 12 2023-2024
8 pages
IT1101
No ratings yet
IT1101
9 pages
MI2026 Problems
No ratings yet
MI2026 Problems
44 pages
502 - KRR - June 2012 - Answer - BB
No ratings yet
502 - KRR - June 2012 - Answer - BB
8 pages
Machine 2020 Jul-Dec Practice 7,8
No ratings yet
Machine 2020 Jul-Dec Practice 7,8
37 pages
XII PB - I - Math-Worksheet
No ratings yet
XII PB - I - Math-Worksheet
8 pages
A3 Sol
No ratings yet
A3 Sol
11 pages
5-SET-3 QP SLOW
No ratings yet
5-SET-3 QP SLOW
7 pages
Mscds2022 Solutions
No ratings yet
Mscds2022 Solutions
23 pages
mscds2021 Solutions
No ratings yet
mscds2021 Solutions
18 pages
Statistical Methods MCQ'S
91% (11)
Statistical Methods MCQ'S
41 pages
MIS 331 PS05 23-24 F (2)
No ratings yet
MIS 331 PS05 23-24 F (2)
40 pages
MI2026 Problems
No ratings yet
MI2026 Problems
44 pages
CS 540: Introduction To Artificial Intelligence: Final Exam: 12:25-2:25pm, May 16, 2013 Room 228 Educational Sciences
No ratings yet
CS 540: Introduction To Artificial Intelligence: Final Exam: 12:25-2:25pm, May 16, 2013 Room 228 Educational Sciences
10 pages
CSIR NET Statistics PYQs
No ratings yet
CSIR NET Statistics PYQs
94 pages
(COMP1942)[2022](s)midterm~thliai^_91588
No ratings yet
(COMP1942)[2022](s)midterm~thliai^_91588
13 pages
Copy of GATE DA 2025 Memory Based Question Analysis
No ratings yet
Copy of GATE DA 2025 Memory Based Question Analysis
21 pages
UPASANA ACADEMY Question Paper
No ratings yet
UPASANA ACADEMY Question Paper
7 pages
12_CBSE_MATH_FULL_TEST_PAPER_02_02_25_1_
No ratings yet
12_CBSE_MATH_FULL_TEST_PAPER_02_02_25_1_
4 pages
Class XII Rehearsal Examination 2024-2025 Changed and Final
No ratings yet
Class XII Rehearsal Examination 2024-2025 Changed and Final
12 pages
MODEL 1
No ratings yet
MODEL 1
22 pages
MATHS_QP_SET-1
No ratings yet
MATHS_QP_SET-1
5 pages
Csir Net Mathematical Science June 2024 (Ifas Solved Paper) Part A
No ratings yet
Csir Net Mathematical Science June 2024 (Ifas Solved Paper) Part A
25 pages
Maths (041) Xii PB 1 QP Set C
No ratings yet
Maths (041) Xii PB 1 QP Set C
7 pages
Exam 3 Review Problems F2024
No ratings yet
Exam 3 Review Problems F2024
6 pages
Sample Paper Applied Mathematics-2
No ratings yet
Sample Paper Applied Mathematics-2
9 pages
Chennai Mathematical Institute
No ratings yet
Chennai Mathematical Institute
14 pages
TPJC JC 2 H2 Maths 2011 Mid Year Exam Solutions
No ratings yet
TPJC JC 2 H2 Maths 2011 Mid Year Exam Solutions
13 pages
Tutorials Combined
No ratings yet
Tutorials Combined
41 pages
Jam MS
No ratings yet
Jam MS
12 pages
Data Mining
No ratings yet
Data Mining
24 pages
DMS Solved Question Paper 2024..
No ratings yet
DMS Solved Question Paper 2024..
22 pages
Ch 1,12,13 practice test
No ratings yet
Ch 1,12,13 practice test
4 pages
Chapter 8 - Stud
No ratings yet
Chapter 8 - Stud
49 pages
GRADE XII-ISC-CSC LONG TEST- 1
No ratings yet
GRADE XII-ISC-CSC LONG TEST- 1
5 pages
Concepts of Probability: Chapter 4
No ratings yet
Concepts of Probability: Chapter 4
24 pages
Data Mining Comprehensive Exam - Regular PDF
No ratings yet
Data Mining Comprehensive Exam - Regular PDF
3 pages
MQM100 MultipleChoice Chapter4
100% (2)
MQM100 MultipleChoice Chapter4
24 pages
Mat2612 TL 202 1 2018 B
No ratings yet
Mat2612 TL 202 1 2018 B
16 pages
CO-1: Tutorials Tutorial-1
No ratings yet
CO-1: Tutorials Tutorial-1
9 pages
QP-XII-Math-Set-1
No ratings yet
QP-XII-Math-Set-1
5 pages
Solution RVCE AIML Test 3
No ratings yet
Solution RVCE AIML Test 3
3 pages
19277sample Paper 1 2023 24
No ratings yet
19277sample Paper 1 2023 24
3 pages
Notes in Additional Mathematics with Examples and Exercises
From Everand
Notes in Additional Mathematics with Examples and Exercises
George N. Frempong
No ratings yet
Topology Essentials
From Everand
Topology Essentials
Emil G. Milewski
5/5 (1)
Pre-Calculus Essentials
From Everand
Pre-Calculus Essentials
Ernest Woodward
No ratings yet
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Additional Homework Problems
No ratings yet
Additional Homework Problems
5 pages
Solutions To Practice Problems - Chapter 12: DS EOQ H
No ratings yet
Solutions To Practice Problems - Chapter 12: DS EOQ H
5 pages
Tecnológico de Monterrey Dra. Ileana Castillo: Solución A Problemas de Clasificación ABC
No ratings yet
Tecnológico de Monterrey Dra. Ileana Castillo: Solución A Problemas de Clasificación ABC
3 pages
Anderson Chapter12
No ratings yet
Anderson Chapter12
7 pages
CLPD
No ratings yet
CLPD
2 pages
FCFS, SPT, EDD and CR
No ratings yet
FCFS, SPT, EDD and CR
4 pages
Professor Blake OPM 101 Name - Spring 2009 EXAMINATION 2, Version A
No ratings yet
Professor Blake OPM 101 Name - Spring 2009 EXAMINATION 2, Version A
6 pages
Evpi Eppi - Emv
No ratings yet
Evpi Eppi - Emv
14 pages
DMA The Hidden Key To DSP Systems: Steve Krueger DSP Architecture Group Dallas
No ratings yet
DMA The Hidden Key To DSP Systems: Steve Krueger DSP Architecture Group Dallas
47 pages
Layout Concept
No ratings yet
Layout Concept
9 pages
08 1 PDF
No ratings yet
08 1 PDF
7 pages
Math 372: Fall 2015: Solutions To Homework: Steven Miller December 7, 2015
No ratings yet
Math 372: Fall 2015: Solutions To Homework: Steven Miller December 7, 2015
51 pages
Curriculum Map in Bachelor of ElementaryEducation
100% (2)
Curriculum Map in Bachelor of ElementaryEducation
13 pages
econometrics project- Maternal Mortality Analysis
No ratings yet
econometrics project- Maternal Mortality Analysis
23 pages
On Generalization Property of Is - Open Sets in Ideal Topological Semigroups
No ratings yet
On Generalization Property of Is - Open Sets in Ideal Topological Semigroups
5 pages
Statistical Methods For High Dimensional Biology: Stat/Biof/Gsat 540
No ratings yet
Statistical Methods For High Dimensional Biology: Stat/Biof/Gsat 540
63 pages
Addition Subtraction Worksheets
No ratings yet
Addition Subtraction Worksheets
2 pages
Chapter 9: Linear Momentum and Collisions
No ratings yet
Chapter 9: Linear Momentum and Collisions
18 pages
Quadratic Equations: Objective Problems
No ratings yet
Quadratic Equations: Objective Problems
29 pages
pdfSMA 104 LECTURE 3 DIFFERENTIATION
No ratings yet
pdfSMA 104 LECTURE 3 DIFFERENTIATION
16 pages
Estonian Math
100% (1)
Estonian Math
26 pages
15minute Math Decimals
No ratings yet
15minute Math Decimals
21 pages
Book Linear Algebra Prof Hazem
No ratings yet
Book Linear Algebra Prof Hazem
90 pages
EESA06 Final Exam PDF
0% (1)
EESA06 Final Exam PDF
16 pages
Motion Along A Straight Line
No ratings yet
Motion Along A Straight Line
16 pages
PRI Analysis and Deinterleaving
100% (1)
PRI Analysis and Deinterleaving
76 pages
The Teaching of Mathematics
No ratings yet
The Teaching of Mathematics
15 pages
Morini 2011
No ratings yet
Morini 2011
17 pages
Module 5 Tangential and Normal Component of Acceleration
No ratings yet
Module 5 Tangential and Normal Component of Acceleration
9 pages
Demand Forecasting & Collaborating Planning, Forecasting & Replenishment
No ratings yet
Demand Forecasting & Collaborating Planning, Forecasting & Replenishment
13 pages
10 Elements of A Winning Trading Plan
100% (1)
10 Elements of A Winning Trading Plan
17 pages
VL10b VL02N ZMM REMOVE LGORT PGI
No ratings yet
VL10b VL02N ZMM REMOVE LGORT PGI
4 pages
MCQ For PPC
0% (2)
MCQ For PPC
33 pages
dataanalysistechniquesSHSOctober (1)
No ratings yet
dataanalysistechniquesSHSOctober (1)
58 pages
Module 8
No ratings yet
Module 8
7 pages
Type-2 Fuzzy Set
No ratings yet
Type-2 Fuzzy Set
12 pages
Kerala PSC Assistant Professor Question Paper
No ratings yet
Kerala PSC Assistant Professor Question Paper
22 pages
GE6152-Engineering Graphics
No ratings yet
GE6152-Engineering Graphics
10 pages
Chimp Optimization Khishe2020
No ratings yet
Chimp Optimization Khishe2020
26 pages
HW3R
No ratings yet
HW3R
3 pages