0% found this document useful (0 votes)
142 views

Data Mining Methods Basics

The document discusses various concepts in data mining including: - Data mining involves extracting useful unknown information from data to make knowledge-driven business decisions. - Common data mining techniques include predictive modeling, decision trees, regression analysis, and clustering. - Data preprocessing steps like detecting missing values and outliers are important for data analysis. - Unsupervised learning infers structure from unlabeled data, like clustering.

Uploaded by

Lynch George
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
142 views

Data Mining Methods Basics

The document discusses various concepts in data mining including: - Data mining involves extracting useful unknown information from data to make knowledge-driven business decisions. - Common data mining techniques include predictive modeling, decision trees, regression analysis, and clustering. - Data preprocessing steps like detecting missing values and outliers are important for data analysis. - Unsupervised learning infers structure from unlabeled data, like clustering.

Uploaded by

Lynch George
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2

Data mining methods basics

The process of extracting valid, useful, unknown info from data and using it to
make proactive knowledge driven business is called --data mining

Which of the following is not applicable to Data Mining?-- Involves working with
known information

What is the other name for Data Preparation stage of Knowledge Discovery Process?
-- ETL

Which of the following modelling type should be used for Labelled data? --
Predictive Modelling

Which of the following activities is performed as part of data pre processing? --


detect missing values

Which of the following role is responsible for performing validation on analysis


datasets? -- Statisticians

Noisy values are the values that are valid for the dataset, but are incorrectly
recorded - true

What is the type of learning where a function is inferred to describe hidden


structure from unlabeled data - unsupervised

Probability of theft in an area is 0.03 with expected loss of 20% or 30% of things
with probabilities 0.55 and 0.45. Insurance policy from A costs $150 pa with 100%
repayment. Policy with B, costs $100 pa and first $500 of any loss has to be paid
by the owner. Which data mining technique can be used to choose the policy? ==
decision tree

Statistical technique used for investigating and modelling the relationship between
two or more variables is: -- Regression analysis

Which statistical technique deals with finding a structure in a collection of


unlabeled data? -- clustering

If time is used as an independent variable in a simple linear regression analysis,


which of the following assumptions could be violated? -- Successive observations of
the dependent variable are uncorrelated

---------

Simulations are carried out to develop a mathematical model of the process -- false

_________ are the values that mark the boundaries of the confidence interval. --
Confidence limits

Which of the following activities are performed as part of data pre processing? --
detecting outliers Data Cleansing * all options, Data Cleansing is wrong

Which of the following are Multi-class Classification problem? -- Will Indian


Cricket team win the next World Cup? Will it be a Rainy day or Sunny day
tomorrow? * should we gift, Will Indian Cricket team win the next World Cup? is
wrong answer

Which data mining method groups together objects that are similar to each other and
dissimilar to the other objects? -- clustering
Regression is typically carried out to develop a mathematical model of the process
-- true

Which is the statistical technique used for investigating and modelling the
relationship between two or more variables? -- regression

Machine learning task of inferring a function from labelled training data is known
as - supervised

Associate rule is known as _____________ -- affinity analysis

If time is used as an independent variable in a simple linear regression, which of


the following assumption could be violated? - Residual variation is same for all
fitted values of dependent variable * is wrong

You might also like