Unstructured Data Classification
Unstructured Data Classification
me/fresco_milestone ( @fresco_milestone )
Answer : image
Answer : Binary
Which pre-processing technique is used to remove the most commonly used words?
The cross-validation technique is used to evaluate a classifier by dividing the data set into a training
set to train the classifier and a testing set to test the same.
Answer : True
True Positive is when the predicted instance and the actual instance are not negative.
Answer : True
True Negative is when the predicted instance and the actual instance are positive.
Answer : False
An algorithm that counts how many times a word appears in a document is __________
Answer : Kernel tricks are used by Nonlinear classifiers to achieve maximum-margin hyper planes
(Incorrect)
Answer : False
Answer : [1 0]
The most widely used package for machine learning in Python is _________
Answer : sklearn
Answer : Known
Choose the correct sequence for classifier building from the following.
Which of the given hyperparameters, when increased, may cause the random forest to overfit the
data?
Answer : False
Supervised learning differs from unsupervised learning as supervised learning requires __________
Set2:
To view the first 3 rows of the dataset, which of the following commands is used?
Answer : sentiment_analysis_data.head(3)
Answer : True
Answer : Yes
In document classification, each document has to be converted from full text to a document vector.
Answer : true
A technique used to depict the performance in a tabular form that has 2 dimensions namely actual
and predicted sets of data is ___________
Which NLP technique uses a lexical knowledge base to obtain the correct base form of the words?
Answer : Lemmatization
Which numerical statistics is used to identify the importance of a rare word in a document?
Answer : TF-IDF
Answer : K-Fold
Answer : False
Answer : Yes
SVM is a _____________
Imagine you have just finished training a decision tree for spam classification, and it is showing
abnormal bad performance on both your training and test sets. Assume that your implementation
has no bugs. What could be the reason for this problem?
Answer : Data Analysis -> Pre-Processing -> Model Building -> Predict
Answer : False
_______ directly achieves multi-class classification (without the support of binary classifiers).
A classifier that can compute using numeric as well as categorical values is __________
Answer : True
The following are pre-processing methods used for unstructured data classification, except
_________
Answer : Confusion_matrix
Answer : True
The higher value of which of the following hyperparameters is better for the decision tree
algorithm?
What kind of classification is the given case study (Sentiment Analysis dataset)?
Which of the following commands is used to view the dataset SIZE, and what is the value returned?