0% found this document useful (0 votes)

279 views

Unstructured Data Classification

The document discusses unstructured data classification and natural language processing techniques. It provides examples of classification algorithms like decision trees and random forests. It also discusses preprocessing steps like stopword removal, bag-of-words, and techniques like TF-IDF for feature extraction from text data. Multiple choice questions are provided about classification, preprocessing, algorithms and their applications to sentiment analysis and spam detection problems.

Uploaded by

Yees BoojPai

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

279 views

Unstructured Data Classification

Uploaded by

Yees BoojPai

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Join our channel if you haven’t joined yet https://t.

me/fresco_milestone ( @fresco_milestone )

Unstructured Data Classification

Identify the unstructured data from the following.

Answer : image

What kind of classification is our case study 'Spam Detection'?

Answer : Binary

Which pre-processing technique is used to remove the most commonly used words?

Answer : Stopword removal

The cross-validation technique is used to evaluate a classifier by dividing the data set into a training
set to train the classifier and a testing set to test the same.

Answer : True

True Positive is when the predicted instance and the actual instance are not negative.

Answer : True

True Negative is when the predicted instance and the actual instance are positive.

Answer : False

An algorithm that counts how many times a word appears in a document is __________

Answer : Bag-of-Words (BOW)

Pruning is a technique associated with __________

Answer : Decision tree

Select the correct statement about Nonlinear classification.

Answer : Kernel tricks are used by Nonlinear classifiers to achieve maximum-margin hyper planes
(Incorrect)

Stemming and lemmatization give the same result.

Answer : False

Question Type: Single-Select

a) Download the dataset from https://hrcdn.net/s3_pub/istreet-

assets/H4_TQkbOj39HUNoBukluIQ/training.txt and load it to the variable 'sentiment_analysis_data'.

b) Give the column names as 'label' and 'message'.

c) Try out the code snippets and answer the questions.

Join our channel if you haven’t joined yet https://t.me/fresco_milestone ( @fresco_milestone )

What is the output of the following command: print(sentiment_analysis_data['label'].unique())

Answer : [1 0]

The most widely used package for machine learning in Python is _________

Answer : sklearn

In Supervised learning, class labels of the training samples are ____________

Answer : Known

Select the pre-processing technique(s) from the following.

Answer : All the options

Model Tuning helps to increase accuracy.

Answer : True (Incorrect) Cannot say

Question Type: Single-Select

a) Download the dataset from https://hrcdn.net/s3_pub/istreet-

assets/H4_TQkbOj39HUNoBukluIQ/training.txt and load it to the variable 'sentiment_analysis_data'.

b) Give the column names as 'label' and 'message'.

c) Try out the code snippets and answer the questions.

What command should be given to tokenize a sentence into words?

Answer : from nltk.tokenize import word_tokenize, Word_tokens =word_tokenize(sentence)

Identify the stop word(s) from the following.

Answer : Both "the" and "it"

The following are performance evaluation measures, except __________

Answer : Decision Tree

Images and documents are examples of ___________

Answer : Unstructured data

Choose the correct sequence for classifier building from the following.

Answer : Initialize -> Train -> Predict -> Evaluate

Which of the given hyperparameters, when increased, may cause the random forest to overfit the
data?

Answer : Depth of Tree

Join our channel if you haven’t joined yet https://t.me/fresco_milestone ( @fresco_milestone )

The fit (X, y) is used to __________

Answer : Train the classifier

Question Type: Single-Select

a) Download the dataset from https://hrcdn.net/s3_pub/istreet-

assets/H4_TQkbOj39HUNoBukluIQ/training.txt and load it to the variable 'sentiment_analysis_data'.

b) Give the column names as 'label' and 'message'.

c) Try out the code snippets and answer the questions.

What does the command sentiment_analysis_data['label'].value_counts() return?

Answer : The count of unique values in the 'label' column

What is the purpose of lemmatization?

Answer : To convert words into a proper base form

Clustering is supervised classification.

Answer : False

Supervised learning differs from unsupervised learning as supervised learning requires __________

Answer : Labeled data

Set2:

To view the first 3 rows of the dataset, which of the following commands is used?

Answer : sentiment_analysis_data.head(3)

Inverse Document frequency is used in the term-document matrix.

Answer : True

Can we consider sentiment classification as a text classification problem?

Answer : Yes

In document classification, each document has to be converted from full text to a document vector.

Answer : true

A technique used to depict the performance in a tabular form that has 2 dimensions namely actual
and predicted sets of data is ___________

Answer : Confusion Matrix

Join our channel if you haven’t joined yet https://t.me/fresco_milestone ( @fresco_milestone )

Which NLP technique uses a lexical knowledge base to obtain the correct base form of the words?

Answer : Lemmatization

Which numerical statistics is used to identify the importance of a rare word in a document?

Answer : TF-IDF

Which type of cross-validation is used for an imbalanced dataset?

Answer : K-Fold

Cross-validation causes over-fitting.

Answer : False

$Download the dataset from https://inclass.kaggle.com/c/si650winter11/download/training.txt and

load it to the variable 'sentiment_analysis_data'.

b) Give the column names as 'label' and 'message'.

c) Try out the code snippets and answer the questions.

Is there a class imbalance problem in the given data set?

Answer : Yes

SVM is a _____________

Answer : Supervised learning algorithm

In a Term Document Matrix (TDM), each row represents ____________

Answer : TF-IDF value

Imagine you have just finished training a decision tree for spam classification, and it is showing
abnormal bad performance on both your training and test sets. Assume that your implementation
has no bugs. What could be the reason for this problem?

Answer : All the options

In a Document Term Matrix (DTM), each row represents

Answer : TF-IDF value

Email spam data is an example of __________

Answer : Unstructured data

Choose the correct sequence from the following.

Answer : Data Analysis -> Pre-Processing -> Model Building -> Predict

High classification accuracy always indicates a good classifier.

Join our channel if you haven’t joined yet https://t.me/fresco_milestone ( @fresco_milestone )

Answer : False

_______ directly achieves multi-class classification (without the support of binary classifiers).

Answer : K Nearest Neighbor

A classifier that can compute using numeric as well as categorical values is __________

Answer : Random Forest Classifier

Lemmatization offers better precision than stemming.

Answer : True

The following are pre-processing methods used for unstructured data classification, except
_________

Answer : Confusion_matrix

TF-IDF is a feature extraction technique.

Answer : True

The higher value of which of the following hyperparameters is better for the decision tree
algorithm?

Answer : Cannot say

$Download the dataset from https://hrcdn.net/s3_pub/istreet-

assets/H4_TQkbOj39HUNoBukluIQ/training.txt and load it to the variable 'sentiment_analysis_data'.

b) Give the column names as 'label' and 'message'.

c) Try out the code snippets and answer the questions.

What kind of classification is the given case study (Sentiment Analysis dataset)?

Answer : Binary classification

$ Download the dataset from https://hrcdn.net/s3_pub/istreet-

assets/H4_TQkbOj39HUNoBukluIQ/training.txt and load it to the variable 'sentiment_analysis_data'.

b) Give the column names as 'label' and 'message'.

c) Try out the code snippets and answer the questions.

Which of the following commands is used to view the dataset SIZE, and what is the value returned?

Answer : sentiment_analysis_data.shape, (6918, 2)

The End of Composition Studies 1st Edition Professor David W Smit - The full ebook with all chapters is available for download now
100% (3)
The End of Composition Studies 1st Edition Professor David W Smit - The full ebook with all chapters is available for download now
47 pages
AdvanceTS1handson - Jupyter Notebook
100% (2)
AdvanceTS1handson - Jupyter Notebook
3 pages
Image Processing
No ratings yet
Image Processing
5 pages
Stat
No ratings yet
Stat
5 pages
Python3 - Programming-Final Assessment - INCOMPLETO
No ratings yet
Python3 - Programming-Final Assessment - INCOMPLETO
32 pages
Storytelling With Data
No ratings yet
Storytelling With Data
2 pages
Structured Data Classification
No ratings yet
Structured Data Classification
39 pages
AI Dump - E0
No ratings yet
AI Dump - E0
3 pages
Machine Learning Scikit Handson
0% (1)
Machine Learning Scikit Handson
4 pages
An Ingression Into Deep Learning - Resp
No ratings yet
An Ingression Into Deep Learning - Resp
25 pages
Dutch Vision On (Youth) Football Development
100% (10)
Dutch Vision On (Youth) Football Development
42 pages
Unstructured
No ratings yet
Unstructured
37 pages
Unstructured Data Classification
No ratings yet
Unstructured Data Classification
2 pages
Unstructured Data Classification Handson
No ratings yet
Unstructured Data Classification Handson
4 pages
Image Classification Handson-Image - Test
No ratings yet
Image Classification Handson-Image - Test
5 pages
Data Visualization New
No ratings yet
Data Visualization New
3 pages
Stat 2
No ratings yet
Stat 2
3 pages
Python-Module03-Case Study03
100% (1)
Python-Module03-Case Study03
2 pages
This Study Resource Was
No ratings yet
This Study Resource Was
5 pages
Data Mining Nostos
100% (1)
Data Mining Nostos
39 pages
Grail
No ratings yet
Grail
23 pages
Azure ML Fresco - Toaz - Info
No ratings yet
Azure ML Fresco - Toaz - Info
28 pages
Data Mining MCQ Links
No ratings yet
Data Mining MCQ Links
1 page
Numpy - Python Package For Data
No ratings yet
Numpy - Python Package For Data
9 pages
Azure ML
100% (1)
Azure ML
5 pages
Python Pandas MCQs
No ratings yet
Python Pandas MCQs
7 pages
Data Handling in R - Introduction To Dplyr
No ratings yet
Data Handling in R - Introduction To Dplyr
2 pages
Regression Analysis - Notes
No ratings yet
Regression Analysis - Notes
3 pages
Statistics and Probability Katabasis
0% (4)
Statistics and Probability Katabasis
1 page
Deep Learning - Chorale Prelude
No ratings yet
Deep Learning - Chorale Prelude
2 pages
Data Structures in Pandas Solution.: Code
No ratings yet
Data Structures in Pandas Solution.: Code
9 pages
Data Handling Using R
No ratings yet
Data Handling Using R
2 pages
Tensor Flow
No ratings yet
Tensor Flow
2 pages
Artificial Intelligence Azure Machine Learning
No ratings yet
Artificial Intelligence Azure Machine Learning
4 pages
Statistics and Probability Katabasis 2
No ratings yet
Statistics and Probability Katabasis 2
2 pages
Intuitive Visualization Basics
0% (1)
Intuitive Visualization Basics
2 pages
Advance Statistics & Probability Q & A
100% (3)
Advance Statistics & Probability Q & A
2 pages
Deep Learning-Chorale Prelude
No ratings yet
Deep Learning-Chorale Prelude
1 page
Machine Learning Scikit Handson
No ratings yet
Machine Learning Scikit Handson
4 pages
Story Telling
33% (3)
Story Telling
8 pages
Python 3 Programming
No ratings yet
Python 3 Programming
3 pages
Python Funstinos and OOPS
No ratings yet
Python Funstinos and OOPS
7 pages
Basics of Statistics and Probability - FP: Statistical Measures
No ratings yet
Basics of Statistics and Probability - FP: Statistical Measures
12 pages
In Cryptographic Terms
No ratings yet
In Cryptographic Terms
3 pages
Data Mining
No ratings yet
Data Mining
3 pages
Clustering - The Data Ensemble Q&A
No ratings yet
Clustering - The Data Ensemble Q&A
2 pages
Image Classification Hands-On
100% (1)
Image Classification Hands-On
1 page
Scala - The Diatonic Syallable
No ratings yet
Scala - The Diatonic Syallable
2 pages
Advanced Time Series Analysis
100% (1)
Advanced Time Series Analysis
3 pages
Image Classification
No ratings yet
Image Classification
3 pages
Data Mining Methods Basics
No ratings yet
Data Mining Methods Basics
2 pages
R Basics
No ratings yet
R Basics
2 pages
Continuous Integration
No ratings yet
Continuous Integration
6 pages
Subjects You Need To Know:: Programming Languages of AI
0% (1)
Subjects You Need To Know:: Programming Languages of AI
7 pages
Informatica 41128 PDF
No ratings yet
Informatica 41128 PDF
34 pages
Tableau Sequel
No ratings yet
Tableau Sequel
5 pages
New Text Document
No ratings yet
New Text Document
10 pages
Must Know in D3js
100% (1)
Must Know in D3js
1 page
Association Rule Mining
100% (2)
Association Rule Mining
2 pages
An Enlightenment To Machine Learning
100% (1)
An Enlightenment To Machine Learning
16 pages
FP Chef-Titan - Python
No ratings yet
FP Chef-Titan - Python
5 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
6400 Hybrid NVR
No ratings yet
6400 Hybrid NVR
2 pages
Engineering 2024
No ratings yet
Engineering 2024
12 pages
Agreement of Support
No ratings yet
Agreement of Support
3 pages
Kandinsky Lesson Plan
No ratings yet
Kandinsky Lesson Plan
1 page
LAIS Administrator Recommendation 2024-25
No ratings yet
LAIS Administrator Recommendation 2024-25
3 pages
Shantelle Dauterive
No ratings yet
Shantelle Dauterive
3 pages
Proyect Trig AE4P GAC
No ratings yet
Proyect Trig AE4P GAC
4 pages
P&ID
No ratings yet
P&ID
12 pages
Daftar Pustaka
No ratings yet
Daftar Pustaka
2 pages
Fulcrum Summer Issue
No ratings yet
Fulcrum Summer Issue
20 pages
Literature Review
100% (1)
Literature Review
26 pages
Developing Your Intuition Develop Clairvoyance
0% (2)
Developing Your Intuition Develop Clairvoyance
2 pages
Sudan Holy Mountain Jebel Barkal and Its Temples
No ratings yet
Sudan Holy Mountain Jebel Barkal and Its Temples
128 pages
28 - Tradition Modernity and The Muslim Epistemology
No ratings yet
28 - Tradition Modernity and The Muslim Epistemology
6 pages
Chapter 1 and 2 Interactions Book 2
No ratings yet
Chapter 1 and 2 Interactions Book 2
24 pages
Health - Grade 2 Healthy Food
No ratings yet
Health - Grade 2 Healthy Food
6 pages
Cbse 12 Chemistry SP1
No ratings yet
Cbse 12 Chemistry SP1
6 pages
Lesson Plan
No ratings yet
Lesson Plan
4 pages
Enemy Program
No ratings yet
Enemy Program
36 pages
2.0 Lowercase Alphabet On The Keyboard Lesson
No ratings yet
2.0 Lowercase Alphabet On The Keyboard Lesson
3 pages
The Art of Verbal Storytelling
No ratings yet
The Art of Verbal Storytelling
2 pages
PRW Unit 1
No ratings yet
PRW Unit 1
7 pages
Textbook of Ear, Nose & Throat With Head and Neck Surgery For Medical Students
No ratings yet
Textbook of Ear, Nose & Throat With Head and Neck Surgery For Medical Students
2 pages
How To Prepare For The UPSC EPFO Enforcement Officer Exam - Quora
No ratings yet
How To Prepare For The UPSC EPFO Enforcement Officer Exam - Quora
2 pages
János Zolnay Towards A Caste-Like" Society. Trends and Impact of Exclusionary Social Policy of The "Illiberal" State in Hungary
No ratings yet
János Zolnay Towards A Caste-Like" Society. Trends and Impact of Exclusionary Social Policy of The "Illiberal" State in Hungary
32 pages
2021 Annual Teaching Plan Term 1-4 Economic & Management Sciences: Grade 9
No ratings yet
2021 Annual Teaching Plan Term 1-4 Economic & Management Sciences: Grade 9
2 pages
Week 11
No ratings yet
Week 11
6 pages
6.3 Exponential Functions
No ratings yet
6.3 Exponential Functions
1 page