DSBDAL Lab Manual
DSBDAL Lab Manual
LABORATORY MANUAL
2023-24
TE-COMPUTER ENGINEERING
SEMESTER-II
Subject Code:310256
-: Name of Faculty:-
Prof. A.M.Karanjkar
Prof. S.S.Peerzade
DSBDAL T.E.C.E (Sem- [2023-24]
II)
INDEX
GROUP A: DATA SCIENCE
Sr.No. Title Page
Number
1 Data Wrangling, I
Perform the following operations using Python on any open source dataset (e.g., data.csv)
1. Import all the required Python Libraries.
2. Locate an open source data from the web (e.g. https://www.kaggle.com).
Provide a cleardescription of the data and its source (i.e., URL of the web site).
3. Load the Dataset into pandas data frame.
4. Data Preprocessing: check for missing values in the data using pandas insult(),
describe()function to get some initial statistics. Provide variable descriptions.
Types of variables etc. Check the dimensions of the data frame.
5. Data Formatting and Data Normalization: Summarize the types of variables by
checking the data types (i.e., character, numeric, integer, factor, and logical) of
the variables in the data set. If variables are not in the correct data type, apply
proper type conversions.
6. Turn categorical variables into quantitative variables in Python.
In addition to the codes and outputs, explain every operation that you do in the above steps
and explain everything that you do to import/read/scrape the data set.
2 Data Wrangling II
Create an “Academic performance” dataset of students and perform the following
operations using Python.
1. Scan all variables for missing values and inconsistencies. If there are missing
values and/or inconsistencies, use any of the suitable techniques to deal with them.
2. Scan all numeric variables for outliers. If there are outliers, use any of the suitable
techniques to deal with them.
3. Apply data transformations on at least one of the variables. The purpose of this
transformation should be one of the following reasons: to change the scale for
better understanding of the variable, to convert a non-linear relation into a linear
one, or to decrease the skewness and convert the distribution into a normal
distribution. Reason and document your approach properly.
3 Descriptive Statistics - Measures of Central Tendency and variability
Perform the following operations on any open source dataset (e.g., data.csv)
1. Provide summary statistics (mean, median, minimum, maximum, standard
deviation) for a dataset (age, income etc.) with numeric variables grouped by one
of the qualitative (categorical) variable. For example, if your categorical variable
is age groups and quantitative variable is income, then provide summary statistics
of income grouped bythe age groups. Create a list that contains a numeric value
for each response to the categorical variable.
2. Write a Python program to display some basic statistical details like percentile,
mean, standard deviation etc. of the species of ‘Iris-setosa’, ‘Iris-versicolor’ and
‘Iris- versicolor’ of iris.csv dataset. Provide the codes with outputs and explain
everything that you do in this step.
4 Data Analytics I
Create a Linear Regression Model using Python/R to predict home prices using Boston
Housing Dataset (https://www.kaggle.com/c/boston-housing). The Boston Housing
dataset contains information about various houses in Boston through different parameters.
There are 506 samplesand 14 feature variables in this dataset.
The objective is to predict the value of prices of the house using the given features.
5 Data Analytics II
1. Implement logistic regression using Python/R to perform
classification onSocial_Network_Ads.csv dataset.
2. Compute Confusion matrix to find TP, FP, TN, FN, Accuracy, Error rate,
Precision, Recall on the given dataset.
6 Data Analytics III
1. Implement Simple Naïve Bayes classification algorithm using Python/R on
iris.csvdataset.
2. Compute Confusion matrix to find TP, FP, TN, FN, Accuracy, Error rate,
Precision, Recall on the given dataset.
7 Text Analytics
1. Extract Sample document and apply following document preprocessing
Date
Signature
Date
Signature
Date
Signature
Date
Signature
Date
Signature
Date
Signature
Date
Signature
Date
Signature
Date
Signature
Date
Signature
Date
Signature
Date
Signature
Date
Signature
Title
Date
Signature
Title
Date
Signature
Title
Date
Signature
Title
Date
Signature
Title
Date
Signature