Ethnotech - Data Science With Python
Ethnotech - Data Science With Python
CERTIFICATE FORMAT
ETHNOTECH ACADEMY
BENEFITS OF THE PROGRAM
ETHNOTECH ACADEMY 3
COURSE OUTLINE
L1 Introduction L5 Regression
L2 Foundation-Panda L6 Classification
L3 Foundation-Numpy L7 Clustering
Foundation-Descriptive
L4
Analysis L8 Text Analytics
ETHNOTECH ACADEMY
EXIT PROFILE
Financial
Data
Analyst
Scientist
Data Business
Journalist analyst
Big Data
Analyst
ETHNOTECH ACADEMY
SESSION 1
Introduction
• Introduction to Python
• Introduction to AI-ML
ETHNOTECH ACADEMY
Introduction of data science
ETHNOTECH ACADEMY
Introduction of data science
ETHNOTECH ACADEMY
Introduction of data science
ETHNOTECH ACADEMY
Introduction Contd…
• To find the best suited time to deliver goods.
Application of Data
science :
ETHNOTECH ACADEMY
Introduction Contd…
What is Data?
• Data is a collection of information.
• One purpose of Data Science is to structure data, making it
interpretable and easy to work with.
ETHNOTECH ACADEMY
Introduction Contd…
Structured Data
• Structured data is organized and easier to work with.
• We can use an array or a database table to
structure or present data.
Example of an array:
[80, 85, 90, 95, 100, 105, 110, 115, 120, 125]
ETHNOTECH ACADEMY
Introduction Contd…
Unstructured Data
• Unstructured data is not organized. We must organize the
data for analysis purposes.
ETHNOTECH ACADEMY
Introduction Contd…
Database Table
• A database table is a table with structured data.
ETHNOTECH ACADEMY
Introduction Contd…
Database Table Structure
Column 1 Column 2 Column 3 Column 4 Column 5 Column 6
ETHNOTECH ACADEMY
Introduction Contd…
Variables
• A variable is defined as something that can be measured or
counted.
• Examples can be characters, numbers or time.
Duration Average_P Max_Pulse Calorie_Bu
• There are 4 columns, meaning ulse rnage
(Duration, Average_Pulse,
30 85 120 250
45 90 130 260
Max_Pulse,Calorie_Burnage). 45 95 130 270
60 110 145
ETHNOTECH ACADEMY
Introduction to AI-ML
What is Artificial Intelligence?
• AI is one of the fascinating and universal fields of Computer
science which has a great scope in future. AI holds a tendency
to cause a machine to work as a human.
ETHNOTECH ACADEMY
Introduction to AI-ML Contd…
decisions”.
ETHNOTECH ACADEMY
Why Artificial Intelligence?
1.With the help of AI, you can create such software or devices which can
solve real-world problems very easily and with accuracy such as
health issues, marketing, traffic issues, etc.
2.With the help of AI, you can create your personal virtual Assistant, such
as Cortana, Google Assistant, Siri, etc.
3.With the help of AI, you can build such Robots which can work in an
environment where survival of humans can be at risk.
ETHNOTECH ACADEMY
Pros And Cons of Artificial Intelligence
ETHNOTECH ACADEMY
What is Machine Learning
ETHNOTECH ACADEMY
Machine Learning Contd…
ETHNOTECH ACADEMY
Machine Learning Contd…
ETHNOTECH ACADEMY
Classification of Machine Learning
ETHNOTECH ACADEMY
Supervised Learning
1.Classification
2.Regression
ETHNOTECH ACADEMY
Supervised Learning Contd…
ETHNOTECH ACADEMY
Supervised Learning Contd…
ETHNOTECH ACADEMY
Supervised Learning Contd…
ETHNOTECH ACADEMY
Unsupervised Learning
ETHNOTECH ACADEMY
Unsupervised Learning Contd…
ETHNOTECH ACADEMY
Unsupervised Learning Contd…
ETHNOTECH ACADEMY
Reinforcement Learning
ETHNOTECH ACADEMY
Reinforcement Learning
ETHNOTECH ACADEMY
Reinforcement Learning
ETHNOTECH ACADEMY
Difference Between Supervised and
Unsupervised Learning
ETHNOTECH ACADEMY
Difference b/w Supervised, Unsupervised
& Semi Supervised Learning
ETHNOTECH ACADEMY
Introduction to Python
Why Programming?
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
Introduction Contd…
Applications
ETHNOTECH ACADEMY
Web application in Python
web applications
ETHNOTECH ACADEMY
Data Analysis in Python
• Python is the leading language of choice for many data
scientists.
• It has grown in popularity due to excellent libraries like:
Numpy
Pandas
Matplotlib
• A data scientist is a professional
responsible for collecting,analyzing
and interpreting extremely large
amounts of data.
ETHNOTECH ACADEMY
Machine Learning in Python
• It is mainly used in
Face Recognition,
Music recommendation,
ETHNOTECH ACADEMY
Raspberry Pi in Python
ETHNOTECH ACADEMY
Game Development in Python
• We can write whole games in Python using PyGame
• Popular games developed in
Python are:
Bridge Commander
Civilization IV
Battlefield 2
Eve Online
48
Freedom Force
ETHNOTECH ACADEMY
Introduction Contd…
ETHNOTECH ACADEMY
Who created Python?
ETHNOTECH ACADEMY
Features of Python
ETHNOTECH ACADEMY
Features of Python
Python uses both a compilers as well as interpreter for converting
our source and running it.
• Numbers
• String
• Bool
• List
• Tuple
• Dictionary
• Set
ETHNOTECH ACADEMY
Bool
• Data type Boolean is used to store 2 values which is True or False.
• All the comparators used will result in True/False
• The three Boolean operators (and, or, and not) are used to compare
Boolean values.
• Like comparison operators, they evaluate these expressions down to a
Boolean value.
• After any math and comparison operators evaluate, Python evaluates the
not operators first, then the and operators, and then the or operators.
>>> 42 == 42 >>> 2 != 3
True True
>>> 2 != 2 >>> 42 == 99
False False
ETHNOTECH ACADEMY
Numbers
Int(signed integers): They are positive or negative whole numbers with no
decimal point.
Long(long integers): They are integers of unlimited size, written like integers
and followed by uppercase or lowercase L.
Float(floating point real values): They represent real numbers and are written
with a decimal point. Floats may also be in scientific notation with E or e
indicating power of 10.
Example: 2.5e2 = 2.5 x 10^2 = 2.5 x 100 = 250
Complex(complex numbers): are written in the form a+bj. The real part of
number is a and the imaginary part is b.
Letter j should appear only in suffix, not in prefix.
Example: 3+5j
ETHNOTECH ACADEMY
String
• Strings are a collection of characters. A string can group
any type of known characters i.e. letters ,numbers and
special characters. They are enclosed in single quote,
double quote, triple (literal) quote or raw string.
Example: ‘Hi’ , “hello” , ‘1234’
Example:
S1 = 'Mango' print(S3)
print(S4)
S2 = "Hello" S5 = "Hey, \"Good\"
S3 = "Hey, 'Good' Morning" Morning“
S4 = 'Hey, "Good" Morning‘ print(S5)
ETHNOTECH ACADEMY
List
List is a container that holds many objects under a single
name.
List can be written as a list of comma-separated values (items)
Lists can be nested just like arrays, i.e., you can have a list of
lists.
Lists are mutable.
Syntax:
List_name = [item1 , item2 , item3]
List_name = []
List_name[index]
ETHNOTECH ACADEMY
Tuple
ETHNOTECH ACADEMY
Dictionary
Dictionaries are enclosed by curly braces ( { } ) and values
can be assigned and accessed using square braces ( [] ).
ETHNOTECH ACADEMY
Dictionary Contd…
Example:
ETHNOTECH ACADEMY
Sets
• Curly braces or the built-in set() function can be used to create sets.
ETHNOTECH ACADEMY
Sets Contd…
• A set is mutable, but may not contain mutable items like a
list, set, or even a dictionary.
• A set may contain values of different types.
Examples:
x = {12,3,4,45}
y = {2,4,6,78}
x.union(y)
{2, 3, 4, 6, 12, 45, 78}
ETHNOTECH ACADEMY
Sets Contd…
x.intersection(y)
{4}
x.difference(y) # elements present in x but not in y
{3, 12, 45}
y.difference(x) # elements present in y but not in x
{2, 6, 78}
x.symmetric_difference(y) #returns unique elements in both
{2, 3, 6, 12, 45, 78}
ETHNOTECH ACADEMY
Q&A
• Data Science
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
• False
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
• The process of executing a program in a high-level language by
translating it one line at a time is called _______
a. Interpretation
b. Compilation
c. Recursion
d. Member function
• Interpretation
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
Question 1
ETHNOTECH ACADEMY
Question 2
ETHNOTECH ACADEMY
Question 3
ETHNOTECH ACADEMY
SUMMARY
• Fundamentals of Data science and AI-ML
• Basics of Python Programming
• Usage of Various datatypes in python
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
Session 2
Foundation-Panda
• Header
• Panda read csv
• datatype and statistics
• Panda column operations
• Panda operations
• Merge and concat
• Graphs
ETHNOTECH ACADEMY
Foundation Panda
Introduction to Pandas
• Pandas is an open source Python library for highly specialized
data analysis.
• It is currently the reference point that all professionals using the
Python language need to study for the statistical purposes of
analysis and decision making.
• The library was designed and developed primarily by Wes
McKinney starting in 2008. In 2012, Sien Chang, one of his
colleagues, was added to the development.
• Together they set up one of the most used libraries in the Python
community
ETHNOTECH ACADEMY
Foundation Panda Contd…
• Pandas arises from the need to have a specific library to
analyze data that provides, in the simplest possible way, all
the instruments for data processing, data extraction, and
data manipulation
ETHNOTECH ACADEMY
Feature Of Pandas
ETHNOTECH ACADEMY
Header
ETHNOTECH ACADEMY
Creating a data frame from CSV file
and creating row header
• While reading the data and storing it in a data frame, or
creating a fresh data frame , column names can be
specified by using the names attribute of the read_csv()
method in python.
• https://shorturl.at/vyEJL
ETHNOTECH ACADEMY
Code Snippet
ETHNOTECH ACADEMY
Output
ETHNOTECH ACADEMY
Creating a data frame and creating
row header in Python itself
• We can create a data frame of specific number of rows and
columns by first creating a multi -dimensional array and then
converting it into a data frame by
the pandas.DataFrame() method.
ETHNOTECH ACADEMY
Code Snippet
ETHNOTECH ACADEMY
Output
ETHNOTECH ACADEMY
Panda column operations
1.Sort by a Column with Pandas
• Sorting your Pandas dataframe df by one or more columns
can be done either ascending or descending.
ETHNOTECH ACADEMY
Panda column operations Contd…
ETHNOTECH ACADEMY
Panda column operations Contd…
2.Rename Columns in Pandas
Syntax
• pd.rename(columns={'original_col_name':'new_col_name'})
• pd.rename(columns={'original_col1_name':
'new_col1_name', 'original_col2_name': 'new_col2_name'})
ETHNOTECH ACADEMY
Panda column operations Contd…
• df.drop('column_name', axis=1)
ETHNOTECH ACADEMY
Panda column operations Contd…
• df.groupby('col_1').count()
ETHNOTECH ACADEMY
Panda column operations Contd…
ETHNOTECH ACADEMY
Panda column operations Contd…
• df['col'].apply(function)
ETHNOTECH ACADEMY
Panda column operations Contd…
• df['col'].apply(lambda x: x**2 + 5)
ETHNOTECH ACADEMY
Panda operations Contd…
Types Of Operation
• Creating a data frame with pandas
ETHNOTECH ACADEMY
Panda operations Contd…
1. Creating a data frame with pandas:
ETHNOTECH ACADEMY
Output
ETHNOTECH ACADEMY
Panda operations Contd…
ETHNOTECH ACADEMY
Panda operations Contd…
ETHNOTECH ACADEMY
Output
ETHNOTECH ACADEMY
Panda operations Contd…
ETHNOTECH ACADEMY
Output
ETHNOTECH ACADEMY
Panda operations Contd…
ETHNOTECH ACADEMY
Output
ETHNOTECH ACADEMY
Merge and Concat
Dataframe
A dataframe is a two-dimensional data structure having
multiple rows and columns. In a dataframe, the data is
aligned in the form of rows and columns only. A dataframe
can perform arithmetic as well as conditional operations. It
has mutable size.
ETHNOTECH ACADEMY
Merge and Concat Contd…
Example:
ETHNOTECH ACADEMY
Merge and Concat Contd…
Output:
ETHNOTECH ACADEMY
Merge and Concat Contd…
DataFrames Merge:
ETHNOTECH ACADEMY
Join Operations
ETHNOTECH ACADEMY
Merge and Concat Contd…
Example:
ETHNOTECH ACADEMY
Merge and Concat Contd…
Output:
ETHNOTECH ACADEMY
Merge and Concat Contd…
DataFrames Concat:
concat() function does all of the heavy lifting of performing
concatenation operations along an axis while performing
optional set logic (union or intersection) of the indexes (if any)
on the other axes.
ETHNOTECH ACADEMY
Merge and Concat Contd…
ETHNOTECH ACADEMY
Merge and Concat Contd…
ETHNOTECH ACADEMY
Merge and Concat Contd…
Output:
ETHNOTECH ACADEMY
DataFrames
A Pandas DataFrame is a 2 dimensional data structure, like a
2 dimensional array, or a table with rows and columns.
Example
ETHNOTECH ACADEMY
Output
ETHNOTECH ACADEMY
Named Indexes
• With the index argument, we can name our own indexes.
ETHNOTECH ACADEMY
Output
ETHNOTECH ACADEMY
Load Files Into a DataFrame
ETHNOTECH ACADEMY
Output
ETHNOTECH ACADEMY
Graphs
ETHNOTECH ACADEMY
Output
ETHNOTECH ACADEMY
Plot of different data
Using more than one list of data in a plot.
ETHNOTECH ACADEMY
Output
ETHNOTECH ACADEMY
Plot on given axis
We can explicitly define the name of axis and plot the data on
the basis of this axis.
ETHNOTECH ACADEMY
Output
ETHNOTECH ACADEMY
Bar plot using matplotlib:
Find different types of bar plot to clearly understand the
behaviour of given data.
ETHNOTECH ACADEMY
Output
ETHNOTECH ACADEMY
Scatter plot:
ETHNOTECH ACADEMY
Output
ETHNOTECH ACADEMY
Documentation – Overview of Data Science
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
• During the execution of following code, what will be the response,
we get
import pandas as pd
s =pd.Series([1,2,3,4,5],index= ['a','b','c','d','e'])
print(s['f’])
A. KeyError
B. IndexError
C. ValueError
D. Semantic error
• Key Error
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
SUMMARY
• Fundaments of Pandas and its usage
• Columnar operations of Pandas
• Usage of Pandas Library in Data Sciecne
• Applications of Graphs in Pandas
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
Question 1
ETHNOTECH ACADEMY
Question 2
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
Session 3
Foundation-Numpy
• One Dimension
• Two Dimension
ETHNOTECH ACADEMY
Session 3
Foundation-Descriptive Analysis
• Data Dictionary
Method 1:
ETHNOTECH ACADEMY
One-dimensional NumPy array
Output
ETHNOTECH ACADEMY
Method 2
fromiter()
ETHNOTECH ACADEMY
Method 2 Contd…
Output
ETHNOTECH ACADEMY
Method 3
arange()
• It returns evenly spaced values within a given interval.
Output
ETHNOTECH ACADEMY
Method 4
ETHNOTECH ACADEMY
Numpy — Stacking Arrays
ETHNOTECH ACADEMY
Stack
• Joins arrays with given axis element by element
• Both input arrays should be in same dimention/shape
• Axis parameter in Stack works as dimention here instead of
horizontal/vertical manner.
• If Axis is 0, then it will join by first dimention
• If Axis is 1, the it will join by second dimention
• The maximum dimension that we can mention is dimention
of input arrays (say n) + 1.
• If axis is given above n + 1, then “out of bounds for array of
dimension” exception will be thrown
ETHNOTECH ACADEMY
Stack Contd…
ETHNOTECH ACADEMY
Stack by First Dimension
ETHNOTECH ACADEMY
Stack by Second dimenstion
ETHNOTECH ACADEMY
Stack by First Dimension
ETHNOTECH ACADEMY
Stack by Second Dimension
ETHNOTECH ACADEMY
HStack
Stacks horizontally
• This function does not work with axis. It extends first array
by second array Horizontally
• As it extends horizontally, both the arrays should have same
number of rows else Value Error will be returned.
ETHNOTECH ACADEMY
HStack for 1D Arrays
ETHNOTECH ACADEMY
HStack for 2D Arrays
ETHNOTECH ACADEMY
Vstack
• This function does not work with axis. It extends first array
by second array Vertically
ETHNOTECH ACADEMY
VStack by 2D Arrays
ETHNOTECH ACADEMY
Foundation-Descriptive Analysis
Introduction
ETHNOTECH ACADEMY
Foundation-Descriptive Analysis Contd…
ETHNOTECH ACADEMY
Centrality measures
2. Median
3. Mode
ETHNOTECH ACADEMY
Mean
To compute mean, sum all the values and divide the sum by
the number of values.
ETHNOTECH ACADEMY
Mean Contd…
Total
Name Subject 1 Subject 2 Subject 3 Subject 4 Marks
Student 6 15 18 7 12 52
ETHNOTECH ACADEMY
Mean with python
The Syntax is
ETHNOTECH ACADEMY
Parameters
• arr : [array_like]input array.
• axis : [int or tuples of int]axis along which we want to calculate
the arithmetic mean. Otherwise, it will consider arr to be
flattened(works on allthe axis). axis = 0 means along the column
and axis = 1 means working along the row.
• out : [ndarray, optional]Different array in which we want to
place the result. The array must have the same dimensions as
expected output.
• dtype : [data-type, optional]Type we desire while computing
mean.
• Results : Arithmetic mean of the array (a scalar value if axis is
none) or array with mean values along specified axis.
ETHNOTECH ACADEMY
Example 1
Output:
ETHNOTECH ACADEMY
Example 2
ETHNOTECH ACADEMY
Output
ETHNOTECH ACADEMY
Median
• It is the value where the upper half of the data lies above it
and lower half lies below it. In other words, it is the middle
value of a data set.
• To calculate the median, arrange the data points in the
increasing order and the middle value is the median.
• It is easy to find out the middle value if there is anodd
number of data points, say, we want to find the median for
marks of all students forSubject 1.
• When marks are arranged in the increasing order, we get
{7,9.5,10,11,14,15,19}. Clearly, the middle value is 11;
therefore, the median is 11.
ETHNOTECH ACADEMY
Median Contd…
• If Student 7 did not write the exam, we will have marks as
{7,9.5,10,11,14,15}. This time there is no clear middle value.
• Then, take the mean of the third and fourth values, which is
(10+11)/2=10.5, so the median in this case is 10.5.
ETHNOTECH ACADEMY
Median in Python
• numpy.median(arr, axis = None) : Compute the median of
the given data (array elements) along the specified axis.
ETHNOTECH ACADEMY
Parameters
• arr : [array_like]input array.
• axis : [int or tuples of int]axis along which we want to calculate
the median. Otherwise, it will consider arr to be flattened(works
on all the axis). axis = 0 means along the column and axis = 1
means working along the row.
• out : [ndarray, optional] Different array in which we want to
place the result. The array must have the same dimensions as
expected output.
• dtype : [data-type, optional]Type we desire while computing
median.
• Results : Median of the array (a scalar value if axis is none) or
array with median values along specified axis.
ETHNOTECH ACADEMY
Example 1
Output:
ETHNOTECH ACADEMY
Example 2
ETHNOTECH ACADEMY
Output
ETHNOTECH ACADEMY
Mode
• It is the value that occurs the most number of times in our data
set.
• Suppose there are 15 students appearing for an exam and
following is the result:
Output:
ETHNOTECH ACADEMY
Example 2
Output:
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
Q&A
• NumPy package is capable to do fast operations on arrays.
A. True
B. False
• True
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
NumPy is often used along with packages like?
a. Node.js
b. Matplotlib
c. SciPy
d. Both B and C
• Both B and C
ETHNOTECH ACADEMY
Q&A
The most important object defined in NumPy is an N-dimensional array type
called?
a. ndarray
b. narray
c. nd_array
d. darray
• ndarray
ETHNOTECH ACADEMY
Q&A
How to convert numpy array to list?
a. array.list()
b. array.list
c. list.array()
d. list(array)
• list(array)
ETHNOTECH ACADEMY
Q&A
What of the following syntax is used to install numpy in the system containing
python3?
a. pip numpy install python3
b. pip3 install numpy
c. pip install numpy
d. python3 pip3 numpy install
• pip3 install numpy
ETHNOTECH ACADEMY
Q&A
What does size attribute in numpy use to find?
a. shape
b. date & time
c. objects
d. number of items
• number of items
ETHNOTECH ACADEMY
Q&A
Is the following syntax true to import numpy module?
fetch numpy as np
np.array(list)
A. Yes, true
B. Not, true
• Not, true
ETHNOTECH ACADEMY
Q&A
What are the attributes of numpy array?
a. shape, dtype, ndim
b. objects, type, list
c. objects, non vectorization
d. Unicode and shape
• shape, dtype, ndim
ETHNOTECH ACADEMY
Q&A
What is the output of following code?
import numpy as np
ary = np.array([1,2,3,5,8])
ary = ary + 1
print (ary[1])
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
SUMMARY
• Fundamentals of Numpy library
• Usage of Numpy library in Data Science
• Various operations of Numpy
• Implementation of Merge and Concatenation operations in Numpy
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
Session 4
Regression
• Introduction and Preprocessing
• Feature Selection Regularisation
• Residual Analysis
• Data Read
• Normality test and BoxCox transformation
• Linear Regression structure
ETHNOTECH ACADEMY
Session 4
• CatBoost
• CatBoost Hyperparameter Tuning
ETHNOTECH ACADEMY
Preprocessing
ETHNOTECH ACADEMY
Preprocessing Contd…
ETHNOTECH ACADEMY
Preprocessing Contd…
ETHNOTECH ACADEMY
Tools and methods for preprocessing
data
ETHNOTECH ACADEMY
Tools and methods for
preprocessing data Contd…
ETHNOTECH ACADEMY
Why is data preprocessing important?
Real-world data is messy and is often created, processed and
stored by a variety of humans, business processes and
applications.
ETHNOTECH ACADEMY
Regression
• Regression searches for relationships among variables.
ETHNOTECH ACADEMY
Need of Regression
ETHNOTECH ACADEMY
Need of Regression Contd…
ETHNOTECH ACADEMY
Linear Regression Contd…
ETHNOTECH ACADEMY
Steps involved in implementing linear
regression:
• Import the packages and classes that you need.
• Provide data to work with, and eventually do appropriate
transformations.
• Create a regression model and fit it with existing data.
• Check the results of model fitting to know whether the
model is satisfactory.
• Apply the model for predictions.
ETHNOTECH ACADEMY
Step 1: Import packages and classes
import numpy as np
ETHNOTECH ACADEMY
Step2: Provide data
ETHNOTECH ACADEMY
Step 3: Create a model and fit it
model = LinearRegression()
model.fit(x, y)
LinearRegression()
ETHNOTECH ACADEMY
Step 3 Contd…
model = LinearRegression().fit(x, y)
ETHNOTECH ACADEMY
Step 4: Get results
• Once you have your model fitted, you can get the results to
check whether the model works satisfactorily and to
interpret it.
r_sq = model.score(x, y)
y_pred = model.predict(x)
print(f"predicted response:\n{y_pred}")
predicted response:
ETHNOTECH ACADEMY
Box-Cox Transformation
ETHNOTECH ACADEMY
Implementation
ETHNOTECH ACADEMY
Output
ETHNOTECH ACADEMY
Catboost
ETHNOTECH ACADEMY
Implementation
ETHNOTECH ACADEMY
Data Preparation
• Initially we’re simply going to drop any rows that contain
NaN for the “survived” column which is our target as this
doesn’t help our model.
df.dropna(subset=['survived'],inplace=True)
• we are only going to make use of 4 features; pclass, sex, age
and fare. Let’s split our data into X and y to get our feature
and target dataframes.
X = df[['pclass','sex', 'age', 'fare']]
y = df['survived']
ETHNOTECH ACADEMY
Data Preparation Contd…
• Now we still need to treat some of the features. We need to
convert the “pclass” column to a string data type as
although it appears numeric, the values are discrete so it’s
actually a categorical variable in this context. In addition, the
“fare” and “age” columns contain some NaNs so we’ll
replace these with zeros.
X['pclass'] = X['pclass'].astype('str')
X['fare'].fillna(0,inplace=True)
X['age'].fillna(0,inplace=True)
ETHNOTECH ACADEMY
Preparing Categorical Features
ETHNOTECH ACADEMY
Preparing Categorical Features
Contd…
ETHNOTECH ACADEMY
Preparing Categorical Features Contd…
ETHNOTECH ACADEMY
Preparing Categorical Features Contd…
ETHNOTECH ACADEMY
Training
ETHNOTECH ACADEMY
Training Contd…
• To train the model we are going to use Catboost’s inbuilt grid
search method. If you have used Sci-Kit learns Grid Search CV
then this works in the same way. First we declare a dictionary of
the hyperparameters that we want to tune and lists of values to
test. We have decided to tune just a few of the most influential
parameters: learning rate, tree depth, L2 leaf regularisation and
also the number of iterations we will train the model for.
ETHNOTECH ACADEMY
Training Contd…
• Now we can fit the model using the grid search method by
passing the grid dictionary we declared above along with
the training data pool. By default grid search splits the
training data into an 80/20 split for training and testing with
a three fold cross validation strategy.
model.grid_search(grid,train_dataset)
ETHNOTECH ACADEMY
Training Contd…
• The model has now been trained and you can print out the
optimum parameters that have been found using grid
search if you’re interested.
model.get_params()
ETHNOTECH ACADEMY
Evaluation
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
Project
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
SUMMARY
• Fundamentals of Linear equation
• Need of Linear Regression
• Implementation of Box Cox Transformation
• Implementation of Cat boost over categorical data
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
• Cat boost
ETHNOTECH ACADEMY
• State the following statement is true or false.
• “Seaborn is a Python data visualization library based on matplotlib.”
a. True
b. False
• True
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
Session 5
Classification
• Classification Introduction
ETHNOTECH ACADEMY
Session 5
• Classification:Logistic Regression
• Classification: Logistic Regression code
ETHNOTECH ACADEMY
Classification
Introduction
• Data Classification in data science refers to the process that
tags and categorizes any kind of data so that it can be better
understood and analyzed. The latter is what we'll be
focusing on.
• But also, a well-planned Data Classification
system makes essential data easy to find
and retrieve.
ETHNOTECH ACADEMY
Types of Data Classification
ETHNOTECH ACADEMY
Types of Data Classification Contd…
Text Classification
Document Classification
Image Classification
Text Classification
• Text classification is a powerful tool for utilizing these
unstructured data we all sit on top of by utilizing NLP. In the
words of our users, it feels like wizardry when you create
your first classifier and see hundreds of survey responses
categorized in seconds.
ETHNOTECH ACADEMY
Data Classification applications
Contd…
Document Classification
• Document Classification focuses on processes that mainly
apply content-specific classification - e.g. classifying
incoming email attachments by type. It differs from text
classification, as instead of specific phrases or paragraphs
being classified, the whole document is taken into
consideration.
ETHNOTECH ACADEMY
Data Classification applications
Contd…
ETHNOTECH ACADEMY
Data Classification applications
Contd…
Image Classification
• Image Classification categorizes any incoming image file by
predetermined labels. It is often combined with object
detection. These days you can create your own image
classifier and teach the model to make subjective decisions
based on your logic: whether an incoming ad creative is
good or not; whether the image fits into the product
portfolio; whether an image you snapped on your holidays
is appropriate to show to your grandparents.
ETHNOTECH ACADEMY
Data Classification applications
Contd…
ETHNOTECH ACADEMY
Classification: Random Forest
ETHNOTECH ACADEMY
Random Forest Contd…
ETHNOTECH ACADEMY
Advantages
ETHNOTECH ACADEMY
Advantages Contd…
ETHNOTECH ACADEMY
Disadvantages:
ETHNOTECH ACADEMY
Implementing Random Forest
Classification Using IRIS Dataset
ETHNOTECH ACADEMY
Implementing Random Forest
Classification on a Real-World Data Set
1. IMPORTING PYTHON LIBRARIES AND LOADING OUR DATA SET
INTO A DATA FRAME
ETHNOTECH ACADEMY
2. SPLITTING OUR DATA SET INTO
TRAINING SET AND TEST SET
ETHNOTECH ACADEMY
3. CREATING A RANDOM FOREST REGRESSION
MODEL AND FITTING IT TO THE TRAINING DATA
ETHNOTECH ACADEMY
PREDICTING THE TEST SET RESULTS AND
MAKING THE CONFUSION MATRIX
ETHNOTECH ACADEMY
Catboost Classifier
ETHNOTECH ACADEMY
key features of cat boost algorithm:
ETHNOTECH ACADEMY
Catboost Classifier
ETHNOTECH ACADEMY
Steps involved in Catboost
implementation
• Define Dataset
• Apply Model
• Predict
ETHNOTECH ACADEMY
• Applying CatBoost’s regressor to the regression dataset. The
dataset contains the price information of houses in Dushanbe
city. The input variables are the number of rooms, floors, area,
and location
ETHNOTECH ACADEMY
Step1: Installations and Imports
ETHNOTECH ACADEMY
Step2: Define Dataset
ETHNOTECH ACADEMY
Step3: Apply Model
ETHNOTECH ACADEMY
Step4:Predict
ETHNOTECH ACADEMY
Classification: One class SVM
ETHNOTECH ACADEMY
Classification
ETHNOTECH ACADEMY
Classification Contd…
ETHNOTECH ACADEMY
Logistic Regression
ETHNOTECH ACADEMY
Logistic Regression Contd…
ETHNOTECH ACADEMY
Implementation of Logistic Regression
Scenerio
• User Database – This dataset contains information about
users from a company’s database. It contains information
about UserID, Gender, Age, EstimatedSalary, and
Purchased. We are using this dataset for predicting whether
a user will purchase the company’s newly launched product
or not.
• Do refer to the below table from where data is being
fetched from the dataset.
ETHNOTECH ACADEMY
Implementation of Logistic Regression
Contd…
• Let us make the Logistic Regression model, predicting
whether a user will purchase the product or not
ETHNOTECH ACADEMY
Import Libraries
ETHNOTECH ACADEMY
Import Libraries Contd…
ETHNOTECH ACADEMY
Splitting The Dataset: Train and
Test dataset
ETHNOTECH ACADEMY
Splitting The Dataset: Train and
Test dataset Contd…
Now, it is very important to perform feature scaling here
because Age and Estimated Salary values lie in different
ranges. If we don’t scale the features then the Estimated Salary
feature will dominate the Age feature when the model finds
the nearest neighbor to a data point in the data space.
ETHNOTECH ACADEMY
Output
ETHNOTECH ACADEMY
Evaluation Metrics
Output:
ETHNOTECH ACADEMY
Visualizing the performance of our
model
ETHNOTECH ACADEMY
Visualizing the performance of our
model Contd…
ETHNOTECH ACADEMY
Visualizing the performance of our
model Contd…
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
SUMMARY
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
Clustering
• Clustering: Introduction
• Clustering: KMeans
• Clustering: Agglomerative
• Clustering: KNN
• Clustering:KNN using Iris
ETHNOTECH ACADEMY
Clustering
• Machine learning is a subset of Artificial Intelligence that
allows a machine to automatically learn from past data
without programming explicitly.
• Classical machine learning is often categorized by how an
algorithm learns to become more accurate in its
predictions.
• Clustering is the task of dividing the population or data
points into several groups such that data points in the same
groups are similar to other data points in that group and
dissimilar to the data points in other groups.
ETHNOTECH ACADEMY
Clustering Contd…
ETHNOTECH ACADEMY
THE IMPORTANCE OF CLUSTERING
ETHNOTECH ACADEMY
K Means clustering Contd…
ETHNOTECH ACADEMY
Working of K-means
ETHNOTECH ACADEMY
Working of K-means Contd…
ETHNOTECH ACADEMY
Working of K-means Contd…
The cluster centers are then updated to be the “centers” of all
the points assigned to it in that pass. This is done by re-
calculating the cluster centers as the average of the points in
each respective cluster.
The algorithm repeats until there’s a minimum change of the
cluster centers from the last iteration.
ETHNOTECH ACADEMY
Limitations of Kmeans
if the clusters have more complex geometric shapes, the
algorithm does a poor job of clustering the data.
the algorithm does not allow data points distant from one
another to share the same cluster, regardless of whether
they belong in the cluster. K-means does not itself learn the
number of clusters from the data, rather that information
must be pre-defined.
when there is overlapping between or among clusters, K-
means cannot determine how to assign data points where
the overlap occurs.
ETHNOTECH ACADEMY
Implemenation of K means
algorithm
Importing Libraries
ETHNOTECH ACADEMY
Working with Dataset
ETHNOTECH ACADEMY
Visualize the data points
ETHNOTECH ACADEMY
Visualize the data points Contd…
ETHNOTECH ACADEMY
Find the K value using the Elbow
method
ETHNOTECH ACADEMY
Find the K value using the Elbow
method Contd…
Centroid points
• array([[88.2 , 17.11428571],
[55.2962963 , 49.51851852],
[86.53846154, 82.12820513],
[25.72727273, 79.36363636],
[26.30434783, 20.91304348]])
ETHNOTECH ACADEMY
Visualize the clusters formed
ETHNOTECH ACADEMY
Visualize the clusters formed
ETHNOTECH ACADEMY
Agglomerative Clustering
ETHNOTECH ACADEMY
Agglomerative Clustering Contd…
• In this clustering approach, we start with the cluster leaf and
then move upward until the cluster root is finally obtained.
ETHNOTECH ACADEMY
Agglomerative Clustering Contd…
ETHNOTECH ACADEMY
Working of Agglomerative Hierarchical
Clustering
• Step-1: Create each data point as a single cluster. Let's say
there are N data points, so the number of clusters will also be
N.
ETHNOTECH ACADEMY
Working of Agglomerative Hierarchical Clustering
ETHNOTECH ACADEMY
Working of Agglomerative Hierarchical Clustering
• Step-3: Again, take the two closest clusters and merge them
together to form one cluster. There will be N-2 clusters.
ETHNOTECH ACADEMY
Working of Agglomerative Hierarchical Clustering
• Step-4: Repeat Step 3 until only one cluster left. So, we will
get the following clusters. Consider the below images:
ETHNOTECH ACADEMY
•Step-5: Once all the clusters are combined into
one big cluster, develop the dendrogram to divide
the clusters as per the problem.
ETHNOTECH ACADEMY
Working of Dendrogram in Hierarchical
clustering
• The dendrogram is a tree-like structure that is mainly used to
store each step as a memory that the HC algorithm performs.
In the dendrogram plot, the Y-axis shows the Euclidean
distances between the data points, and the x-axis shows all
the data points of the given dataset.
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
Agglomerative Clustering Contd…
In this flowchart, we assumed a dataset with N elements where N = 6.
Below are the steps involved in the clustering above:
Step 1: Initially, assume each data point is an independent cluster,
i.e. 6 clusters.
Step 2: Into a single cluster, merge the two closest data points. By so
doing, we ended up with 5 clusters.
Step 3: Again, merge the two closest clusters into a single cluster. By
so doing, we ended up with 4 clusters.
Step 4: Repeat step three above until a single cluster of all data
points is obtained.
ETHNOTECH ACADEMY
Agglomerative Clustering Contd…
ETHNOTECH ACADEMY
Implemenation of Agglomerative
clustering
• Agglomerative Clustering: Agglomerative Clustering is one
of the most common hierarchical clustering techniques.
Dataset – Credit Card Dataset.
ETHNOTECH ACADEMY
Step 2: Loading and Cleaning the
data
ETHNOTECH ACADEMY
Step 3: Preprocessing the data
ETHNOTECH ACADEMY
Step 4: Reducing the dimensionality
of the Data
ETHNOTECH ACADEMY
Step 5: Visualizing the working of
the Dendrograms
ETHNOTECH ACADEMY
Step 5 Contd…
ETHNOTECH ACADEMY
Step 6
ETHNOTECH ACADEMY
Step 6 Contd…
ETHNOTECH ACADEMY
KNN Clustering
ETHNOTECH ACADEMY
Consider, we have a new data point and we need to
put it in the required category
ETHNOTECH ACADEMY
Firstly, we will choose the number of neighbors, so we will
choose the k=5.
ETHNOTECH ACADEMY
By calculating the Euclidean distance we got the nearest neighbors,
as three nearest neighbors in category A and two nearest neighbors
in category B
ETHNOTECH ACADEMY
Steps involved in KNN algoritm:
ETHNOTECH ACADEMY
Implementation of KNN algorithm using IRIS
Dataset
ETHNOTECH ACADEMY
Steps involved in KNN algoritm
Contd…
ETHNOTECH ACADEMY
Steps involved in KNN algorithm Contd…
ETHNOTECH ACADEMY
Implementation using Iris Dataset
in Python
• K nearest neighbor (KNN) is a simple and efficient method
for classification problems. Moreover, KNN is a classification
algorithm using a statistical learning method that has been
studied as pattern recognition, data science, and machine
learning approach.[1], [2] Therefore, this technique aims to
assign an unseen point to the dominant class among its k
nearest neighbors within the training set.[3]
ETHNOTECH ACADEMY
Implementation using Iris Dataset
in Python Contd…
• The training data used 50% from the Iris dataset with 75
rows of data and for testing data also used 50% from the
Iris dataset with 75 rows. The dataset has four
measurements that will use for KNN training, such as sepal
length, sepal width, petal length, and petal width.
Furthermore, the species or class attribute will use as a
prediction, in which the data is classed as Iris-setosa, Iris-
versicolor, or Iris-virginica.
ETHNOTECH ACADEMY
Implementation using Iris Dataset
in Python Contd…
Import libraries:
ETHNOTECH ACADEMY
Implementation using Iris Dataset
in Python Contd…
• Start time to seeing the computation time:
• Loading Dataset:
ETHNOTECH ACADEMY
Implementation using Iris Dataset
in Python Contd…
• Make a KNN Class
• Function Initialization
Parameter Description:
k(int): The nearest k instances
ETHNOTECH ACADEMY
Implementation using Iris Dataset
in Python Contd…
• Function for Load Training Data
Parameter Description:
TrainingPath(string): File path of the training dataset
ColoumnName(string): Column name of the given dataset
ETHNOTECH ACADEMY
Implementation using Iris Dataset
in Python Contd…
• Function for Getting Testing Data
Parameter Description:
TestingPath(string): File path of the testing dataset
ColoumnName(string): Column name of the given name
ETHNOTECH ACADEMY
Implementation using Iris Dataset
in Python Contd…
• Function for Prediction the label of each testing
Parameter Description:
TestPoint ( < numpy.ndarray > ): Features data frame of
testing data
ETHNOTECH ACADEMY
Implementation using Iris Dataset
in Python Contd…
• Graphic of Training & Testing Accuracy with k = 1 to 7
ETHNOTECH ACADEMY
RESULT AND DISCUSSION
Explanation of Training and Testing Result
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
SUMMARY
• Fundamentals of Clustering and Cluster Analysis
• Application of Clustering
• Implementation of K means Clusterning
• Implementation of Agglomerative Clustring
• Implementation of KNN algorithm on IRIS Dataset
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
Session 6
Text Analytics
• Text Analytics: Introduction
ETHNOTECH ACADEMY
Text Analytics
ETHNOTECH ACADEMY
Text Analytics Contd…
ETHNOTECH ACADEMY
Text Analytics Contd…
Text communication is one of the most popular forms of
day-to-day conversion. We chat, message, tweet, share
status, email, write blogs, share opinions, and feedback in
our daily routine.
These all activities are generating text in a large amount,
which is unstructured in nature. In the area of the online
marketplace and social media, It is extremely important to
analyze large quantities of data, to understand people’s
opinions.
NLP enables the computer to interact with humans in a
natural manner.
ETHNOTECH ACADEMY
Text Analytics Contd…
ETHNOTECH ACADEMY
NLTK
• Natural Language Toolkit (NLTK) library contains various
utilities that allow you to effectively manipulate and analyze
linguistic data. Among its advanced features are text
classifiers that you can use for many kinds of classification,
including sentiment analysis.
• Sentiment analysis is the practice of using algorithms to
classify various samples of related text into overall positive
and negative categories. With NLTK, you can employ these
algorithms through powerful built-in machine learning
operations to obtain insights from linguistic data.
ETHNOTECH ACADEMY
Installing NLTK Data
• NLTK comes with many corpora, toy grammars, trained
models, etc. A complete list is posted
at: https://www.nltk.org/nltk_data/
ETHNOTECH ACADEMY
Installing NLTK Data Contd…
Step 2: Move the cursor to the Download button & then click
on the latest python version
ETHNOTECH ACADEMY
Installing NLTK Data Contd…
Step 3: Open the downloaded file. Click on the checkbox &
Click on Customize installation.
ETHNOTECH ACADEMY
Installing NLTK Data Contd…
ETHNOTECH ACADEMY
Installing NLTK Data Contd…
ETHNOTECH ACADEMY
Installing NLTK Data Contd…
ETHNOTECH ACADEMY
Installing NLTK Data Contd…
ETHNOTECH ACADEMY
Installing NLTK Data Contd…
ETHNOTECH ACADEMY
Tokenization
ETHNOTECH ACADEMY
Example of sentence tokenization
ETHNOTECH ACADEMY
NLTK Word Tokenize
• You can easily tokenize the sentences and words of the text
with the tokenize module of NLTK.
ETHNOTECH ACADEMY
NLTK Word Tokenize Contd…
ETHNOTECH ACADEMY
Word and Sentence tokenizer
ETHNOTECH ACADEMY
Punctuation-based tokenizer
ETHNOTECH ACADEMY
Tweet tokenizer
ETHNOTECH ACADEMY
MWET tokenizer
ETHNOTECH ACADEMY
TextBlob Word Tokenize
ETHNOTECH ACADEMY
TextBlob Word Tokenize Contd…
• In the code below, we perform word tokenization using
TextBlob library:
ETHNOTECH ACADEMY
Named entity Recognition Contd…
ETHNOTECH ACADEMY
Stemming and Lemmatization
ETHNOTECH ACADEMY
Creating a Stemmer with
PorterStemmer Contd…
ETHNOTECH ACADEMY
Creating a Stemmer with Snowball
Stemmer
• It is also known as the Porter2 stemming algorithm as it tends to fix a few shortcomings in Porter
Stemmer. Let’s see how to use it.
ETHNOTECH ACADEMY
Creating a Stemmer with Snowball
Stemmer Contd…
ETHNOTECH ACADEMY
Creating a Lemmatizer with Python
Spacy
Note: python -m spacy download en_core_web_sm
The above line must be run in order to download the required
file to perform lemmatization.
ETHNOTECH ACADEMY
Output
ETHNOTECH ACADEMY
Creating a Lemmatizer with Python
NLTK Contd…
ETHNOTECH ACADEMY
Output
Apples and orange are similar . Boots and hippo are n't .
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
WordCloud
ETHNOTECH ACADEMY
Preparatory exam link
ETHNOTECH ACADEMY
Program Feedback Link
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
Q&A
ETHNOTECH ACADEMY
SUMMARY
• Fundamentals of Text Analytics
• Usage of Tokenization, Stemming and Lemmatization
• Significance of Wordcloud and its applications in real world
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY
ETHNOTECH ACADEMY