0% found this document useful (0 votes)
2 views13 pages

1.4 Data Analysis with Python- Data Wrangling 2

Uploaded by

nhut.an41004
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
Download as ppt, pdf, or txt
0% found this document useful (0 votes)
2 views13 pages

1.4 Data Analysis with Python- Data Wrangling 2

Uploaded by

nhut.an41004
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1/ 13

Data Wrangling

Objectives
 Describe data normalization
 Demonstrate the use of binning
 Demonstrate the use of categotical variables

Data Wrangling 2
Data Normalization in
Python
 Uniform the features value with different range.
 The used car data set:
 The feature length ranges from 150-250,

 The feature width and height ranges from 50-

100.
 We may want to normalize these variables so that
the range of the values is consistent.
 This normalization can make some statistical
analyses easier down the road.

Data Wrangling 3
Data Normalization in
Python
 Why data normalization is important?

 Student give the answers ?

Data Wrangling 4
Data Normalization in
Python
 Methods of normalizing data
1. Simple feature scaling: divides each value by the maximum
value for that feature. The new values range between
zero and one.

Data Wrangling 5
Data Normalization in
Python
 Methods of normalizing data
2. Min-max: each value X_old subtract it from the minimum
value of that feature, then divides by the range of that
feature. new values range between zero and one.

Data Wrangling 6
Data Normalization in
Python
 Methods of normalizing data
3. Z-score: each value you subtract the mu and then divide by
the standard deviation sigma. The resulting values hover
around zero.

Data Wrangling 7
Binning in Python
 What is Binning data?
 Grouping of values into “bins” you can bin “age” into [0 to
5], [6 to 10], [11 to 15]
 Group a set of numerical values into a smaller number of
bins to have a better understanding of the data
distribution we categorize the price into three bins: low
price, medium price, and high prices.

Data Wrangling 8
variables into
quantitative variables
 Problem
 Most statistical models can not take in the objects/ strings
as input
 How to turn categorical variables into quantitative
variables( numeric, string)
 The fuel type feature as a categorical variable has two
values, gas or diesel, which are in string format

Data Wrangling 9
variables into
quantitative variables
 Categorical  Numeric: “one-hot encoding”
technique
 Add dummy variables for each unique category
 Assign 0 or 1 in each category

Data Wrangling 10
variables into
quantitative variables
 Pandas.get_dummies(): convert categorical
variables to dummy variables( o or 1)

Data Wrangling 11
Summary
 Describe data normalization
 Demonstrate the use of binning
 Demonstrate the use of categotical variables

Data Wrangling 12
Q&A

Data Wrangling 13

You might also like