1.4 Data Analysis with Python- Data Wrangling 2
1.4 Data Analysis with Python- Data Wrangling 2
Objectives
Describe data normalization
Demonstrate the use of binning
Demonstrate the use of categotical variables
Data Wrangling 2
Data Normalization in
Python
Uniform the features value with different range.
The used car data set:
The feature length ranges from 150-250,
100.
We may want to normalize these variables so that
the range of the values is consistent.
This normalization can make some statistical
analyses easier down the road.
Data Wrangling 3
Data Normalization in
Python
Why data normalization is important?
Data Wrangling 4
Data Normalization in
Python
Methods of normalizing data
1. Simple feature scaling: divides each value by the maximum
value for that feature. The new values range between
zero and one.
Data Wrangling 5
Data Normalization in
Python
Methods of normalizing data
2. Min-max: each value X_old subtract it from the minimum
value of that feature, then divides by the range of that
feature. new values range between zero and one.
Data Wrangling 6
Data Normalization in
Python
Methods of normalizing data
3. Z-score: each value you subtract the mu and then divide by
the standard deviation sigma. The resulting values hover
around zero.
Data Wrangling 7
Binning in Python
What is Binning data?
Grouping of values into “bins” you can bin “age” into [0 to
5], [6 to 10], [11 to 15]
Group a set of numerical values into a smaller number of
bins to have a better understanding of the data
distribution we categorize the price into three bins: low
price, medium price, and high prices.
Data Wrangling 8
variables into
quantitative variables
Problem
Most statistical models can not take in the objects/ strings
as input
How to turn categorical variables into quantitative
variables( numeric, string)
The fuel type feature as a categorical variable has two
values, gas or diesel, which are in string format
Data Wrangling 9
variables into
quantitative variables
Categorical Numeric: “one-hot encoding”
technique
Add dummy variables for each unique category
Assign 0 or 1 in each category
Data Wrangling 10
variables into
quantitative variables
Pandas.get_dummies(): convert categorical
variables to dummy variables( o or 1)
Data Wrangling 11
Summary
Describe data normalization
Demonstrate the use of binning
Demonstrate the use of categotical variables
Data Wrangling 12
Q&A
Data Wrangling 13