Customer Segmentation in Python Chapter3
Customer Segmentation in Python Chapter3
Karolis Urbonas
Head of Data Science, Amazon
DataCamp Customer Segmentation in Python
Skewed variables
Left-skewed
Right-skewed
DataCamp Customer Segmentation in Python
Skewed variables
Skew removed with logarithmic
transformation
DataCamp Customer Segmentation in Python
Karolis Urbonas
Head of Data Science, Amazon
DataCamp Customer Segmentation in Python
Identifying skewness
Visual analysis of the distribution
If it has a tail - it's skewed
DataCamp Customer Segmentation in Python
sns.distplot(datamart['Recency'])
plt.show()
DataCamp Customer Segmentation in Python
sns.distplot(frequency_log)
plt.show()
DataCamp Customer Segmentation in Python
Karolis Urbonas
Head of Data Science, Amazon
DataCamp Customer Segmentation in Python
Identifying an issue
datamart_rfm.describe()
Analyze key statistics of the dataset
Compare mean and standard
deviation
DataCamp Customer Segmentation in Python
Sequence of structuring
pre-processing steps
Karolis Urbonas
Head of Data Science, Amazon
DataCamp Customer Segmentation in Python
Sequence
1. Unskew the data - log transformation
2. Standardize to the same average values
3. Scale to the same standard deviation
4. Store as a separate array to be used for clustering
DataCamp Customer Segmentation in Python
import numpy as np
datamart_log = np.log(datamart_rfm)
datamart_normalized = scaler.transform(datamart_log)
DataCamp Customer Segmentation in Python