1/17/25, 7:39 PM Masterclass Data Analysis.
ipynb - Colab
1+1
2+3
Importation of Library
import pandas as pd
import numpy as np
df=pd.read_csv("/content/supermarket_sales - Sheet1 (2) (1).csv")
df
g
Invoice Customer Product Unit
Branch City Gender Quantity Tax 5% Total Date Time Payment cogs ma
ID type line price
percen
750-67- Health and
0 A Yangon Member Female 74.69 7 26.1415 548.9715 1/5/2019 13:08 Ewallet 522.83 4.76
8428 beauty
226-31- Electronic
1 C Naypyitaw Normal Female 15.28 5 3.8200 80.2200 3/8/2019 10:29 Cash 76.40 4.76
3081 accessories
631-41- Home and Credit
2 A Yangon Normal Male 46.33 7 16.2155 340.5255 3/3/2019 13:23 324.31 4.76
3108 lifestyle card
123-19- Health and
3 A Yangon Member Male 58.22 8 23.2880 489.0480 1/27/2019 20:33 Ewallet 465.76 4.76
1176 beauty
373-73- Sports and
4 A Yangon Normal Male 86.31 7 30.2085 634.3785 2/8/2019 10:37 Ewallet 604.17 4.76
7910 travel
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
233-67- Health and
995 C Naypyitaw Normal Male 40.35 1 2.0175 42.3675 1/29/2019 13:46 Ewallet 40.35 4.76
5758 beauty
303-96- Home and
996 B Mandalay Normal Female 97.38 10 48.6900 1022.4900 3/2/2019 17:16 Ewallet 973.80 4.76
2227 lifestyle
727-02- Food and
997 A Yangon Member Male 31.84 1 1.5920 33.4320 2/9/2019 13:22 Cash 31.84 4.76
1313 beverages
347-56- Home and
998 A Yangon Normal Male 65.82 1 3.2910 69.1110 2/22/2019 15:33 Cash 65.82 4.76
2442 lifestyle
849-09- Fashion
999 A Yangon Member Female 88.34 7 30.9190 649.2990 2/18/2019 13:28 Cash 618.38 4.76
3807 accessories
1000 rows × 17 columns
Next steps: Generate code with df toggle_off View recommended plots New interactive sheet
[Link]()
[Link] 1/4
1/17/25, 7:39 PM Masterclass Data [Link] - Colab
gros
Invoice Customer Product Unit
Branch City Gender Quantity Tax 5% Total Date Time Payment cogs margi
ID type line price
percentag
750-67- Health and
0 A Yangon Member Female 74.69 7 26.1415 548.9715 1/5/2019 13:08 Ewallet 522.83 4.76190
8428 beauty
226-31- Electronic
1 C Naypyitaw Normal Female 15.28 5 3.8200 80.2200 3/8/2019 10:29 Cash 76.40 4.76190
3081 accessories
631-41- Home and Credit
2 A Yangon Normal Male 46.33 7 16.2155 340.5255 3/3/2019 13:23 324.31 4.76190
3108 lifestyle card
123-19- Health and
3 A Yangon Member Male 58.22 8 23.2880 489.0480 1/27/2019 20:33 Ewallet 465.76 4.76190
1176 beauty
373-73- Sports and
4 A Yangon Normal Male 86.31 7 30.2085 634.3785 2/8/2019 10:37 Ewallet 604.17 4.76190
7910 travel
Next steps: Generate code with df toggle_off View recommended plots New interactive sheet
[Link]()
g
Invoice Customer Product Unit
Branch City Gender Quantity Tax 5% Total Date Time Payment cogs ma
ID type line price
percen
233-67- Health and
995 C Naypyitaw Normal Male 40.35 1 2.0175 42.3675 1/29/2019 13:46 Ewallet 40.35 4.76
5758 beauty
303-96- Home and
996 B Mandalay Normal Female 97.38 10 48.6900 1022.4900 3/2/2019 17:16 Ewallet 973.80 4.76
2227 lifestyle
727-02- Food and
997 A Yangon Member Male 31.84 1 1.5920 33.4320 2/9/2019 13:22 Cash 31.84 4.76
1313 beverages
347-56- Home and
998 A Yangon Normal Male 65.82 1 3.2910 69.1110 2/22/2019 15:33 Cash 65.82 4.76
2442 lifestyle
849-09- Fashion
999 A Yangon Member Female 88.34 7 30.9190 649.2990 2/18/2019 13:28 Cash 618.38 4.76
3807 accessories
[Link]
(1000, 17)
[Link]()
Unit price Quantity Tax 5% Total cogs gross margin percentage gross income Rating
count 1000.000000 1000.000000 1000.000000 1000.000000 1000.00000 1.000000e+03 1000.000000 1000.00000
mean 55.672130 5.510000 15.379369 322.966749 307.58738 4.761905e+00 15.379369 6.97270
std 26.494628 2.923431 11.708825 245.885335 234.17651 6.131498e-14 11.708825 1.71858
min 10.080000 1.000000 0.508500 10.678500 10.17000 4.761905e+00 0.508500 4.00000
25% 32.875000 3.000000 5.924875 124.422375 118.49750 4.761905e+00 5.924875 5.50000
50% 55.230000 5.000000 12.088000 253.848000 241.76000 4.761905e+00 12.088000 7.00000
75% 77.935000 8.000000 22.445250 471.350250 448.90500 4.761905e+00 22.445250 8.50000
max 99 960000 10 000000 49 650000 1042 650000 993 00000 4 761905e+00 49 650000 10 00000
[Link]()
<class '[Link]'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 17 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Invoice ID 1000 non-null object
1 Branch 1000 non-null object
2 City 1000 non-null object
3 Customer type 1000 non-null object
4 Gender 1000 non-null object
[Link] 2/4
1/17/25, 7:39 PM Masterclass Data [Link] - Colab
5 Product line 1000 non-null object
6 Unit price 1000 non-null float64
7 Quantity 1000 non-null int64
8 Tax 5% 1000 non-null float64
9 Total 1000 non-null float64
10 Date 1000 non-null object
11 Time 1000 non-null object
12 Payment 1000 non-null object
13 cogs 1000 non-null float64
14 gross margin percentage 1000 non-null float64
15 gross income 1000 non-null float64
16 Rating 1000 non-null float64
dtypes: float64(7), int64(1), object(9)
memory usage: 132.9+ KB
Exploratory Data Analysis
Which is the Busiest Branch?
df['Branch'].value_counts()
count
Branch
A 340
B 332
C 328
The busiest Branch is Branch A with count 340 followed by Branch B and Then Branch C at 328
df['Gender'].value_counts()
count
Gender
Female 501
Male 499
df['Product line'].unique()
array(['Health and beauty', 'Electronic accessories',
'Home and lifestyle', 'Sports and travel', 'Food and beverages',
'Fashion accessories'], dtype=object)
import seaborn as sns
[Link](data=df,x='Product line')
[Link] 3/4
1/17/25, 7:39 PM Masterclass Data [Link] - Colab
<Axes: xlabel='Product line', ylabel='count'>
print("The number of the Product line is :",df['Product line'].nunique())
The number of the Product line is : 6
Which had the highest gross income
import datetime
df['Date'][0]
df['Date']= pd.to_datetime(df['Date'])
df['Date'][0]
Timestamp('2019-01-05 [Link]')
df['Month']=df['Date'].[Link]
[Link]()
gross
Invoice Customer Product Unit
Branch City Gender Quantity Tax 5% Total Date Time Payment cogs margin
ID type line price
percentage
750-67- Health and 2019-
0 A Yangon Member Female 74.69 7 26.1415 548.9715 13:08 Ewallet 522.83 4.761905 2
8428 beauty 01-05
226-31- Electronic 2019-
1 C Naypyitaw Normal Female 15.28 5 3.8200 80.2200 10:29 Cash 76.40 4.761905
3081 accessories 03-08
631-41- Home and 2019- Credit
2 A Yangon Normal Male 46.33 7 16.2155 340.5255 13:23 324.31 4.761905 1
3108 lifestyle 03-03 card
123-19- Health and 2019-
3 A Yangon Member Male 58.22 8 23.2880 489.0480 20:33 Ewallet 465.76 4.761905 2
1176 beauty 01-27
373-73- Sports and 2019-
4 A Yangon Normal Male 86.31 7 30.2085 634.3785 10:37 Ewallet 604.17 4.761905 3
7910 travel 02-08
Next steps: Generate code with df toggle_off View recommended plots New interactive sheet
[Link]('Month')['gross income'].sum()
gross income
Month
1 5537.708
2 4629.494
3 5212.167
[Link] 4/4