0% found this document useful (0 votes)

14 views14 pages

Ex1_Plotting and Visualization using Numpy and Pandas

Uploaded by

prathi1443

Available Formats

Download as PDF, TXT or read online on Scribd

Download as pdf or txt

0% found this document useful (0 votes)

14 views14 pages

Ex1_Plotting and Visualization using Numpy and Pandas

Uploaded by

prathi1443

Available Formats

Download as PDF, TXT or read online on Scribd

Download as pdf or txt

You are on page 1/ 14

PART I – NumPy and Pandas Data Structure

Introduction

Data visualization is the most important step in the life cycle of data science, data analytics,
or we can say in data engineering. It is more impressive, interesting and understanding when
we represent our study or analysis with the help of colors and graphics. Using visualization
elements like graphs, charts, maps, etc., it becomes easier for clients to understand the
underlying structure, trends, patterns and relationships among variables within the dataset.
Simply explaining the data summary and analysis using plain numbers becomes complicated
for both people coming from technical and non-technical backgrounds. Data visualization
gives us a clear idea of what the data wants to convey to us. It makes data neutral for us to
understand the data insights.
Data visualization involves operating a huge amount of data and converts it into meaningful
and knowledgeable visuals using various tools. For visualizing data we need the best
software tools to handle various types of data in structured or unstructured format from
different sources such as files, web API, databases, and many more. We must choose the
best visualization tool that fulfills all our requirements. The tool should support interactive
plots generation, connectivity to data sources, combining data sources, automatically
refresh the data, secure access to data sources, and exporting widgets. All these features
allow us to make the best visuals of our data and also save time.
Data Visualization with Pandas:
Pandas library in python is mainly used for data analysis. It is not a data visualization library
but, we can create basic plots using Pandas. Pandas is highly useful and practical if we want
to create exploratory data analysis plots. We do not need to import other data visualization
libraries in addition to Pandas for such tasks.
As Pandas is Python’s popular data analysis library, it provides several different functions to
visualize our data with the help of the .plot() function. The one more advantage of using
Pandas for visualization is we can serialize or create a pipeline of data analysis functions and
plotting functions. It simplifies the task.
Pandas is an essential data analysis toolkit for Python. It is a Python package providing fast,
flexible, and expressive data structures designed to make working with relational or labeled
data. It aims to be the fundamental high-level building block for doing practical, real-world
data analysis in python.
The Pandas plot() Method
Pandas comes with a couple of plotting functionalities applicable on DataFrame- or series
objects that use the Matplotlib library under the hood, which means any plot created by the
Pandas library is a Matplotlib object.
Technically, the Pandas plot() method provides a set of plot styles through the kind keyword
argument to create decent-looking plots. The default value of the kind argument is the line
string value. However, there are eleven different string values that can be assigned to the
kind argument, which determines what kind of plot we'll create.
The .plot is also an attribute of Pandas DataFrame and series objects, providing a small
subset of plots available with Matplotlib. In fact, Pandas makes plotting as simple as just
writing a single line of code by automating much of the data visualization procedure for us.

Data Visualization with NumPy:

Numpy is a library for scientific computing in Python and also a basis for pandas. It provides
a high-performance multidimensional array object and tools for working with these arrays. A
numpy array is similar to the list. It is usually fixed in size and each element is of the same
type. we can cast a list to a numpy array by first importing it. Numpy arrays contain data of
the same type, use attribute “dtype” to obtain the data type of the array’s elements.
Data Visualization with Matplotlib:
Matplotlib is one of the most widely used, if not the most popular data visualization library
in Python. It produces quality figures in a variety of hard copy formats and interactive
environments across platforms. Matplotlib can be used in Python scripts, IPython shell,
jupyter notebook, web application servers, and for GUI toolkits.
EX. NO.1 PLOTTING AND VISUALIZATION USING NUMPY AND PANDAS DATA
STRUCTURE

AIM:

To analyze, plot and visualize the given dataset using Numpy and Pandas data structure.

DESCRIPTION:

Data visualization is a powerful way to capture trends and share the insights gained
from data. It is one of the important steps of data analysis. There are plenty of data
visualization tools on the shelf with a lot of outstanding features. In this exercise, we're
going to learn plotting and visualization with the Pandas, Numpy and Matplotlib
packages. Numpy and Pandas are Python’s most important libraries used for data
preprocessing and data cleaning. You can also use the methods in Numpy and Pandas to
draw the plots. These methods allow one to visualize arrays, Series and DataFrames
more easily.

SOFTWARE TOOLS REQUIRED:

Software Required : Google Colaboratory

Operating System : WINDOWS XP / 10 / 11

Computers Minimum Requirement: Intel i3 or Intel i5 with 4GB RAM and 40

Required : GB hard disk

ALGORITHM:

1. Open a new notebook in google Colaboratory.

2. Import the necessary packages.

3. Load the weekly closing price of the Facebook, Microsoft, and Apple stocks over the
last previous months as a CSV file and read it using the read_csv function.

4. Plot and visualize the given dataset “iris_data” with the following plots scatter, bar,
line, histogram, area, box, hexagonal bin, pie, density and scatter matrix plot

5. Use Pandas plot() method to visualize Series and DataFrames.

PROCEDURE:

1. The plot method is an amazing method that helps one to draw plots more easily.
Import the necessary libraries.
2. Import the necessary libraries and the dataset required for visualization and then
display the content of the DataFrame on the output. The %matplotlib inline
magic command is also added to the code to ensure the plotted figures appear in
the notebook cells correctly:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
dataset_url = ('https://raw.githubusercontent.com/m-
mehdi/pandas_tutorials/main/weekly_stocks.csv')
df = pd.read_csv(dataset_url, parse_dates=['Date'], index_col='Date')
pd.set_option('display.max.columns', None)
print(df.head())
Output:

MSFT FB AAPL
Date
2021-05-24 249.679993 328.730011 124.610001
2021-05-31 250.789993 330.350006 125.889999
2021-06-07 257.890015 331.260010 127.349998
2021-06-14 259.429993 329.660004 130.460007
2021-06-21 265.019989 341.369995 133.110001

To explore and visualize the data with Pandas.

Line Plot
3. The default plot is the line plot that plots the index on the x-axis and the other numeric
columns in the DataFrame on the y-axis. Plot a line plot and see how Microsoft performed over
the previous 12 months:
df.plot(y='MSFT', figsize=(9,6))
NOTE
The figsize argument takes two arguments, width and height in inches, and
allows us to change the size of the output figure. The default values of the width
and height are 6.4 and 4.8, respectively.

4. We can plot multiple lines from the data by providing a list of column names and
assigning it to the y-axis. For example, let's see how the three companies performed over the
previous year:
df.plot.line(y=['FB', 'AAPL', 'MSFT'], figsize=(10,6))

5. Use the other parameters provided by the plot() method to add more details to a plot,
like this:

df.plot(y='FB', figsize=(10,6), title='Facebook Stock', ylabel='USD')

In the figure, the title argument adds a title to the plot, and the ylabel sets a
label for the y-axis of the plot. The plot's legend displays by default, however,
the legend argument may be set to false to hide the legend.
Bar Plot
6. A bar chart is a basic visualization for comparing values between data groups and
representing categorical data with rectangular bars. This plot may include the count of a
specific category or any defined value, and the lengths of the bars correspond to the values
they represent.
In the following example, create a bar chart based on the average monthly stock
price to compare the average stock price of each company to others in a
particular month. To do so, first, we need to resample data by month-end and
then use the mean() method to calculate the average stock price in each month.
We also select the last three months of data, like this:

df_3Months = df.resample(rule='M').mean()[-3:]

print(df_3Months)

MSFT FB AAPL
Date
2022-03-31 298.400002 212.692505 166.934998
2022-04-30 282.087494 204.272499 163.704994
2022-05-31 262.803335 198.643331 147.326665

7. Create a bar chart based on the aggregated data by assigning the bar string value to the
kind argument:
df_3Months.plot(kind='bar', figsize=(10,6), ylabel='Price')

8. Create horizontal bar charts by assigning the barh string value to the kind argument.

df_3Months.plot(kind='barh', figsize=(9,6))

9. One can also plot the data on the stacked vertical or horizontal bar charts, which
represent different groups on top of each other. The height of the resulting bar shows the
combined result of the groups. To create a stacked bar chart we need to assign True to the
stacked argument, like this:
df_3Months.plot(kind='bar', stacked=True, figsize=(9,6))

Histogram
10. A histogram is a type of bar chart that represents the distribution of numerical data
where the x-axis represents the bin ranges while the y-axis represents the data frequency
within a certain interval. The bins argument specifies the number of bin intervals, and the alpha
argument specifies the degree of transparency.

df[['MSFT', 'FB']].plot(kind='hist', bins=25, alpha=0.6, figsize=(9,6))

11. A histogram can also be stacked.

df[['MSFT', 'FB']].plot(kind='hist', bins=25, alpha=0.6, stacked=True, figsize=(9,6))

Box Plot

12. A box plot consists of three quartiles and two whiskers that summarize the data in a set
of indicators: minimum, first quartile, median, third quartile, and maximum values. A box plot
conveys useful information, such as the interquartile range (IQR), the median, and the outliers
of each data group.

df.plot(kind='box', figsize=(9,6))
13. Create horizontal box plots, like horizontal bar charts, by assigning False to the vert
argument as shown below:

df.plot(kind='box', vert=False, figsize=(9,6))

Area Plot

14. An area plot is an extension of a line chart that fills the region between the line chart
and the x-axis with a color. If more than one area chart displays in the same plot, different
colors distinguish different area charts.
df.plot(kind='area', figsize=(9,6))
15. The Pandas plot() method creates a stacked area plot by default. It's a common task to
unstack the area chart by assigning False to the stacked argument:

df.plot(kind='area', stacked=False, figsize=(9,6))

Pie Plot
16. A pie plot is a great proportional representation of numerical data in a column. The
following example shows the average Apple stock price distribution over the previous three
months:
df_3Months.index=['March', 'April', 'May']
df_3Months.plot(kind='pie', y='AAPL', legend=False, autopct='%.f')
17. A legend will display on pie plots by default, so assign False to the legend keyword to
hide the legend. The new keyword argument in the code above is autopct, which shows the
percent value on the pie chart slices.
If we want to represent the data of all the columns in multiple pie charts as
subplots, assign True to the subplots argument as given below:

df_3Months.plot(kind='pie', legend=False, autopct='%.f', subplots=True,

figsize=(14,8))
array([, ,
], dtype=object)

Scatter Plot
18. Scatter plots- plot data points on the x and y axes to show the correlation between two
variables. The below scatter plot shows the relationship between Microsoft and Apple stock
prices.
df.plot(kind='scatter', x='MSFT', y='AAPL', figsize=(9,6), color='Green')

Hexbin Plot
19. When the data is very dense, a hexagon bin plot, also known as a hexbin plot, can be an
alternative to a scatter plot. In other words, when the number of data points is enormous, and
each data point can't be plotted separately, it's better to use this kind of plot that represents
data in the form of a honeycomb. Also, the color of each hexbin defines the density of data
points in that range.
The gridsize argument specifies the number of hexagons in the x-direction. A
larger grid size means more and smaller bins. The default value of the gridsize
argument is 100.
df.plot(kind='hexbin', x='MSFT', y='AAPL', gridsize=10, figsize=(10,6))

KDE Plot
20. The plot Kernel Density Estimate, also known as KDE, visualizes the probability density
of a continuous and non-parametric data variable. This plot uses Gaussian kernels to estimate
the probability density function (PDF) internally.
df.plot(kind='kde')

21. Also specify the bandwidth that affects the plot smoothness in the KDE plot, like this:

df.plot(kind='kde', bw_method=0.1)
22. As shown in the plot below, selecting a small bandwidth leads to under-smoothing,
which means the density plot appears as a combination of individual peaks. On the contrary, a
huge bandwidth leads to over-smoothing, which means the density plot appears as a unimodal
distribution.

df.plot(kind='kde', bw_method=1)

Conclusion

Thus the Plotting and visualization using Numpy and Pandas data structure was
executed and the output was verified.

Solutions To Problems, Capitulo 2 Levenspiel
80% (5)
Solutions To Problems, Capitulo 2 Levenspiel
6 pages
Data Visualization With Matplotlib
No ratings yet
Data Visualization With Matplotlib
20 pages
21CS644 Module 4
No ratings yet
21CS644 Module 4
24 pages
Class 1 Data Visualization in Python using matplotlib
No ratings yet
Class 1 Data Visualization in Python using matplotlib
13 pages
MCA_S3_ Data Visualisation_U5
No ratings yet
MCA_S3_ Data Visualisation_U5
19 pages
Pierian Data - Python For Finance & Algorithmic Trading Course Notes
No ratings yet
Pierian Data - Python For Finance & Algorithmic Trading Course Notes
11 pages
Data Visualization Python Tutorial
No ratings yet
Data Visualization Python Tutorial
9 pages
Data Manipulation and Visualization
No ratings yet
Data Manipulation and Visualization
21 pages
Data Visualisation using Python Matplotlib codes for class 12th ip
No ratings yet
Data Visualisation using Python Matplotlib codes for class 12th ip
13 pages
Essential Python Data Visualization Libraries 1687141550
No ratings yet
Essential Python Data Visualization Libraries 1687141550
16 pages
Session 13, Data Visualization
No ratings yet
Session 13, Data Visualization
13 pages
Chapter 4 Data Visualizations
No ratings yet
Chapter 4 Data Visualizations
24 pages
01-Matplotlib
No ratings yet
01-Matplotlib
2 pages
Jmis 26 4 167
No ratings yet
Jmis 26 4 167
9 pages
DMV Unit-4-1.pdf
No ratings yet
DMV Unit-4-1.pdf
10 pages
1 - Introduction - Data Visualization
No ratings yet
1 - Introduction - Data Visualization
3 pages
Unit Iv Notes Class 12
No ratings yet
Unit Iv Notes Class 12
22 pages
7 Visualizing Financial Time Series
No ratings yet
7 Visualizing Financial Time Series
26 pages
2,3. Introduction Pandas & Matplotlib - Copy
No ratings yet
2,3. Introduction Pandas & Matplotlib - Copy
32 pages
Programming 2 Lectures
No ratings yet
Programming 2 Lectures
41 pages
Unit 4 python
No ratings yet
Unit 4 python
12 pages
Data Visualization
No ratings yet
Data Visualization
31 pages
DVP First Module
No ratings yet
DVP First Module
88 pages
Mohit
No ratings yet
Mohit
19 pages
Unit 1 - Chap 2 - Data Visualisation
No ratings yet
Unit 1 - Chap 2 - Data Visualisation
29 pages
Datascienece
No ratings yet
Datascienece
18 pages
Unit 3 (Python)
No ratings yet
Unit 3 (Python)
29 pages
unit-3(FODS)
No ratings yet
unit-3(FODS)
34 pages
Python Plots
No ratings yet
Python Plots
47 pages
BDA File
No ratings yet
BDA File
26 pages
15octmatplotlib 2024
No ratings yet
15octmatplotlib 2024
4 pages
unit 5
No ratings yet
unit 5
28 pages
Unit II lecturer notes
No ratings yet
Unit II lecturer notes
28 pages
Matplotlib
No ratings yet
Matplotlib
30 pages
lec19
No ratings yet
lec19
14 pages
Data Visualization and Data Handling Using Pandas CLASS 12 - Aashi Nagiya
No ratings yet
Data Visualization and Data Handling Using Pandas CLASS 12 - Aashi Nagiya
19 pages
pandas (1)
No ratings yet
pandas (1)
25 pages
Financial Analytics With Python
100% (1)
Financial Analytics With Python
40 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
39 pages
Ccs346 Eda Unit 1
No ratings yet
Ccs346 Eda Unit 1
139 pages
3 Python
No ratings yet
3 Python
16 pages
Machine Learning Experiment
No ratings yet
Machine Learning Experiment
69 pages
DAV EXP 1 t12 31
No ratings yet
DAV EXP 1 t12 31
39 pages
Data Science With Python - Lesson 10 - Data Visualization in Python With Matplotlib - Raw
No ratings yet
Data Science With Python - Lesson 10 - Data Visualization in Python With Matplotlib - Raw
71 pages
AIML%20Short%20Term%20Internship%20Session%209%20Summary-1719044709410
No ratings yet
AIML%20Short%20Term%20Internship%20Session%209%20Summary-1719044709410
14 pages
Python Exploratory Data Analysis
No ratings yet
Python Exploratory Data Analysis
24 pages
Data Visualization using Matplotlib in Python
No ratings yet
Data Visualization using Matplotlib in Python
15 pages
Data Visualization
No ratings yet
Data Visualization
25 pages
Pandas For Machine Learning: Acadview
No ratings yet
Pandas For Machine Learning: Acadview
18 pages
matplotlib
No ratings yet
matplotlib
5 pages
Data Visualization Using Matplotlib and Seaborn
No ratings yet
Data Visualization Using Matplotlib and Seaborn
28 pages
Data Visualization
No ratings yet
Data Visualization
28 pages
CHAPTER-2 Data Visualization
No ratings yet
CHAPTER-2 Data Visualization
4 pages
Data Visualization With Python Libraries
No ratings yet
Data Visualization With Python Libraries
13 pages
Unit 1 Pandas - Charts
No ratings yet
Unit 1 Pandas - Charts
18 pages
Python Pandas Tutorial
No ratings yet
Python Pandas Tutorial
45 pages
Matplotlib Notes
No ratings yet
Matplotlib Notes
5 pages
UNIT5
No ratings yet
UNIT5
18 pages
Summary: Introduction To Data Visualization Tools
No ratings yet
Summary: Introduction To Data Visualization Tools
13 pages
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
TIBCO Spotfire – A Comprehensive Primer
From Everand
TIBCO Spotfire – A Comprehensive Primer
Michael Phillips
No ratings yet
Abb Make Protection Coupler Type Nsd50
No ratings yet
Abb Make Protection Coupler Type Nsd50
11 pages
Answer Key: Make Sure To Write Your Answers Here. They Will Not Be Graded If They Are Anywhere Else
No ratings yet
Answer Key: Make Sure To Write Your Answers Here. They Will Not Be Graded If They Are Anywhere Else
1 page
PHILIPS Chassis QM14.3E LA PDF
No ratings yet
PHILIPS Chassis QM14.3E LA PDF
98 pages
Customizing Settings For Move in
No ratings yet
Customizing Settings For Move in
3 pages
EN_10296-1=2003
No ratings yet
EN_10296-1=2003
30 pages
Baghauli Balance-R
No ratings yet
Baghauli Balance-R
45 pages
BCHEM 254 Metabolism of Nutrients II-Lecture 1 20180121-1
No ratings yet
BCHEM 254 Metabolism of Nutrients II-Lecture 1 20180121-1
140 pages
Structural Analysis 1 (Intro)
No ratings yet
Structural Analysis 1 (Intro)
8 pages
Anchor Tower
No ratings yet
Anchor Tower
4 pages
Manual DeviceNet Fraba Posital
No ratings yet
Manual DeviceNet Fraba Posital
44 pages
Intel & AMD Processor Hierarchy
No ratings yet
Intel & AMD Processor Hierarchy
3 pages
Conveyor Assembly
No ratings yet
Conveyor Assembly
1 page
PPDS Presentation 2
No ratings yet
PPDS Presentation 2
30 pages
M.6 Inverse Matrix - Notebook September 21, 2016
No ratings yet
M.6 Inverse Matrix - Notebook September 21, 2016
13 pages
VSC-30 Relief Valve
No ratings yet
VSC-30 Relief Valve
4 pages
Attendance
No ratings yet
Attendance
75 pages
Computer-14df - XX - 144a: Fast Power Factor Regulator
No ratings yet
Computer-14df - XX - 144a: Fast Power Factor Regulator
29 pages
FF Ecu
No ratings yet
FF Ecu
9 pages
UNIT 4 Microwave Tubes
No ratings yet
UNIT 4 Microwave Tubes
39 pages
Exp 3 - Carbon Residue Test
92% (12)
Exp 3 - Carbon Residue Test
15 pages
Dat5U Pick: Chassis ODY
No ratings yet
Dat5U Pick: Chassis ODY
261 pages
ASA Hydraulik Catalog PDF
No ratings yet
ASA Hydraulik Catalog PDF
60 pages
Automated Full Waveform Detection and Location Algorithm of - 2017 - Procedia E
No ratings yet
Automated Full Waveform Detection and Location Algorithm of - 2017 - Procedia E
6 pages
Edf Algorithm
No ratings yet
Edf Algorithm
1 page
Contoh REPORT Open Ended CONTROL
No ratings yet
Contoh REPORT Open Ended CONTROL
32 pages
Theory
No ratings yet
Theory
20 pages
Emulation of Automotive Communication Protocol Single Edge Nibble Transmission (SENT) Using Aurix Family of Microcontrollers
No ratings yet
Emulation of Automotive Communication Protocol Single Edge Nibble Transmission (SENT) Using Aurix Family of Microcontrollers
4 pages
(ABU) Reinforced Concrete I - L-3 Plastic Design PDF
No ratings yet
(ABU) Reinforced Concrete I - L-3 Plastic Design PDF
83 pages
STA513-11 Analisis Regresi Berganda
No ratings yet
STA513-11 Analisis Regresi Berganda
47 pages