0% found this document useful (0 votes)
42 views21 pages

Unit - V Introduction To Pandas in Python

Uploaded by

Lindsey White
Copyright
© © All Rights Reserved
Available Formats
Download as PPSX, PDF, TXT or read online on Scribd
Download as ppsx, pdf, or txt
0% found this document useful (0 votes)
42 views21 pages

Unit - V Introduction To Pandas in Python

Uploaded by

Lindsey White
Copyright
© © All Rights Reserved
Available Formats
Download as PPSX, PDF, TXT or read online on Scribd
Download as ppsx, pdf, or txt
Download as ppsx, pdf, or txt
You are on page 1/ 21

Atmiya University

Faculty of Science,
Department of Computer Science
& I.T.
Subject Name: 21UFSDE309 Data Science Using Python

By: Dr. Hiren Kavathiya


Introduction to Pandas in Python
Pandas is an open-source library that is made mainly for
working with relational or labeled data both easily and
naturally.
It provides various data structures and operations for
manipulating numerical data and time series.
This library is built on top of the NumPy library.
Pandas is fast and it has high performance & productivity
for users.

Department of Computer Science & I.T.


Introduction to Pandas in Python
History: Pandas were initially developed by Wes
McKinney in 2008 while he was working at AQR Capital
Management.
He convinced the AQR to allow him to open source the
Pandas.
Another AQR employee, Chang She, joined as the
second major contributor to the library in 2012.
Over time many versions of pandas have been released.
The latest version of the pandas is 1.4.1
Department of Computer Science & I.T.
Introduction to Pandas in Python
Advantages
Fast and efficient for manipulating and analyzing data.
Data from different file objects can be loaded.
Easy handling of missing data (represented as NaN) in floating point as
well as non-floating point data
Size mutability: columns can be inserted and deleted from DataFrame and
higher dimensional objects
Data set merging and joining.
Flexible reshaping and pivoting of data sets
Provides time-series functionality.
Powerful group by functionality for performing split-apply-combine
operations on data sets. Department of Computer Science & I.T.
What is Matplotlib?
Matplotlib is a low level graph plotting library in python
that serves as a visualization utility.
Matplotlib was created by John D. Hunter.
Matplotlib is open source and we can use it freely.
Matplotlib is mostly written in python, a few segments
are written in C, Objective-C and Javascript for Platform
compatibility.

Department of Computer Science & I.T.


What is Matplotlib?
Matplotlib is a low level graph plotting library in python
that serves as a visualization utility.
Matplotlib was created by John D. Hunter.
Matplotlib is open source and we can use it freely.
Matplotlib is mostly written in python, a few segments
are written in C, Objective-C and Javascript for Platform
compatibility.

Department of Computer Science & I.T.


What is Matplotlib?
Installation of Matplotlib
If you have Python and PIP already installed on a system,
then installation of Matplotlib is very easy.
Install it using this command:
C:\Users\Your Name>pip install matplotlib

Department of Computer Science & I.T.


What is Matplotlib?
Import Matplotlib
Once Matplotlib is installed, import it in your applications by
adding the import module statement:
import matplotlib

Checking Matplotlib Version


The version string is stored under __version__ attribute.
Example
import matplotlib
print(matplotlib.__version__)
Department of Computer Science & I.T.
What is Data Structure in Pandas?
Pandas is divided into three data structures when it
comes to dimensionality of an array. These data
structures are:
Series
DataFrame
Panel

Department of Computer Science & I.T.


What is Data Structure in Pandas?
Data Structure Dimensions
 Series 1D
DataFrame 2D
Panel 3D
Series and Data Frames are the most widely used data
structures based on the usage and problem solving sets in
data science. If we look at these data structures in terms of
a spreadsheet then Series would be a single column of an
excel sheet, whereas DataFrame will have rows and
columns and be a sheet itself. Department of Computer Science & I.T.
What is a Series in Pandas?
Pandas series is a one dimensional data structure which
can have values of integer, float and string. We use series
when we want to work with a single dimensional array. It
is important to note that series cannot have multiple
columns. It only holds one column just like in an excel
sheet. Series does have an index as an axis label. You can
have your own index labels by customizing the index
values.

Department of Computer Science & I.T.


What is a Series in Pandas?
This is Series

Name
Dhyey
Krishna
Kishan
Radha
Shyam

Department of Computer Science & I.T.


Installing Pandas on Windows
Installing Pandas on Windows
You can install pandas on windows by simply going to
command prompt and type:
pip install pandas

Department of Computer Science & I.T.


What is a Series in Pandas?
Creating a Series in Pandas
Pandas Series can be created in different ways from
MySQL table, through excel worksheet (CSV) or from an
array, dictionary, list etc. Let’s look at how to create a
series. Let’s import Pandas first into the python file or
notebook that you are working in:
import pandas as pd
ps = pd.Series([1,2,3,4,5])
print(ps)
Department of Computer Science & I.T.
What is a Series in Pandas?
Changing the index of Series in Pandas
By default, the index values of your series are numbers
ranging from 0 onwards. You can change the index of the
series by customising the index values inside a list, in
order to achieve that use the index argument to change
values.
ps = pd.Series([1,2,3,4,5], index=['a','b','c','d','e'])
print(ps)

Department of Computer Science & I.T.


What is a Series in Pandas?
Changing the index of Series in Pandas
By default, the index values of your series are numbers
ranging from 0 onwards. You can change the index of the
series by customising the index values inside a list, in
order to achieve that use the index argument to change
values.
ps = pd.Series([1,2,3,4,5], index=['a','b','c','d','e'])
print(ps)

Department of Computer Science & I.T.


What is a Series in Pandas?
Creating a Series from a Dictionary
Let’s learn about creating series from a dictionary, just like creating a
conventional Series in Pandas, a dictionary has all the elements
predefined to suit a Series. If an index is not specified while declaring
the Series, then the keys are considered to be index by default. If an
index is passed then keys are replaced as index labels.
import pandas as pd
 import numpy as np
 dict_pd = {'a' : 1, 'b' : 2, 'c' : 3, 'd': 4, 'e': 5}
series_dict = pd.Series(dict_pd)
 print(series_dict)
Department of Computer Science & I.T.
Data Frames
A Pandas DataFrame is a 2 dimensional data structure, like
a 2 dimensional array, or a table with rows and columns.
import pandas as pd

data = {"calories": [420, 380, 390],"duration": [50, 40, 45]}

#load data into a DataFrame object:


df = pd.DataFrame(data)

print(df)
Department of Computer Science & I.T.
Data Frames
Locate Row
As you can see from the result above, the DataFrame is
like a table with rows and columns.
Pandas use the loc attribute to return one or more
specified row(s)
Example
Return row 0:
#refer to the row index:
print(df.loc[0])
Department of Computer Science & I.T.
Data Frames
Named Indexes
With the index argument, you can name your own indexes.
Example
Add a list of names to give each row a name:
import pandas as pd

data = {"calories": [420, 380, 390],"duration": [50, 40, 45]}

df = pd.DataFrame(data, index = ["day1", "day2", "day3"])

print(df)
Department of Computer Science & I.T.
Data Frames
Load Files Into a DataFrame
If your data sets are stored in a file, Pandas can load them
into a DataFrame.
Example
Load a comma separated file (CSV file) into a DataFrame:
import pandas as pd

df = pd.read_csv('data.csv')

print(df)
Department of Computer Science & I.T.

You might also like