0% found this document useful (0 votes)
5 views11 pages

Pandas python

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
5 views11 pages

Pandas python

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 11

Pandas is a Python library.

Pandas is used to analyze data.

What is Pandas?
Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

The name "Pandas" has a reference to both "Panel Data", and "Python Data
Analysis" and was created by Wes McKinney in 2008.

Why Use Pandas?


Pandas allows us to analyze big data and make conclusions based on statistical
theories.

Pandas can clean messy data sets, and make them readable and relevant.

Relevant data is very important in data science.

What Can Pandas Do?


Pandas gives you answers about the data. Like:
 Is there a correlation between two or more columns?
 What is average value?
 Max value?
 Min value?

Pandas are also able to delete rows that are not relevant, or contains wrong
values, like empty or NULL values. This is called cleaning the

Key Features of Pandas


 Fast and efficient DataFrame object with default and
customized indexing.
 Tools for loading data into in-memory data objects from
different file formats.
 Data alignment and integrated handling of missing data.
 Reshaping and pivoting of date sets.
 Label-based slicing, indexing and subsetting of large data sets.
 Columns from a data structure can be deleted or inserted.
 Group by data for aggregation and transformations.
 High performance merging and joining of data.
 Time Series functionality.

Installation of Pandas
If you have Python and PIP already installed on a system, then installation of
Pandas is very easy.

Install it using this command:

C:\Users\Your Name>pip install pandas

If this command fails, then use a python distribution that already has Pandas
installed like, Anaconda, Spyder etc.

Import Pandas
Once Pandas is installed, import it in your applications by adding the import keyword:

import pandas

Pandas deals with the following three data structures −

 Series
 DataFrame
 Panel

These data structures are built on top of Numpy array, which means
they are fast.

Dimension & Description


The best way to think of these data structures is that the higher
dimensional data structure is a container of its lower dimensional
data structure. For example, DataFrame is a container of Series,
Panel is a container of DataFrame.

Data
Dimensi
Structur Description
ons
e

Series 1 1D labeled homogeneous array, sizeimmutable.

General 2D labeled, size-mutable tabular structure with


Data Frames 2
potentially heterogeneously typed columns.

Panel 3 General 3D labeled, size-mutable array.


Building and handling two or more dimensional arrays is a tedious
task, burden is placed on the user to consider the orientation of the
data set when writing functions. But using Pandas data structures,
the mental effort of the user is reduced.

For example, with tabular data (DataFrame) it is more semantically


helpful to think of the index (the rows) and the columns rather
than axis 0 and axis 1.

Mutability

All Pandas data structures are value mutable (can be changed) and
except Series all are size mutable. Series is size immutable.

Note − DataFrame is widely used and one of the most important


data structures. Panel is used much less.

Series
Series is a one-dimensional array like structure with homogeneous
data. For example, the following series is a collection of integers 10,
23, 56, …

10 23 56 17 52 61 73 90 26 72

Key Points

 Homogeneous data
 Size Immutable
 Values of Data Mutable

DataFrame
DataFrame is a two-dimensional array with heterogeneous data. For
example,

Name Age Gender Rating

Steve 32 Male 3.45

Lia 28 Female 4.6

Vin 45 Male 3.9

Katie 38 Female 2.78

The table represents the data of a sales team of an organization


with their overall performance rating. The data is represented in
rows and columns. Each column represents an attribute and each
row represents a person.

Data Type of Columns


The data types of the four columns are as follows −

Column Type

Name String

Age Integer

Gender String

Rating Float

Key Points

 Heterogeneous data
 Size Mutable
 Data Mutable

Panel
Panel is a three-dimensional data structure with heterogeneous
data. It is hard to represent the panel in graphical representation.
But a panel can be illustrated as a container of DataFrame.

Key Points

 Heterogeneous data
 Size Mutable
 Data Mutable

pandas.Series

A pandas Series can be created using the following constructor −

pandas.Series( data, index, dtype, copy)


The parameters of the constructor are as follows −

Sr.N
o
Parameter & Description

data
1
data takes various forms like ndarray, list, constants

index
2 Index values must be unique and hashable, same length as data.
Default np.arrange(n) if no index is passed.

dtype
3
dtype is for data type. If None, data type will be inferred

copy
4
Copy data. Default False

A series can be created using various inputs like −

 Array
 Dict
 Scalar value or constant

Create an Empty Series

A basic series, which can be created is an Empty Series.

Example
#import the pandas library and aliasing as pd
import pandas as pd
s = pd.Series()
print (s)
Its output is as follows −
Series([], dtype: float64)

Create a Series from ndarray


If data is an ndarray, then index passed must be of the same length.
If no index is passed, then by default index will
be range(n) where n is array length, i.e.,
[0,1,2,3…. range(len(array))-1].
Example 1
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data)
print (s)
Its output is as follows −
0 a
1 b
2 c
3 d
dtype: object

Example 2
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data,index=[100,101,102,103])
print (s)
Its output is as follows −
100 a
101 b
102 c
103 d
dtype: object

We passed the index values here. Now we can see the customized
indexed values in the output.

Create a Series from dict


A dict can be passed as input and if no index is specified, then the
dictionary keys are taken in a sorted order to construct index.
If index is passed, the values in data corresponding to the labels in
the index will be pulled out.
Example 1
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
print (s)
Its output is as follows −
a 0.0
b 1.0
c 2.0
dtype: float64
Python Pandas - DataFrame

A Data frame is a two-dimensional data structure, i.e., data is


aligned in a tabular fashion in rows and columns.

Features of DataFrame
 Potentially columns are of different types
 Size – Mutable
 Labeled axes (rows and columns)
 Can Perform Arithmetic operations on rows and columns
Structure

Let us assume that we are creating a data frame with student’s


data.

You can think of it as an SQL table or a spreadsheet data


representation.

pandas.DataFrame
pandas.DataFrame

A pandas DataFrame can be created using the following constructor



pandas.DataFrame( data, index, columns, dtype, copy)

The parameters of the constructor are as follows −

Sr.N
Parameter & Description
o

data
1 data takes various forms like ndarray, series, map, lists, dict, constants and also
another DataFrame.

index
2 For the row labels, the Index to be used for the resulting frame is Optional
Default np.arange(n) if no index is passed.

columns
3 For column labels, the optional default syntax is - np.arange(n). This is only
true if no index is passed.

dtype
4
Data type of each column.

copy
5 This command (or whatever it is) is used for copying of data, if the default is
False.
Create DataFrame

A pandas DataFrame can be created using various inputs like −

 Lists
 dict
 Series
 Numpy ndarrays
 Another DataFrame

In the subsequent sections of this chapter, we will see how to create


a DataFrame using these inputs.

Create an Empty DataFrame

A basic DataFrame, which can be created is an Empty Dataframe.

Example
#import the pandas library and aliasing as pd
import pandas as pd
df = pd.DataFrame()
print (df)
Its output is as follows −
Empty DataFrame
Columns: []
Index: []
Its output is as follows −
0
0 1
1 2
2 3
3 4
4 5
Example 2
Live Demo
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print df
Its output is as follows −
Name Age
0 Alex 10
1 Bob 12
2 Clarke 13
Example 3
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df =
pd.DataFrame(data,columns=['Name','Age'],dtype=float)
print df
Its output is as follows −
Name Age
0 Alex 10.0
1 Bob 12.0
2 Clarke 13.0
Note − Observe, the dtype parameter changes the type of Age
column to floating point.
Create a DataFrame from Lists
The DataFrame can be created using a single list or a list of lists.

Example 1
Live Demo
import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print df
Its output is as follows −
0
0 1
1 2
2 3
3 4
4 5

Example 2
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print df
Its output is as follows −
Name Age
0 Alex 10
1 Bob 12
2 Clarke 13
Example 3
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df =
pd.DataFrame(data,columns=['Name','Age'],dtype=float)
print df
Its output is as follows −
Name Age
0 Alex 10.0
1 Bob 12.0
2 Clarke 13.0
Note − Observe, the dtype parameter changes the type of Age
column to floating point.
Create a DataFrame from Dict of ndarrays / Lists
All the ndarrays must be of same length. If index is passed, then
the length of the index should equal to the length of the arrays.
If no index is passed, then by default, index will be range(n),
where n is the array length.
Example 1
Live Demo
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':
[28,34,29,42]}
df = pd.DataFrame(data)
print df
Its output is as follows −
Age Name
0 28 Tom
1 34 Jack
2 29 Steve
3 42 Ricky
Note − Observe the values 0,1,2,3. They are the default index
assigned to each using the function range(n).

Pandas DataFrame to Excel


You can save or write a DataFrame to an Excel File or a specific
Sheet in the Excel file
using pandas.DataFrame.to_excel() method of DataFrame class.
In this tutorial, we shall learn how to write a Pandas DataFrame to
an Excel File, with the help of well detailed example Python
programs.
Prerequisite
The prerequisite to work with Excel file functions in pandas is that,
you have to install openpyxl module. To install openpyxl using pip,
run the following pip command.
pip install openpyxl

Example

import pandas as pd
df= pd.read_excel('D:\\rr.xlsx')
print(df)

You might also like