Pandas python
Pandas python
What is Pandas?
Pandas is a Python library used for working with data sets.
The name "Pandas" has a reference to both "Panel Data", and "Python Data
Analysis" and was created by Wes McKinney in 2008.
Pandas can clean messy data sets, and make them readable and relevant.
Pandas are also able to delete rows that are not relevant, or contains wrong
values, like empty or NULL values. This is called cleaning the
Installation of Pandas
If you have Python and PIP already installed on a system, then installation of
Pandas is very easy.
If this command fails, then use a python distribution that already has Pandas
installed like, Anaconda, Spyder etc.
Import Pandas
Once Pandas is installed, import it in your applications by adding the import keyword:
import pandas
Series
DataFrame
Panel
These data structures are built on top of Numpy array, which means
they are fast.
Data
Dimensi
Structur Description
ons
e
Mutability
All Pandas data structures are value mutable (can be changed) and
except Series all are size mutable. Series is size immutable.
Series
Series is a one-dimensional array like structure with homogeneous
data. For example, the following series is a collection of integers 10,
23, 56, …
10 23 56 17 52 61 73 90 26 72
Key Points
Homogeneous data
Size Immutable
Values of Data Mutable
DataFrame
DataFrame is a two-dimensional array with heterogeneous data. For
example,
Column Type
Name String
Age Integer
Gender String
Rating Float
Key Points
Heterogeneous data
Size Mutable
Data Mutable
Panel
Panel is a three-dimensional data structure with heterogeneous
data. It is hard to represent the panel in graphical representation.
But a panel can be illustrated as a container of DataFrame.
Key Points
Heterogeneous data
Size Mutable
Data Mutable
pandas.Series
Sr.N
o
Parameter & Description
data
1
data takes various forms like ndarray, list, constants
index
2 Index values must be unique and hashable, same length as data.
Default np.arrange(n) if no index is passed.
dtype
3
dtype is for data type. If None, data type will be inferred
copy
4
Copy data. Default False
Array
Dict
Scalar value or constant
Example
#import the pandas library and aliasing as pd
import pandas as pd
s = pd.Series()
print (s)
Its output is as follows −
Series([], dtype: float64)
Example 2
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data,index=[100,101,102,103])
print (s)
Its output is as follows −
100 a
101 b
102 c
103 d
dtype: object
We passed the index values here. Now we can see the customized
indexed values in the output.
Features of DataFrame
Potentially columns are of different types
Size – Mutable
Labeled axes (rows and columns)
Can Perform Arithmetic operations on rows and columns
Structure
pandas.DataFrame
pandas.DataFrame
Sr.N
Parameter & Description
o
data
1 data takes various forms like ndarray, series, map, lists, dict, constants and also
another DataFrame.
index
2 For the row labels, the Index to be used for the resulting frame is Optional
Default np.arange(n) if no index is passed.
columns
3 For column labels, the optional default syntax is - np.arange(n). This is only
true if no index is passed.
dtype
4
Data type of each column.
copy
5 This command (or whatever it is) is used for copying of data, if the default is
False.
Create DataFrame
Lists
dict
Series
Numpy ndarrays
Another DataFrame
Example
#import the pandas library and aliasing as pd
import pandas as pd
df = pd.DataFrame()
print (df)
Its output is as follows −
Empty DataFrame
Columns: []
Index: []
Its output is as follows −
0
0 1
1 2
2 3
3 4
4 5
Example 2
Live Demo
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print df
Its output is as follows −
Name Age
0 Alex 10
1 Bob 12
2 Clarke 13
Example 3
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df =
pd.DataFrame(data,columns=['Name','Age'],dtype=float)
print df
Its output is as follows −
Name Age
0 Alex 10.0
1 Bob 12.0
2 Clarke 13.0
Note − Observe, the dtype parameter changes the type of Age
column to floating point.
Create a DataFrame from Lists
The DataFrame can be created using a single list or a list of lists.
Example 1
Live Demo
import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print df
Its output is as follows −
0
0 1
1 2
2 3
3 4
4 5
Example 2
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print df
Its output is as follows −
Name Age
0 Alex 10
1 Bob 12
2 Clarke 13
Example 3
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df =
pd.DataFrame(data,columns=['Name','Age'],dtype=float)
print df
Its output is as follows −
Name Age
0 Alex 10.0
1 Bob 12.0
2 Clarke 13.0
Note − Observe, the dtype parameter changes the type of Age
column to floating point.
Create a DataFrame from Dict of ndarrays / Lists
All the ndarrays must be of same length. If index is passed, then
the length of the index should equal to the length of the arrays.
If no index is passed, then by default, index will be range(n),
where n is the array length.
Example 1
Live Demo
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':
[28,34,29,42]}
df = pd.DataFrame(data)
print df
Its output is as follows −
Age Name
0 28 Tom
1 34 Jack
2 29 Steve
3 42 Ricky
Note − Observe the values 0,1,2,3. They are the default index
assigned to each using the function range(n).
Example
import pandas as pd
df= pd.read_excel('D:\\rr.xlsx')
print(df)