Python Data Frame New

Python Data Frame
PREPARED BY
R.AKILA.AP(SG)/CSE
BSACIST
Pandas
At the very basic level, Pandas objects can be thought

of as enhanced versions of NumPy structured arrays.
The rows and columns are identified with labels
rather than simple integer indices.
Pandas provides a host of useful tools, methods, and
functionality on top of these data structures.
Three fundamental Pandas data structures:
Series, DataFrame, and Index.
Pandas Series
A pandas Series is a one-dimensional array of
indexed data. It can be created from a list or array
The series has both a sequence of values and a
sequence of indices, which we can access with
the values and index attributes. The values are
simply a familiar NumPy array:
The essential difference is the presence of the index:
while the Numpy Array has an implicitly
defined integer index used to access the values, the
Pandas Series has an explicitly defined index
associated with the values.
This explicit index definition gives the Series object
additional capabilities. For example, the index need
not be an integer, but can consist of values of any
desired type. For example, if we wish, we can use
strings as an index:
Series as Specialized Dictionary
A dictionary is a structure which maps arbitrary keys

to a set of arbitrary values, and a series is a structure
which which maps typed keys to a set
of typed values.
This typing is important: just as the type-specific
compiled code behind a NumPy array makes it more
efficient than a Python list for certain operations, the
type information of a Pandas Series makes it much
more efficient than Python dictionaries for certain
operations.
Pandas DataFrame
Pandas DataFrame is two-dimensional size-

mutable, potentially heterogeneous tabular data
structure with labeled axes (rows and columns).
A Data frame is a two-dimensional data structure,
i.e., data is aligned in a tabular fashion in rows and
columns.
Pandas DataFrame consists of three principal
components, the data, rows, and columns.

Basic operation on Pandas DataFrame
Creating a DataFrame
Dealing with Rows and Columns
Indexing and Selecting Data
Working with Missing Data
Iterating over rows and columns
Contd..
In the real world, a Pandas DataFrame will be

created by loading the datasets from existing storage,
storage can be SQL Database, CSV file, and Excel file.
 Pandas DataFrame can be created from the lists,
dictionary, and from a list of dictionary etc.
Creating a dataframe using List

Creating DataFrame from dict of ndarray/lists
To create DataFrame from dict of narray/list, all the

narray must be of same length.
If index is passed then the length index should be
equal to the length of arrays.
 If no index is passed, then by default, index will be
range(n) where n is the array length.
Dealing with Rows and Columns
A Data frame is a two-dimensional data structure,

i.e., data is aligned in a tabular fashion in rows and
columns.
We can perform basic operations on rows/columns
like selecting, deleting, adding, and renaming.
Column Selection: In Order to select a column in
Pandas DataFrame, we can either access the columns
by calling them by their columns name.
import pandas as pd
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],

'Height': [5.1, 6.2, 5.1, 5.2],
'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}
# Convert the dictionary into DataFrame

df = pd.DataFrame(data)
# Declare a list that is to be converted into a column

address = ['Delhi', 'Bangalore', 'Chennai', 'Patna']
# Using 'Address' as the column name

# and equating it to the list
df['Address'] = address
# Observe the result

print(df)
After adding new column
Dataframe
Row Selection: Pandas provide a unique method
to retrieve rows from a Data frame.
DataFrame.loc[] method is used to retrieve rows
from Pandas DataFrame.
Rows can also be selected by passing integer location
to an iloc[] function.
Dealing with rows
# importing pandas package

import pandas as pd
# making data frame from csv file

data = pd.read_csv("nba.csv", index_col ="Name")
# retrieving row by loc method

first = data.loc["Avery Bradley"]
second = data.loc["R.J. Hunter"]
print(first, "\n\n\n", second)

Selecting a single row
Indexing a DataFrame using .iloc[ ] :

This function allows us to retrieve rows and columns
by position.
In order to do that, we’ll need to specify the positions
of the rows that we want, and the positions of the
columns that we want as well.
The df.iloc indexer is very similar to df.loc but only
uses integer locations to make its selections.
Working with Missing Data
Missing Data can occur when no information is provided for

one or more items or for a whole unit.
 Missing Data is a very big problem in real life scenario.
Missing Data can also refer to as NA(Not Available) values in
pandas.
Checking for missing values
using isnull() and notnull() :
In order to check missing values in Pandas DataFrame, we use
a function isnull() and notnull().
Both function help in checking whether a value is NaN or not.
These function can also be used in Pandas Series in order to
find null values in a series.
Filling missing values
Filling missing values

using fillna(), replace() and interpolate() :
In order to fill null values in a datasets, we
use fillna(), replace() and interpolate() function these
function replace NaN values with some value of their own.
All these function help in filling a null values in datasets of
a DataFrame.
Interpolate() function is basically used to fill NA values in
the dataframe but it uses various interpolation technique to
fill the missing values rather than hard-coding the value.
Dropping missing values
Dropping missing values using dropna() :

In order to drop a null values from a dataframe, we
used dropna() function this fuction drop
Rows/Columns of datasets with Null values in
different ways.
Now we drop rows with at least one Nan
value (Null value)

Python Data Frame New

Uploaded by

Python Data Frame New

Uploaded by

Python Data Frame

At the very basic level, Pandas objects can be thought

A dictionary is a structure which maps arbitrary keys

Pandas DataFrame is two-dimensional size-

In the real world, a Pandas DataFrame will be

To create DataFrame from dict of narray/list, all the

A Data frame is a two-dimensional data structure,

data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],

# Convert the dictionary into DataFrame

# Declare a list that is to be converted into a column

# Using 'Address' as the column name

# Observe the result

# importing pandas package

# making data frame from csv file

# retrieving row by loc method

print(first, "\n\n\n", second)

Indexing a DataFrame using .iloc[ ] :

Missing Data can occur when no information is provided for

Filling missing values

Dropping missing values using dropna() :

You might also like