0% found this document useful (0 votes)
31 views32 pages

Python Data Frame New

Pandas DataFrame is a two-dimensional data structure that allows labeling of rows and columns for analysis of tabular data. It consists of series objects containing data, rows, and columns. Basic operations on a DataFrame include creating, selecting, adding, and deleting rows and columns. Missing data is represented by NaN values and can be checked, filled, or dropped from the DataFrame.

Uploaded by

Ben Ten
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
Download as pptx, pdf, or txt
0% found this document useful (0 votes)
31 views32 pages

Python Data Frame New

Pandas DataFrame is a two-dimensional data structure that allows labeling of rows and columns for analysis of tabular data. It consists of series objects containing data, rows, and columns. Basic operations on a DataFrame include creating, selecting, adding, and deleting rows and columns. Missing data is represented by NaN values and can be checked, filled, or dropped from the DataFrame.

Uploaded by

Ben Ten
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 32

Python Data Frame

PREPARED BY
R.AKILA.AP(SG)/CSE
BSACIST
Pandas

At the very basic level, Pandas objects can be thought


of as enhanced versions of NumPy structured arrays.
The rows and columns are identified with labels
rather than simple integer indices.
Pandas provides a host of useful tools, methods, and
functionality on top of these data structures.
Three fundamental Pandas data structures:
Series, DataFrame, and Index.
Pandas Series
A pandas Series is a one-dimensional array of
indexed data. It can be created from a list or array
The series has both a sequence of values and a
sequence of indices, which we can access with
the values and index attributes. The values are
simply a familiar NumPy array:
The essential difference is the presence of the index:
while the Numpy Array has an implicitly
defined integer index used to access the values, the
Pandas Series has an explicitly defined index
associated with the values.
This explicit index definition gives the Series object
additional capabilities. For example, the index need
not be an integer, but can consist of values of any
desired type. For example, if we wish, we can use
strings as an index:
Series as Specialized Dictionary

A dictionary is a structure which maps arbitrary keys


to a set of arbitrary values, and a series is a structure
which which maps typed keys to a set
of typed values.
This typing is important: just as the type-specific
compiled code behind a NumPy array makes it more
efficient than a Python list for certain operations, the
type information of a Pandas Series makes it much
more efficient than Python dictionaries for certain
operations.
Pandas DataFrame

Pandas DataFrame is two-dimensional size-


mutable, potentially heterogeneous tabular data
structure with labeled axes (rows and columns).
A Data frame is a two-dimensional data structure,
i.e., data is aligned in a tabular fashion in rows and
columns.
Pandas DataFrame consists of three principal
components, the data, rows, and columns.
 
Basic operation on Pandas DataFrame

Creating a DataFrame
Dealing with Rows and Columns
Indexing and Selecting Data
Working with Missing Data
Iterating over rows and columns
Contd..

In the real world, a Pandas DataFrame will be


created by loading the datasets from existing storage,
storage can be SQL Database, CSV file, and Excel file.
 Pandas DataFrame can be created from the lists,
dictionary, and from a list of dictionary etc.
Creating a dataframe using List
 
Creating DataFrame from dict of ndarray/lists

To create DataFrame from dict of narray/list, all the


narray must be of same length.
If index is passed then the length index should be
equal to the length of arrays.
 If no index is passed, then by default, index will be
range(n) where n is the array length.
Dealing with Rows and Columns

A Data frame is a two-dimensional data structure,


i.e., data is aligned in a tabular fashion in rows and
columns.
We can perform basic operations on rows/columns
like selecting, deleting, adding, and renaming.
Column Selection: In Order to select a column in
Pandas DataFrame, we can either access the columns
by calling them by their columns name.
import pandas as pd

data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],


'Height': [5.1, 6.2, 5.1, 5.2],
'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}

# Convert the dictionary into DataFrame


df = pd.DataFrame(data)

# Declare a list that is to be converted into a column


address = ['Delhi', 'Bangalore', 'Chennai', 'Patna']

# Using 'Address' as the column name


# and equating it to the list
df['Address'] = address

# Observe the result


print(df)
After adding new column
Dataframe
Row Selection: Pandas provide a unique method
to retrieve rows from a Data frame.
DataFrame.loc[] method is used to retrieve rows
from Pandas DataFrame.
Rows can also be selected by passing integer location
to an iloc[] function.
Dealing with rows

# importing pandas package


import pandas as pd

# making data frame from csv file


data = pd.read_csv("nba.csv", index_col ="Name")

# retrieving row by loc method


first = data.loc["Avery Bradley"]
second = data.loc["R.J. Hunter"]

print(first, "\n\n\n", second)


Selecting a single row

Indexing a DataFrame using .iloc[ ] :


This function allows us to retrieve rows and columns
by position.
In order to do that, we’ll need to specify the positions
of the rows that we want, and the positions of the
columns that we want as well.
The df.iloc indexer is very similar to df.loc but only
uses integer locations to make its selections.
Working with Missing Data

Missing Data can occur when no information is provided for


one or more items or for a whole unit.
 Missing Data is a very big problem in real life scenario.
Missing Data can also refer to as NA(Not Available) values in
pandas.
Checking for missing values
using isnull() and notnull() :
In order to check missing values in Pandas DataFrame, we use
a function isnull() and notnull().
Both function help in checking whether a value is NaN or not.
These function can also be used in Pandas Series in order to
find null values in a series.
Filling missing values

Filling missing values


using fillna(), replace() and interpolate() :
In order to fill null values in a datasets, we
use fillna(), replace() and interpolate() function these
function replace NaN values with some value of their own.
All these function help in filling a null values in datasets of
a DataFrame.
Interpolate() function is basically used to fill NA values in
the dataframe but it uses various interpolation technique to
fill the missing values rather than hard-coding the value.
Dropping missing values 

Dropping missing values using dropna() :


In order to drop a null values from a dataframe, we
used dropna() function this fuction drop
Rows/Columns of datasets with Null values in
different ways.
Now we drop rows with at least one Nan
value (Null value)

You might also like