0% found this document useful (0 votes)
270 views

Introduction To Pandas - Ipynb - Colaboratory

This document introduces the Pandas library in Python. Pandas allows users to store and manipulate data in data frames, which are similar to spreadsheets. Data frames allow columns to be labeled and indexed. Users can access data frames by selecting columns using their names or by choosing rows using indexes. Pandas also makes it easy to create, modify, and restructure data frames.

Uploaded by

Vincent Giang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
270 views

Introduction To Pandas - Ipynb - Colaboratory

This document introduces the Pandas library in Python. Pandas allows users to store and manipulate data in data frames, which are similar to spreadsheets. Data frames allow columns to be labeled and indexed. Users can access data frames by selecting columns using their names or by choosing rows using indexes. Pandas also makes it easy to create, modify, and restructure data frames.

Uploaded by

Vincent Giang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

8/5/2020 Introduction to Pandas.

ipynb - Colaboratory

Overview
In this module we will have a look at a Python library called Pandas.

Pandas is a library built on top of NumPy.

Pandas offer a data structure called pandas.DataFrame which is similar to NumPy Arrays with
added functionality.

Added functionalities include operations that we often use in data science such as omitting
missing values, replacing values, and etc.

Pandas Basics
Similar to NumPy, we can import Pandas by calling import pandas as pd .

By calling as pd we can use the library functions by calling pd.foo() . If you omiit as pd and
import library by calling import pandas , you will have to call functions as pandas.foo()

import pandas as pd

Pandas DataFrame
pandas.DataFrame is a widely used tabular data structure similar to a spreadsheet which we can
use to manage data within out python code.

We can have names to columns unlike in NumPy which allows to easily manipulate and nd our
data within a huge dataset.

data = {'state': ['OH', 'OH', 'OH', 'NV', 'NV'],


'year': [2000, 2001, 2002, 2000, 2002],
'pop' : [1.5, 1.4, 3.6, 2.4, 2.0]}

populationData = pd.DataFrame(data, index=['A', 'B', 'C', 'D', 'E'])

print(populationData)

state year pop


A OH 2000 1.5
B OH 2001 1.4
C OH 2002 3.6
D NV 2000 2.4
E NV 2002 2.0

https://colab.research.google.com/drive/1kIUvQX1ynNzPYz7RfzLI9b069BY9-fgS#printMode=true 1/7
8/5/2020 Introduction to Pandas.ipynb - Colaboratory

We can see that dataframes allows us to have names to colums which is helpful to nding data
across multiple columns.

Also note that we can have a custom index for the rows (other than regular 0, 1, 2.. index in arrays).
Here in the population dataset we have custom indices which are 'A', 'B', 'C', 'D', 'E'

Accessing Columns
We can access columns by using column names. For example we can access state names in pour
population data by calling populationData['state']

populationData['state']

When we need to access multiple columns, we can use nested [] and pass a list of columns we
need to view.

populationData[['state', 'pop']]

Also, we can view the columns that are in the data frame by calling populationData.columns

populationData.columns

Activity
https://colab.research.google.com/drive/1kIUvQX1ynNzPYz7RfzLI9b069BY9-fgS#printMode=true 2/7
8/5/2020 Introduction to Pandas.ipynb - Colaboratory

You have been given a sales dataframe which includes data on effect of multiple factors on sales
price of a commodity.

Your task is to determine rst determine what are the columns in the dataset and extract 3
columns from dataset such that it includes Date, price and a factor of your choice.

salesDataframe = pd.read_csv('https://www.cs.odu.edu/~sampath/courses/f19/cs620/files/data/va

salesDataframe.index = [1000, 1001, 1002, 1003, 1004, 1005, 1006]

# Continue with your code

salesDataframe

Acessing Rows
There are multiple ways to retrieve rows from a dataframe.

Since pandas allows us to set custom index, we can either use custom index or the regular index
(i.e 0, 1, 2..) to access data.

To view the index used by a dataframe, we can call dataframe.index

populationData.index

Sometimes custom index would be the same as regular index when we haven't speci ed a custom
index for our dataframe.

https://colab.research.google.com/drive/1kIUvQX1ynNzPYz7RfzLI9b069BY9-fgS#printMode=true 3/7
8/5/2020 Introduction to Pandas.ipynb - Colaboratory

head
Through head(n) we can get the rst n number of rows of the dataframe.

populationData.head(3)

loc operation
We can use the custom index we have on the dataframe (i.e 'A', 'B', 'C', 'D', 'E' in our
population dataframe) to retrieve rows. We can do that by calling loc on our dataframe.

populationData.loc['A']

To select multiple rows, we can pass multiple indices similar to the way we accessed multiple
columns.

populationData.loc[['A', 'C']]

iloc operation
We can also use the regular index to retrieve data. (i.e 0, 1, 2..). For that we have to use iloc
operation similar to loc operation we used previously.

populationData.iloc[1]

https://colab.research.google.com/drive/1kIUvQX1ynNzPYz7RfzLI9b069BY9-fgS#printMode=true 4/7
8/5/2020 Introduction to Pandas.ipynb - Colaboratory

populationData.iloc[[1, 3]]

Activity
Consider the sales dataset we have. Retrieve set of rows of your choice using head() , iloc and
loc operations.

Accessing Rows and Columns


We can combine column access methods and row access methods to get the required set of data
from the dataframe. For that, we rst select the rows and nest it with the column selection.

populationData.loc[["A", "C"]][["state", "pop"]]

Creating Dataframes
To create dataframes we have to use pd.DataFrame() function along with the data for the
dataframe.

We have to pass a dictionary to pd.DataFrame function which contains names of columns and
data we have for each row.

Let's create the temperature dataset from numpy excercise. Here we have 2 arrays for inside and
outside temperature readings.

We will be using inside and outside as column names.

data = {'inside' : [166, 108, 229, 194, 266, 102, 235, 188, 183, 129],
'outside' : [251, 238, 236, 161, 108, 291, 121, 183, 137, 133]}

https://colab.research.google.com/drive/1kIUvQX1ynNzPYz7RfzLI9b069BY9-fgS#printMode=true 5/7
8/5/2020 Introduction to Pandas.ipynb - Colaboratory
temperatureDataframe = pd.DataFrame(data)

temperatureDataframe

Activity
Your task is to create a dataframe that has both acceleration data and temperature data. You may
chose column names of your choice.

accx = [0.03463151, 0.6746004 , 0.75813463, 0.14376458, 0.17252515,


0.4135009 , 0.80347004, 0.81023186, 0.66539218, 0.54754633]

accy = [0.48593401, 0.88983019, 0.87322111, 0.95533169, 0.35901729,


0.86243141, 0.36083334, 0.18515889, 0.20486895, 0.18408961]

accz = [0.29648785, 0.38779023, 0.05209736, 0.75532094, 0.27063359,


0.53516819, 0.79639674, 0.64252951, 0.18353906, 0.30367977]

https://colab.research.google.com/drive/1kIUvQX1ynNzPYz7RfzLI9b069BY9-fgS#printMode=true 6/7
8/5/2020 Introduction to Pandas.ipynb - Colaboratory

https://colab.research.google.com/drive/1kIUvQX1ynNzPYz7RfzLI9b069BY9-fgS#printMode=true 7/7

You might also like