0% found this document useful (0 votes)

17 views14 pages

Introduction To Pandas

Pandas is a Python library designed for data manipulation and analysis, allowing users to clean, explore, and analyze datasets effectively. It provides data structures like Series and DataFrames for handling one-dimensional and multi-dimensional data, respectively, and supports operations such as loading data from files, handling missing values, and performing statistical analyses. The library is essential for data science, enabling users to derive insights from large datasets through various functions and methods.

Uploaded by

korircaren4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views14 pages

Introduction To Pandas

Uploaded by

korircaren4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

PANDAS

Pandas is short for panel data.

It is a python library used for working with datasets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

Pandas allows us to analyze big data and make conclusions based on statistical
theories.

Pandas can clean messy data sets, and make them readable and relevant.

Relevant data is very important in data science.

:}
Data Science: is a branch of computer science where we study how to store,
use and analyze data for deriving information from it.

Pandas gives you answers about the data. Like:

 Is there a correlation between two or more columns?

 What is average value?

 Max value?

 Min value?

Pandas are also able to delete rows that are not relevant, or contains wrong values,
like empty or NULL values. This is called cleaning the data.

Once Pandas is installed, import it in your applications by adding

the import keyword:

import pandas
Example;
import pandas
mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}

myvar = [Link](mydataset)

print(myvar)
What is a Series?

A Pandas Series is like a column in a table.

It is a one-dimensional array holding data of any type.

Example;

Create a simple Pandas Series from a list:

import pandas as pd

a = [1, 7, 2]

myvar = [Link](a)

print(myvar)

Create Labels

With the index argument, you can name your own labels.

Example

Create your own labels:

import pandas as pd

a = [1, 7, 2]

myvar = [Link](a, index = ["x", "y", "z"])

print(myvar)

When you have created labels, you can access an item by referring to the label.

Example

Return the value of "y":

print(myvar["y"])

Key/Value Objects as Series

You can also use a key/value object, like a dictionary, when creating a Series.

Example

Create a simple Pandas Series from a dictionary:

import pandas as pd

calories = {"day1": 420, "day2": 380, "day3": 390}

myvar = [Link](calories)

print(myvar)

Note: The keys of the dictionary become the labels.

To select only some of the items in the dictionary, use the index argument and
specify only the items you want to include in the Series.

Example

Create a Series using only data from "day1" and "day2":

import pandas as pd

calories = {"day1": 420, "day2": 380, "day3": 390}

myvar = [Link](calories, index = ["day1", "day2"])

print(myvar)
DataFrames

Data sets in Pandas are usually multi-dimensional tables, called DataFrames.

Series is like a column, a DataFrame is the whole table.

Example

Create a DataFrame from two Series:

import pandas as pd

data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}

myvar = [Link](data)

print(myvar)

A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array,

or a table with rows and columns.

Example;

Create a simple Pandas DataFrame:

import pandas as pd

data = {
"calories": [420, 380, 390],

"duration": [50, 40, 45]

#load data into a DataFrame object:

df = [Link](data)

print(df)

As you can see from the result above, the DataFrame is like a table with rows and
columns.

Pandas use the loc attribute to return one or more specified row(s)

Example

Return row 0:

#refer to the row index:

print([Link][0])

Note: This example returns a Pandas Series.

Example

Return row 0 and 1:

#use a list of indexes:

print([Link][[0, 1]])

Note: When using [], the result is a Pandas DataFrame.

Named Indexes

With the index argument, you can name your own indexes.

Example

Add a list of names to give each row a name:

import pandas as pd

data = {

"calories": [420, 380, 390],

"duration": [50, 40, 45]

df = [Link](data, index = ["day1", "day2", "day3"])

print(df)

Locate Named Indexes

Use the named index in the loc attribute to return the specified row(s).

Example
Return "day2":

#refer to the named index:

print([Link]["day2"])

Load Files Into a DataFrame

If your data sets are stored in a file, Pandas can load them into a DataFrame.

Example

Load a comma separated file (CSV file) into a DataFrame:

import pandas as pd

df = pd.read_csv('[Link]')

print(df)

Read CSV Files

A simple way to store big data sets is to use CSV files (comma separated files).

CSV files contains plain text and is a well know format that can be read by
everyone including Pandas.

In our examples we will be using a CSV file called '[Link]'.

Download [Link]. or Open [Link]

Load the CSV into a DataFrame:

import pandas as pd

df = pd.read_csv('[Link]')

print(df.to_string())

Tip: use to_string() to print the entire DataFrame.

If you have a large DataFrame with many rows, Pandas will only return the first 5
rows, and the last 5 rows:

Example

Print the DataFrame without the to_string() method:

import pandas as pd

df = pd.read_csv('[Link]')

print(df)

The number of rows returned is defined in Pandas option settings.

You can check your system's maximum rows with the

[Link].max_rows statement.
Example

Check the number of maximum returned rows:

import pandas as pd

print([Link].max_rows)

n my system the number is 60, which means that if the DataFrame contains more
than 60 rows, the print(df) statement will return only the headers and the first and
last 5 rows.

You can change the maximum rows number with the same statement.

Read JSON

Big data sets are often stored, or extracted as JSON.

JSON is plain text, but has the format of an object, and is well known in the world
of programming, including Pandas.

In our examples we will be using a JSON file called '[Link]'.

Open [Link].

Example;

Load the JSON file into a DataFrame:

import pandas as pd
df = pd.read_json('[Link]')

print(df.to_string())

Tip: use to_string() to print the entire DataFrame

JSON = Python Dictionary

JSON objects have the same format as Python dictionaries.

If your JSON code is not in a file, but in a Python Dictionary, you can load it into a
DataFrame directly

Example

Load a Python Dictionary into a DataFrame:

import pandas as pd

data = {
"Duration":{
"0":60,
"1":60,
"2":60,
"3":45,
"4":45,
"5":60
},
"Pulse":{
"0":110,
"1":117,
"2":103,
"3":109,
"4":117,
"5":102
},
"Maxpulse":{
"0":130,
"1":145,
"2":135,
"3":175,
"4":148,
"5":127
},
"Calories":{
"0":409,
"1":479,
"2":340,
"3":282,
"4":406,
"5":300
}
}

df = [Link](data)

print(df)

Pandas - Analyzing DataFrames

Viewing the Data

One of the most used method for getting a quick overview of the DataFrame, is the
head() method.
The head() method returns the headers and a specified number of rows, starting
from the top

ExampleGet your own Python Server

Get a quick overview by printing the first 10 rows of the DataFrame:

import pandas as pd

df = pd.read_csv('[Link]')

print([Link](10))

Note: if the number of rows is not specified, the head() method will return the top 5
rows.

Example

Print the first 5 rows of the DataFrame:

import pandas as pd

df = pd.read_csv('[Link]')

print([Link]())

There is also a tail() method for viewing the last rows of the DataFrame.
The tail() method returns the headers and a specified number of rows, starting from
the bottom.

Example

Print the last 5 rows of the DataFrame:

print([Link]())

Info About the Data

The DataFrames object has a method called info(), that gives you more information
about the data set.

Example

Print information about the data:

print([Link]())

Null Values

The info() method also tells us how many Non-Null values there are present in
each column, and in our data set it seems like there are 164 of 169 Non-Null values
in the "Calories" column.

Which means that there are 5 rows with no value at all, in the "Calories" column,
for whatever reason.
Empty values, or Null values, can be bad when analyzing data, and you should
consider removing rows with empty values. This is a step towards what is called
cleaning data, and you will learn more about that in the next chapters.

Using rbind with Pandas DataFrames
No ratings yet
Using rbind with Pandas DataFrames
17 pages
Data Analysis with Pandas Overview
No ratings yet
Data Analysis with Pandas Overview
49 pages
EDA Pandas
No ratings yet
EDA Pandas
228 pages
Pandas
No ratings yet
Pandas
21 pages
Pandas
No ratings yet
Pandas
41 pages
Introduction to Pandas for Data Analysis
No ratings yet
Introduction to Pandas for Data Analysis
10 pages
Introduction to Python Pandas Library
No ratings yet
Introduction to Python Pandas Library
22 pages
Mdad - Numpy ML
No ratings yet
Mdad - Numpy ML
85 pages
Notes On Pandas.
No ratings yet
Notes On Pandas.
7 pages
Exp1 - Manipulating Datasets Using Pandas
No ratings yet
Exp1 - Manipulating Datasets Using Pandas
15 pages
Pandas Guide for Data Enthusiasts
No ratings yet
Pandas Guide for Data Enthusiasts
14 pages
For Assignment-3 (Final - Pandas - Lab)
No ratings yet
For Assignment-3 (Final - Pandas - Lab)
40 pages
Data Science - Sec3
No ratings yet
Data Science - Sec3
27 pages
Exercise 3
No ratings yet
Exercise 3
12 pages
Data Wrangling with Pandas Guide
No ratings yet
Data Wrangling with Pandas Guide
16 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
16 pages
Introduction to Pandas Library in Python
No ratings yet
Introduction to Pandas Library in Python
39 pages
Introduction to Pandas for Data Wrangling
No ratings yet
Introduction to Pandas for Data Wrangling
16 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
Pandas
No ratings yet
Pandas
42 pages
Pandas DataFrame Basics Guide
No ratings yet
Pandas DataFrame Basics Guide
41 pages
Introduction to Pandas Basics
No ratings yet
Introduction to Pandas Basics
6 pages
Pandas for Data Science Beginners
No ratings yet
Pandas for Data Science Beginners
41 pages
Overview of Pandas DataFrames
No ratings yet
Overview of Pandas DataFrames
21 pages
Pandas Notes
No ratings yet
Pandas Notes
44 pages
Pandas (Assignment 3)
No ratings yet
Pandas (Assignment 3)
24 pages
Advance Python Unit 4
No ratings yet
Advance Python Unit 4
13 pages
Pandas DataFrame Basics Guide
No ratings yet
Pandas DataFrame Basics Guide
9 pages
Pandas DataFrame Cheat Sheet Guide
No ratings yet
Pandas DataFrame Cheat Sheet Guide
10 pages
Getting Started with Pandas DataFrames
No ratings yet
Getting Started with Pandas DataFrames
38 pages
Pandas
No ratings yet
Pandas
8 pages
Unit 3
No ratings yet
Unit 3
10 pages
Python Pandas: Data Manipulation Guide
No ratings yet
Python Pandas: Data Manipulation Guide
84 pages
Pandas
No ratings yet
Pandas
21 pages
12 IP Unit 1 Python Pandas I (Part 3 Dataframes) Notes
100% (1)
12 IP Unit 1 Python Pandas I (Part 3 Dataframes) Notes
24 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
33 pages
Python Pandas
No ratings yet
Python Pandas
34 pages
Pandas Guide for Data Analysts
No ratings yet
Pandas Guide for Data Analysts
9 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
Pandas Handbook
No ratings yet
Pandas Handbook
33 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Introduction To Pandas and Matplotlib: Dr. D. Kothandaraman Associate Professor, SCOPE, VITAP-University
No ratings yet
Introduction To Pandas and Matplotlib: Dr. D. Kothandaraman Associate Professor, SCOPE, VITAP-University
30 pages
Pandas
No ratings yet
Pandas
25 pages
Python Pandas Tutorial For Beginners
100% (1)
Python Pandas Tutorial For Beginners
203 pages
Unit 4
No ratings yet
Unit 4
36 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Python Data Libraries Guide
No ratings yet
Python Data Libraries Guide
53 pages
Data Analysis with Pandas Basics
No ratings yet
Data Analysis with Pandas Basics
28 pages
FDS Notes Unit-4
No ratings yet
FDS Notes Unit-4
30 pages
DataFrame Ac Win Final
No ratings yet
DataFrame Ac Win Final
30 pages
Python Pandas DataFrame Guide
No ratings yet
Python Pandas DataFrame Guide
4 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Pandas
No ratings yet
Pandas
29 pages
Unit IV
No ratings yet
Unit IV
49 pages
Introduction to Python Pandas Library
No ratings yet
Introduction to Python Pandas Library
13 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Python 3rd Unit Question and Answer
No ratings yet
Python 3rd Unit Question and Answer
25 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
19 pages
Rakhi Kumari Resume
No ratings yet
Rakhi Kumari Resume
2 pages
Machine Learning in Cloud Computing Solutions
No ratings yet
Machine Learning in Cloud Computing Solutions
12 pages
C - ABAPD - 2507 SAP Real Updated Questions
No ratings yet
C - ABAPD - 2507 SAP Real Updated Questions
8 pages
Optimizing BLOB Management in Databases
No ratings yet
Optimizing BLOB Management in Databases
9 pages
Dhanalakshmi College of Engineering, Chennai
No ratings yet
Dhanalakshmi College of Engineering, Chennai
14 pages
Ieee Template
No ratings yet
Ieee Template
8 pages
MCQ On Mean, Median, Mode, Range, MD, SD
No ratings yet
MCQ On Mean, Median, Mode, Range, MD, SD
19 pages
T2 Schema Design and Data Modeling PDF
No ratings yet
T2 Schema Design and Data Modeling PDF
13 pages
Dbms Unit I
No ratings yet
Dbms Unit I
12 pages
The Big Book of Mlops Final 062722
No ratings yet
The Big Book of Mlops Final 062722
36 pages
Azure Databricks for Data Engineers
No ratings yet
Azure Databricks for Data Engineers
87 pages
3-Month Data Analyst Career Roadmap
No ratings yet
3-Month Data Analyst Career Roadmap
39 pages
Exploring Indexing Systems and Techniques New1
No ratings yet
Exploring Indexing Systems and Techniques New1
20 pages
Unit 1 Hashing
No ratings yet
Unit 1 Hashing
69 pages
ServiceNow CSA Dump
No ratings yet
ServiceNow CSA Dump
26 pages
E-Resources 2025
No ratings yet
E-Resources 2025
2 pages
BigQuery The Future of Data Warehousing
No ratings yet
BigQuery The Future of Data Warehousing
10 pages
Second Puc Minimalist Passing Package 2025 26
No ratings yet
Second Puc Minimalist Passing Package 2025 26
9 pages
Prasanna Jammalamadaka
No ratings yet
Prasanna Jammalamadaka
5 pages
Practical No.4
No ratings yet
Practical No.4
5 pages
Text 1
No ratings yet
Text 1
51 pages
Sujata Project Report . Final One .
No ratings yet
Sujata Project Report . Final One .
126 pages
CC UNIT-4 Notes
No ratings yet
CC UNIT-4 Notes
20 pages
B.SC - CSIT 8th Sem Model Questions
No ratings yet
B.SC - CSIT 8th Sem Model Questions
13 pages
Barangay Certification System With Secured QR Using AES Algorithm
100% (2)
Barangay Certification System With Secured QR Using AES Algorithm
55 pages
BGRFC Framework in SAP
No ratings yet
BGRFC Framework in SAP
22 pages
Er+Mapping+ Normalization: Group - 6
No ratings yet
Er+Mapping+ Normalization: Group - 6
9 pages
Assignment 1 Brief: Qualification BTEC Level 5 HND Diploma in Computing
No ratings yet
Assignment 1 Brief: Qualification BTEC Level 5 HND Diploma in Computing
16 pages
MIS Unit-2
No ratings yet
MIS Unit-2
24 pages
DB Lec2
No ratings yet
DB Lec2
44 pages

Introduction To Pandas

Uploaded by

Introduction To Pandas

Uploaded by

PANDAS

Pandas is short for panel data.

It is a python library used for working with datasets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

Relevant data is very important in data science.

Pandas gives you answers about the data. Like:

 Is there a correlation between two or more columns?

 What is average value?

Once Pandas is installed, import it in your applications by adding

A Pandas Series is like a column in a table.

It is a one-dimensional array holding data of any type.

Create a simple Pandas Series from a list:

Create your own labels:

myvar = [Link](a, index = ["x", "y", "z"])

Return the value of "y":

Key/Value Objects as Series

Create a simple Pandas Series from a dictionary:

calories = {"day1": 420, "day2": 380, "day3": 390}

Note: The keys of the dictionary become the labels.

Create a Series using only data from "day1" and "day2":

calories = {"day1": 420, "day2": 380, "day3": 390}

myvar = [Link](calories, index = ["day1", "day2"])

Data sets in Pandas are usually multi-dimensional tables, called DataFrames.

Series is like a column, a DataFrame is the whole table.

Create a DataFrame from two Series:

A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array,

Create a simple Pandas DataFrame:

"duration": [50, 40, 45]

#load data into a DataFrame object:

#refer to the row index:

Note: This example returns a Pandas Series.

Return row 0 and 1:

Note: When using [], the result is a Pandas DataFrame.

Add a list of names to give each row a name:

"calories": [420, 380, 390],

"duration": [50, 40, 45]

df = [Link](data, index = ["day1", "day2", "day3"])

Locate Named Indexes

#refer to the named index:

Load Files Into a DataFrame

Load a comma separated file (CSV file) into a DataFrame:

Read CSV Files

In our examples we will be using a CSV file called '[Link]'.

Download [Link]. or Open [Link]

Load the CSV into a DataFrame:

Tip: use to_string() to print the entire DataFrame.

Print the DataFrame without the to_string() method:

The number of rows returned is defined in Pandas option settings.

You can check your system's maximum rows with the

Check the number of maximum returned rows:

Big data sets are often stored, or extracted as JSON.

In our examples we will be using a JSON file called '[Link]'.

Load the JSON file into a DataFrame:

Tip: use to_string() to print the entire DataFrame

JSON = Python Dictionary

JSON objects have the same format as Python dictionaries.

Load a Python Dictionary into a DataFrame:

Pandas - Analyzing DataFrames

Viewing the Data

ExampleGet your own Python Server

Get a quick overview by printing the first 10 rows of the DataFrame:

Print the first 5 rows of the DataFrame:

Print the last 5 rows of the DataFrame:

Info About the Data

Print information about the data:

You might also like