0% found this document useful (0 votes)

95 views

NumPy and Pandas Tutorial

Uploaded by

omvati343

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

95 views

NumPy and Pandas Tutorial

Uploaded by

omvati343

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

NumPy and Pandas for Data Analysis AI ML Training

NumPy Tutorial
Introduction

NumPy (Numerical Python) is a library for the Python programming language, adding support
for large, multi-dimensional arrays and matrices, along with a large collection of high-level
mathematical functions to operate on these arrays.

Installation

To install NumPy, use the following command:

pip install numpy

Basic Operations

Importing NumPy

import numpy as np

Creating Arrays

# Create a 1D array
array_1d = np.array([1, 2, 3, 4, 5])
print(array_1d)

# Create a 2D array
array_2d = np.array([[1, 2, 3], [4, 5, 6]])
print(array_2d)

# Create an array with zeros

zeros_array = np.zeros((3, 4))
print(zeros_array)

# Create an array with ones

ones_array = np.ones((2, 3))
print(ones_array)

# Create an identity matrix

identity_matrix = np.eye(3)
print(identity_matrix)

# Create an array with a range of values

range_array = np.arange(10, 20, 2)
print(range_array)

# Create an array with evenly spaced values

linspace_array = np.linspace(0, 1, 5)
print(linspace_array)

Array Operations

# Arithmetic operations
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

LinkedIn: www.linkedin.com/in/nidhi-grover-raheja-904211138 1 |Pa ge

NumPy and Pandas for Data Analysis AI ML Training

print(a + b) # Addition
print(a - b) # Subtraction
print(a * b) # Element-wise multiplication
print(a / b) # Element-wise division

# Matrix multiplication
matrix_a = np.array([[1, 2], [3, 4]])
matrix_b = np.array([[5, 6], [7, 8]])
print(np.dot(matrix_a, matrix_b))

# Broadcasting
array_broadcast = np.array([1, 2, 3])
print(array_broadcast + 1) # Adds 1 to each element

# Statistical operations
print(np.mean(a)) # Mean
print(np.median(a)) # Median
print(np.std(a)) # Standard deviation
print(np.sum(a)) # Sum
print(np.min(a)) # Minimum
print(np.max(a)) # Maximum

Indexing and Slicing

array = np.array([1, 2, 3, 4, 5, 6])

# Indexing
print(array[0]) # First element
print(array[-1]) # Last element

# Slicing
print(array[1:4]) # Elements from index 1 to 3
print(array[:3]) # First three elements
print(array[3:]) # Elements from index 3 to end
print(array[::2]) # Every second element

Reshaping Arrays

array = np.arange(1, 10)

reshaped_array = array.reshape((3, 3))
print(reshaped_array)

# Flattening arrays
flattened_array = reshaped_array.flatten()
print(flattened_array)

Pandas Tutorial
Introduction

Pandas is a library providing high-performance, easy-to-use data structures and data analysis
tools for the Python programming language.

Installation

LinkedIn: www.linkedin.com/in/nidhi-grover-raheja-904211138 2 |Pa ge

NumPy and Pandas for Data Analysis AI ML Training

To install Pandas, use the following command:

pip install pandas

Basic Operations

Importing Pandas

import pandas as pd

Creating DataFrames

# Create a DataFrame from a dictionary

data = {
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32],
'City': ['New York', 'Paris', 'Berlin', 'London']
}
df = pd.DataFrame(data)
print(df)

# Create a DataFrame from a CSV file

df_from_csv = pd.read_csv('path_to_csv_file.csv')
print(df_from_csv)

Viewing Data

# Display the first few rows

print(df.head())

# Display the last few rows

print(df.tail())

# Display the data types of columns

print(df.dtypes)

# Display the shape of the DataFrame

print(df.shape)

# Display summary statistics

print(df.describe())

Selecting Data

# Select a single column

print(df['Name'])

# Select multiple columns

print(df[['Name', 'City']])

# Select rows by index

print(df.iloc[0]) # First row
print(df.iloc[0:2]) # First two rows

# Select rows by label

print(df.loc[0]) # First row
print(df.loc[0:2]) # First three rows (inclusive)

LinkedIn: www.linkedin.com/in/nidhi-grover-raheja-904211138 3 |Pa ge

NumPy and Pandas for Data Analysis AI ML Training

# Conditional selection
print(df[df['Age'] > 30])

Adding and Dropping Columns

# Add a new column

df['Country'] = ['USA', 'France', 'Germany', 'UK']
print(df)

# Drop a column
df = df.drop('Country', axis=1)
print(df)

Handling Missing Data

# Create a DataFrame with missing values

data_with_nan = {
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, None, 35, 32],
'City': ['New York', 'Paris', None, 'London']
}
df_nan = pd.DataFrame(data_with_nan)
print(df_nan)

# Drop rows with missing values

df_dropped_nan = df_nan.dropna()
print(df_dropped_nan)

# Fill missing values

df_filled_nan = df_nan.fillna({'Age': df_nan['Age'].mean(), 'City':
'Unknown'})
print(df_filled_nan)

Grouping and Aggregating Data

# Group by a column and calculate mean

print(df.groupby('City').mean())

# Group by multiple columns and calculate sum

print(df.groupby(['City', 'Name']).sum())

Merging DataFrames

# Create two DataFrames

df1 = pd.DataFrame({'Name': ['John', 'Anna'], 'Age': [28, 24]})
df2 = pd.DataFrame({'Name': ['Peter', 'Linda'], 'City': ['Berlin',
'London']})

# Concatenate DataFrames
df_concat = pd.concat([df1, df2], ignore_index=True)
print(df_concat)

# Merge DataFrames
df_merge = pd.merge(df1, df2, on='Name', how='inner')
print(df_merge)

LinkedIn: www.linkedin.com/in/nidhi-grover-raheja-904211138 4 |Pa ge

NumPy and Pandas for Data Analysis AI ML Training

Exporting Data

# Export DataFrame to CSV

df.to_csv('output.csv', index=False)

# Export DataFrame to Excel

df.to_excel('output.xlsx', index=False)

Advanced Pandas Tutorial

Handling Time Series Data

Pandas provides robust support for time series data. Here's how to work with it.

Creating Time Series Data

# Create a date range

date_range = pd.date_range(start='2023-01-01', periods=10, freq='D')
print(date_range)

# Create a DataFrame with time series data

time_series_data = {
'Date': date_range,
'Value': np.random.randn(10)
}
df_time_series = pd.DataFrame(time_series_data)
df_time_series.set_index('Date', inplace=True)
print(df_time_series)

Resampling Time Series Data

# Resample to weekly frequency and calculate the mean

df_resampled = df_time_series.resample('W').mean()
print(df_resampled)

# Resample to monthly frequency and calculate the sum

df_resampled_monthly = df_time_series.resample('M').sum()
print(df_resampled_monthly)

Working with Categorical Data

# Create a DataFrame with categorical data
data = {
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'City': ['New York', 'Paris', 'Berlin', 'London'],
'Gender': ['Male', 'Female', 'Male', 'Female']
}
df_categorical = pd.DataFrame(data)

# Convert a column to categorical type

df_categorical['Gender'] = df_categorical['Gender'].astype('category')
print(df_categorical)

# Get the categories and codes

print(df_categorical['Gender'].cat.categories)

LinkedIn: www.linkedin.com/in/nidhi-grover-raheja-904211138 5 |Pa ge

NumPy and Pandas for Data Analysis AI ML Training

print(df_categorical['Gender'].cat.codes)

Pivot Tables
# Create a DataFrame
data = {
'Name': ['John', 'Anna', 'John', 'Anna', 'John', 'Anna'],
'Month': ['Jan', 'Jan', 'Feb', 'Feb', 'Mar', 'Mar'],
'Sales': [150, 200, 130, 210, 170, 220]
}
df_sales = pd.DataFrame(data)

# Create a pivot table

pivot_table = df_sales.pivot_table(values='Sales', index='Name',
columns='Month', aggfunc='sum')
print(pivot_table)

Handling Large Datasets

# Read a large CSV file in chunks
chunk_size = 1000
chunks = pd.read_csv('large_dataset.csv', chunksize=chunk_size)

# Process each chunk

for chunk in chunks:
# Perform operations on the chunk
print(chunk.shape)

Applying Functions

Using apply()

# Create a DataFrame
data = {
'A': [1, 2, 3],
'B': [4, 5, 6]
}
df = pd.DataFrame(data)

# Define a function
def add_one(x):
return x + 1

# Apply the function to each element

print(df.applymap(add_one))

# Apply the function to each column

print(df.apply(lambda x: x + 1))

# Apply the function to each row

print(df.apply(lambda x: x + 1, axis=1))

Joining DataFrames
# Create two DataFrames
df1 = pd.DataFrame({
'key': ['A', 'B', 'C', 'D'],

LinkedIn: www.linkedin.com/in/nidhi-grover-raheja-904211138 6 |Pa ge

NumPy and Pandas for Data Analysis AI ML Training

'value': [1, 2, 3, 4]
})
df2 = pd.DataFrame({
'key': ['B', 'D', 'E', 'F'],
'value': [5, 6, 7, 8]
})

# Inner join
inner_joined = pd.merge(df1, df2, on='key', how='inner')
print(inner_joined)

# Left join
left_joined = pd.merge(df1, df2, on='key', how='left')
print(left_joined)

# Right join
right_joined = pd.merge(df1, df2, on='key', how='right')
print(right_joined)

# Outer join
outer_joined = pd.merge(df1, df2, on='key', how='outer')
print(outer_joined)

Window Functions
# Create a DataFrame with time series data
data = {
'Date': pd.date_range(start='2023-01-01', periods=10, freq='D'),
'Value': np.random.randn(10)
}
df = pd.DataFrame(data)
df.set_index('Date', inplace=True)

# Calculate rolling mean

rolling_mean = df['Value'].rolling(window=3).mean()
print(rolling_mean)

# Calculate expanding sum

expanding_sum = df['Value'].expanding().sum()
print(expanding_sum)

# Calculate exponentially weighted mean

ewm_mean = df['Value'].ewm(span=3).mean()
print(ewm_mean)

Handling JSON Data

# Create a JSON string
json_str = '''
[
{"Name": "John", "Age": 28, "City": "New York"},
{"Name": "Anna", "Age": 24, "City": "Paris"},
{"Name": "Peter", "Age": 35, "City": "Berlin"}
]
'''

# Read JSON string into DataFrame

df_json = pd.read_json(json_str)
print(df_json)

LinkedIn: www.linkedin.com/in/nidhi-grover-raheja-904211138 7 |Pa ge

NumPy and Pandas for Data Analysis AI ML Training

# Export DataFrame to JSON

df_json.to_json('output.json', orient='records', lines=True)

Advanced Indexing with MultiIndex

# Create a MultiIndex DataFrame
arrays = [
['A', 'A', 'B', 'B'],
['one', 'two', 'one', 'two']
]
index = pd.MultiIndex.from_arrays(arrays, names=('first', 'second'))
df_multi = pd.DataFrame({'value': [1, 2, 3, 4]}, index=index)
print(df_multi)

# Accessing data in MultiIndex DataFrame

print(df_multi.loc['A'])
print(df_multi.loc[('A', 'one')])

Combining DataFrames with concat and append

# Create DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
})
df2 = pd.DataFrame({
'A': ['A3', 'A4', 'A5'],
'B': ['B3', 'B4', 'B5']
})

# Concatenate DataFrames
concatenated = pd.concat([df1, df2], ignore_index=True)
print(concatenated)

# Append DataFrames
appended = df1.append(df2, ignore_index=True)
print(appended)

Performance Tips
# Use vectorized operations instead of loops
data = pd.DataFrame({
'A': range(1000000),
'B': range(1000000)
})

# Inefficient way: Using loops

data['C'] = [x + y for x, y in zip(data['A'], data['B'])]

# Efficient way: Using vectorized operations

data['C'] = data['A'] + data['B']

LinkedIn: www.linkedin.com/in/nidhi-grover-raheja-904211138 8 |Pa ge

12 Information Practices Text Book Preeti Arora
No ratings yet
12 Information Practices Text Book Preeti Arora
45 pages
Spanish Verbs With Prepositions
100% (3)
Spanish Verbs With Prepositions
12 pages
Pandas_Dataframe_All_Operations_1735471870
No ratings yet
Pandas_Dataframe_All_Operations_1735471870
4 pages
Programs of Python Pandas
No ratings yet
Programs of Python Pandas
15 pages
Dsbda Assignment 1
No ratings yet
Dsbda Assignment 1
5 pages
Practical File Python
No ratings yet
Practical File Python
25 pages
Journal 12
No ratings yet
Journal 12
54 pages
XII IP PRACTICAL LIST 2022-23-1
No ratings yet
XII IP PRACTICAL LIST 2022-23-1
23 pages
Pandas,Numpy,Matplotlib
No ratings yet
Pandas,Numpy,Matplotlib
11 pages
Python CSBS Bhavya Lab Manual
No ratings yet
Python CSBS Bhavya Lab Manual
14 pages
09_Pandas slides
No ratings yet
09_Pandas slides
33 pages
Python Amit
No ratings yet
Python Amit
11 pages
Pandas Numpy
No ratings yet
Pandas Numpy
4 pages
Python Pandas
No ratings yet
Python Pandas
3 pages
Practical File Part 1
No ratings yet
Practical File Part 1
17 pages
14_Pandas
No ratings yet
14_Pandas
25 pages
STD XII-IP Ch-1 (Practical)
No ratings yet
STD XII-IP Ch-1 (Practical)
7 pages
Shiva Teja
No ratings yet
Shiva Teja
19 pages
5 WEEK Python Programs
No ratings yet
5 WEEK Python Programs
20 pages
12 IP Practical
No ratings yet
12 IP Practical
14 pages
IP Practical File Project
No ratings yet
IP Practical File Project
60 pages
Ip Project Work 2
No ratings yet
Ip Project Work 2
52 pages
Python
No ratings yet
Python
32 pages
Practical File: School Name School Logo
100% (1)
Practical File: School Name School Logo
35 pages
External
No ratings yet
External
11 pages
Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
report
No ratings yet
report
25 pages
Iteration
No ratings yet
Iteration
40 pages
Pandas Dataframe
No ratings yet
Pandas Dataframe
48 pages
Lecture Material 3
No ratings yet
Lecture Material 3
7 pages
MLRecord
No ratings yet
MLRecord
24 pages
Subset Selection Class Assignment
No ratings yet
Subset Selection Class Assignment
5 pages
DMT Function
No ratings yet
DMT Function
10 pages
DATA MINING EX1
No ratings yet
DATA MINING EX1
10 pages
L_AND_T_project_Naveen 24cs002895
No ratings yet
L_AND_T_project_Naveen 24cs002895
7 pages
Usage of NumPy for Numerical Data in Detail
No ratings yet
Usage of NumPy for Numerical Data in Detail
52 pages
Assignment-1 (Python Pandas-Series Object and Data Frame: 1. Answer The Following
100% (1)
Assignment-1 (Python Pandas-Series Object and Data Frame: 1. Answer The Following
8 pages
Class XII Python Practical File
No ratings yet
Class XII Python Practical File
19 pages
8 R Basics 3
No ratings yet
8 R Basics 3
27 pages
ML-CONTENTHALF
No ratings yet
ML-CONTENTHALF
35 pages
CET313 - Introduction To AI
No ratings yet
CET313 - Introduction To AI
5 pages
unit-3(FODS)
No ratings yet
unit-3(FODS)
34 pages
Import
No ratings yet
Import
15 pages
Pandas Methods With Examples
No ratings yet
Pandas Methods With Examples
2 pages
12th IP PRACTICALS
No ratings yet
12th IP PRACTICALS
18 pages
PythonForMachineLearning
No ratings yet
PythonForMachineLearning
66 pages
Practical Record 2 PYTHON AND SQL PROGRAMS - 2023
No ratings yet
Practical Record 2 PYTHON AND SQL PROGRAMS - 2023
76 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
32 pages
Dev Lab Manual Org
No ratings yet
Dev Lab Manual Org
28 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
10 pages
How To Use Popular Data Structures and Algorithms in Python ?
100% (1)
How To Use Popular Data Structures and Algorithms in Python ?
11 pages
EXP1-siddhant gupta (23_SE_148)
No ratings yet
EXP1-siddhant gupta (23_SE_148)
17 pages
Practical_1
No ratings yet
Practical_1
5 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
Numpy Pyhton Tutorial
No ratings yet
Numpy Pyhton Tutorial
28 pages
Data Aggregation and Group Operations
No ratings yet
Data Aggregation and Group Operations
34 pages
ai codes
No ratings yet
ai codes
11 pages
Ip Project
No ratings yet
Ip Project
16 pages
Ip Project File
No ratings yet
Ip Project File
46 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
Unit 4_Query Processing
No ratings yet
Unit 4_Query Processing
49 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
26 pages
graph algorithms-final
No ratings yet
graph algorithms-final
158 pages
Recursion
No ratings yet
Recursion
12 pages
CSUnit1[1]
No ratings yet
CSUnit1[1]
124 pages
IS LECTURE 1
No ratings yet
IS LECTURE 1
37 pages
NNAI BAI-205 UNIT 1
No ratings yet
NNAI BAI-205 UNIT 1
107 pages
communication channels
No ratings yet
communication channels
7 pages
Forensic Investigation Report
No ratings yet
Forensic Investigation Report
14 pages
Revision 3AM by Sir Ahmed 2020
No ratings yet
Revision 3AM by Sir Ahmed 2020
3 pages
PHD Theology Application Form
No ratings yet
PHD Theology Application Form
22 pages
Chapter 2 - Public Speaking PDF
No ratings yet
Chapter 2 - Public Speaking PDF
18 pages
CURRICULUM MAP IN TLE 10 4th Q
No ratings yet
CURRICULUM MAP IN TLE 10 4th Q
2 pages
Cognitive Development
No ratings yet
Cognitive Development
4 pages
Idiomatic Vocabulary For Ielts Speaking PDF Free
No ratings yet
Idiomatic Vocabulary For Ielts Speaking PDF Free
14 pages
Dua Yadafiyu For Protection
No ratings yet
Dua Yadafiyu For Protection
2 pages
ARCHE 2 Module in Strength of Materials - Shear and Moment in Beams
No ratings yet
ARCHE 2 Module in Strength of Materials - Shear and Moment in Beams
26 pages
Business English Vocabulary Online
No ratings yet
Business English Vocabulary Online
27 pages
Education: Arif Maulana Rahman
No ratings yet
Education: Arif Maulana Rahman
1 page
Baby Sign Resources (PDF)
No ratings yet
Baby Sign Resources (PDF)
2 pages
Serie - D - Exercices - 8eme - Annee - de - Base-Anglais-Future Simple
No ratings yet
Serie - D - Exercices - 8eme - Annee - de - Base-Anglais-Future Simple
1 page
_The Unexpected Path
No ratings yet
_The Unexpected Path
2 pages
The Aims of Education
No ratings yet
The Aims of Education
23 pages
Theoretical_Framework_e-version
No ratings yet
Theoretical_Framework_e-version
81 pages
Jose Rizal's Travel Abroad (PAtr
No ratings yet
Jose Rizal's Travel Abroad (PAtr
30 pages
Kaamaseekashtakam
100% (5)
Kaamaseekashtakam
35 pages
Journal of Italian Translations PDF
100% (1)
Journal of Italian Translations PDF
313 pages
Exploring Concession and Contrast
100% (1)
Exploring Concession and Contrast
16 pages
Final Resume
No ratings yet
Final Resume
1 page
Medic Mind Crash Course
No ratings yet
Medic Mind Crash Course
4 pages
Dynamic Markov Compression
No ratings yet
Dynamic Markov Compression
10 pages
Salmi Toufik
No ratings yet
Salmi Toufik
2 pages
Essays of Travel by Stevenson, Robert Louis, 1850-1894
100% (1)
Essays of Travel by Stevenson, Robert Louis, 1850-1894
94 pages
The Journey of Life
No ratings yet
The Journey of Life
7 pages
On The World Soul, or The Higher Physics For A Global Organism (F.W.J Schelling) (Z-Library)
No ratings yet
On The World Soul, or The Higher Physics For A Global Organism (F.W.J Schelling) (Z-Library)
8 pages
Positive and Negative Statements
No ratings yet
Positive and Negative Statements
4 pages
M.tech CSE First Sem Syllabus Cloud Computing
No ratings yet
M.tech CSE First Sem Syllabus Cloud Computing
5 pages