NumPy and Pandas Tutorial
NumPy and Pandas Tutorial
NumPy Tutorial
Introduction
NumPy (Numerical Python) is a library for the Python programming language, adding support
for large, multi-dimensional arrays and matrices, along with a large collection of high-level
mathematical functions to operate on these arrays.
Installation
Basic Operations
Importing NumPy
import numpy as np
Creating Arrays
# Create a 1D array
array_1d = np.array([1, 2, 3, 4, 5])
print(array_1d)
# Create a 2D array
array_2d = np.array([[1, 2, 3], [4, 5, 6]])
print(array_2d)
Array Operations
# Arithmetic operations
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(a + b) # Addition
print(a - b) # Subtraction
print(a * b) # Element-wise multiplication
print(a / b) # Element-wise division
# Matrix multiplication
matrix_a = np.array([[1, 2], [3, 4]])
matrix_b = np.array([[5, 6], [7, 8]])
print(np.dot(matrix_a, matrix_b))
# Broadcasting
array_broadcast = np.array([1, 2, 3])
print(array_broadcast + 1) # Adds 1 to each element
# Statistical operations
print(np.mean(a)) # Mean
print(np.median(a)) # Median
print(np.std(a)) # Standard deviation
print(np.sum(a)) # Sum
print(np.min(a)) # Minimum
print(np.max(a)) # Maximum
# Indexing
print(array[0]) # First element
print(array[-1]) # Last element
# Slicing
print(array[1:4]) # Elements from index 1 to 3
print(array[:3]) # First three elements
print(array[3:]) # Elements from index 3 to end
print(array[::2]) # Every second element
Reshaping Arrays
# Flattening arrays
flattened_array = reshaped_array.flatten()
print(flattened_array)
Pandas Tutorial
Introduction
Pandas is a library providing high-performance, easy-to-use data structures and data analysis
tools for the Python programming language.
Installation
Basic Operations
Importing Pandas
import pandas as pd
Creating DataFrames
Viewing Data
Selecting Data
# Conditional selection
print(df[df['Age'] > 30])
# Drop a column
df = df.drop('Country', axis=1)
print(df)
Merging DataFrames
# Concatenate DataFrames
df_concat = pd.concat([df1, df2], ignore_index=True)
print(df_concat)
# Merge DataFrames
df_merge = pd.merge(df1, df2, on='Name', how='inner')
print(df_merge)
Exporting Data
Pandas provides robust support for time series data. Here's how to work with it.
print(df_categorical['Gender'].cat.codes)
Pivot Tables
# Create a DataFrame
data = {
'Name': ['John', 'Anna', 'John', 'Anna', 'John', 'Anna'],
'Month': ['Jan', 'Jan', 'Feb', 'Feb', 'Mar', 'Mar'],
'Sales': [150, 200, 130, 210, 170, 220]
}
df_sales = pd.DataFrame(data)
Applying Functions
Using apply()
# Create a DataFrame
data = {
'A': [1, 2, 3],
'B': [4, 5, 6]
}
df = pd.DataFrame(data)
# Define a function
def add_one(x):
return x + 1
Joining DataFrames
# Create two DataFrames
df1 = pd.DataFrame({
'key': ['A', 'B', 'C', 'D'],
'value': [1, 2, 3, 4]
})
df2 = pd.DataFrame({
'key': ['B', 'D', 'E', 'F'],
'value': [5, 6, 7, 8]
})
# Inner join
inner_joined = pd.merge(df1, df2, on='key', how='inner')
print(inner_joined)
# Left join
left_joined = pd.merge(df1, df2, on='key', how='left')
print(left_joined)
# Right join
right_joined = pd.merge(df1, df2, on='key', how='right')
print(right_joined)
# Outer join
outer_joined = pd.merge(df1, df2, on='key', how='outer')
print(outer_joined)
Window Functions
# Create a DataFrame with time series data
data = {
'Date': pd.date_range(start='2023-01-01', periods=10, freq='D'),
'Value': np.random.randn(10)
}
df = pd.DataFrame(data)
df.set_index('Date', inplace=True)
# Concatenate DataFrames
concatenated = pd.concat([df1, df2], ignore_index=True)
print(concatenated)
# Append DataFrames
appended = df1.append(df2, ignore_index=True)
print(appended)
Performance Tips
# Use vectorized operations instead of loops
data = pd.DataFrame({
'A': range(1000000),
'B': range(1000000)
})