Python_for_DataScience
Python_for_DataScience
Introduction – Setting working directory – Creating and saving, File execution, clearing
console, removing variables from environment, clearing environment – variable creation –
Operators – Data types and its associated operations – sequence data types – conditions and
branching – Functions-Virtual Environments
Introduction to Python
Python is a versatile and popular programming language known for its simplicity and
readability. It is widely used in various fields, including web development, data analysis,
artificial intelligence, and scientific computing.
The working directory is the folder where Python scripts are executed and where files are
read from or written to. To set the working directory:
Coding
import os
# To get the current working directory
print(os.getcwd())
# To change the working directory
os.chdir('path_to_directory')
Creating and saving files in Python can be done using the open() function:
File Execution
python filename.py
Clearing Console
Clearing the console can be done using a system call:
import os
# Create a variable
x = 10
Clearing Environment
Clearing the entire environment is not a built-in feature of Python, but you can delete all
variables in the global scope:
Variable Creation
x=5 # Integer
y = 3.14 # Float
name = "John" # String
Operators
Lists
my_list = [1, 2, 3, 4, 5]
Tuples
my_tuple = (1, 2, 3, 4, 5)
Strings
Ranges
x = 10
if x > 0:
print("Positive")
elif x < 0:
print("Negative")
else:
print("Zero")
Functions
def greet(name):
return f"Hello, {name}!"
print(greet("Alice"))
Virtual Environments
Virtual environments allow you to create isolated Python environments for different projects:
Copy code
# Create a virtual environment
python -m venv myenv
List – Tuples- Set – Dictionary – Its associated functions - File handling - Modes– Reading
and writing files - Introduction to Pandas – Series – Data frame – Indexing and loading –
Data manipulation – Merging – Group by – Scales – Pivot table – Date and time.
Lists
Creating Lists:
my_list = [1, 2, 3, 4, 5]
Tuples
Creating Tuples:
my_tuple = (1, 2, 3, 4, 5)
Sets
Creating Sets:
my_set = {1, 2, 3, 4, 5}
Functions and Methods:
Dictionaries
Creating Dictionaries:
File Handling
Modes:
# Writing to a file
with open('example.txt', 'w') as file:
file.write("Hello, World!")
Introduction to Pandas
Series
Creating a Series:
import pandas as pd
data = [1, 2, 3, 4, 5]
series = pd.Series(data)
DataFrame
Creating a DataFrame:
data = {
'Column1': [1, 2, 3],
'Column2': [4, 5, 6]
}
df = pd.DataFrame(data)
Indexing and Loading
Indexing:
Loading Data:
Basic Operations:
Merging
Combining DataFrames:
Group By
Grouping data:
grouped = df.groupby('Column1')
summary = grouped['Column2'].sum()
Scales
Example:
Pivot Table
df['Date'] = pd.to_datetime(df['Date'])
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['Day'] = df['Date'].dt.day
Numpy
Numpy is a powerful numerical computing library in Python, providing support for large
multi-dimensional arrays and matrices along with a large collection of high-level
mathematical functions.
Creating 1D Arrays:
import numpy as np
Creating 2D Arrays:
Associated Operations
Basic Operations:
# Element-wise addition
result = arr_1d + 2
# Element-wise subtraction
result = arr_1d - 2
# Element-wise multiplication
result = arr_1d * 2
# Element-wise division
result = arr_1d / 2
Aggregations:
# Sum of elements
np.sum(arr_1d)
# Mean of elements
np.mean(arr_1d)
# Standard deviation
np.std(arr_1d)
Broadcasting
Example:
Dot Product:
a = np.array([1, 2])
b = np.array([3, 4])
dot_product = np.dot(a, b) # 11
Matrix Multiplication:
result = np.matmul(A, B)
Inverse of a Matrix:
Indexing:
# Accessing elements
element = arr[0] # 1
# Slicing
subarray = arr[1:3] # [2, 3]
Reshaping:
Matplotlib
Matplotlib is a plotting library for creating static, interactive, and animated visualizations in
Python.
Scatter Plot
x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 4, 3, 2, 1])
plt.scatter(x, y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Scatter Plot')
plt.show()
Line Plot
plt.plot(x, y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Line Plot')
plt.show()
Bar Plot
plt.bar(categories, values)
plt.xlabel('Categories')
plt.ylabel('Values')
plt.title('Bar Plot')
plt.show()
Histogram
Creating a Histogram:
data = np.random.randn(1000)
plt.hist(data, bins=30)
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Histogram')
plt.show()
Box Plot
df = pd.DataFrame({
'A': np.random.randn(100),
'B': np.random.randn(100),
'C': np.random.randn(100),
'D': np.random.randn(100)
})
sns.pairplot(df)
plt.show()
Regression
# Sample data
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([1, 2, 3, 4, 5])
# Make predictions
predictions = model.predict(X)
# Plot results
plt.scatter(X, y, color='blue')
plt.plot(X, predictions, color='red')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Linear Regression')
plt.show()
Classification
# Load data
iris = load_iris()
X = iris.data
y = iris.target
# Make predictions
predictions = model.predict(X)
# Plot results
plt.scatter(X[:, 0], X[:, 1], c=predictions, cmap='viridis')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Logistic Regression Classification')
plt.show()
print(i)
Print n odd numbers Output
for i in range(1,10,2):
13579
print(i)
for i in range(1,5,1): 1 4 9 16
print(i*i)
for i in range(1,5,1): 1 8 27 64
print(i*i*i)
ALGORITHM
Step1: Start
Step3: Print the basic characteristics and operactions of array Step4: Stop
PROGRAM
import numpy as np
[ 4, 2, 5]] )
# Printing type of arr object print("Array is of type: ", type(arr)) # Printing array dimensions
(axes)
print("No. of dimensions: ", arr.ndim) # Printing shape of array print("Shape of array: ",
arr.shape)
# Printing size (total number of elements) of array print("Size of array: ", arr.size)
OUTPUT
a = np.array([[1,2,3],[3,4,5],[4,5,6]])
print(a)
[[1 2 3]
[3 4 5]
[4 5 6]]
[4 5 6]]
ALGORITHM
Step1: Start
Step5: Stop
PROGRAM
['Row2',3,4]])
print(pd.DataFrame(data=data[1:,1:],
# Take a 2D array as input to your DataFrame my_2darray = np.array([[1, 2, 3], [4, 5, 6]])
print(pd.DataFrame(my_2darray))
# Take a dictionary as input to your DataFrame my_dict = {1: ['1', '3'], 2: ['1', '2'], 3: ['2', '4']}
print(pd.DataFrame(my_dict))
print(pd.DataFrame(my_series))
Output:
Col1 Col2
Row1 1 2
Row2 3 4
0 1 2
0 1 2 3
1 4 5 61 23
0 1 1 2
1 3 2 4A
0 4
1 5
2 6
3 7
United Kingdom London India New Delhi United States Washington Belgium
Brussels
(2, 3)
2
BASIC PLOTS USING MATPLOTLIB
ALGORITHM
Step1: Start
Step3: Create a Basic plots using Matplotlib Step4: Print the output
Step5: Stop
Program:3a
# naming the x axis plt.xlabel('x - axis') # naming the y axis plt.ylabel('y - axis')
Output:
Program:3b
plt.plot(a)
# get command over the individual # boundary line of the graph body
ax.spines['right'].set_visible(False) ax.spines['top'].set_visible(False)
plt.xticks(list(range(-3, 10)))
# set the intervals by which y-axis # set the marks plt.yticks(list(range(-3, 20, 3)))
Program:
a = [1, 2, 3, 4, 5]
# use fig whenever u want the # output in a new window also # specify the window size you
# want ans to be displayed
sub1 = plt.subplot(2, 2, 1)
sub2 = plt.subplot(2, 2, 2)
sub3 = plt.subplot(2, 2, 3)
# sets how the display subplot # x axis values advances by 1 # within the specified range
sub1.set_xticks(list(range(0, 10, 1))) sub1.set_title('1st Rep')
sub2.plot(b, 'or')
# sets how the display subplot x axis # values advances by 2 within the
sub4.plot(c, 'Dm')
# similarly we can set the ticks for # the y-axis range(start(inclusive), # end(exclusive), step)
plt.show()
Output:
Normal Curve
ALGORITHM
Program:
sb.set_style('whitegrid')
plt.ylabel('Probability Density')
Output:
CORRELATION AND SCATTER PLOTS
ALGORITHM
Step 4: plot the scatter plot Step 5: Print the result Step 6: Stop the process
Program:
y2=-5°x
y3=no_random.randn(100) #Plot
# Plot
plt(show)
Output
SIMPLE LINEAR REGRESSION
ALGORITHM
Step 2: Import numpy and matplotlib package Step 3: Define coefficient function
PROGRAM:
import numpy as np
m_y = np.mean(y)
plt.ylabel('y')
def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
print("Estimated coefficients:\nb_0 = {} \
Output :
Estimated coefficients:
b_0 = -0.0586206896552
b_1 = 1.45747126437
Graph:
MATPLOTLIB
Draw a line in a diagram from position (1, 3) to (2, 8) then to (6, 1) and
finally to position (8, 10):
plt.plot(xpoints, ypoints)
plt.show()
Draw a line diagram to plot from (1, 3) to (8, 10), we have to pass two
arrays [1, 8] and [3, 10] to the plot function.
Markers
Draw a line diagram with marker to plot from (1, 3) to (8, 10), we have to
pass two arrays [1, 8] and [3, 10] to the plot function.
import numpy as np
import matplotlib.pyplot as plt
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
plt.plot(x, y)
plt.title("Sports Watch Data")
plt.xlabel("Average Pulse")
plt.ylabel("Calorie Burnage")
plt.show()
Set Font Properties for Title and Labels
You can use the fontdict parameter in xlabel(), ylabel(), and title() to set font
properties for the title and labels.
Example
import numpy as np
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
font1 = {'family':'serif','color':'blue','size':20}
font2 = {'family':'serif','color':'darkred','size':15}
plt.title("Sports Watch Data", fontdict = font1)
plt.plot(x, y)
plt.show()
Matplotlib Scatter
With Pyplot, you can use the scatter() function to draw a scatter plot.
The scatter() function plots one dot for each observation. It needs two arrays
of the same length, one for the values of the x-axis, and one for values on
the y-axis:
import matplotlib.pyplot as plt
import numpy as np
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
plt.scatter(x, y)
plt.show()
ColorMap
Creating Bars
With Pyplot, you can use the bar() function to draw bar graphs:
Histogram
A histogram is a graph showing frequency distributions.
The hist() function will use an array of numbers to create a histogram, the
array is sent into the function as an argument.
import matplotlib.pyplot as plt
import numpy as np
x = np.random.normal(170, 10, 250)
plt.hist(x)
plt.show()
Explode
The explode parameter, if specified, and not None, must be an array with one
value for each wedge.
Each value represents how far from the center each wedge is displayed:
Legend
To add a list of explanation for each wedge, use the legend() function:
import pandas as pd
# Create a DataFrame
data = { 'Name': ['John', 'Emma', 'Sam', 'Lisa', 'Tom'], 'Age': [25, 30, 28, 32, 27], 'Country':
['USA', 'Canada', 'Australia', 'UK', 'Germany'], 'Salary': [50000, 60000, 55000, 70000, 52000]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
# Selecting columns
name_age = df[['Name', 'Age']]
print("\nName and Age columns:")
print(name_age)
# Filtering rows
filtered_df = df[df['Country'] == 'USA']
print("\nFiltered DataFrame (Country = 'USA'):")
print(filtered_df)
# Sorting by a column
sorted_df = df.sort_values('Salary', ascending=False)
print("\nSorted DataFrame (by Salary in descending order):")
print(sorted_df)
# Aggregating data
average_salary = df['Salary'].mean() print("\nAverage Salary:", average_salary)
# Adding a new column
df['Experience'] = [3, 6, 4, 8, 5]
print("\nDataFrame with added Experience column:")
print(df)
# Updating values
df.loc[df['Name'] == 'Emma', 'Salary'] = 65000
print("\nDataFrame after updating Emma's Salary:")
print(df)
# Deleting a column df = df.drop('Experience', axis=1)
print("\nDataFrame after deleting Experience column:")
print(df)