Good Python Practices

Efficient Python Tricks and Tools for
Data Scientists - By Khuyen Tran
Good Python Practices

GitHub View on GitHub Book View Book
This section includes some best practices to write Python code.

Write Meaningful Names
It is a bad practice to use vague names such as x, y, z in your Python
code since they don't give you any information about their roles in the
code.
x = 10
y = 5
z = x + y
Write declarative variables names instead. You can also add type hints to
make the types of these variables more obvious.
num_members: int = 10
num_guests: int = 5
sum_: int = num_members + num_guests
Assign Names to Values
It can be confusing for others to understand the roles of some values in
your code.
circle_area = 3.14 * 5**2
Thus, it is a good practice to assign names to your variables to make

them readable to others.
PI = 3.14
RADIUS = 5
circle_area = PI * RADIUS**2
Avoid Duplication in Your Code
While writing code, we should avoid duplication because:
It is redundant
If we make a change to one piece of code, we need to remember to
make the same change to another piece of code. Otherwise, we will
introduce bugs into our code.
In the code below, we use the filter X['date'] > date(2021, 2, 8)

twice. To avoid duplication, we can assign the filter to a variable, then
use that variable to filter other arrays.
import pandas as pd
from datetime import date
df = pd.DataFrame({'date': [date(2021, 2, 8),

date(2021, 2, 9), date(2021, 2, 10)],
'val1': [1,2,3], 'val2': [0,1,0]})
X, y = df.iloc[:, :1], df.iloc[:, 2]
# Instead of this
subset_X = X[X['date'] > date(2021, 2, 8)]
subset_y = y[X['date'] > date(2021, 2, 8)]
# Do this
filt = df['date'] > date(2021, 2, 8)
subset_X = X[filt]
subset_y = y[filt]
Underscore(_): Ignore Values That Will Not Be
Used
When assigning the values returned from a function, you might want to
ignore some values that are not used in future code. If so, assign those
values to underscores _.
def return_two():
return 1, 2
_, var = return_two()
var
2
Underscore “_”: Ignore The Index in Python For
Loops
If you want to repeat a loop a specific number of times but don’t care
about the index, you can use _.
for _ in range(5):
print('Hello')
Hello
Hello
Hello
Hello
Hello
Python Pass Statement
If you want to create code that does a particular thing but don’t know
how to write that code yet, put that code in a function then use pass.
Once you have finished writing the code in a high level, start to go back
to the functions and replace pass with the code for that function. This
will prevent your thoughts from being disrupted.
def say_hello():
pass
def ask_to_sign_in():
pass
def main(is_user: bool):

if is_user:
say_hello()
else:
ask_to_sign_in()
main(is_user=True)
Stop using = operator to create a copy of a
Python list. Use copy method instead
When you create a copy of a Python list using the = operator, a change in
the new list will lead to the change in the old list. It is because both lists
point to the same object.
l1 = [1, 2, 3]
l2 = l1
l2.append(4)
l2
[1, 2, 3, 4]
l1
[1, 2, 3, 4]
Instead of using = operator, use copy() method. Now your old list will
not change when you change your new list.
l1 = [1, 2, 3]
l2 = l1.copy()
l2.append(4)
l2
[1, 2, 3, 4]
l1
[1, 2, 3]
deepcopy: Copy a Nested Object
If you want to create a copy of a nested object, use deepcopy. While
copy creates a shallow copy of the original object, deepcopy creates a
deep copy of the original object. This means that if you change the
nested children of a shallow copy, the original object will also change.
However, if you change the nested children of a deep copy, the original
object will not change.
from copy import deepcopy
l1 = [1, 2, [3, 4]]

l2 = l1.copy() # Create a shallow copy
l2[0] = 6
l2[2].append(5)
l2
[6, 2, [3, 4, 5]]

# [3, 4] becomes [3, 4, 5]
l1
[1, 2, [3, 4, 5]]
l1 = [1, 2, [3, 4]]

l3 = deepcopy(l1) # Create a deep copy
l3[2].append(5)
l3
[1, 2, [3, 4, 5]]
# l1 stays the same

l1
[1, 2, [3, 4]]

Avoid Side Effects When Using List in a
Function
When using a Python list as an argument in a function, you might
inadvertently change its value.
For example, in the code below, using the append method ends up
changing the values of the original list.
def append_four(nums: list):

nums.append(4)
return nums
a = [1, 2, 3]
b = append_four(a)
[1, 2, 3, 4]
If you want to avoid this side effect, use copy with a list or deepcopy
with a nested list in a function.
def append_four(nums: list):

nums1 = nums.copy()
nums1.append(4)
return nums1
a = [1, 2, 3]
b = append_four(a)
a
[1, 2, 3]
Enumerate: Get Counter and Value While
Looping
Are you using for i in range(len(array)) to access both the index
and the value of the array? If so, use enumerate instead. It produces the
same result but it is much cleaner.
arr = ['a', 'b', 'c', 'd', 'e']
# Instead of this
for i in range(len(arr)):
print(i, arr[i])
0 a
1 b
2 c
3 d
4 e
# Use this
for i, val in enumerate(arr):
print(i, val)
0 a
1 b
2 c
3 d
4 e
Don't Use Multiple OR Operators. Use in
Instead
It is lengthy to write multiple OR operators. You can shorten your
conditional statement by using in instead.
a = 1
if a == 1 or a == 2 or a == 3:
print("Found one!")
Found one!
if a in [1, 2, 3]:
print("Found one!")
Found one!
A Function Should Only Do One Task
A function should do only one task, not multiple tasks. The function
process_data tries to do multiple tasks such as adding new features,
adding one, and taking a sum of all columns. Using comments helps
explain each block of code, but it takes a lot of work to keep the
comments up-to-date. It is also difficult to test each unit of code inside a
function.
import pandas as pd
data = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
def process_data(df: pd.DataFrame):

# Create a copy
data = df.copy()
# Add new features

data["c"] = [1, 1, 1]
# Add 1
data["a"] = data["a"] + 1
# Sum all columns

data["sum"] = data.sum(axis=1)
return data
process_data(data)
a b c sum
0 2 4 1 7
1 3 5 1 9
2 4 6 1 11
A better practice is to split the function process_data into smaller

functions that do only one thing. In the code below, I split the function
process_data into 4 different functions and apply these functions to a
pandas DataFrame in order using pipe.
def create_a_copy(df: pd.DataFrame):

return df.copy()
def add_new_features(df: pd.DataFrame):

df["c"] = [1, 1, 1]
return df
def add_one(df: pd.DataFrame):

df["a"] = df["a"] + 1
return df
def sum_all_columns(df: pd.DataFrame):
df["sum"] = df.sum(axis=1)
return df
(data
.pipe(create_a_copy)
.pipe(add_new_features)
.pipe(add_one)
.pipe(sum_all_columns)
)
a b c sum
0 2 4 1 7
1 3 5 1 9
2 4 6 1 11
Avoid Using Flags as a Function's Parameters
A function should only do one thing. If flags are used as a function's
parameters, the function is doing more than one thing.
def get_data(is_csv: bool, name: str):

if is_csv:
df = pd.read_csv(name + '.csv')
else:
df = pd.read_pickle(name + '.pkl')
return df
When you find yourself using flags as a way to run different code,
consider splitting your function into different functions.
def get_csv_data(name: str):

return pd.read_csv(name + '.csv')
def get_pickle_data(name: str):

return pd.read_pickle(name + '.pkl')

Good Python Practices

Uploaded by

Good Python Practices

Uploaded by

Efficient Python Tricks and Tools for

Data Scientists - By Khuyen Tran

Good Python Practices

This section includes some best practices to write Python code.

circle_area = 3.14 * 5**2

Thus, it is a good practice to assign names to your variables to make

In the code below, we use the filter X['date'] > date(2021, 2, 8)

df = pd.DataFrame({'date': [date(2021, 2, 8),

def main(is_user: bool):

from copy import deepcopy

l1 = [1, 2, [3, 4]]

[6, 2, [3, 4, 5]]

[1, 2, [3, 4, 5]]

l1 = [1, 2, [3, 4]]

[1, 2, [3, 4, 5]]

# l1 stays the same

[1, 2, [3, 4]]

def append_four(nums: list):

def append_four(nums: list):

arr = ['a', 'b', 'c', 'd', 'e']

data = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})

def process_data(df: pd.DataFrame):

# Add new features

# Sum all columns

A better practice is to split the function process_data into smaller

def create_a_copy(df: pd.DataFrame):

def add_new_features(df: pd.DataFrame):

def add_one(df: pd.DataFrame):

def get_data(is_csv: bool, name: str):

def get_csv_data(name: str):

def get_pickle_data(name: str):

You might also like