0% found this document useful (0 votes)
30 views21 pages

Good Python Practices

The document provides best practices for writing clean and readable Python code. It recommends assigning meaningful names to variables, avoiding duplication, using underscores to ignore unused values, splitting functions that do multiple tasks into smaller single-purpose functions, and other tips for writing efficient and well-structured Python code.

Uploaded by

rhenancfdn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
30 views21 pages

Good Python Practices

The document provides best practices for writing clean and readable Python code. It recommends assigning meaningful names to variables, avoiding duplication, using underscores to ignore unused values, splitting functions that do multiple tasks into smaller single-purpose functions, and other tips for writing efficient and well-structured Python code.

Uploaded by

rhenancfdn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 21

Efficient Python Tricks and Tools for

Data Scientists - By Khuyen Tran

Good Python Practices


GitHub View on GitHub Book View Book

This section includes some best practices to write Python code.


Write Meaningful Names
It is a bad practice to use vague names such as x, y, z in your Python
code since they don't give you any information about their roles in the
code.

x = 10
y = 5
z = x + y

Write declarative variables names instead. You can also add type hints to
make the types of these variables more obvious.

num_members: int = 10
num_guests: int = 5
sum_: int = num_members + num_guests
Assign Names to Values
It can be confusing for others to understand the roles of some values in
your code.

circle_area = 3.14 * 5**2

Thus, it is a good practice to assign names to your variables to make


them readable to others.

PI = 3.14
RADIUS = 5

circle_area = PI * RADIUS**2
Avoid Duplication in Your Code
While writing code, we should avoid duplication because:

It is redundant
If we make a change to one piece of code, we need to remember to
make the same change to another piece of code. Otherwise, we will
introduce bugs into our code.

In the code below, we use the filter X['date'] > date(2021, 2, 8)


twice. To avoid duplication, we can assign the filter to a variable, then
use that variable to filter other arrays.

import pandas as pd
from datetime import date

df = pd.DataFrame({'date': [date(2021, 2, 8),


date(2021, 2, 9), date(2021, 2, 10)],
'val1': [1,2,3], 'val2': [0,1,0]})
X, y = df.iloc[:, :1], df.iloc[:, 2]

# Instead of this
subset_X = X[X['date'] > date(2021, 2, 8)]
subset_y = y[X['date'] > date(2021, 2, 8)]
# Do this
filt = df['date'] > date(2021, 2, 8)
subset_X = X[filt]
subset_y = y[filt]
Underscore(_): Ignore Values That Will Not Be
Used
When assigning the values returned from a function, you might want to
ignore some values that are not used in future code. If so, assign those
values to underscores _.

def return_two():
return 1, 2

_, var = return_two()
var

2
Underscore “_”: Ignore The Index in Python For
Loops
If you want to repeat a loop a specific number of times but don’t care
about the index, you can use _.

for _ in range(5):
print('Hello')

Hello
Hello
Hello
Hello
Hello
Python Pass Statement
If you want to create code that does a particular thing but don’t know
how to write that code yet, put that code in a function then use pass.

Once you have finished writing the code in a high level, start to go back
to the functions and replace pass with the code for that function. This
will prevent your thoughts from being disrupted.

def say_hello():
pass

def ask_to_sign_in():
pass

def main(is_user: bool):


if is_user:
say_hello()
else:
ask_to_sign_in()

main(is_user=True)
Stop using = operator to create a copy of a
Python list. Use copy method instead
When you create a copy of a Python list using the = operator, a change in
the new list will lead to the change in the old list. It is because both lists
point to the same object.

l1 = [1, 2, 3]
l2 = l1
l2.append(4)

l2

[1, 2, 3, 4]

l1

[1, 2, 3, 4]
Instead of using = operator, use copy() method. Now your old list will
not change when you change your new list.

l1 = [1, 2, 3]
l2 = l1.copy()
l2.append(4)

l2

[1, 2, 3, 4]

l1

[1, 2, 3]
deepcopy: Copy a Nested Object
If you want to create a copy of a nested object, use deepcopy. While
copy creates a shallow copy of the original object, deepcopy creates a
deep copy of the original object. This means that if you change the
nested children of a shallow copy, the original object will also change.
However, if you change the nested children of a deep copy, the original
object will not change.

from copy import deepcopy

l1 = [1, 2, [3, 4]]


l2 = l1.copy() # Create a shallow copy

l2[0] = 6
l2[2].append(5)
l2

[6, 2, [3, 4, 5]]


# [3, 4] becomes [3, 4, 5]
l1

[1, 2, [3, 4, 5]]

l1 = [1, 2, [3, 4]]


l3 = deepcopy(l1) # Create a deep copy

l3[2].append(5)
l3

[1, 2, [3, 4, 5]]

# l1 stays the same


l1

[1, 2, [3, 4]]


Avoid Side Effects When Using List in a
Function
When using a Python list as an argument in a function, you might
inadvertently change its value.

For example, in the code below, using the append method ends up
changing the values of the original list.

def append_four(nums: list):


nums.append(4)
return nums

a = [1, 2, 3]
b = append_four(a)

[1, 2, 3, 4]
If you want to avoid this side effect, use copy with a list or deepcopy
with a nested list in a function.

def append_four(nums: list):


nums1 = nums.copy()
nums1.append(4)
return nums1

a = [1, 2, 3]
b = append_four(a)
a

[1, 2, 3]
Enumerate: Get Counter and Value While
Looping
Are you using for i in range(len(array)) to access both the index
and the value of the array? If so, use enumerate instead. It produces the
same result but it is much cleaner.

arr = ['a', 'b', 'c', 'd', 'e']

# Instead of this
for i in range(len(arr)):
print(i, arr[i])

0 a
1 b
2 c
3 d
4 e
# Use this
for i, val in enumerate(arr):
print(i, val)

0 a
1 b
2 c
3 d
4 e
Don't Use Multiple OR Operators. Use in
Instead
It is lengthy to write multiple OR operators. You can shorten your
conditional statement by using in instead.

a = 1

if a == 1 or a == 2 or a == 3:
print("Found one!")

Found one!

if a in [1, 2, 3]:
print("Found one!")

Found one!
A Function Should Only Do One Task
A function should do only one task, not multiple tasks. The function
process_data tries to do multiple tasks such as adding new features,
adding one, and taking a sum of all columns. Using comments helps
explain each block of code, but it takes a lot of work to keep the
comments up-to-date. It is also difficult to test each unit of code inside a
function.

import pandas as pd

data = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})

def process_data(df: pd.DataFrame):


# Create a copy
data = df.copy()

# Add new features


data["c"] = [1, 1, 1]

# Add 1
data["a"] = data["a"] + 1

# Sum all columns


data["sum"] = data.sum(axis=1)
return data

process_data(data)

a b c sum

0 2 4 1 7

1 3 5 1 9

2 4 6 1 11

A better practice is to split the function process_data into smaller


functions that do only one thing. In the code below, I split the function
process_data into 4 different functions and apply these functions to a
pandas DataFrame in order using pipe.

def create_a_copy(df: pd.DataFrame):


return df.copy()

def add_new_features(df: pd.DataFrame):


df["c"] = [1, 1, 1]
return df

def add_one(df: pd.DataFrame):


df["a"] = df["a"] + 1
return df
def sum_all_columns(df: pd.DataFrame):
df["sum"] = df.sum(axis=1)
return df

(data
.pipe(create_a_copy)
.pipe(add_new_features)
.pipe(add_one)
.pipe(sum_all_columns)
)

a b c sum

0 2 4 1 7

1 3 5 1 9

2 4 6 1 11
Avoid Using Flags as a Function's Parameters
A function should only do one thing. If flags are used as a function's
parameters, the function is doing more than one thing.

def get_data(is_csv: bool, name: str):


if is_csv:
df = pd.read_csv(name + '.csv')
else:
df = pd.read_pickle(name + '.pkl')
return df

When you find yourself using flags as a way to run different code,
consider splitting your function into different functions.

def get_csv_data(name: str):


return pd.read_csv(name + '.csv')

def get_pickle_data(name: str):


return pd.read_pickle(name + '.pkl')

You might also like