Python+Deep+Dive+2
Python+Deep+Dive+2
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
What this course is about
m y
the Python language
d e
c a
à canonical CPython 3.6+ implementation
A
the standard library
t ©
i g h and the standard library
p yr
C o
this is NOT an introductory course
coding videos
Ac
y te
Jupyter notebooks
th B
projects and solutions
M a
t ©
github repository for all code
i g h
yr
https://github.com/fbaptiste/python-deepdive
o p
C
Sequence Types
m y
d e
a
what are sequences?
slicing à ranges
Ac
y te
B
shallow vs deep copy
h t
i g
list comprehensions
yr
à closures
o p
sorting à sort key functions
C
Iterables and Iterators
m y
more general than sequence types
d e
c a
differences between iterables and iterators
te A
B y
h
lazy vs eager iterables
a t
the iterable protocol
© M
h t
the iterator protocol
yr i g
writing our own custom iterables and iterators
o p
C
Generators
m y
what are generator?
d e
c a
generator functions
te A
generator expressions
B y
a th
M
the yield statement
t ©
h
the yield from statement
yr i g
how generators are related to iterators
o p
C
Iteration Tools
m y
Many useful tools for functional approach to iteration
d e
à built-in
c a à itertools module
Aggregators
te A à functools module
B y
h
Slicing iterables
a t
M
Selection and filtering
t
Infinite iterators
©
i g h
yr
Mapping and reducing
o p
Grouping
C Combinatorics
Context Managers
m y
what are context managers?
d e
c a
the context manager protocol
te A
B y
why are they so useful?
a th
© M
creating custom context managers using the context manager protocol
h t
yr i g
creating custom context managers using generator functions
o p
C
Projects
m y
d e
a
project after each section
Ac
te
should attempt these yourself first – practice makes perfect!
B y
th
solution videos and notebooks provided
a
M
à my approach
©
h t à more than one approach possible
yr i g
o p
C
Extras
m y
will keep growing over time
d e
c a
important new features of Python 3.6 and later
te A
B y
best practices
a th
M
random collection of interesting stuff
©
h t
g
additional resources
yr i
o p
send me your suggestions!
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Python 3: Deep Dive (Part 2) - Prerequisites
This course assumes that you have in-depth knowledge of the following:
m y
d e
functions and function arguments
c a
def my_func(p1, p2, *args, k1=None, **kwargs)
te A
packing and unpacking iterables
B y
my_func(*my_list)
a th
f, *_, l = (1, 2, 3, 4, 5)
closures
decorators
h t @my_decorator @my_decorator(p1, p2)
yr i g
p
Boolean truth values bool(obj)
C o
named tuples namedtuple('Data', 'field_1 field_2')
== vs is id(obj)
Python 3: Deep Dive (Part 2) - Prerequisites
This course assumes that you have in-depth knowledge of the following:
m y
d e
zip zip(list1, list2, list3)
c a
map map(lambda x: x**2, my_list)
te A
reduce
B y
reduce(lambda x, y: x * y, my_list, 10)
filter
a th
filter(lambda p: p. age > 18, persons)
sorted
© M
sorted(persons, lambda p: p.name.lower())
h t
imports
i g
import math
yr
o p from math import sqrt, sin
You should have a basic understanding of creating and using classes in Python
m y
d e
class Person:
c a
A
def __init__(self, name, age):
te
self.name = name
y
self.age = age
@property
th B
def age(self):
return self._age
M a
t ©
h
@age.setter
i g
def age(self, age):
yr
if value <= 0:
p
raise ValueError('Age must be greater than 0')
C oelse:
self._age = age
Python 3: Deep Dive (Part 2) - Prerequisites
y
You should understand how special functionality is implemented in Python using special methods
class Point:
e m
def __init__(self, x, y):
self.x = x
c a d
A
self.y = y
def __repr__(self):
y te
B
return f'Point(x={self.x}, y={self.y})'
h
yr i g
def __gt__(self, other):
if not isinstance(other, Point):
o p
return NotImplemented
C
else:
return self.x ** 2 + self.y ** 2 > other.x**2 + other.y**2
m y
d e
a
for loops, while loops break continue else
Ac
te
branching if … elif… else…
B y
exception handling try:
a th
my_func()
h t finally:
g
cleanup()
yr i
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
This course is all about the Python language and the standard library
a
https://github.com/fbaptiste/python-deepdive
M
t ©
To follow along you will therefore need:
i g h
yr
CPython 3.6 or higher
p
Jupyter Notebook
o
CYou favorite Python editor: VSCode, PyCharm, command line + VIM/Nano/…
indexing starting at 0
e m
c a
slices include lower bound index, but exclude upper bound index d
te A
y
slicing
slice objects
th B
M a
modifying mutable sequences
t ©
h
copying sequences – shallow and deep
o p
C
sorting
list comprehensions
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
What is a sequence?
m y
In Math: S = x 1, x 2, x 3, x 4, … (countable sequence)
d e
c a
A
Note the sequence of indices: 1, 2, 3, 4, …
y te
th B
We can refer to any item in the sequence by using it's index number x2 or S[2]
M a
So we have a concept of the first element, the second element, and so on… à positional ordering
t ©
g h
Python lists have a concept of positional order, but sets do not
i
A list is a sequence type
yr
A set is not
o p
C
In Python, we start index numbers at 0, not 1
S = x 0, x 1, x 2, x 3, … à
(we'll see why later)
m y
mutable lists bytearrays
d e
strings
c a
A
immutable tuples range bytes
y te
more limited than lists, strings and tuples
th B
in reality a tuple is more than just a sequence type
M a
Additional Standard Types:
t ©
collections package namedtuple
i g h deque
C o
Homogeneous vs Heterogeneous Sequences
m y
Strings are homogeneous sequences
d e
each element is of the same type (a character)
c a
'python'
te A
B y
h
Lists are heterogeneous sequences
© M
h t
i g
Homogeneous sequence types are usually more efficient (storage wise at least)
yr
o p
e.g. prefer using a string of characters, rather than a list or tuple of characters
C
Iterable Type vs Sequence Type
m y
What does it mean for an object to be iterable?
d e
c a
it is a container type of object and we can list out the elements in that object one by one
te A
So any sequence type is iterable
B y
l = [1, 2, 3] for e in l
a th
l[0]
© M
h t
yr i g
But an iterable is not necessarily a sequence type à iterables are more general
o p
s = {1, 2, 3} for e in s
C s[0]
Standard Sequence Methods
Built-in sequence types, both mutable and immutable, support the following methods
m y
x in s s1 + s2 concatenation
d e
x not in s s * n (or n * s)
c
(n an integer)
a repetition
min(s)
te A
y
len(s) (if an ordering between elements of s is defined)
max(s)
th B
a
This is not the same as the ordering (position) of elements
M
inside the container, this is the ability to compare pairwise
elements using an order comparison (e.g. <, <=, etc.)
t ©
s.index(x)
i g h
index of first occurrence of x in s
p
s.index(x, i)
yr index of first occurrence of x in s at or after index i
C o
s.index(x, i, j) index of first occurrence of x in s at or after index i and before index j
Standard Sequence Methods
B y
extended slice from index i, to (but not including) j, in steps of k
a th
Note that slices will return in the same container type
© M
t
We will come back to slicing in a lot more detail in an upcoming video
i g h
yr
range objects are more restrictive:
o p
no concatenation / repetition
C
min, max, in, not in not as efficient
Hashing
m y
Immutable sequence types may support hashing hash(s)
d e
but not if they contain mutable types!
c a
te A
B y
We'll see this in more detail when we look at Mapping Types
a th
© M
h t
yr i g
o p
C
Review: Beware of Concatenations
x = [1, 2] a = x + x a à [1, 2, 1, 2]
m y
d e
c a
x = 'python' a = x + x
A
a à 'pythonpython'
te
B y
x = [ [0, 0] ] a = x + x
th
a à [ [0, 0], [0, 0] ]
a
© M a[0] is x[0]
h t a[1] is x[0]
id(x[0])
yr
==
i g id(a[0]) == id(a[1])
o p
C
a[0][0] = 100 a à [ [100, 0], [100, 0] ]
Review: Beware of Repetitions
a = [1, 2] * 2 a à [1, 2, 1, 2]
m y
d e
a
a = 'python' * 2 a à 'pythonpython'
t ©
i g h
a[0][0] = 100
C o
Same happens here, but because strings are immutable it's quite safe
y
0xFF255
Eric
m
names = ['Eric', 'John'] names
John
d e
c a
Eric 0xAA2345
names = names + ['Michael']
te A
John
Michael
© M
h t Eric
Eric
0xFF255
yr i g
names = ['Eric', 'John'] names John
John
Michael
p
names.append('Michael')
C o
Mutating Using []
m y
e
s[i] = x element at index i is replaced with x
s[i:j] = s2
c a
slice is replaced by the contents of the iterable s2 d
del s[i] removes element at index i
te A
B y
h
del s[i:j] removes entire slice
a t
© M
We can even assign to extended slices: s[i:j:k] = s2
h t
yr i g
We will come back to mutating using slicing in a lot more detail in an
o p
upcoming video
C
Some methods supported by mutable sequence types such as lists
m y
d e
s.append(x) appends x to the end of s
c a
s.insert(i, x) inserts x at index i
te A
B y
s.extend(iterable)
th
appends contents of iterable to the end of s
a
s.pop(i)
© M
removes and returns element at index i
h t
s.remove(x)
yr i g
removes the first occurrence of x in s
s.reverse()
o p does an in-place reversal of elements of s
C
s.copy() returns a shallow copy
and more…
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Valid Questions
a th
M
We want to determine how we should handle sequences of consecutive integers
©
t
à represent positions of elements in a sequence
h
yr i g
p
['a', 'b', 'c', 'd']
o
C 1
0
2
1
3
2
4
3
Slice Bounds
c
a) 1 <= n <= 15
te A
b) 0 < n <= 15
B y
c) 1 <= n < 16
a th
d)
M
0 < n < 16
t ©
h
(b) and (d) can become odd at times.
yr i g
Suppose we want to describe the unsigned integers 0, 1, 2, …, 10
o p
C
Using (b) or (d) we would need to use a signed integer for the lower bound:
b) -1 < n <= 10
d) -1 < n < 11
Now consider this sequence: 2, 3, …, 16
a) 2 <= n <= 16
m y
c) 2 <= n < 17
d e
c a
How many elements are in this sequence? 15
te A
B y
th
Calculating number of elements from bounds in (a) and (c)
a
a) 15 = 16 – 2 + 1
© M
# = upper – lower + 1
h t
g
c) 15 = 17 – 2
i
# = upper – lower
p yr
C o
So, (c) seems simpler for that calculation
We'll get to a second reason in a bit, but for now we'll use convention (c)
Starting Indexing at 0 instead of 1
c a
2, 3, 4, …, 16
te A
sequence length: 15
index n (1 based) 1, 2, 3, …, 15
B y
1 <= n < 16 upper bound = length + 1
index n (0 based) 0, 1, 2, …, 14
a th 0 <= n < 15 upper bound = length
© M
h t
For any sequence s, the index range is given by:
yr i g
0 based: 0 <= n < len(s)
o p
1 based: 1 <= n < len(s) + 1
C
So, 0 based appears simpler
Another reason for choosing 0 based indexing
te A
0 based 0, 1, 2, 3, …, 25
B y
How many elements come before d?
a th
3 elements
1 based index(d) à 4
© M 4-1 elements
h t
g
0 based index(d) à 3 3 elements
yr i
o p
C
So, using 0 based indexing, the number of elements that precede an element
at some index
à is the index itself
Summarizing so far…
m y
describing ranges of indices using range(l, u) à l <= n < u
d e
c a
we have the following results
te A
the indices of any sequence s are given by:
y
range(0, len(s))
B
[0 <= n < len(s)]
first index: 0 last index:
a thlen(s)-1
© M n
h t
i g
the length of a range(l, u) is given by:
yr
l - u
o
1
p 2
s = [a, b, c, …, z]
25
len(s) à 26
C
indices à range(0, 26)
n elements precede s[n]
Slices
Because of the conventions on starting indexing at 0 and defining ranges using [lower, upper)
m y
we can think of slicing in these terms:
d e
a
inclusive exclusive
Each item in a sequence is like a box, with the indices between the boxes:
Ac
y te
B
6 is the length of the sequence
h
a b c d e f
0 1 2 3 4 5 6
a t
© M
s[2:4]
t
à [c, d]
h
yr i g
p
First 2 elements: s[0:2] s[:2]
C
Everything else:
o s[2:6]
s[:k] s[k:]
with k elements in the first subsequence:
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Why copy sequences?
m y
d
Sometimes you want to make sure that whatever sequence you are working with cannot
e
c a
be modified, either inadvertently by yourself, or by 3rd party functions
te A
We saw an example of this earlier with list concatenations and repetitions.
yr
new_list à [30, 20, 10]
p
s à [30, 20, 10]
C o
Soapbox def reverse(s):
s.reverse()
return s
Generally we write functions that do not modify the contents of their arguments.
m y
d e
a
But sometimes we really want to do so, and that's perfectly fine à in-place methods
Ac
However, to clearly indicate to the caller that something is happening in-place, we should not
te
return the object we modified
B y
If we don't return s in the above example, the caller will probably wonder why not?
a th
So, in this case, the following would be a better approach:
def reverse(s):
© M
s.reverse()
h t
yr i g
and if we do not do in-place reversal, then we return the reversed sequence
o p
def reverse(s):
C
s2 = <copy of s>
s2.reverse()
return s2
How to copy a sequence
m y
Simple Loop cp = []
d e
a
for e in s: definitely non-Pythonic!
cp.append(e)
Ac
List Comprehension cp = [e for e in s]
y te
th B
The copy method
a
cp = s.copy() (not implemented in immutable types, such as tuples or strings)
M
Slicing
t ©
cp = s[0:len(s)] or, more simply cp = s[:]
i g h
The copy module
p yr
list()
C o list_2 = list(list_1)
Note: tuple_2 = tuple(tuple_1) and t[:] does not create a new tuple!
Watch out when copying entire immutable sequences
l1 = [1, 2, 3]
m y
l2 = list(l1) l2 à [1, 2, 3] id(l1) ≠ id(l2)
d e
c a
t1 = (1, 2, 3)
te A
B y
h
t2 = tuple(t1) t2 à (1, 2, 3) id(t1) = id(t2) same object!
a t
t1 = (1, 2, 3)
© M
t2 = t1[:]
h t
t2 à (1, 2, 3) id(t1) = id(t2) same object!
yr i g
o p
Same thing with strings, also an immutable sequence type
C
Since the sequence is immutable, it is actually OK to return the same sequence
Shallow Copies
m
Using any of the techniques above, we have obtained a copy of the original sequence
y
d e
s = [10, 20, 30]
c a
A
cp = s.copy()
te
cp[0] = 100 cp à [100, 20, 30] s à [10, 20, 30]
B y
th
Great, so now our sequence s will always be safe from unintended modifications?
a
Not quite…
© M
cp = s.copy()
h t
cp[0] = 'python'
o p
cp[1][0] = 100
C
cp à ['python', [100, 40] ] s à [ [10, 20], [100, 40] ]
Shallow Copies
What happened?
m y
d e
a
When we use any of the copy methods we saw a few slides ago, the copy essentially copies
all the object references from one sequence to another
Ac
te
s = [a, b] id(s) à 1000 id(s[0]) à 2000 id(s[1]) à 3000
B y
h
cp = s.copy() id(cp) à 5000 id(cp[0]) à 2000 id(cp[1]) à 3000
a t
M
When we made a copy of s, the sequence was copied, but it's elements point to the
©
t
same memory address as the original sequence elements
i g h
yr
The sequence was copied, but it's elements were not
o p
This is called a shallow copy
C
Shallow Copies
m y
e
s = [ 1, 2 ]
cp = s.copy()
c a d
0xF100 1 0xA100 2
A
0xA200
te
y
s
th B
cp.append(3) cp
0xF200
M a 3 0xA300
t ©
cp[1] = 3
i g h
p yr
If the elements of s are immutable, such as integers in this example,
o
then not really important
C
Shallow Copies
te A
y
cp = s.copy() s[0] cp[0] s[1] cp[1]
th B
a
cp[0][0] = 100
0xF100 [0, 0] 0xA100 [0, 0] 0xA200
© M s
h t
yr i g
s à [ [100, 0], [0, 0] ]
cp
0xF200
o p
C
Deep Copies
y
So, if collections contain mutable elements, shallow copies are not sufficient to ensure the copy
m
e
can never be used to modify the original!
h
s
s = [ [0, 0], [0, 0] ]
a t
cp = [e.copy() for e in s]
In this case:
h t cp
cp is a copy of s
yr i g
o p
but also, every element of cp is a copy of the corresponding element in s
C
shallow copy
Deep Copies
m
But what happens if the mutable elements of s themselves contain mutable elements?
y
d e
s = [ [ [0, 1], [2, 3] ], [ [4, 5], [6, 7] ] ]
c a
te A
B y
a th
We would need to make copies at least 3 levels deep to ensure a true deep copy
M
Deep copies, in general, tend to need a recursive approach
©
h t
yr i g
o p
C
Deep Copies
m y
e
Deep copies are not easy to do. You might even have to deal with circular references
b
c a d
a = [10, 20]
[a, 30]
te A
y
b = [a, 30]
a.append(b) a
[10,20,
[10, 20]b]
th B
M a
t ©
If you wrote your own deep copy algorithm, you would need to handle this circular reference!
i g h
p yr
C o
Deep Copies
m y
In general, objects know how to make shallow copies of themselves
d e
c a
built-in objects like lists, sets, and dictionaries do - they have a copy() method
te A
y
The standard library copy module has generic copy and deepcopy operations
t ©
i g h
Custom classes can implement the __copy__ and __deepcopy__ methods to allow you to
override how shallow and deep copies are made for you custom objects
p yr
o
We'll revisit this advanced topic of overriding deep copies of custom
C
classes in the OOP series of this course.
x
Deep Copies [10, 20] 0xA100
m y
def MyClass: .a
d e
def __init__(self, a):
c a
cp_shallow 0xF200
A
self.a = a
.a
B
cp_deep
x = [10, 20]
a th .a
M
copy of x (deep)
©
obj = MyClass(x) x is obj.a à True [10, 20] 0xA200
h t
g
cp_shallow = copy(obj) cp_shallow.a is obj.a à True
yr i
o p
cp_deep = deepcopy(obj) cp_deep.a is obj.a à False
C
Deep Copies lst[0]
x
def MyClass:
m y
e
def __init__(self, a): lst[1]
lst
self.a = a
c a
y
.a d
x = MyClass(500)
y
y = MyClass(x) y.a is x à True
B
but there is a relationship
lst = [x, y]
M
cp = deepcopy(lst)
h
cp[0] is x à False cp_x
yr
cp[1] is y à False
i g cp cp[1]
o p
cp[1].a is x à False cp_y
C
.a
m y
Slicing relies on indexing à only works with sequence types
d e
c a
Mutable Sequence Types Immutable Sequence Types
te A
B y
extract data
a th
extract data
assign data
© M
h t
Example
yr i g
l = [1, 2, 3, 4, 5] l[0:2] à [1, 2]
o p
C
l[0:2] = ('a', 'b', 'c')
a th s.start à 0
M
s.end à 2
l = [1, 2, 3, 4, 5]
t © l[s] à [1, 2]
i g h
p yr
This can be useful because we can name slices and use symbols
o
instead of a literal subsequently
m y
e
[i:j] start at i (including i) stop at j (excluding j)
yr
1
i g 2 3 4 5 6
o p [1:4]
C
Effective Start and Stop Bounds
Interestingly the following works: l = ['a', 'b', 'c', 'd', 'e', 'f']
m y
d e
a
l[3:100] à ['d', 'e', 'f'] No error!
Ac
te
we can specify slices that are "out of bounds"
B y
In fact, negative indices work too:
a t
l[-1]h à 'f'
h t
a
yr
b
i g c d e f
0
o p 1 2 3 4 5 6
C -6 -5 -4 -3 -2 -1
Step Value
m
d e
slice(i, j, k)
When not specified, the step value defaults to 1
c a
0 1 2 3 4
l = ['a', 'b', 'c', 'd', 'e', 'f']
5
te A
-6 -5 -4 -3 -2 -1
B y
a th
M
l[0:6:2] 0, 2, 4 à ['a', 'c', 'e']
t ©
h
l[1:6:3] 1, 4 à ['b', 'e']
yr i g
p
l[1:15:3] 1, 4 à ['b', 'e']
C
l[-1:-4:-1] o -1, -2, -3 à ['f', 'e', 'd']
Range Equivalence
m y
Any slice essentially defines a sequence of indices that is used to select elements for another
sequence
d e
c a
A
In fact, any indices defined by a slice can also be defined using a range
te
B y
The difference is that slices are defined independently of the sequence being sliced
a th
M
The equivalent range is only calculated once the length of the sequence being sliced is known
©
h t
g
Example
yr i
p
[0:100] l sequence of length 10 à range(0, 10)
y
The effective indices "generated" by a slice are actually dependent on the length of the
m
e
sequence being sliced
Python does this by reducing the slice using the following rules:
c a d
seq[i:j]
0 1
te A 2 3 4
l = ['a', 'b', 'c', 'd', 'e', 'f']
5
-6
B y -5 -4 -3 -2 -1
length = 6
if i > len(seq) à len(seq)
a th[0:100] à range(0, 6)
if j > len(seq) à len(seq)
© M
if i < 0
h t
à max(0, len(seq) + i) [-10:3] à range(0, 3)
if j < 0
yr i g
à max(0, len(seq) + j) [-5:3] à range(1, 3)
o p
i omitted or None
C
à 0 [:100] à range(0, 6)
m y
[i:j:k] = {x = i + n * k | 0 <= n < (j-i)/k}
d e
c a
stopping when j is reached or exceeded,
A
k > 0 the indices are: i, i+k, i+2k, i+3k, …, < j
te
but never including j itself
0
B y1 2 3 4
l = ['a', 'b', 'c', 'd', 'e', 'f']
5
a th -6 -5 -4 -3 -2 -1
length = 6
if i, j > len(seq) à len(seq)
© M [0:100:2] à range(0, 6, 2)
if i, j < 0
h t
à max(0, len(seq) + i/j) [-10:100:2] à range(0, 6, 2)
yr i g [-5:100:2] à range(1, 6, 2)
o p
i omitted or None à0 [:6:2] à range(0, 6, 2)
C
j omitted or None à len(seq) [1::2] à range(1, 6, 2)
[::2] à range(0, 6, 2)
so same rules as [i:j] – makes sense, since that would be the same as [i:j:1]
Transformations [i:j:k], k < 0
m y
k<0 the indices are: i, i+k, i+2k, i+3k, …, > j
d e
0 1
c
2
a 3 4 5
A
l = ['a', 'b', 'c', 'd', 'e', 'f']
te
-6 -5 -4 -3 -2 -1
y
length = 6
if i, j > len(seq) à len(seq) - 1
th B
[5:2:-1] à range(5, 2, -1)
if i, j < 0
©
à max(-1, len(seq) + i/j)
t
[5:-2:-1] à range(5, 4, -1)
C o
i omitted or None à len(seq) - 1 [:-2:-1] à range(5, 4, -1)
te
j > len(seq) len(seq) len(seq)-1
B y
i < 0 max(0, len(seq)+i)
a th max(-1, len(seq)+i)
j < 0
M
max(0, len(seq)+j)
©
max(-1, len(seq)+j)
i omitted / None
h t 0 len(seq)-1
j omitted / None
yr i g len(seq) -1
o p
C
0 1 2 3 4 5
Examples
l = ['a', 'b', 'c', 'd', 'e', 'f']
-6 -5 -4 -3 -2 -1
length = 6
m y
d e
[-10:10:1] -10 à 0
c a
10 à 6
te A
à range(0, 6)
B y
[10:-10:-1] 10 à 5
a th
M
-10 à max(-1, 6-10) à max(-1, -4) à -1
©
t
à range(5, -1, -1)
h
yr i g
We can of course easily define empty slices!
o p
C
[3:-1:-1] 3 à 3
-1 à max(-1, 6-1) à 5
à range(3, 5, -1)
Example
a t
© M
t
seq = 'python'
i g h
yr
seq[::-1] à 'nohtyp'
p
C o
If you get confused…
m
The slice object has a method, indices, that returns the equivalent range start/stop/step y
for any slice given the length of the sequence being sliced:
d e
c a
te
slice(start, stop, step).indices(length) à (start, stop, step)
A
B y
h
the values in this tuple can be used to generate a list of indices using the range function
a t
slice(10, -5, -1)
© M
with a sequence of length 6
o p à 5, 4, 3, 2
C
slice(10, -5, -1).indices(6) à (5, 1, -1)
list(range(*slice(10,-5,-1).indices(6))) à [5, 4, 3, 2]
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Creating our own Sequence types
m y
e
We will cover Abstract Base Classes later in this course, so we'll revisit this topic again
c a d
A
At it's most basic, an immutable sequence type should support two things:
th
given an index, returning the element at that index B
M a
t ©
If an object provides this functionality, then we should in theory be able to:
i g h
retrieve elements by index using square brackets []
p yr
iterate through the elements using Python's native looping mechanisms
m
Remember that sequence types are iterables, but not all iterables are sequence types y
d e
c a
A
Sequence types, at a minimum, implement the following methods:
y te
B
__len__ __getitem__
a th
© M
At its most basic, the __getitem__ method takes in a single integer argument – the index
h t
i g
However, it may also choose to handle a slice type argument
yr
o p
C
So how does this help when iterating over the elements of a sequence?
The __getitem__ method
m y
e
The __getitem__ method should return an element of the sequence based on the specified index
B
Python's list object implements the __getitem__ method: y
a th
M
my_list = ['a', 'b', 'c', 'd', 'e', 'f']
©
t
my_list.__getitem__(0) à 'a'
h
i g
my_list.__getitem__(1) à 'b'
yr
p
my_list.__getitem__(-1) à 'f'
C o
my_list.__getitem__(slice(None, None, -1))
my_list.__getitem__(-100) à IndexError
Ac
y te
th B
a
All we really need from this __getitem__ method is the ability to
M
©
return an element for a valid index
h t
raise an IndexError exception for an invalid index
yr i g
o p
Also remember, that sequence indices start at 0
C
i.e. we always know the index of the first element of the sequence
Implementing a for loop
m y
__getitem__(i) will return the element at index i
d e
c a
__getitem__(i) will raise an IndexError exception when i is out of bounds
te A
y
my_list = [0, 1, 2, 3, 4, 5]
th B
a
for item in my_list: index = 0
M
print(item ** 2)
while True:
t ©
try:
i g h item = my_list.__getitem__(index)
yr
except IndexError:
break
o p print(item ** 2)
C
index += 1
m y
In general sequence types support the Python built-in function len()
d e
c a
te A
To support this all we need to do is implement the __len__ method in our custom sequence type
B y
my_list = [0, 1, 2, 3, 4, 5]
a th
© M
h t
g
len(my_list) à 6
yr i
my_list.__len__() à 6
o p
C
Writing our own Custom Sequence Type
m y
to implement our own custom sequence type we should then implement:
d e
__len__
c a
__getitem__
te A
B y
At the very least __getitem__ should:
a th
M
return an element for a valid index [0, length-1]
©
t
raise an IndexError exception if index is out of bounds
i g h
yr
Additionally we can choose to support:
o p
negative indices i < 0 à i = length - i
C
slicing handle slice objects as argument to __getitem__
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Concatenation +
Let's use Python's list as an example
m y
d e
a
We can concatenate two lists together by using the + operator
Ac
te
This will create a new list combining the elements of both lists
B y
l1 = [1, 2, 3]
th
id(l1) = 0xFFF100
a
l2 = [4, 5, 6]
© M
id(l2) = 0xFFF200
h t
l1 = l1 + l2
yr i g
à [1, 2, 3, 4, 5, 6] id(l1) = 0xFFF300
o p
C
In-Place Concatenation +=
te A
B y
h
it's true for numbers, strings, tuples à in general, true for immutable types
a t
M
but not lists!
t ©
h
l1 = [1, 2, 3] id(l1) = 0xFFF100
yr i g
l2 = [4, 5, 6]
o p id(l2) = 0xFFF200
C
l1 += l2 à [1, 2, 3, 4, 5, 6] id(l1) = 0xFFF100
m y
d e
t += t1 has the same effect as t = t + t1
c a
te A
y
Since t is immutable, += does NOT perform in-place concatenation
B
a th
Instead it creates a new tuple that concatenates the two tuples and returns the new object
© M
id(t1) = 0xFFF100
t
t1 = (1, 2, 3)
i g h
yr
t2 = (4, 5, 6) id(t2) = 0xFFF200
o p
C
t1 += t2 à (1, 2, 3, 4, 5, 6) id(t1) = 0xFFF300
In-Place Repetition *=
Ac
l1 = l1 * 2 à [1, 2, 3, 1, 2, 3]
te
id(l1) = 0xFFF200
y
th B
M a
But the in-place repetition operator works this way:
l1 = [1, 2, 3]
t ©
id(l1) = 0xFFF100
i g h
l1 *= 2
C o
the list was mutated
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Assigning Values via Indexes, Slices and Extended Slices
We have seen how we can extract elements from a sequence by using indexing, slicing, and
m y
extended slicing
d e
[i]
c a
[i:j] slice(i, j)
te A
B y
[i:j:k] slice (i, j, k) k≠1
M
Mutable sequences support assignment via a specific index
©
h t
and they also support assignment via slices
yr i g
p
The value being assigned via slicing and extended slicing must to be an iterable
C o
(any iterable, not just a sequence type)
Replacing a Slice
Ac
te
l = [1, 2, 3, 4, 5] l[1:3] à [2, 3]
B y
h
l[1:3] = (10, 20, 30) l à [1, 10, 20, 30, 4, 5]
l = [1, 2, 3, 4, 5]
yr i g l[0:4:2] à [1, 3]
o p
l[0:4:2] = [10, 30] l à [10, 2, 30, 4, 5]
C
The list l was mutated
Deleting a Slice
Ac
(extended slicing replacement needs same length)
M
The list l was mutated
t ©
i g h
p yr
C o
Insertions using Slices
m y
d e
a
The trick here is that the slice must be empty
otherwise it would just replace the elements in the slice
Ac
y te
l = [1, 2, 3, 4, 5] l[1:1] à []
th B
l[1:1] = 'abc'
M a
l[1:1] à [1, 'a', 'b', 'c', 2, 3, 4, 5]
t ©
The list l was mutated
i g h
p yr
o
Obviously this will also not work with extended slices
C
extended slice assignment requires both lengths to be the same
but for insertion we need the slice to be empty,
and the iterable to have some values
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Concatenation and In-Place Concatenation
When dealing with the + and += operators in the context of sequences
we usually expect them to mean concatenation
m y
d e
c
But essentially, it is just an overloaded definition of these operators
a
te A
We can overload the definition of these operators in our custom classes by using the methods:
__add__ __iadd__
B y
a th
M
In general (but not necessarily), we expect:
©
obj1 + obj2
h t
à obj1 and obj2 are of the same type
o p
obj1 += obj2 à obj2 is any iterable
__mul__ __imul__
B y
a th
M
In general (but not necessarily), we expect:
©
obj1 * n
h t
à n is a non-negative integer
o
obj1 *= np à n is a non-negative integer
We saw in an earlier lecture how we can implement accessing elements in a custom sequence type
m y
__getitem__ à seq[n]
d e
à seq[i:j]
c a
à seq[i:j:k]
te A
B y
th
We can handle assignments in a very similar way, by implementing
a
__setitem__
© M
There a few restrictions with assigning to slices that we have already seen (at least with lists):
h t
For any slice we could only assign an iterable
yr i g
For extended slices only, both the slice and the iterable must have the same length
o p
C
Of course, since we are implementing __setitem__ ourselves, we
could technically make it do whatever we want!
Additional Sequence Functions and Operators
m y
d e
__contains__ in
c a
__delitem__ del
te A
__rmul__ n * seq
B y
a th
M
The way Python works is that when it encounters an expression such as:
a + b
t © a * b
it first tries
i g
a.__add__(b)
h a.__mul__(b)
p yr
o
if a does not support the operation (TypeError), it then tries:
C b.__radd__(a) b.__rmul__(a)
Implementing append, extend, pop
m y
e
Actually there's nothing special going here.
c a d
A
If we want to, we can just implement methods of the same name (not special methods)
te
B y
and they can just behave the same way as we have seen for lists for example
a th
© M
h t
yr i g
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Sorting and Sort Keys
m y
d e
But we do have to consider the direction of the sort: ascending
c a
descending
a th
© M
The sorted() function has an optional keyword-only argument
t
called reverse which defaults to False
h
yr i g
If we set it to True, then the sort will sort in descending order
o p
C
But one really important thing we need to think about: ordering
m y
d e
a
'a', 'b', 'c'
y te
although is 'a' < 'A' or 'a' > 'A' or 'a' == 'A'
B
'hello', 'python', 'bird', 'parrot'
t ©
i g h
rectangle_1, rectangle_2, rectangle_3
p yr
o
When items are pairwise comparable
C
we can use that ordering to sort
(< or >)
d e
a
x à 120 ord('a') à 97
Ac
te
We now associate the ASCII numerical value with each character, and sort based on that value
B y
items 'b' 'x' 'a'
à
'a'
a th
'b' 'x'
keys 98 120 97
© M 97 98 120
yr
98
i g 65 97 88 120 49 63
o p
'1' '?' 'A' 'B' 'X' 'a' 'b' 'x'
à
C 49 63 65 66 88 97 98 120
You'll note that the sort keys have a natural sort order
Sorting and Sort Keys
Let's say we want to sort a list of Person objects based on their age
y
(assumes the Person
m
e
class has an age
p1.age
p2.age
à
à
30
15
c a d
property)
p3.age
p4.age
à
à
5
32 item p1 p2 p3 p4
te A p3 p2 p1 p4
keys 30 15 5
B y
32
à
h
5 15 30 32
a t
M
We could also generate the key value, for any given person, using a function
©
def key(p):
g
return p.age
yr i
p
sort [p1, p2, p3, p4]
o
C
using sort keys generated by the function key = lambda p: p.age
Sorting and Sort Keys
The sort keys need not be numerical à they just need to have a natural sort order
m y
(< or >)
d e
item 'hello' 'python' 'parrot' 'bird'
c a
keys 'o' 'n' 't' 'd'
te A
ß last character of each string
B y
à
'bird'
'd'
'python'
'n'
'hello'
'o'
a
't'
th
'parrot''
© M
h t
yr
key = lambda s: s[-1]
i g
o p
C
Python's sorted function
m y
d e
a
Optional keyword-only argument called key
Ac
te
if provided, key must be a function that for any given element in the sequence being sorted
h t
yr i g
If key is not provided, then Python will sort based on the natural ordering of the elements
o p
i.e. they must be pairwise comparable (<, > )
C
If the elements are not pairwise comparable, you will get an exception
Python's sorted function
m y
keyword-only
d e
a
The sorted function:
B
Python 2.3, 2002
• a stable sort
a th https://en.wikipedia.org/wiki/Timsort
© M
t
Side note: for the "natural" sort of elements, we can always think of the keys as the elements
h
g
themselves
yr i
o p
sorted(iterable) ßà sorted(iterable, key=lambda x: x)
C
Stable Sorts
A stable sort is one that maintains the relative order of items that have equal keys
m y
(or values if using natural ordering)
d e
c a
A
p1.age à 30
te
p2.age à 15
y
p3.age à 5
B
p4.age à 32
p5.age à 15
a th
M
sorted((p1, p2, p3, p4, p5), key=lambda p: p.age)
©
à [ p3 p2 p5
h t p1 p4 ]
yr i g
o p keys equal
m y
But that will depend on the particular type you are dealing with
d e
Python's list objects support in-place sorting
c a
te A
The list class has a sort() instance method that does in-place sorting
B y
l = [10, 5, 3, 2] id(l) à 0xFF42
a th
l.sort()
© M
t
l à [2, 3, 5, 10] id(l) à 0xFF42
i g h
yr
Compared to sorted()
•
o p
same TimSort algorithm
•
•
•
C
same keyword-only arg: key
same keyword-only arg: reverse (default is False)
in-place sorting, does not copy the data
• only works on lists (it's a method in the list class)
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Quick Recap
d e
a
goal à generate a list by transforming, and optionally filtering, another iterable
c
• start with some iterable
te A
other_list = ['this', 'is', 'a', 'parrot']
B y
h
• create empty new list new_list = []
a t
M
• iterate over the original iterable for item in other_list:
t
• skip over certain values (filter)
© if len(item) > 2:
i g h
• transform value and append to new list new_list.append(item[::-1])
p yr
C o
List comprehension:
new_list = [item[::-1] for item in other_list if len(item) > 2]
m y
If the comprehension expression gets too long, it can be split over multiple lines
d e
c a
between 1 and 100 that are not divisible by 2, 3 or 5
te A
For example, let's say we want to create a list of squares of all the integers
B y
a th
M
sq = [i**2 for i in range(1, 101) if i%2 and i%3 and i%5]
©
h t
We could write this over multiple lines:
yr i g
sq = [i**2
o p
C
for i in range(1, 101)
if i%2 and i%3 and i%5]
Comprehension Internals
Comprehensions have their own local scope – just like a function
m y
e
We should think of a list comprehension as being wrapped in a function that is created by
d
a
Python that will return the new list when executed
sq = [i**2 for i in range(10)]
Ac
RHS
y te
When the RHS is compiled:
th B
Python creates a temporary function def temp():
M
comprehension a
that will be used to evaluate the new_list = []
for i in range(10):
t © new_list.append(i**2)
h
return new_list
yr i g
When the line is executed: Executes temp()
We'll disassemble some Python code in the coding video to actually see this
Comprehension Scopes
m y
d e
a
They have their own local scope: [item ** 2 for item in range(100)]
Ac
local symbol
te
But they can access global variables:
# module1.py
B y
num = 100
a th global symbol
© M
sq = [item**2 for item in range(num)]
h t local symbol
yr i g
As well as nonlocal variables:
o p
def my_func(num):
nonlocal symbol
Closures!!
Nested Comprehensions
m y
d e
a
And since they are functions, a nested comprehension can access (nonlocal) variables from the
enclosing comprehension!
Ac
[ [i * j for j in range(5)] for i in range(5)]
y te
th B closure
a
nested comprehension local variable: j
© M free variable: i
t
outer comprehension
i g h
local variable: i
p yr
C o
Nested Loops in Comprehensions
te A
y
for i in range(5):
B
for j in range(5):
for k in range(5):
l.append((i, j, k))
a th
© M
h t
g
l = [(i, j, k) for i in range(5) for j in range(5) for k in range(5)]
yr i
o p
Note that the order in which the for loops are specified in the comprehension
C
correspond to the order of the nested loops
Nested Loops in Comprehensions
c
A
loops and if statements
te
won't work!
y
l = [] l = []
for i in range(5):
for j in range(5):
th
if i==j: B
for i in range(5):
if i==j:
l.append((i, j))
M a for j in range(5):
l.append((i, j))
t ©
i g h j is referenced after
yr
j is created here
it has been created
o p
l = [(i, j) for i in range(5) for j in range(5) if i == j]
C
l = [(i, j) for i in range(5) if i == j for j in range(5)]
won't work!
Nested Loops in Comprehensions
l = []
m y
e
for i in range(1, 6):
d
[(i, j)
if i%2 == 0:
a
for i in range(1, 6) if i%2==0
c
for j in range(1, 6):
for j in range(1, 6) if j%3==0]
if j%3 == 0:
l.append((i,j))
te A
l = []
B y
for i in range(1, 6):
for j in range(1, 6):
a th
[(i, j)
for i in range(1, 6)
M
for j in range(1, 6)
if i%2==0:
©
if i%2==0
if j%3 == 0:
t
if j%3==0]
h
l.append((i,j))
l = []
yr i g
p
[(i, j)
for i in range(1, 6):
C o
for j in range(1, 6):
if i%2==0 and j%3==0:
l.append((i,j))
for i in range(1, 6)
for j in range(1, 6)
if i%2==0 and j%3==0]
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Background Information
A regular strictly convex polygon is a polygon that has the following characteristics:
m y
d e
• all interior angles are less than 180°
c a
A
• all sides have equal length
y te
circumradius
h
vertex
t B
M aedge (side)
t ©
h
apothem
yr i g
o p
C interior angle
circumradius vertex
Background Information
edge
For a regular strictly convex polygon with R
m
s y
• n edges ( = n vertices)
d e apothem
• R circumradius
c a a
te A
interior angle = $ − 2 ×
!"#
B y
%
$
a th interior angle
M
edge length s = 2 ) sin
$
apothem a = ) ,-.
%
t ©
i g
$
h
yr
!
area = $ . /
p
&
o
perimeter = n s
C
Goal 1
m y
d e
a
Initializer
• number of edges/vertices
• circumradius
Ac
y te
th B
Properties
# edges M a
Functionality
• a proper representation (__repr__)
©
•
t
• # vertices • implements equality (==) based on #
• interior angle
yr
• edge length • implements > based on number of
• apothem vertices only (__gt__)
• area
o p
C
• perimeter
Goal 2
m y
d e
a
Initializer
• number of vertices for largest polygon in the sequence
• common circumradius for all polygons
Ac
y te
Properties
th B
a
• max efficiency polygon: returns the Polygon with the highest area : perimeter ratio
M
t ©
Functionality
i g h
yr
• functions as a sequence type (__getitem__)
p
• supports the len() function (__len__)
o
• has a proper representation (__repr__)
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
What is an iterable? Something fit for iterating over
y
à we'll see a more formal definition for Python's iterable protocol
m
d e
Already seen: Sequences and iteration
c a
More general concept of iteration
te A
Iterators à get next item, no indexes needed
B y à consumables
Iterables
a th
M
Consuming iterators manually
©
h t
Relationship between sequence types and iterators
yr i g Infinite Iterables
o p Lazy Evaluation
C Iterator Delegation
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Iterating Sequences
Ac
à iteration: __getitem__(0), __getitem__(1), etc
y te
th B
But iteration can be more general than based on sequential indexing
t
get next item
i g h
à no concept of ordering needed
yr
à just a way to get items out of the container one by one
C
happens is not required – but
can be
Example: Sets
Sets are unordered collections of items s = {'x', 'y', 'b', 'c', 'a'}
m y
d e
c a
Sets are not indexable s[0]
te A
y
à TypeError – 'set' object does not support indexing
th B
But sets are iterable
M a
©
y
for item in s: c
g
à x which the elements are returned in the
i
print(item)
yr
b iteration
p
a
C o
The concept of next
m y
For general iteration, all we really need is the concept of "get the next item" in the collection
d e
a
If a collection object implements a get_next_item method
Ac
te
we can get elements out of the collection, one get_next_item()
y
after the other, this way: get_next_item()
th B get_next_item()
M a
and we could iterate over the collection as follows:
for _ in range(10):
t ©
i g h
item = coll.get_next_item()
yr
print(item)
o p
But how do we know when to stop asking for the next item?
C
i.e. when all the elements of the collection have been returned
by calling get_next_item()?
à StopIteration built-in Exception
Attempting to build an Iterable ourselves
Let's try building our own class, which will be a collection of squares of integers
m y
We could make this a sequence, but we want to avoid the concept of indexing
d e
c a
In order to implement a next method, we need to know what we've already "handed out"
so we can hand out the "next" item without repeating ourselves
te A
B y
class Squares:
a th
def __init__(self):
self.i = 0
© M
h t
yr
def next_(self):
i g
result = self.i ** 2
o p
self.i += 1
C
return result
class Squares:
Iterating over Squares def __init__(self):
self.i = 0
sq = Squares()
m y
def next_(self):
d e
result = self.i ** 2
a
self.i += 1
for _ in range(5): 0
Ac return result
te
item = sq.next_() 1
y
print(item) à 4
9
16
th B
M a
There are a few issues:
t ©
i g h
yr
à the collection is essentially infinite
p
à cannot use a for loop, comprehension, etc
o
C
à we cannot restart the iteration "from the beginning"
Refining the Squares Class
m y
• we specify the size of the collection when we create the instance
d e
a
• we raise a StopIteration exception if next_ has been called too many times
c
class Squares: class Squares:
te A
def __init__(self):
self.i = 0 self.i = 0
B y
def __init__(self, length):
def next_(self):
th
self.length = length
a
M
result = self.i ** 2
def next_(self):
©
self.i += 1
t
return result if self.i >= self.length:
i g h raise StopIteration
yr
else:
result = self.i ** 2
o p self.i += 1
C
return result
class Squares:
Iterating over Squares instances def __init__(self, length):
self.i = 0
y
self.length = length
m
e
sq = Squares(5) create a collection of length 5
d
def next_(self):
a
if self.i >= self.length:
Ac raise StopIteration
else:
te
result = self.i ** 2
y
try: self.i += 1
item = sq.next_()
B
try getting the next item return result
print(item)
a th
except StopIteration:
M
catch the StopIteration exception à nothing left to iterate
©
break
t
break out of the infinite while loop – we're done iterating
h
yr i g
Output: 0
o
1 p
C 4
9
16
Python's next() function
d e
a
implementing the special method: __len__
te
for our custom type by
y
B
implementing the special method: __next__
class Squares:
a th
def __init__(self, length):
self.i = 0
© M
h
self.length = length
t
yr i g
def __next__(self):
o p
if self.i >= self.length:
raise StopIteration
C else:
result = self.i ** 2
self.i += 1
return result
Iterating over Squares instances
m y
e
sq = Squares(5) Output: 0
while True:
try:
1
4
c a d
A
item = next(sq) 9
te
print(item) 16
y
except StopIteration:
break
th B
M a
We still have some issues:
t ©
i g h
yr
• cannot iterate using for loops, comprehensions, etc
• once the iteration starts we have no way of re-starting it
o p
• and once all the items have been iterated (using next) the
C
object becomes useless for iteration à exhausted
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Where we're at so far…
Ac
à once we start using next there's no going back
à once we have reached StopIteration we're basically
te
done with the object
B y
Let's tackle the loop issue first
a th
We saw how to iterate using __next__, StopIteration, and a while loop
© M
t
This is actually how Python handles for loops in general
i g h
yr
Somehow, we need to tell Python that our class has that __next__
p
o
method and that it will behave in a way consistent with using a
C
while loop to iterate
A protocol is simply a fancy way of saying that our class is going to implement certain
m y
functionality that Python can count on
d e
c a
A
To let Python know our class can be iterated over using __next__ we implement the iterator protocol
te
B y
h
The iterator protocol is quite simple – the class needs to implement two methods:
a t
M
à __iter__ this method should just return the object (class instance) itself
©
sounds weird, but we'll understand why later
h t
à __next__
yr i g
this method is responsible for handing back the next
element from the collection and raising the
C
handed out
m y
__iter__ à just returns the object itself
d e
c a
A
__next__ à returns the next item from the container, or raises SopIteration
y te
th B
If an object is an iterator, we can use it with for loops, comprehensions, etc
M a
Python will know how to loop (iterate) over such an object
(basically using the same while loop technique we used)
t ©
i g h
p yr
C o
Example
m y
class Squares:
sq = Squares(5)
d e 0
a
1
c
def __init__(self, length):
for item in sq: à 4
A
self.i = 0
print(item) 9
te
self.length = length
16
def __next__(self):
B y
if self.i >= self.length:
raise StopIteration
a thStill one issue though!
else:
result = self.i ** 2
© M The iterator cannot be "restarted"
t
Once we have looped through all the items
h
self.i += 1
i g
return result the iterator has been exhausted
p yr
o
def __iter__(self): To loop a second time through the
C
return self collection we have to create a new
instance and loop through that
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Iterators
B
a th à become throw away objects
© M
h t
maintaining the collection of items (the container) (e.g. creating, mutating (if mutable), etc)
yr i g
iterating over the collection
o p
C
Why should we have to re-create the collection of items just to
iterate over them?
Separating the Collection from the Iterator
c a
Iterating over the data should be a separate object
te A
à iterator
B y
h
That object is throw-away à but we don't throw away the collection
a t
The collection is iterable
© M
h t
but the iterator is responsible for iterating over the collection
yr i g
p
The iterable is created once
o
C
The iterator is created every time we need to start a fresh iteration
Example
class Cities:
def __init__(self):
self._cities = ['Paris', 'Berlin', 'Rome', 'London']
m y
e
self._index = 0
def __iter__(self):
c a d
A
return self
def __next__(self):
y te
h B
if self._index >= len(self._cities):
t
a
raise StopIteration
M
else:
item = self._cities[self._index]
t ©
self._index += 1
i g h
return item
p yr
Cities instances are iterators
C o
Every time we want to run a new loop, we have to create a new
instance of Cities
This is wasteful, because we should not have to re-create the _cities
list every time
Example So, let's separate the object that maintains the cities, from the iterator itself
class Cities:
def __init__(self):
m y
e
self._cities = ['New York', 'New Delhi', 'Newcastle']
def __len__(self):
c a d
A
return len(self._cities)
class CityIterator:
y te
def __init__(self, cities):
th B
self._cities = cities
self._index = 0
M a
t ©
h
def __iter__(self):
return self
yr i g
o p
def __next__(self):
if self._index >= len(self._cities):
C raise StopIteration
else:
etc…
Example
To use the Cities and CityIterator together here's how we would proceed:
m y
cities = Cities() create an instance of the container object
d e
c a
create a new iterator – but see how we pass in the
A
city_iterator = CityIterator(cities)
existing cities instance
a th
M
At this point, the cities_iterator is exhausted
©
h t
If we want to re-iterate over the collection, we need to create a new one
yr i g
p
city_iterator = CityIterator(cities)
C o
for city in cities_iterator:
print(city)
But this time, we did not have to re-create the collection – we just
passed in the existing one!
So far…
m y
a container that maintains the collection items
d e
c a
a separate object, the iterator, used to iterate over the collection
a th
M
It would be nice if we did not have to do that manually every time
t ©
and if we could just iterate over the Cities object instead of CityIterator
i g h
p yr
This is where the formal definition of a Python iterable comes in…
C o
Iterables
Ac
te
__iter__ returns a new instance of the iterator object
used to iterate over the iterable
B y
a th
class Cities:
© M
t
def __init__(self):
i g h
self._cities = ['New York', 'New Delhi', 'Newcastle']
yr
def __len__(self):
p
o
return len(self._cities)
C
def __iter__(self):
return CityIterator(self)
Iterable vs Iterator
m y
e
An iterable is an object that implements
c a d
An iterator is an object that implements
te A
__iter__ à returns itself (an iterator)
B y
(not a new instance)
a th
__next__
© M
à returns the next element
h t
yr i g
So iterators are themselves iterables
p
but they are iterables that become exhausted
o
C
Iterables on the other hand never become exhausted
because they always return a new iterator that is then used to iterate
Iterating over an iterable
c a
(we'll actually come back to this for sequences!)
te A
y
The first thing Python does when we try to iterate over an object
t ©
h
using the iterator returned by iter()
yr i g
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Lazy Evaluation
Ac
value of a property only becomes known when the property is requested - deferred
y te
Example
th B
class Actor:
def __init__(self, actor_id):
Ma
self.actor_id = actor_id
t ©
h
self.bio = lookup_actor_in_db(actor_id)
i g
self.movies = None
yr
@property
o p
def movies(self):
c
Example
te A
B y
iterable à Factorial(n)
a th
M
will return factorials of consecutive integers from 0 to n-1
©
h t
do not pre-compute all the factorials
yr i g
wait until next requests one, then calculate it
o p
C
This is a form of lazy evaluation
Application to Iterables
m y
e
Another application of this might be retrieving a list of forum posts
c a d
A
Posts might be an iterable
y te
each call to next returns a list of 5 posts (or some page size)
th B
but uses lazy loading
M a
©
à every time next is called, go back to database and get next 5 posts
t
i g h
p yr
C o
Application to Iterables à Infinite Iterables
m
Using that lazy evaluation technique means that we can actually have infinite iterablesy
d e
Since items are not computed until they are requested
c a
we can have an infinite number of items in the collection
te A
B y
a th
Don't try to use a for loop over such an iterable
M
unless you have some type of exit condition in your loop
©
t
à otherwise infinite loop!
h
yr i g
o p
Lazy evaluation of iterables is something that is used a lot in Python!
C
We'll examine that in detail in the next section on generators
m y
d e
c a
te A
B y
iter()
a th
© M
h t
yr i g
o p
C
What happens when Python performs an iterationon over an iterable?
m y
The very first thing Python does is call the iter() function on the object we want to iterate
d e
c a
A
If the object implements the __iter__ method, that method is called
te
and Python uses the returned iterator
B y
a th
M
What happens if the object does not implement the __iter__ method?
©
h t
i g
Is an exception raised immediately?
yr
o p
C
Sequence Types
m y
So how does iterating over a sequence type – that maybe only implemented __getitem__ work?
d e
I just said that Python always calls iter() first
c a
te A
y
You'll notice I did not say Python always calls the __iter__ method
B
a th
M
I said it calls the iter() function!!
t ©
g h
In fact, if obj is an object that only implements __getitem__
i
p yr
o
iter(obj) à returns an iterator type object!
C
Some form of magic at work?
Not really!
m y
d e
a
Let's think about sequence types and how we can iterate over them
Ac
Suppose seq is some sequence type that implements __getitem__ (but not __iter__)
y te
__getitem__ method? à IndexError
t B
Remember what happens when we request an index that is out of bounds from the
h
index = 0
M a
t ©
h
while True:
try:
yr i g
p
print(seq[index])
o
index += 1
C
except IndexError:
break
Making an Iterator to iterate over any Sequence
m y
d e
class SeqIterator:
def __init__(self, seq):
c a
self.seq = seq
self.index = 0
te A
B y
def __iter__(self):
return self
a th
def __next__:
© M
try:
h t
g
item = self.seq[self.index]
yr i
self.index += 1
return item
o p
except IndexError:
C raise StopIteration()
Calling iter()
m y
So when iter(obj) is called:
d e
Python first looks for an __iter__ method
c a
te A
y
à if it's there, use it
à if it's not
th B
M
look for a __getitem__ method a
t ©
à if it's there create an iterator object and return that
i g h
à if it's not there, raise a TypeError exception (not iterable)
p yr
C o
Testing if an object is iterable
m y
e
Sometimes (very rarely!)
B y
__getitem__ or __iter__
a th
M
and that __iter__ returns an iterator
©
Easier approach: try:
h t
yr i g iter(obj)
except TypeError:
o p # not iterable
C
<code>
else:
# is iterable
<code>
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Python provides many functions that return iterables or iterators
m y
e
Additionally, the iterators perform lazy evaluation
c a d
A
You should always be aware of whether you are dealing with an iterable or an iterator
te
B y
why?
th
if an object is an iterable (but not an iterator) you can iterate over it many times
a
M
if an object is an iterator you can iterate over it only once
©
h t
yr i g
o p
C
range(10) à iterable
m y
d e
a
zip(l1, l2) à iterator
Ac
te
enumerate(l1) à iterator
B y
open('cars.csv') à iterator
a th
dictionary .keys() à iterable
© M
h t
g
dictionary .values() à iterable
yr i
o p
dictionary .items() à iterable
c a
We now want to run a loop that will call countdown()
countdown()
countdown()
à
à
4
3
until 0 is reached
te A
countdown() à 2
B y
h
We could certainly do that using a loop and testing the
t
countdown() à 1
a
countdown() à 0 value to break out of the loop once 0 has been reached
countdown()
...
à -1
© M
h t
g
while True:
yr i val = countdown()
p
if val == 0:
C o break
else:
print(val)
An iterator approach
We could take a different approach, using iterators, and we can also make it quite generic
m y
d e
a
Make an iterator that knows two things:
the callable that needs to be called
Ac
te
a value (the sentinel) that will result in a StopIteration if the callable returns that value
y
th B
a
The iterator would then be implemented as follows:
© M
h t
call the callable and get the result
yr i g
if the result is equal to the sentinel à StopIteration
C
otherwise return the result
m y
e
We just studied the first form of the iter() function:
B y
th
iter() creates a iterator for us (leveraging the sequence protocol)
a
© M
t
Notice that the iter() function was able to generate an iterator for us automatically
i g h
p yr
C o
The second form of the iter() function
iter(callable, sentinel)
m y
d e
c a
A
This will return an iterator that will:
t ©
i g h
p yr
C o
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Iterating a sequence in reverse order
y
If we have a sequence type, then iterating over the sequence in reverse order is quite simple:
m
for item in seq[::-1]:
d e
This works, but is wasteful because it makes a copy of
print(item) the sequence
c a
for i in range(len(seq)):
te A
print(seq[len(seq) – i – 1])
B y
h
This is more efficient, but the syntax is messy
for i in range(len(seq)-1, -1, -1):
a t
M
print(seq[i])
yr
over the sequence – it does not copy the
data like the first example
o p
C Both __getitem__ and __len__ must be implemented
m y
Unfortunately, reversed() will not work with custom iterables without a little bit of extra work
d e
c a
When we call reversed() on a custom iterable, Python will look for and call
the __reversed__ function
te A
B y
h
That function should return an iterator that will be used to perform the reversed iteration
a t
M
So basically we have to implement a reverse iterator ourselves
©
h t
i g
Just like the iter() method, when we call reversed() on an object:
yr
p
looks for and calls __reversed__ method
o
C
if it's not there, uses __getitem__ and __len__
to create an iterator for us
exception otherwise
Card Deck Example
m y
In the code exercises I am going to build an iterable containing a deck of 52 sorted cards
d e
a
2 Spades … Ace Spades, 2 Hearts … Ace Hearts, 2 Diamonds … Ace Diamonds, 2 Clubs … Ace Clubs
Ac
te
But I don't want to create a list containing all the pre-created cards à Lazy evaluation
B y
h
So I want my iterator to figure out the suit and card name for a given index in the sorted deck
a t
M
SUITS = ['Spades', 'Hearts', 'Diamonds', 'Clubs']
©
t
RANKS = [2, 3, …, 10, 'J', 'Q', 'K', 'A']
h
yr i g
We assume the deck is sorted as follows:
o p
C
iterate over SUITS
for each suit iterate over RANKS
y
RANKS = [2, 3, …, 10, 'J', 'Q', 'K', 'A']
e
2S … AS 2H … AH 2D … AD 2C … AC
m
There are len(SUITS) suits 4 There are len(RANKS) ranks
c a 13 d
The deck has a length of: len(SUITS) * len(RANKS) 52
te A
B y
Each card in this deck has a positional index: a number from 0 to len(deck) - 1 0 - 51
a th
M
To find the suit index of a card at index i: To find the rank index of a card at index i:
i // len(RANKS)
t © i % len(RANKS)
i g h
yr
Examples Examples
5th
p
card (6S) à index 4
o
5th card (6S) à index 4
C
à 4 // 13 à 0 à 4 % 13 à 4
m y
d e
c a
generator functions à generator factories
te A
y
à they return a generator when called
B
th
à they are not a generator themselves
a
© M
generator expressions
h t
à uses comprehension syntax
C
performance considerations
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Iterators review
m y
class FactIter:
d e
def __init__(self, n):
c a
A
self.n = n
te
self.i = 0
def __iter__(self):
B y
return self
a th
def __next__(self):
if self.i >= self.n:
© M
raise StopIteration
h t
g
else:
yr i
result = math.factorial(self.i)
p
self.i += 1
C o
return result
y
What if we could do something like this instead:
c a d
A
pause execution here facts = factorials(4)
te
wait for resume get_next(facts) à 0!
return 'done!'
y
get_next(facts) à 1!
B
a th
get_next(facts) à 2!
get_next(facts) à 3!
© M get_next(facts) à done!
h t
Of course, getting 0!, 1!, 2!, 3! followed by a string is odd
yr i g
And what happens if we call get_next again?
o p
Maybe we should consider raising an exception… StopIteration?
C
And instead of calling get_next, why not just use next?
y
The yield keyword does exactly what we want:
it emits a value
e m
the function is effectively suspended (but it retains its current state)
c a d
calling next on the function resumes running the function right after the yield statement
te A
if function returns something instead of yielding (finishes running) à StopIteration exception
def song():
B y
h
print('line 1')
yield "I'm a lumberjack and I'm OK"
a t
M
print('line 2')
yield 'I sleep all night and I work all day'
t ©
lines = song()
i g h
à no output!
line = next(lines)
o
line à "I'm a lumberjack and I'm OK"
C
line = next(lines) à 'line 2' is printed in console
line à "I sleep all night and I work all day"
line = next(lines) à StopIteration
Generators
A function that uses the yield statement, is called a generator function
y
th B
The generator is created by Python when the function is called à gen = my_func()
M a
The resulting generator is executed by calling next() à next(gen)
©
the function body will execute until it encounters a yield statement
t
i g h
it yields the value (as return value of next()) then it suspends itself
yr
until next is called again à suspended function resumes execution
o p
if it encounters a return before a yield
def my_func():
m y
yield 1
d e
yield 2
c a
A
yield 3
à gen is a generator
y te
B
gen = my_func()
a th
next(gen) à1
© M
next(gen) à2
h t
next(gen) à3
yr i g
next(gen)
o p à StopIteration
C
Generators
next StopIteration
m y
This should remind you of iterators!
d e
c a
In fact, generators are iterators
te A
à they implement the iterator protocol
__iter__
B y__next__
def my_func():
a th
à they are exhausted when function returns a value
M
yield 1
yield 2
©
à StopIteration exception
t
yield 3
h
à return value is the exception message
gen = my_func()
yr i g
o p
gen.__iter__() à iter(gen) à returns gen itself
C
gen.__next__() à next(gen)
Example
class FactIter:
def factorials(n):
m y
e
def __init__(self, n):
for i in range(n):
d
self.n = n
a
yield math.factorial(i)
c
self.i = 0
def __iter__(self):
te A
fact_iter = factorials(5)
y
return self
def __next__(self):
th B
if self.i >= self.n:
raise StopIteration
M a
else:
t ©
result = math.factorial(self.i)
i
self.i += 1
g h
yr
return result
o p
fact_iter = FactIter(5)
C
Generators
Generator functions are functions which contain at least one yield statement
m y
When a generator function is called, Python creates a generator object
d e
c a
Generators implement the iterator protocol
te A
Generators are inherently lazy iterators
y
(and can be infinite)
B
a th
Generators are iterators, and can be used in the same way (for loops, comprehensions, etc)
© M
Generators become exhausted once the function returns a value
h t
yr i g
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Generators become exhausted
te A
B y
Generators are iterators
a th
M
à they can become exhausted (consumed)
©
à they cannot be "restarted"
t
i g h
yr
This can lead to bugs if you try to iterate twice over a generator
o p
C
Example
m y
def squares(n):
d e
a
for i in range(n):
yield i ** 2
Ac
sq = squares(5)
y
à sq is a new generator (iterator)
te
th B
l = list(sq)
M
l à [0, 1, 4, 9, 16]
a
t ©
and sq has been exhausted
i g h
l = list(sq)
p yr l à []
C o
Example
def squares(n):
This of course can lead to unexpected behavior sometimes…
y
for i in range(n):
m
e
yield i ** 2
sq = squares(5)
c a d
enum1 = enumerate(sq)
te A
enumerate is lazy à hasn't iterated through sq yet
B y
next(sq) à0
a th
next(sq) à1
© M
h t
i g
list(enum1) à [(0,4), (1, 9), (2, 16)]
yr
o p notice how enumerate started at i=2
c
te A
y
def squares(n): class Squares:
B
for i in range(n): def __init__(self, n):
yield i ** 2
a th
self.n = n
new instance of
the generator
© M def __iter__(self):
return squares(n)
sq = Squares(n)
h t
yr i g
o p
l1 = list(sq) l1 à [0, 1, 4, 9, 16]
C
l2 = list(sq) l2 à [0, 1, 4, 9, 16]
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Comprehension Syntax
m y
e
We already covered comprehension syntax when we studied list comprehensions
c a d
A
l = [i ** 2 for i in range(5)]
y te
As well as more complicated syntax:
th B
• if statements
M a
©
• multiple nested loops
t
• nested comprehensions
h
[(i, j)
yr i g
p
for i in range(1, 6) if i%2==0
o
C
for j in range(1, 6) if j%3==0]
th B
a
a list is returned a generator is returned
evaluation is eager
© M evaluation is lazy
h t
has local scope
o p
C
can access nonlocal can access nonlocal
and global scopes and global scopes
iterable iterator
Resource Utilization
m y
all objects are created right away
d e
object creation is delayed until requested by next()
c a
A
à takes longer to create/return the list à generator is created/returned immediately
y te
à iteration is slower (objects need to be created)
th B
M a
if you iterate through all the elements à time performance is about the same
if you do not iterate through all the elements à generator more efficient
t ©
i g h
à entire collections is loaded into memory à only a single item is loaded at a time
p yr
o
in general, generators tend to have less memory overhead
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Delegating to another iterator
B y
h
with open(file) as f:
for line in f:
a t
M
yield line
t ©
The inner loop is basically just using the file iterator and yielding values directly
i g h
yr
Essentially we are delegating yielding to the file iterator
o p
C
Simpler Syntax
We can replace this inner loop by using a simpler syntax: yield from
m y
d e
a
def read_all_data():
for file in ('file1.csv', 'file2.csv',
'file3.csv'):
Ac
with open(file) as f:
y te
B
for line in f:
yield line
a th
def read_all_data():
© M
t
for file in ('file1.csv', 'file2.csv',
'file3.csv'):
i g h
yr
with open(file) as f:
yield from f
o p
C
We'll come back to yield from, as there is a lot more to it!
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Background Info
m y
Along with this project is a data file: nyc_parking_tickets_extract.csv
d e
c a
Here are the first few lines of data
te A
B y
th
Summons Number,Plate ID,Registration State,Plate Type,Issue Date,Violation Code,Vehicle Body Type,Vehicle Make,Violation Description
a
4006478550,VAD7274,VA,PAS,10/5/2016,5,4D,BMW,BUS LANE VIOLATION
4006462396,22834JK,NY,COM,9/30/2016,5,VAN,CHEVR,BUS LANE VIOLATION
M
4007117810,21791MG,NY,COM,4/10/2017,5,VAN,DODGE,BUS LANE VIOLATION
©
h t
g
àfields separated by commas
yr i
à first row contains the field names
o p
C
à data rows are a mix of data types: string, date, int
m y
Your first goal is to create a lazy iterator that will produce a named tuple for each row of data
d e
c a
The contents of each tuple should be an appropriate data type (e.g. date, int, string)
te A
y
You can use the split method for string to split on the comma
B
th
You will need to use the strip method to remove the end-of-line character (\n)
a
© M
t
Remember, the goal is to produce a lazy iterator
h
yr i g
p
à you should not be reading the entire file in memory and then processing it
C o
à the goal is to keep the required memory overhead to a minimum
B y
a th
M
You can choose otherwise, but I would store the make and violation counts as a dictionary
t ©
h
à key = car make
yr i g
à value = # violations
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Python has many tools for working with iterables
m y
iter reversed next len slicing
d e
c a
A
zip
filter
y te
sorted
th B
enumerate
M a built-in
t ©
all any
i g h
map
p yr
C
reduce
o functools module
The itertools module
Slicing islice
m y
d e
Selecting and Filtering dropwhile takewhile
c
compress
a filterfalse
a
accumulate
th
Infinite Iterators count
© M
cycle repeat
h t
Zipping
yr i g
zip_longest
o p
C
Combinatorics product permutations
combinations
combinations_with_replacement
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Aggregators
m y
Functions that iterate through an iterable and return a single value that (usually) takes into
account every element of the iterable
d e
c a
min(iterable) à minimum value in the iterable
te A
B y
h
max(iterable) à maximum value in the iterable
a t
sum(iterable)
M
à sum of all the values in the iterable
©
h t
yr i g
o p
C
Associated Truth Values
Ac
Every object has a True truth value, except:
y te
• None
th B
• False
M a
©
• 0 in any numeric type (e.g. 0, 0.0, 0+0j, …)
t
i g h
• empty sequences (e.g. list, tuple, string, …)
yr
• empty mapping types (e.g. dictionary, set, …)
o p
• custom classes that implement a __bool__ or __len__
C
method that returns False or 0
m y
d e
any(iterable)
c a
à returns True if any (one or more) element in iterable is truthy
à False otherwise
te A
B y
all(iterable)
a th
à returns True if all the elements in iterable are truthy
M
à False otherwise
©
h t
yr i g
o p
C
Leveraging the any and all functions
m y
Often, we are not particularly interested in the direct truth value of the elements in our iterables
d e
à want to know if any, or all, satisfy some condition
c a
à if the condition is True
te A
y
A function that takes a single argument and returns True or False is called a predicate
B
a th
We can make any and all more useful by first applying a predicate to each element of the iterable
© M
h t
yr i g
o p
C
Example
m y
and we want to know if: every element is less than 10
d e
c a
First define a suitable predicate: pred = lambda x: x < 10
te A
B y
a th
Apply this predicate to every element of the iterable:
© M
results = [pred(1), pred(2), pred(3), pred(4), pred(100)]
à [True,
h tTrue, True, True, False]
yr i g
o p
Then we use all on these results all(results) à False
C
How do we apply that predicate?
a th
Or even:
© M
new_list = []
h t
for item in iterable:
yr i g
new_list.append(fn(item))
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
itertools.islice
Ac
te
We can also slice general iterables (including iterators of course)
B y
à islice(iterable, start, stop, step)
a th
from itertools import islice
© M
t
l = [1, 2, 3, 4]
i g
result = islice(l, 0, 3)
h
p
list(result)
yr à [1, 2, 3]
C o
à islice returns a lazy iterator
d e
à returns all elements of iterable where predicate(element) is True
c a
predicate can be None – in which case it is the identity function
te A
y
f(x) à x
h B
à in other words, truthy elements only will be retained
t
à filter returns a lazy iterator
M a
t ©
g h
We can achieve the same result using generator expressions:
i
p yr
(item for item in iterable if pred(item)) predicate is not None
C o
(item for item in iterable if item)
predicate is None
or (item for item in iterable if bool(item))
Example
m y
filter(lambda x: x < 4, [1, 10, 2, 10, 3, 10]) à 1, 2, 3
d e
c a
te A
y
filter(None, [0, '', 'hello', 100, False]) à 'hello', 100
th B
M a
©
à remember that filter returns a (lazy) iterator
h t
yr i g
o p
C
itertools.filterfalse
y te
Example
th B
M a
filterfalse(lambda x: x < 4, [1, 10, 2, 10, 3, 10]) à 10, 10, 10
t ©
i g h
filterfalse(None, [0, '', 'hello', 100, False]) à 0, '', False
p yr
C o
à filterfalse returns a (lazy) iterator
itertools.compress
m y
d e
a
It is basically a way of filtering one iterable, using the truthiness of items in another iterable
Ac
te
data = ['a', 'b', 'c', 'd', 'e']
B y
h
selectors = [True, False, 1, 0]
t
None
M
à a, c
a
©
compress(data, selectors)
h t
yr i g
à compress returns a (lazy) iterator
o p
C
itertools.takewhile
m y
e
takewhile(pred, iterable)
c a d
A
The takewhile function returns an iterator that will yield items while pred(item) is Truthy
y te
à at that point the iterator is exhausted
th B
a
even if there are more items in the iterable whose predicate would be truthy
M
t ©
h
takewhile(lambda x: x < 5, [1, 3, 5, 2, 1]) à 1, 3
yr i g
o p
à takewhile returns a (lazy) iterator
C
itertools.dropwhile
m y
e
dropwhile(pred, iterable)
c a d
A
The dropwhile function returns an iterator that will start iterating (and yield all remaining elements)
te
once pred(item) becomes Falsy
B y
dropwhile(lambda x: x < 5, [1, 3, 5, 2, 1])
a th à 5, 2, 1
© M
h t
à dropwhile returns a (lazy) iterator
yr i g
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
itertools.count à lazy iterator
a t complex
Example
© M Decimal
count(10, 2)
h t
à 10, 12, 14, … bool False à 0
yr i g True à 1
p
count(10.5, 0.1) à 10.5, 10.6, 10.7, …
C o
takewhile(lambda x: x < 10.8, count(10.5, 0.1))
à 10.5, 10.6, 10.7
itertools.cycle à lazy iterator
m y
e
The cycle function allows us to loop over a finite iterable indefinitely
c a d
Example
te A
B y
h
cycle(['a', 'b', 'c']) à 'a', 'b', 'c', 'a', 'b', 'c', …
a t
© M
Important
h t
yr i g
If the argument of cycle is itself an iterator à iterators becomes exhausted
o p
cycle will still produce an infinite sequence
C
à does not stop after the iterator becomes exhausted
itertools.repeat à lazy iterator
m y
e
The repeat function simply yields the same value indefinitely
c a d
A
repeat('spam') à 'spam', 'spam', 'spam', 'spam', …
y te
th B
Optionally, you can specify a count to make the iterator finite
M
repeat('spam', 3) à 'spam', 'spam', 'spam' a
t ©
Caveat
i g h
p yr
The items yielded by repeat are the same object
C o
à they each reference the same object in memory
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Chaining Iterables itertools.chain(*args) à lazy iterator
te A
B y
h
We can manually chain iterables this way: iter1 iter2 iter3
a t
for it in (iter1, iter2, iter3):
© M yield from it
h
Or, we an use chain as follows: t
yr i g
p
for item in chain(iter1, iter2, iter3):
o
C
print(item)
m y
What happens if we want to chain from iterables contained inside another, single, iterable?
d e
l = [iter1, iter2, iter3]
c a
chain(l) à l
te A
B y
h
What we really want is to chain iter1, iter2 and iter3
a t
We can try this using unpacking:
© Mchain(*l)
t
à produces chained elements from iter1, iter2 and iter3
BUT
i g h
unpacking is eager – not lazy!
p yr
If l was a lazy iterator, we essentially iterated through l (not the sub
C o
iterators), just to unpack!
d e
a
yield from sub_it
Ac
te
Or we can use chain.from_iterable
B y
chain.from_iterable(it)
a th
© M
This achieves the same result
h t
yr i g
p
à iterates lazily over it
o
C à in turn, iterates lazily over each iterable in it
"Copying" Iterators itertools.tee(iterable, n)
m y
Sometimes we need to iterate through the same iterator multiple times, or even in parallel
d e
a
We could create the iterator multiple times manually
Ac
te
iters = []
y
for _ in range(10):
B
iters.append(create_iterator())
i g h
tee(iterable, 10)
C o
all different objects
Teeing Iterables
à always!
Ac
à even if the original argument was not
y te
th B
l = [1, 2, 3, 4]
M a
tee(l, 3)
©
à (iter1, iter2, iter3)
t
i g h
yr
all lazy iterators
o p not lists!
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Mapping and Accumulation
c a
Accumulation
te
à reducing an iterable down to a single value A
B y
à sum(iterable)
th
calculates the sum of every element in an iterable
a
à min(iterable)
© M
returns the minimal element of the iterable
à max(iterable)
yr i g
à reduce(fn, iterable, [initializer])
a th
M
Of course, we can easily do the same thing using a generator expression too
©
h t
g
maps = (fn(item) for item in iterable)
yr i
o p
C
reduce
m y
Suppose we want to find the sum of all elements in an iterable:
d e
l = [1, 2, 3, 4]
c a
A
sum(l) à 1 + 2 + 3 + 4 = 10
reduce(lambda x, y: x + y, l) à 1
y te
B
à1 + 2 = 3
à3 + 3 = 6
à 6 + 4 = 10
a th
To find the product of all elements:
© M
h t
reduce(lambda x, y: x * y, l) à 1
yr i g à1 * 2 = 2
à2 * 3 = 6
o p à 6 * 4 = 24
C
We can specify a different "start" value in the reduction
Ac
à useful for mapping a multi-argument function on an iterable of iterables
y te
l = [ [1, 2], [3, 4] ]
th B
map(lambda item: item[0] * item[1], l) à 2, 12
t ©
we could also just use a generator expression to do the same thing:
i g h
(operator.mul(*item) for item in l)
p yr
o
We can of course use iterables that contain more than just two values:
C
l = [ [1, 2, 3], [10, 20, 30], [100, 200, 300] ]
a th
© M
Note the argument order is not the same! reduce(fn, iterable)
h t accumulate(iterable, fn)
yr i g
à in accumulate, fn is optional
o p
C
à defaults to addition
Example
l = [1, 2, 3, 4]
m y
d e
a
functools.reduce(operator.mul, l) Ø 1 à 24
Ø
Ø
1 * 2 = 2
2 * 3 = 6
Ac
te
Ø 6 * 4 = 24
B y
a
itertools.accumulate(l, operator.mul)
th à 1, 2, 6, 24
© M
h t
yr i g
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
The zip Function à lazy iterator
m y
d e
a
It takes a variable number of positional arguments – each of which are iterables
Ac
te
It returns an iterator that produces tuples containing the elements of the iterables, iterated one
y
at a time
th B
It stops immediately once one of the iterables has been completely iterated over
M
à zips based on the shortest iterable a
t ©
i g h
zip([1, 2, 3], [10, 20], ['a', 'b', 'c', 'd'])
p yr
à (1, 10, 'a'), (2, 20, 'b')
C o
itertools.zip_longest(*args, [fillvalue=None])
Ac
zip([1, 2, 3], [10, 20], ['a', 'b', 'c', 'd'])
y te
à (1, 10, 'a'), (2, 20, 'b')
th B
M a
©
zip_longest([1, 2, 3], [10, 20], ['a', 'b', 'c', 'd'])
h t
à (1, 10, 'a'), (2, 20, 'b'), (3,None, 'c'), (None, None, 'd')
yr i g
o p
C
zip_longest([1, 2, 3], [10, 20], ['a', 'b', 'c', 'd'], -1)
à (1, 10, 'a'), (2, 20, 'b'), (3,-1, 'c'), (-1, -1, 'd')
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Grouping
Sometimes we want to loop over an iterable of elements
but we want to group those elements as we iterate through them
m y
Suppose we have an iterable containing tuples, and we want to group based
d e
on the first element of each tuple
c a
(1, 10, 100)
We would like to iterate
using this kind of
te A for key, group in groups:
print(key)
(1, 11, 101) group 1 approach:
B y for item in group:
h t
(3, 30, 300)
yr i g key à 2
(2, 20, 200)
p
(3, 31, 301) group 3 (2, 21, 201)
C o
(3, 32, 302)
key
(3,
à 3
30, 300)
(3, 31, 301)
(3, 32, 302)
itertools.groupby(data, [keyfunc]) à lazy iterator
iterable
c a
(1, 10, 100) Here we want to group based on the 1st
A
element of each tuple
te
(1, 11, 101) à grouping key lambda x: x[0]
B y
(1, 12, 102)
a th
groupby(iterable, lambda x: x[0])
M
(2, 20, 200)
(2, 21, 201) à iterator
i g h 1, sub_iterator à (1, 10, 100), (1, 11, 101), (1, 12, 102)
(3, 31, 301)
C o 3, sub_iterator à (3, 30, 300), (3, 31, 301), (3, 32, 302)
c a
groups = groupby(iterable, lambda x: x[0])
A
(1, 10, 100)
(1, 11, 101) next(groups) next(iterable)
y te
next(iterable)
1, sub_iterator à (1, 10, 100), (1, 11, 101), (1, 12, 102)
next(iterable)
th B
(2, 20, 200)
(2, 21, 201)
next(groups)
2, sub_iterator
M a
next(iterable)
(2, 20, 200)
next(iterable)
(2, 21, 201)
yr i g
3, sub_iterator à (3, 30, 300), (3, 31, 301), (3, 32, 302)
o p
C
next(groups) actually iterates through all the elements of the current "sub-iterator"
before proceeding to the next group
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
The itertool module contains a few functions for generating
m y
e
permutations combinations
c a d
A
It also has a function to generate the Cartesian product of multiple iterables
y te
All these functions return lazy iterators
th B
M a
t ©
i g h
p yr
C o
Cartesian Product
m y
e
{1, 2, 3} x {a, b, c} (1, a)
d
(2, a) c
(3, a)
c a
(1, b)
b
te A
y
(2, b)
B
(3, b) a
(1, c)
a th
M
1 2 3
(2, c)
©
(3, c)
h t
2-dimensional:
y r i g
! ×# = (&, () & ∈ !, ( ∈ #
o
n-dimensional:
p
! × ⋯ ×!
& ' = (&&, &(, … , &' ) && ∈ !&, … , &' ∈ !'
C
Cartesian Product
m y
l1 = [1, 2, 3] l2 = ['a', 'b', 'c', 'd']
d e
à notice not same length
c a
def cartesian_product(l1, l2):
te A
y
for x in l1:
B
for y in l2:
yield (x, y)
a th
© M
t
cartesian_product(l1, l2)
i g h
à (1, 'a'), (1, 'b'), (1, 'c'), (1, 'd'), …, (3,'d')
p yr
C o
itertools.product(*args) à lazy iterator
m y
d e
a
l1 = [1, 2, 3] l2 = ['a', 'b', 'c', 'd']
Ac
te
product(l1, l2) à (1, 'a'), (1, 'b'), (1, 'c'), (1, 'd'), …, (3,'d')
B y
l3 = [100, 200]
a th
product(l1, l2, l3) à (1,
© M
'a', 100), (1, 'a', 200),
h t
(1, 'b', 100), (1, 'b', 200),
yr i g (1,
…
'c', 100), (1, 'c', 200),
C
Permutations
This function will produce all the possible permutations of a given iterable
m y
In addition, we can specify the length of each permutation
d e
c a
à maxes out at the length of the iterable
itertools.permutations(iterable, r=None)
te A
B y
h
à r is the size of the permutation
a t
M
à r = None means length of each permutation is the length of the iterable
©
h t
i g
Elements of the iterable are considered unique based on their position, not their value
yr
p
à if iterable produces repeat values
o
C
then permutations will have repeat values too
Combinations
m y
à OK to always sort the elements of a combination
d e
c a
Combinations of length r, can be picked from a set
te A
• without replacement
B y
à once an element has been picked from the set it
a th
cannot be picked again
• with replacement
© M
à once an element has been picked from the set it can
be picked again
h t
yr i g
o p
C
itertools.combinations(iterable, r)
itertools.combinations_with_replacement(iterable, r)
m y
d e
Just like for permutations:
c a
te A
the elements of an iterable are unique based on their position, not their value
B y
a th
The different combinations produced by these functions are sorted
M
based on the original ordering in the iterable
©
h t
yr i g
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Data Files
m y
vehicles.csv
d e
employment.csv
c a
update_status.csv
te A
B y
Each file contains a common key that uniquely identifies each row – SSN
a th
©
à appears only once in every file M
You are guaranteed that every SSN number
h
à is present in all 4 files
t
yr i g
à the order of SSN in each file is the same
o p
C
To make the approach easier, I am going to break it down into
multiple smaller goals
Goal 1
m y
d e
à returns named tuples
c a
à data types are appropriate (string, date, int, etc)
te A
B
à the 4 iterators are independent of each other (for now)
y
a th
M
You will want to make use of the standard library module csv for this
©
h t
yr i g
o p
C
Reading CSV Files
m y
e
CSV files are files that contain multiple lines of data à strings
th B
a
in addition, individual fields may be wrapped in further delimiters à quotes are common
© M
à this allows the field value to contain what may be otherwise interpreted as a delimiter
h t
Example
yr i g
1,hello,world à 3 values: 1 hello world
o p
C 1,"hello,world" à 2 values: 1 hello, world
Reading CSV Files
1,hello,world
m y
1,"hello, world"
d e
c a
A
Simply splitting on the comma is not going to work in the second example! à 1 "hello world"
t
Example
i g h
Mueller-Rath,Human Resources,05-8069298,123-88-3381
yr
"Schumm, Schumm and Reichert",Engineering,73-3839744,125-07-9434
o p
def read_file(file_name):
C
with open(file_name) as f:
reader = csv.reader(f, delimiter=',', quotechar='"')
yield from reader
à yields lists of strings containing each field value
Goal 2
Create a single iterable that combines all the data from all four files
m y
d e
a
à try to re-use the iterators you created in Goal 1
c
à by combining I mean one row per SSN containing data from all four files in a single named tuple
A
y te
B
Once again, make sure returned data is a single named tuple containing all fields
a th
M
When you "combine" the data, make sure the SSN's match!
©
h t
i g
Remember that all the files are already sorted by SSN, and that each SSN appears once, and
yr
only once, in every file
o p à viewing files side by side, all the row SSN's will align correctly
C
Don't repeat the SSN 4 times in the named tuple – once is enough!
Goal 3
©
h t
i g
Make sure your iterator remains lazy!
yr
o p
C
Goal 4
m y
e
For non-stale records, generate lists of number of car makes by gender
c a d
If you do this correctly, the largest groups for each gender are:
te A
B y
Female à Ford and Chevrolet (both have 42 persons in those groups)
m y
Oxford dictionary:
e
The circumstances that form the setting for an event, statement, or
idea, and in terms of which it can be fully understood.
d
c a
In Python: the state surrounding a section of code
te A
B y
# module.py
a th global scope
f = open('test.txt', 'r')
© M f à a file object
print(f.readlines())
h t
f.close()
yr i g
o p
C
when print(f.readlines()) runs, it has a context in which it runs
à global scope
Managing the context of a block of code
th B
a
Need to better "manage" the context that perform_work(f) needs
f = open('test.txt', 'r')
© M
try:
h t
perform_work(f)
finally:
yr i g
p
f.close()
o
C
this works à writing try/finally every time can get cumbersome
à too easy to forget to close the file
Context Managers
m y
e
à create a context (a minimal amount of state needed for a block of code)
c a d
te
à automatically clean up the context when we are done with it
A
B y
à enter context
a th
à open file
h t
à exit context
o p
C
Example
m y
e
with open('test.txt', 'r') as f: create the context à open file
B y
Context managers manage data in our scope
a th à on entry
© M à on exit
h t
yr i g
Very useful for anything that needs to provide Enter / Exit Start / Stop Set / Reset
o p
C
à open / close file
à start db transaction / commit or abort transaction
à set decimal precision to 3 / reset back to original precision
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
try…finally…
B y
even if an exception occurs in except block
h
finally:
…
a t
© M
Works even if inside a function and a return is in the try or except blocks
h t
yr i g
Very useful for writing code that should execute no matter what happens
o p
C
But this can get cumbersome!
yr i g
o p
C
Context Managers PEP 343 object returned from context (optional)
a th
with open(file_name) as f:
h t
yr i
# file is now closed g exit the context
o p
C
The context management protocol
a
exception handling
M
# done with context obj = mgr.__enter__()
t © try:
h
# do something
yr i g finally:
# done with context
o p mgr.__exit__()
C
Use Cases
y
Very common usage is for opening a file (creating resource) and closing the file (releasing resource)
m
d
Context managers can be used for much more than creating and releasing resources
e
c a
Common Patterns
te A
• Open – Close
B y
•
•
Lock – Release
Change – Reset
a th
•
•
Start – Stop
Enter – Exit
© M
h t
Examples
yr i g
o p
• file context managers
C
• Decimal contexts
How Context Protocol Works class MyClass:
def __init__(self):
works in conjunction with a with statement # init class
m y
d e
def __enter__(self):
a
my_obj = MyClass()
c
return obj
works as a regular class
__enter__, __exit__ were not called
te A
def __exit__(self, + …):
B y # clean up obj
t
à calls my_instance.__enter__() à my_instance
i g h
à return value from __enter__ is assigned to obj
C o
after the with block, or if an exception occurs inside the with block:
à my_instance.__exit__ is called
Scope of with block
a
with open(fname) as f:
© M
row = next(f)
h t
yr i g row is also in the global scope
print(f)
C
print(row)
row is available and has a value
The __enter__ Method
m y
def __enter__(self):
d e
c a
This method should perform whatever setup it needs to
te A
It can optionally return an object
y
à as returned_obj
B
a th
That's all there is to this method
© M
h t
yr i g
o p
C
The __exit__ Method
More complicated…
m y
d e
a
Remember the finally in a try statement? à always runs even if an exception occurs
__exit__ is similar
Ac
à runs even if an exception occurs in with block
y te
th B
But should it handle things differently if an exception occurred?
à maybe
M a
à so it needs to know about any exceptions that occurred
t ©
g h
à it also needs to tell Python whether to silence the exception, or let it propagate
i
p yr
C o
The __exit__ Method
d e
print ('done')
c a
te A
Scenario 1
B y
th
__exit__ receives error, performs some clean up and silences error
a
print statement runs
© M
t
no exception is seen
i g h
yr
Scenario 2
o p
__exit__ receives error, performs some clean up and let's error propagate
C
print statement does not run
m y
e
Needs three arguments: à the exception type that occurred (if any, None otherwise)
a d
à the exception value that occurred (if any, None otherwise)
c
A
à the traceback object if an exception occurred (if any, None otherwise)
y te
B
Returns True or False: à True = silence any raised exception
th
à False = do not silence a raised exception
a
© M
t
def __exit__(self, exc_type, exc_value, exc_trace):
# do clean up work here
i g h
yr
return True # or False
o p
---------------------------------------------------------------------------
C
ValueError Traceback (most recent call last)
<ipython-input-14-39a69b57f322> in <module>()
1 with MyContext() as obj:
----> 2 raise ValueError
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Pattern: Open - Close
Open File
m y
operate on open file
d e
Close File
c a
te A
B y
h
Open socket
operate on socket
a t
Close socket
© M
h t
yr i g
o p
C
Pattern: Start - Stop
y te
Start timer
th B
perform operations
Ma
Stop timer
t ©
i g h
p yr
C o
Pattern: Lock - Release
m y
e
acquire thread lock
te A
B y
a th
© M
h t
yr i g
o p
C
Pattern: Change - Reset
m y
change Decimal context precision
d e
perform some operations using the new precision
c a
reset Decimal context precision back to original value
te A
B y
a th
redirect stdout to a file
© M
h t
g
perform some operations that write to stdout
yr i
reset stdout to original value
o p
C
Pattern: Wacky Stuff!
m y
with tag('p'):
d e
a
print('some text', end='') <p>some text</p>
Ac
with tag('p'):
y te
print('some', end='')
th B
a
<p>some <b>bold<b> text</p>
with tag('b'):
print('bold ', end='')
© M
print('text', end='')
h t
yr i g
o p
C
Pattern: Wacky Stuff!
m y
with ListMaker(title='Items', prefix='- ',
d e
a
indent=3, stdout='myfile.txt') as lm:
lm.print('Item 1')
Ac
te
>> myfile.txt
y
with lm :
lm.print('item 1a')
th B Items
a
- Item 1
M
lm.print('item 1b')
- item 1a
lm.print(Item 2')
t © - item 1b
with lm :
i g h - Item 2
p yr
lm.print('item 2a') - item 2a
C o
lm.print('item 2b') - item 2b
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Context Manager Pattern
m y
create context manager
d e
c a
enter context (and, optionally, receive an object)
te A
do some work
B y
exit context
a th
© M
h t
yr i g
with open(file_name) as f:
o p
data = file.readlines()
C
Mimic Pattern using a Generator
c
f = next(ctx) opens file, and yields it
A
try:
te
yield f next(ctx) closes file
finally:
B y
h
à StopIteration exception
t
f.close()
Ma
©
ctx = open_file('file.txt', 'r')
f = next(ctx)
h t
try:
yr i g
p
# do work with file
finally:
C
try: o
next(ctx)
except StopIteration:
pass
This works in general
def gen(args):
m y
# do set up work here
d e
try:
c a
yield object
A
This is quite clunky still
te
finally:
h
# clean up object here
© M
t
obj = next(ctx)
i g h
yr
try:
# do work with obj
o p finally:
C
try:
next(ctx)
except StopIteration:
pass
Creating a Context Manager from a Generator Function
m y
e
f = open(fname, mode)
generator object à gen = open_file('test.txt', 'w')
d
try:
yield f
c a
f = next(gen)
A
finally:
# do work with f
te
f.close()
B y next(f) à closes f
class GenContext:
a th
M
def __init__(self, gen): gen = open_file('test.txt', 'w')
self.gen = gen
t © with GenContext(gen) as f:
h
# do work
g
def __enter__(self):
yr i
obj = next(self.gen)
p
return obj
C o
def __exit__(self, exc_type, exc_value, exc_tb):
next(self.gen)
return False
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
So far…
we saw how to create a context manager using a class and a generator function
m y
d e
a
def gen_function(args):
…
try:
Ac
yield obj single yield
y te
the return value of __enter__
finally:
th
cleanup phase
B __exit__
…
M a
class GenContextManager:
t ©
g
def __init__(gen_func):
i h
yr
self.gen = gen_func()
o p
def __enter__(self):
C
return next(self.gen)
with GenContextManager(gen_func):
m y
…
d e
c a
We can tweak this a bit to also allow passing in
arguments to gen_func
te A
class GenContextManager:
def __init__(gen_obj):
B y self.gen = gen_obj
M
return next(self.gen)
gen = gen_func(args)
i g
with GenContextManager(gen):
h next(self.gen)
yr
…
o p
C
This works, but we have to create the generator object first,
and use the GenContextManager class
y
self.gen = gen_obj
gen = gen_func(args)
e m
def __enter__(self):
with GenContextManager(gen):
c a d
return next(self.gen)
A
… def __exit__(self, …):
te
next(self.gen)
B y
def contextmanager_dec(gen_fn):
a th
def helper(*args, **kwargs):
© M
t
gen = gen_fn(*args, **kwargs)
h
yr i g
return GenContextManager(gen)
p
return helper
o
C
Usage Example def contextmanager_dec(gen_fn):
y
@contextmanager_dec def helper(*args, **kwargs):
m
def open_file(f_name):
f = open(f_name)
e
gen = gen_fn(*args, **kwargs)
d
a
try:
c
yield f return GenContextManager(gen)
finally:
te A
return helper
f.close()
B y
à open_file = contextmanager_dec(open_file)
a th
M
à open_file is now actually the helper closure
©
calling open_file(f_name)
h t
yr i g
à calls helper(f_name) [free variable gen_fn = open_file ]
o p
à creates the generator object
C
à returns GenContextManager instance
à with open_file(f_name)
The contextlib Module
c a d
Technique is basically what we came up with
te A
à more complex à exception handling
B y
a th
à if an exception occurs in with block, needs to be propagated
© M
back to generator function
__exit__(self, exc_type, exc_value, exc_tb)
h t
g
à enhanced generators as coroutines à later
yr i
This is implemented for us in the standard library:
o p
Ccontextlib.contextmanager
c a
à first row contains the field names
A
personal_info.csv
y te
B
The basic goal will be to create a context manager that only requires the file name
a th
and provides us an iterator we can use to iterate over the data in those files
© M
h t
The iterator should yield named tuples with field names based on the header row in the CSV file
yr i g
p
For simplicity, we assume all fields are just strings
o
C
Goal 1
For this goal implement the context manager using a context manager class
m y
d e
i.e. a class that implements the context manager protocol
c a
te A
__enter__ __exit__
B y
Make sure your iterator uses lazy evaluation
a th
© M
h t
If you can, try to create a single class that implements both
yr i g
the context manager protocol and the iterator protocol
o p
C
Goal 2
m y
e
For this goal, re-implement what you did in Goal 1, but using a generator function instead
c a d
A
You'll have to use the @contextmanager from the contextlib module
te
B y
a th
© M
h t
yr i g
o p
C
Information you may find useful
d e
a
print(row)
Ac
te
But file objects also support just reading data using the read function
B y
th
we specify how much of the file to read (that can span multiple rows)
a
© M
when we do this a "read head" is maintained à we can reposition this read head à seek()
h t
i g
with open(f_name) as f:
yr
print(f.read(100)) à reads the first 100 characters à read head is now at 100
o p
print(f.read(100)) à reads the next 100 characters à read head is now at 200
m y
d e
But CSV files can be written in different "styles" à dialects
c a
te A
john\tcleese\t42
y
john,cleese,42 john;cleese;42 john|cleese|42
th B
a
"john","cleese","42" 'john';'cleese';'42'
© M
The csv module has a Sniffer class we can use to auto-determine the specific dialect
h t
à need to provide it a sample of the csv file
yr i
with open(f_name) as f: g
o p
sample = f.read(2000)
C
dialect = csv.Sniffer().sniff(sample)
with open(f_name) as f:
reader = csv.reader(f, dialect)
m y
d e
c a
te A
B y
Good
a th
Luck!
© M
h t
yr i g
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Concurrency vs Parallelism
m y
concurrency parallelism
d e
Task 1 Task 2 Task 1 Task 2
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Cooperative vs Preemptive Multitasking
m y
e
cooperative preemptive
Task 1 Task 2
a
Task 1
c d
Task 2
voluntary not
te A
y
voluntary
voluntary!
B
yield yield not
a th voluntary!
© M
h t
yr i g
o p
C
completely controlled by developer not controlled by developer
Python some sort of
coroutines scheduler involved
threading
Coroutines
m y
Cooperative multitasking
d e
c a
à Python programs execute on a "single thread"
A
Concurrent, not parallel
te
Global Interpreter Lock à GIL
B y
Two ways to create coroutines in Python
a th
à generators
© M
à uses extended form of yield à recent addition: asyncio
h t
yr i g
à native coroutines à uses async / await
o p
C
This section is not about
m y
asyncio
d e
native coroutines
c a
threading
te A
B y
multiprocessing à parallelism
a th
© M
This section is about
h t
yr i g
learning the basics of generator-based coroutines
o p
C
some practical applications of these coroutines
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
What is a coroutine?
y
cooperative routines
m
inner is now in control
e
subroutines
running_averages is in control
subroutine is called
c a d stack frame
def running_averages(iterable):
te A created
(inner)
y
avg = averager()
for value in iterable:
running_average = avg(value)
th B
def averager():
print(running_average)
M a total = 0
count = 0
t © def inner(value):
h
nonlocal total
yr i g
running_averages is back in control
nonlocal count
total += value
o p count += 1
return total / count
C stack frame
destroyed
(inner)
return inner
m y
d e stack frame
a
def running_averages(iterable):
create instance of running_averager
start coroutine
Ac created
(running_averager)
for value in iterable:
send value to running_averager
y te
def running_averager():
received value back
th B total = 0
a
print(received value) count = 0
M
running_average = None
while True:
yr
yield new average
m y
d e
A queue is a data structure that supports first-in first-out (FIFO) addition/removal of items
c a
A
remove elements from front of queue
te
FIFO
add elements to back of queue
B y
a th
A stack is a data structure that supports last-in first-out addition/removal of items
© M
h t last pushed element is removed first (popped)
yr i g LIFO
o p
push elements
C
on top of stack
m y
e
stack lst.append(item) à appends item to end of list
lst.pop()
a d
à removes and returns last element of list
c
te A
queue lst.insert(0, item)
B y
à inserts item to front of list
lst.pop()
a th
à removes and returns last element of list
© M
t
So a list can be used for both a stack and a queue
i g h
yr
But, inserting elements in a list is quite inefficient!
p
C o numbers coming up in a bit…
The deque data structure
c a
A
à very efficient at adding / removing items from both front and end of a collection
te
B y
h
from collections import deque
a t
dq = deque()
© M
t
dq = deque(iterable) dq = deque(maxlen=n)
i g h
yr
dq.append(item) dq.appendleft(item)
o
dq.pop() p dq.popleft()
C
dq.clear() len(dq)
Timings # items = 10_000
# tests = 1_000
(times in seconds)
m y
list deque
d e
c a
append (right) 0.87 0.87 --
te A
pop (right) 0.002 0.0005 x4
B y
a th
insert (left) 20.80
© M
0.84 x25
h t
g
pop (left) x24
i
0.012 0.0005
p yr
C o
Another use case…
producer consumer
m y
grabs data from queue
d e
c a
te A
B y
h
consumer
t
producer
M
queue a
t ©
i g h
adds data to queue
performs work
p yr
C o
Implementing a Producer/Consumer using Subroutines
m y
à run producer to insert all elements into deque
d e
à run consumer to remove and process all elements in deque
c a
def produce_elements(dq):
te A
def consume_elements(dq):
for i in range(1, 100_000):
B y
while len(dq) > 0:
h
dq.appendleft(i) item = dq.pop()
© M
def coordinator():
h t
g
dq = deque()
yr i
producer = produce_elements(dq)
p
consume_elements(dq)
C o
Implementing a Producer/Consumer using Generators
m y
e
à create a limited size deque
à coordinator creates instance of producer generator
c a d
à coordinator creates instance of consumer generator
te A
à producer runs until deque is filled
B y
à yields control back to caller
à consumer runs until deque is empty
a th repeat until producer is "done"
or controller decides to stop
© M
à yields control back to caller
h t
yr i g
o p
C
Implementing a Producer/Consumer using Generators
m y
e
for i in range(1, n): while True:
dq.appendleft(i)
if len(dq) == dq.maxlen:
while len(dq) > 0:
c a
item = dq.pop() d
yield
A
# process item
te
yield
def coordinator():
dq = deque(maxlen=10)
B y
t
producer = produce_elements(dq, 100_000)
consumer = consume_elements(dq)
a h
while True:
try:
© M
t
Notice how yield is not used to yield values
h
next(producer)
break
yr i g
except StopIteration: but to yield control back to controller
o p
finally:
next(consumer)
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Generators can be in different states
def my_gen(f_name):
m y
e
f = open(f_name)
d
try:
for row in f:
c a
A
yield row.split(',')
te
finally:
y
f.close()
th B
create the generator
rows = my_gen()
M a
à CREATED
t ©
i g
run the generator
h
p yr
next(rows) à RUNNING
m y
e
use inspect.getgeneratorstate to see the current state of a generator
c a d
from inspect import getgeneratorstate
te A
B y
g = my_gen()
th
getgeneratorstate(g) à GEN_CREATED
a
row = next(g)
M
getgeneratorstate(g) à GEN_SUSPENDED
©
h t
g
getgeneratorstate(g) à GEN_CLOSED
i
lst(g)
p yr
C o
(inside the generator code while it is running)
getgeneratorstate(g) à GEN_RUNNING
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
So far…
c a
te A
After a value is yielded, the generator is suspended
B y
a th
M
How about sending data to the generator upon resumption?
©
h t
yr i g
Enhancement to generators introduced in Python 2.5 PEP 342
o p
C
Sending data to a generator
c
it can also receive values
te A
B y
h
it is used just like an expression would received = yield
a t
© M
we can combine both
yr i g
p
à works, but confusing!
C o à use sparingly
What's happening?
y
def gen_echo():
while True:
received = yield
e m
print('You said:', received)
c a d
echo = gen_echo() à CREATED
te A
has not started running yet – not in a suspended state
B y
h
next(echo) à SUSPENDED Python has just yielded (None)
a t
generator is suspended at the yield
© M
t
we can resume and send data to the generator at the same time using send()
i g h echo.send('hello')
yr
generator resumes running exactly at the yield
o p
C
the yield expression evaluates to the just received data
m y
received = yield 'python'
d e
c a
generator is
te A
y
'python' is yielded and control is returned to caller
suspended here
th B
caller sends data to generator: g.send('hello')
M a
t ©
i g h
generator resumes
p yr
'hello' is the result of the yield expression
C o
'hello' is assigned to received
m y
d e
c a
Notice that we can only send data if the generator is suspended at a yield
te A
So we cannot send data to a generator that is in a CREATED state – it must be in a SUSPENDED state
B y
def gen_echo():
while True:
a th
received = yield
©
print('You said:', received)M
h t
echo = gen_echo()
yr i g à CREATED echo.send('hello')
next(echo)
o p à SUSPENDED echo.send('hello')
C
à yes, a value has been yielded – and we can choose to just ignore it
à in this example, None has been yielded
Priming the generator
c a
à always use next() to prime
te A
B y
th
Later we'll see how we can "automatically" prime the generator using a decorator
a
© M
h t
yr i g
o p
C
Using yield…
m y
à used for producing data à yield 'Python'
d e
c a
A
à used for receiving data à a = yield (technically this produces None)
y te
Be careful mixing the two usages in your code
th B
à difficult to understand
M a
à sometimes useful
t ©
i g h
yr
à often not needed
o p
C
Example
def running_averager():
total = 0
count = 0
m y
running_average = None
d e
while True:
value = yield running_average
c a
total += value
count += 1
te A
running_average = total / count
B y
averager = running_averager()
a th
next(averager) à primed
© M
à None has been yielded
h t
g
averager.send(10)
i
à value received 10
C o à yields running_average à 10
à suspended and waiting
def read_file(f_name):
m y
f = open(f_name)
d e
a
try:
for row in f:
yield row à yield from f
Ac
finally:
y te
B
f.close()
Suppose the file has 100 rows
a th
rows = read_file('test.txt')
for _ in range(10):
© Mnext(rows)
à read 10 rows
h t
à file is still open
yr i g
o p
à how do we now close the file without iterating through the entire file?
C
Closing a generator
m
We can close a generator by calling its close() method
d e
c a
def read_file(f_name):
f = open(f_name)
te A
try:
B y
for row in f:
yield row
a th
finally:
f.close()
© M
h t
g
rows = read_file('test.txt')
yr
for _ in range(10):
next(rows) i
o p
C
rows.close() à finally block runs, and file is closed
B y
g.close() à Generator close called
h
yield 1
t
à Cleanup here…
a
yield 2
M
except GeneratorExit:
print('Generator close called')
finally:
t ©
i g h
print('Cleanup here…')
p yr
C o
Python's expectations when close() is called
m
à the exception is silenced by Python
y
• the generator exits cleanly (returns)
e
à to the caller, everything works "normally"
d
• some other exception is raised from
c a
à exception is seen by caller
inside the generator
te A
if the generator "ignores" the
B y
à Python raises a RuntimeError:
GeneratorExit exception and yields
another value
a thgenerator ignored GeneratorExit
© M
t
in other words, don't try to catch and ignore a GeneratorExit exception
h
yr i g
it's perfectly OK not to catch it, and simply let it bubble up
o p
C
def gen(): g = gen()
yield 1 next(g)
yield 2 g.close()
Use in coroutines
y te
h
à coroutine opens a transaction when it is primed (next)
t B
M a
à coroutine receives data to write to the database
t ©
à coroutine commits the transaction when close() is called (GeneratorExit)
i g h
yr
à coroutine aborts (rolls back) transaction if some other exception occurs
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Sending things to coroutines
c a
à sends (throws) a GeneratorExit exception to coroutine
te A
we can also "send" any exception to the coroutine
.throw(exception)
a th
© M
à the exception is raised at the point where the coroutine is suspended
h t
yr i g
o p
C
How throw() is handled
m y
à generator does not catch the exception (does nothing)
d e
à exception propagates to caller
c a
te A
B y
th
à generator catches the exception, and does something
a
M
à yields a value
à exits (returns)
t ©
i g h
à raises a different exception
p yr
C o
Catch and yield
m y
à generator catches the exception
d e
à handles and silences the exception
c a
à yields a value à generator is now SUSPENDED
te A
B y
th
à yielded value is the return value of the .throw() method
a
© M
t
def gen():
while True:
i g h
yr
try: None has been yielded
received = yield
o p
print(received)
C
except ValueError:
print('silencing ValueError')
Catch and exit
Ac
te
à caller receives a StopIteration exception à generator is now CLOSED
B y
this is the same as calling next() or send() to a generator that returns instead of yielding
a th
can think of throw() as same thing as send(), but causes an exception to be sent
instead of plain data
© M
def gen():
h t
while True:
try:
yr i g
o p
received = yield
print(received)
Cexcept ValueError:
print('silencing ValueError')
return None StopIteration
is seen by caller
Catch and raise different exception
Ac
te
à new exception propagates to caller à generator is now CLOSED
B y
def gen():
while True:
a th
try:
received = yield
© M
print(received)
h t
g
except ValueError:
yr i
print('silencing ValueError')
p
raise CustomException CustomException
C o is seen by caller
close() vs throw()
y te
B
yes, but…
a th
with close(), Python expects the GeneratorExit, or StopIteration exceptions to propagate,
and silences it for the caller
© M
h t
if we use throw() instead, the GeneratorExit exception is raised inside the caller context (if
the generator lets it)
yr i g
try:
o p
C
gen.throw(GeneratorExit())
except GeneratorExit:
pass
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
We always to prime a coroutine before using it
m y
e
à very repetitive
B y
next(g)
a th
à primes the coroutine
or g.send(None)
© M
h t
yr i g
This is a perfect example of using a decorator to do this work for us!
o p
C
Creating a function to auto prime coroutines
m y
e
def prime(gen_fn):
g = gen_fn()
a
creates the generator
c d
next(g)
te A
primes the generator
y
returns the primed generator
return g
th B
def echo():
while True:
M a
received = yield
t ©
h
print(received)
yr i g
o p
echo_gen = prime(echo)
C
echo_gen.send('hello') à 'hello'
A decorator approach
m y
We still have to remember to call the prime function for our echo coroutine before we can use it
d e
Since echo is a coroutine, we know we always have to prime it first
c a
te A
So let's write a decorator that will replace our generator function with another function
y
that will automatically prime it when we create an instance of it
B
def coroutine(gen_fn):
a th
def prime():
g = gen_fn()
© M
next(g)
h t
return g
return prime
yr i g @coroutine
o p def echo():
while True:
C received = yield
print(received)
Understanding how the decorator works
def coroutine(gen_fn):
m y
def prime(): def echo():
d e
a
g = gen_fn() while True:
next(g)
return g
received = yield
print(received)
Ac
return prime
y te
B
echo = coroutine(echo) [same effect as using @coroutine]
a th
M
à echo function is now actually the prime function
à prime is a closure
t ©
à free variable gen_fn is echo
i g h
calling echo()
p yr
C o
à calls prime() with gen_fn = echo g = echo()
next(g)
return g
Expanding the decorator
def coroutine(gen_fn):
m y
def prime():
d e
a
g = gen_fn()
c
à cannot pass arguments to the generator function
next(g)
return g
return prime
te A
B y
a th
def coroutine(gen_fn):
© M
t
def prime(*args, **kwargs):
h
g
g = gen_fn(*args, **kwargs)
next(g)
yr i
p
return g
o
return prime
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Recall…
def subgen():
m y
for i in range(10):
d e
a
yield i
Ac
te
We could consume the data from subgen in another def delegator():
generator this way:
h
yield value
a t
© M
t
Instead of using that loop, we saw we could just write: def delegator():
p yr
C o
With either definition we can call it this way: d = delegator()
next(d)
etc…
What is going on exactly?
m y
caller delegator subgen
d e
next(d) yield from subgen
c a
yield value
next next
te A
B y
yield value
a th
yield value
© M
2-way communications
h t
yr i g
p
Can we send(), close() and throw() also? Yes!
C o
How does the delegator behave when subgenerator returns?
m y
e
it continues running normally
c a d
A
def delegator(): def subgen():
te
yield from subgen() yield 1
y
yield 'subgen closed' yield 2
th B
d = delegator()
M a
next(d) à 1
t ©
next(d) à 2
i g h
next(d)
next(d)
C o à StopIteration
Inspecting the subgenerator
from inspect import getgeneratorlocals, getgeneratorstate
m y
e
def delegator(): def subgen(): d = delegator()
a = 100
s = subgen()
yield 1
yield 2
a d
getgeneratorstate(d) à GEN_CREATED
c
A
yield from s getgeneratorlocals(d) à {}
te
yield 'subgen closed'
B y
h
next(d) à 1 getgeneratorstate(d) à GEN_SUSPENDED
a t
getgeneratorlocals(d) à {'a': 100, 's': <gen object>}
s = getgeneratorlocals(d)['s']
© M getgeneratorstate(s) à GEN_SUSPENDED
h t
next(d) à 2
yr i g d à SUSPENDED s à SUSPENDED
o p
next(d) à 'subgen closed' d à SUSPENDED s à CLOSED
C
next(d) à StopIteration d à CLOSED s à CLOSED
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
yield from and send()
c a
via a delegator à yield from
te A
B y
caller delegator
a th
M
subgenerator
next next
t ©
yield
i g h yield
p yr
send send
C o
Priming the subgenerator coroutine
m
d e
a
How does this work with yield from?
y
B
received = yield
a th print(received)
d = delegator()
© M
h t
before we can send to d we have to prime it next(d)
yr
What about coro()? i g
o p
C
yield from will automatically prime the coroutine when necessary
Sending data to the subgenerator
m y
e
Once the delegator has been primed
B y
yield from coro()
t
while True:
a h
received = yield
M
print(received)
d = delegator()
t ©
i g h
yr
next(d)
o p
d.send('python') à python is printed by coroutine
C
Control Flow
m y
caller delegator subgenerator
d e
next yield from coro()
c a
yield
print('next line of code')
te A
B y
a th
© M
t
delegator is "stuck" here until subgenerator closes
i g h
yr
then it resumes running the rest of the code
o p
C
Multiple Delegators à pipeline
m y
e
def coro(): def gen1():
…
yield
yield from gen2()
c a d
A
… def gen2():
te
yield from coro()
B y
d = gen1()
a th
© M
caller gen1
h t gen2 coro
yr i g
o p
this can even be recursive
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Closing the subgenerator
m y
def delegator(): def subgen():
d e
a
… …
yield from subgen()
…
yield
…
Ac
y te
th B
next
M a
delegator code is effectively "paused" here as long as subgen is not closed
t ©
when subgen closes
i g h
p yr
delegator resumes running exactly where it was paused
C o
Closing the delegator
m y
def delegator(): def subgen():
d e
a
… …
yield from subgen()
…
yield
…
Ac
y te
th B
next
M a
t ©
d = delegator()
i g h
yr
d.close() à closes the subgenerator
o p
C
à immediately closes the delegator as well
Returning from a generator
m y
A generator can return
d e
c a
à StopIteration
te A
B y
th
The returned value is embedded in the StopIteration exception
a
© M
t
à we can extract that value try:
i g h next(g)
except StopIteration as ex:
p yr print(ex.value)
C o
à so can Python!
Returning from a subgenerator
m y
yield from is an expression
d e
c a
It evaluates to the returned value of the subgenerator
te A
B y
result = yield from subgen()
a th
def subgen():
M
…
yield
t © …
i g h return result
p yr
C o
Returning from a subgenerator
def delegator():
m
def subgen(): y
…
d
…
e
result = yield from subgen()
…
c a yield
…
te A return result
B y
a th delegator receives return value
M
next and continues running normally
t ©
i g h
yr
yield from à establishes conduit
o p
subgenerator returns à conduit is closed
m y
We can throw exceptions into a generator using the throw() method
d e
c a
à works with delegation as well
te A
B y
def delegator():
a th def subgen():
M
yield from subgen() …
yield
t © …
d = delegator()
i g h
p yr
o
d.throw(Exc)
C
Delegator does not intercept the exception à just forwards it to subgenerator
caller
m y
d e
c a
delegator
te A
y
may handle: silence or propagate up (same or different exception)
th B
M a
subgen
t ©
h
may handle: silence or propagate up (same or different exception)
yr i g
o p exception
C
throw something else
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Data pipelines (Pulling)
data consumer
pull pull pull
m y
e
filter transform data source
(sink)
c a d
(producer)
te A
à use yield and iteration to pull data through the pipeline
B y
consumer filter_data
a th parse_data read_data
M
iterate filter_data() iterate parse_data() iterate read_data() yield row
write data to file
t ©
yield select rows only transform data
i g h yield row
yr
pull pull pull
o p
C
Data pipelines (Pushing)
te A consumer
(sink)
B y
a th
M
Example
generate integers
push
t ©
square number
push
filter odds only
push
log results
i g h
p yr
C o
Can get crazier…
m y
broadcasting
d e
c a
te A
filter …
B y
source transformer
th
broadcaster
a
transformer …
© M
h t filter …
yr i g
p
pushes data through the pipeline
o
C