0% found this document useful (0 votes)
12 views371 pages

Python+Deep+Dive+2

Uploaded by

udaycignex
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
12 views371 pages

Python+Deep+Dive+2

Uploaded by

udaycignex
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 371

m y

d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
What this course is about
m y
the Python language
d e
c a
à canonical CPython 3.6+ implementation

A
the standard library

becoming an expert Python developer


y te
idiomatic Python
th B
M a
obtaining a deeper understanding of the Python language

t ©
i g h and the standard library

p yr
C o
this is NOT an introductory course

à refer to prerequisites video or course description


Included Course Materials
m y
d e
a
lecture videos

coding videos
Ac
y te
Jupyter notebooks

th B
projects and solutions

M a
t ©
github repository for all code

i g h
yr
https://github.com/fbaptiste/python-deepdive

o p
C
Sequence Types
m y
d e
a
what are sequences?

slicing à ranges
Ac
y te
B
shallow vs deep copy

the sequence protocol


a th
© M
implementing our own sequence types

h t
i g
list comprehensions

yr
à closures

o p
sorting à sort key functions

C
Iterables and Iterators
m y
more general than sequence types
d e
c a
differences between iterables and iterators

te A
B y
h
lazy vs eager iterables

a t
the iterable protocol

© M
h t
the iterator protocol

yr i g
writing our own custom iterables and iterators

o p
C
Generators
m y
what are generator?
d e
c a
generator functions

te A
generator expressions
B y
a th
M
the yield statement

t ©
h
the yield from statement

yr i g
how generators are related to iterators

o p
C
Iteration Tools
m y
Many useful tools for functional approach to iteration
d e
à built-in

c a à itertools module

Aggregators
te A à functools module

B y
h
Slicing iterables

a t
M
Selection and filtering

t
Infinite iterators
©
i g h
yr
Mapping and reducing

o p
Grouping

C Combinatorics
Context Managers
m y
what are context managers?
d e
c a
the context manager protocol
te A
B y
why are they so useful?

a th
© M
creating custom context managers using the context manager protocol

h t
yr i g
creating custom context managers using generator functions

o p
C
Projects
m y
d e
a
project after each section

Ac
te
should attempt these yourself first – practice makes perfect!

B y
th
solution videos and notebooks provided

a
M
à my approach

©
h t à more than one approach possible

yr i g
o p
C
Extras
m y
will keep growing over time
d e
c a
important new features of Python 3.6 and later

te A
B y
best practices

a th
M
random collection of interesting stuff

©
h t
g
additional resources

yr i
o p
send me your suggestions!

C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Python 3: Deep Dive (Part 2) - Prerequisites

This course assumes that you have in-depth knowledge of the following:
m y
d e
functions and function arguments
c a
def my_func(p1, p2, *args, k1=None, **kwargs)

lambdas lambda x, y: x+y

te A
packing and unpacking iterables
B y
my_func(*my_list)

a th
f, *_, l = (1, 2, 3, 4, 5)

closures

© M nested scopes free variables

decorators
h t @my_decorator @my_decorator(p1, p2)

yr i g
p
Boolean truth values bool(obj)

C o
named tuples namedtuple('Data', 'field_1 field_2')

== vs is id(obj)
Python 3: Deep Dive (Part 2) - Prerequisites

This course assumes that you have in-depth knowledge of the following:
m y
d e
zip zip(list1, list2, list3)

c a
map map(lambda x: x**2, my_list)

te A
reduce
B y
reduce(lambda x, y: x * y, my_list, 10)

filter
a th
filter(lambda p: p. age > 18, persons)

sorted
© M
sorted(persons, lambda p: p.name.lower())

h t
imports
i g
import math

yr
o p from math import sqrt, sin

C from math import sqrt as sq

from math import *


Python 3: Deep Dive (Part 2) - Prerequisites

You should have a basic understanding of creating and using classes in Python
m y
d e
class Person:
c a
A
def __init__(self, name, age):

te
self.name = name

y
self.age = age

@property
th B
def age(self):
return self._age
M a
t ©
h
@age.setter

i g
def age(self, age):

yr
if value <= 0:

p
raise ValueError('Age must be greater than 0')

C oelse:
self._age = age
Python 3: Deep Dive (Part 2) - Prerequisites

y
You should understand how special functionality is implemented in Python using special methods
class Point:

e m
def __init__(self, x, y):
self.x = x

c a d
A
self.y = y

def __repr__(self):

y te
B
return f'Point(x={self.x}, y={self.y})'

def __eq__(self, other):


if not isinstance(other, Point):
a th
return False
else:
© M
t
return self.x == other.x and self.y == other.y

h
yr i g
def __gt__(self, other):
if not isinstance(other, Point):

o p
return NotImplemented

C
else:
return self.x ** 2 + self.y ** 2 > other.x**2 + other.y**2

def __add__(self, other):



Python 3: Deep Dive (Part 2) - Prerequisites

You should also have a basic understanding of:

m y
d e
a
for loops, while loops break continue else

Ac
te
branching if … elif… else…

B y
exception handling try:

a th
my_func()

© M except ValueError as ex:


handle_value_error()

h t finally:

g
cleanup()

yr i
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
This course is all about the Python language and the standard library

I do not make use of any 3rd party library


m y
d e
a
EXCEPT Jupyter Notebooks

à I can provide you fully annotated code


Ac
à all notebooks are downloadable
y te
th B
à but you should really use github

a
https://github.com/fbaptiste/python-deepdive
M
t ©
To follow along you will therefore need:

i g h
yr
CPython 3.6 or higher

p
Jupyter Notebook

o
CYou favorite Python editor: VSCode, PyCharm, command line + VIM/Nano/…

I use Anaconda's Python installation: https://conda.io/docs/index.html


m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
y
what are sequences?

indexing starting at 0
e m
c a
slices include lower bound index, but exclude upper bound index d
te A
y
slicing

slice objects
th B
M a
modifying mutable sequences

t ©
h
copying sequences – shallow and deep

yr i g implementing custom sequence types

o p
C
sorting

list comprehensions
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
What is a sequence?

m y
In Math: S = x 1, x 2, x 3, x 4, … (countable sequence)
d e
c a
A
Note the sequence of indices: 1, 2, 3, 4, …

y te
th B
We can refer to any item in the sequence by using it's index number x2 or S[2]

M a
So we have a concept of the first element, the second element, and so on… à positional ordering

t ©
g h
Python lists have a concept of positional order, but sets do not

i
A list is a sequence type

yr
A set is not

o p
C
In Python, we start index numbers at 0, not 1

S = x 0, x 1, x 2, x 3, … à
(we'll see why later)

S[2] is the third element


Built-In Sequence Types

m y
mutable lists bytearrays

d e
strings
c a
A
immutable tuples range bytes

y te
more limited than lists, strings and tuples

th B
in reality a tuple is more than just a sequence type

M a
Additional Standard Types:
t ©
collections package namedtuple

i g h deque

p yr array module array

C o
Homogeneous vs Heterogeneous Sequences

m y
Strings are homogeneous sequences

d e
each element is of the same type (a character)
c a
'python'

te A
B y
h
Lists are heterogeneous sequences

each element may be a different type


a t [1, 10.5, 'python']

© M
h t
i g
Homogeneous sequence types are usually more efficient (storage wise at least)

yr
o p
e.g. prefer using a string of characters, rather than a list or tuple of characters

C
Iterable Type vs Sequence Type

m y
What does it mean for an object to be iterable?

d e
c a
it is a container type of object and we can list out the elements in that object one by one

te A
So any sequence type is iterable

B y
l = [1, 2, 3] for e in l

a th
l[0]

© M
h t
yr i g
But an iterable is not necessarily a sequence type à iterables are more general

o p
s = {1, 2, 3} for e in s

C s[0]
Standard Sequence Methods
Built-in sequence types, both mutable and immutable, support the following methods

m y
x in s s1 + s2 concatenation
d e
x not in s s * n (or n * s)
c
(n an integer)
a repetition

min(s)
te A
y
len(s) (if an ordering between elements of s is defined)
max(s)

th B
a
This is not the same as the ordering (position) of elements

M
inside the container, this is the ability to compare pairwise
elements using an order comparison (e.g. <, <=, etc.)

t ©
s.index(x)
i g h
index of first occurrence of x in s

p
s.index(x, i)
yr index of first occurrence of x in s at or after index i

C o
s.index(x, i, j) index of first occurrence of x in s at or after index i and before index j
Standard Sequence Methods

s[i] the element at index i


m y
d e
s[i:j] the slice from index i, to (but not including) j
c a
te A
s[i:j:k]

B y
extended slice from index i, to (but not including) j, in steps of k

a th
Note that slices will return in the same container type

© M
t
We will come back to slicing in a lot more detail in an upcoming video

i g h
yr
range objects are more restrictive:

o p
no concatenation / repetition

C
min, max, in, not in not as efficient
Hashing

m y
Immutable sequence types may support hashing hash(s)

d e
but not if they contain mutable types!
c a
te A
B y
We'll see this in more detail when we look at Mapping Types

a th
© M
h t
yr i g
o p
C
Review: Beware of Concatenations

x = [1, 2] a = x + x a à [1, 2, 1, 2]
m y
d e
c a
x = 'python' a = x + x
A
a à 'pythonpython'

te
B y
x = [ [0, 0] ] a = x + x
th
a à [ [0, 0], [0, 0] ]

a
© M a[0] is x[0]

h t a[1] is x[0]
id(x[0])

yr
==
i g id(a[0]) == id(a[1])

o p
C
a[0][0] = 100 a à [ [100, 0], [100, 0] ]
Review: Beware of Repetitions

a = [1, 2] * 2 a à [1, 2, 1, 2]
m y
d e
a
a = 'python' * 2 a à 'pythonpython'

a = [ [0, 0] ] * 2 a à [ [0, 0], [0, 0] ]


Ac
y te
th B
id == M
id(a[0])
a== id(a[1])

t ©
i g h
a[0][0] = 100

p yr a à [ [100, 0], [100, 0] ]

C o
Same happens here, but because strings are immutable it's quite safe

a = ['python'] * 2 a à ['python', 'python']


m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Mutating Objects

y
0xFF255
Eric

m
names = ['Eric', 'John'] names
John

d e
c a
Eric 0xAA2345
names = names + ['Michael']

te A
John
Michael

This is NOT mutation!


B y
a th
Mutating an object means changing the object's state without creating a new object

© M
h t Eric
Eric
0xFF255

yr i g
names = ['Eric', 'John'] names John
John
Michael

p
names.append('Michael')

C o
Mutating Using []

m y
e
s[i] = x element at index i is replaced with x

s[i:j] = s2
c a
slice is replaced by the contents of the iterable s2 d
del s[i] removes element at index i
te A
B y
h
del s[i:j] removes entire slice

a t
© M
We can even assign to extended slices: s[i:j:k] = s2

h t
yr i g
We will come back to mutating using slicing in a lot more detail in an

o p
upcoming video

C
Some methods supported by mutable sequence types such as lists

s.clear() removes all items from s

m y
d e
s.append(x) appends x to the end of s

c a
s.insert(i, x) inserts x at index i
te A
B y
s.extend(iterable)
th
appends contents of iterable to the end of s

a
s.pop(i)

© M
removes and returns element at index i

h t
s.remove(x)

yr i g
removes the first occurrence of x in s

s.reverse()
o p does an in-place reversal of elements of s

C
s.copy() returns a shallow copy
and more…
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Valid Questions

Why does sequence indexing start at 0, and not 1?


m y
d e
Why does a sequence slice s[i:j] include s[i], but exclude s[j]?
c a
te A
this is not just an arbitrary choice
B y
à there are rational and practical reasons behind doing so

a th
M
We want to determine how we should handle sequences of consecutive integers

©
t
à represent positions of elements in a sequence

h
yr i g
p
['a', 'b', 'c', 'd']

o
C 1

0
2

1
3

2
4

3
Slice Bounds

Consider the following sequence of integers 1, 2, 3, …, 15


m y
d e
a
How can we describe this range of numbers without using an ellipsis (…)?

c
a) 1 <= n <= 15

te A
b) 0 < n <= 15

B y
c) 1 <= n < 16

a th
d)
M
0 < n < 16

t ©
h
(b) and (d) can become odd at times.

yr i g
Suppose we want to describe the unsigned integers 0, 1, 2, …, 10

o p
C
Using (b) or (d) we would need to use a signed integer for the lower bound:

b) -1 < n <= 10
d) -1 < n < 11
Now consider this sequence: 2, 3, …, 16

a) 2 <= n <= 16
m y
c) 2 <= n < 17
d e
c a
How many elements are in this sequence? 15

te A
B y
th
Calculating number of elements from bounds in (a) and (c)

a
a) 15 = 16 – 2 + 1
© M
# = upper – lower + 1

h t
g
c) 15 = 17 – 2

i
# = upper – lower

p yr
C o
So, (c) seems simpler for that calculation

We'll get to a second reason in a bit, but for now we'll use convention (c)
Starting Indexing at 0 instead of 1

When we count elements we naturally start counting at 1, so why start indexing at 0?


m y
d e
Consider the following sequence:

c a
2, 3, 4, …, 16
te A
sequence length: 15

index n (1 based) 1, 2, 3, …, 15
B y
1 <= n < 16 upper bound = length + 1

index n (0 based) 0, 1, 2, …, 14
a th 0 <= n < 15 upper bound = length

© M
h t
For any sequence s, the index range is given by:

yr i g
0 based: 0 <= n < len(s)

o p
1 based: 1 <= n < len(s) + 1

C
So, 0 based appears simpler
Another reason for choosing 0 based indexing

Consider this sequence:


m y
a, b, c, d, … z
d e
c a
1 based 1, 2, 3, 4, …, 26

te A
0 based 0, 1, 2, 3, …, 25

B y
How many elements come before d?
a th
3 elements

1 based index(d) à 4
© M 4-1 elements

h t
g
0 based index(d) à 3 3 elements

yr i
o p
C
So, using 0 based indexing, the number of elements that precede an element
at some index
à is the index itself
Summarizing so far…

choosing 0 based indexing for sequences

m y
describing ranges of indices using range(l, u) à l <= n < u
d e
c a
we have the following results

te A
the indices of any sequence s are given by:
y
range(0, len(s))

B
[0 <= n < len(s)]
first index: 0 last index:

a thlen(s)-1

number of indices before index n:

© M n

h t
i g
the length of a range(l, u) is given by:

yr
l - u

o
1
p 2
s = [a, b, c, …, z]
25
len(s) à 26

C
indices à range(0, 26)
n elements precede s[n]
Slices

Because of the conventions on starting indexing at 0 and defining ranges using [lower, upper)

m y
we can think of slicing in these terms:
d e
a
inclusive exclusive

Each item in a sequence is like a box, with the indices between the boxes:
Ac
y te
B
6 is the length of the sequence

h
a b c d e f
0 1 2 3 4 5 6
a t
© M
s[2:4]
t
à [c, d]

h
yr i g
p
First 2 elements: s[0:2] s[:2]

C
Everything else:
o s[2:6]

In general we can split a sequence into two


s[2:]

s[:k] s[k:]
with k elements in the first subsequence:
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Why copy sequences?

Mutable sequences can be modified.

m y
d
Sometimes you want to make sure that whatever sequence you are working with cannot
e
c a
be modified, either inadvertently by yourself, or by 3rd party functions

te A
We saw an example of this earlier with list concatenations and repetitions.

Also consider this example: def reverse(s):


B y
t
s.reverse()
return s
a h
M
s = [10, 20, 30]
new_list = reverse(s)

t © We should have passed it a copy of our list if we

i g h did not intend for our original list to be modified

yr
new_list à [30, 20, 10]

p
s à [30, 20, 10]

C o
Soapbox def reverse(s):
s.reverse()
return s
Generally we write functions that do not modify the contents of their arguments.
m y
d e
a
But sometimes we really want to do so, and that's perfectly fine à in-place methods

Ac
However, to clearly indicate to the caller that something is happening in-place, we should not

te
return the object we modified

B y
If we don't return s in the above example, the caller will probably wonder why not?

a th
So, in this case, the following would be a better approach:

def reverse(s):
© M
s.reverse()

h t
yr i g
and if we do not do in-place reversal, then we return the reversed sequence

o p
def reverse(s):

C
s2 = <copy of s>
s2.reverse()
return s2
How to copy a sequence

We can copy a sequence using a variety of methods: s = [10, 20, 30]

m y
Simple Loop cp = []

d e
a
for e in s: definitely non-Pythonic!
cp.append(e)

Ac
List Comprehension cp = [e for e in s]

y te
th B
The copy method
a
cp = s.copy() (not implemented in immutable types, such as tuples or strings)

M
Slicing
t ©
cp = s[0:len(s)] or, more simply cp = s[:]

i g h
The copy module
p yr
list()
C o list_2 = list(list_1)

Note: tuple_2 = tuple(tuple_1) and t[:] does not create a new tuple!
Watch out when copying entire immutable sequences

l1 = [1, 2, 3]
m y
l2 = list(l1) l2 à [1, 2, 3] id(l1) ≠ id(l2)
d e
c a
t1 = (1, 2, 3)
te A
B y
h
t2 = tuple(t1) t2 à (1, 2, 3) id(t1) = id(t2) same object!

a t
t1 = (1, 2, 3)
© M
t2 = t1[:]
h t
t2 à (1, 2, 3) id(t1) = id(t2) same object!

yr i g
o p
Same thing with strings, also an immutable sequence type

C
Since the sequence is immutable, it is actually OK to return the same sequence
Shallow Copies

m
Using any of the techniques above, we have obtained a copy of the original sequence
y
d e
s = [10, 20, 30]

c a
A
cp = s.copy()

te
cp[0] = 100 cp à [100, 20, 30] s à [10, 20, 30]

B y
th
Great, so now our sequence s will always be safe from unintended modifications?

a
Not quite…

s = [ [10, 20], [30, 40] ]

© M
cp = s.copy()

h t
cp[0] = 'python'

yr i g cp à ['python', [30, 40] ] s à [ [10, 20], [30, 40] ]

o p
cp[1][0] = 100

C
cp à ['python', [100, 40] ] s à [ [10, 20], [100, 40] ]
Shallow Copies

What happened?

m y
d e
a
When we use any of the copy methods we saw a few slides ago, the copy essentially copies
all the object references from one sequence to another

Ac
te
s = [a, b] id(s) à 1000 id(s[0]) à 2000 id(s[1]) à 3000

B y
h
cp = s.copy() id(cp) à 5000 id(cp[0]) à 2000 id(cp[1]) à 3000

a t
M
When we made a copy of s, the sequence was copied, but it's elements point to the

©
t
same memory address as the original sequence elements

i g h
yr
The sequence was copied, but it's elements were not

o p
This is called a shallow copy

C
Shallow Copies

m y
e
s = [ 1, 2 ]
cp = s.copy()

c a d
0xF100 1 0xA100 2
A
0xA200

te
y
s

th B
cp.append(3) cp
0xF200

M a 3 0xA300

t ©
cp[1] = 3

i g h
p yr
If the elements of s are immutable, such as integers in this example,

o
then not really important

C
Shallow Copies

But, if the elements of s are mutable, then it can be important


m y
d e
c a
s = [ [0, 0], [0, 0] ]

te A
y
cp = s.copy() s[0] cp[0] s[1] cp[1]

th B
a
cp[0][0] = 100
0xF100 [0, 0] 0xA100 [0, 0] 0xA200

© M s

cp à [ [100, 0], [0, 0] ]

h t
yr i g
s à [ [100, 0], [0, 0] ]
cp
0xF200

o p
C
Deep Copies

y
So, if collections contain mutable elements, shallow copies are not sufficient to ensure the copy

m
e
can never be used to modify the original!

Instead, we have to do something called a deep copy.


c a d
For the previous example we might try this:
te A
0xF100

B y [0, 0] 0xA100 [0, 0] 0xA200

h
s
s = [ [0, 0], [0, 0] ]

a t
cp = [e.copy() for e in s]

© M 0xF200 [0, 0] 0xA300 [0, 0] 0xA400

In this case:
h t cp

cp is a copy of s
yr i g
o p
but also, every element of cp is a copy of the corresponding element in s

C
shallow copy
Deep Copies

m
But what happens if the mutable elements of s themselves contain mutable elements?
y
d e
s = [ [ [0, 1], [2, 3] ], [ [4, 5], [6, 7] ] ]

c a
te A
B y
a th
We would need to make copies at least 3 levels deep to ensure a true deep copy

M
Deep copies, in general, tend to need a recursive approach

©
h t
yr i g
o p
C
Deep Copies

m y
e
Deep copies are not easy to do. You might even have to deal with circular references

b
c a d
a = [10, 20]
[a, 30]

te A
y
b = [a, 30]
a.append(b) a
[10,20,
[10, 20]b]
th B
M a
t ©
If you wrote your own deep copy algorithm, you would need to handle this circular reference!

i g h
p yr
C o
Deep Copies

m y
In general, objects know how to make shallow copies of themselves

d e
c a
built-in objects like lists, sets, and dictionaries do - they have a copy() method

te A
y
The standard library copy module has generic copy and deepcopy operations

The copy function will create a shallow copy


th B
M a
The deepcopy function will create a deep copy, handling nested objects, and circular
references properly

t ©
i g h
Custom classes can implement the __copy__ and __deepcopy__ methods to allow you to
override how shallow and deep copies are made for you custom objects

p yr
o
We'll revisit this advanced topic of overriding deep copies of custom

C
classes in the OOP series of this course.
x
Deep Copies [10, 20] 0xA100

Suppose we have a custom class as follows:


obj 0xF100

m y
def MyClass: .a
d e
def __init__(self, a):

c a
cp_shallow 0xF200

A
self.a = a
.a

from copy import copy, deepcopy


y te 0xF300

B
cp_deep

x = [10, 20]
a th .a

M
copy of x (deep)

©
obj = MyClass(x) x is obj.a à True [10, 20] 0xA200

h t
g
cp_shallow = copy(obj) cp_shallow.a is obj.a à True

yr i
o p
cp_deep = deepcopy(obj) cp_deep.a is obj.a à False

C
Deep Copies lst[0]
x
def MyClass:

m y
e
def __init__(self, a): lst[1]
lst
self.a = a

c a
y
.a d
x = MyClass(500)

te A this is not a circular reference

y
y = MyClass(x) y.a is x à True

B
but there is a relationship
lst = [x, y]

a th between y.a and x

M
cp = deepcopy(lst)

t © cp[0] relationship between cp_y.a and cp_x


is maintained!

h
cp[0] is x à False cp_x

yr
cp[1] is y à False
i g cp cp[1]

o p
cp[1].a is x à False cp_y

C
.a

cp[1].a is cp[0] à True


m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
We've used slicing in this course before, but now it's time to dive deeper into slicing

m y
Slicing relies on indexing à only works with sequence types

d e
c a
Mutable Sequence Types Immutable Sequence Types
te A
B y
extract data
a th
extract data

assign data

© M
h t
Example
yr i g
l = [1, 2, 3, 4, 5] l[0:2] à [1, 2]

o p
C
l[0:2] = ('a', 'b', 'c')

l[0:2] à ['a', 'b', 'c', 3, 4, 5]


The Slice Type

Although we usually slice sequences using the more conventional notation:


m y
my_list[i:j]
d e
c a
slice definitions are actually objects à of type slice
te A
B y
s = slice(0, 2) type(s) à slice

a th s.start à 0

M
s.end à 2

l = [1, 2, 3, 4, 5]
t © l[s] à [1, 2]

i g h
p yr
This can be useful because we can name slices and use symbols

o
instead of a literal subsequently

C Similar to how you can name ranges in Excel…


Slice Start and Stop Bounds

m y
e
[i:j] start at i (including i) stop at j (excluding j)

all integers k where i <= k < j


c a d
also remember that indexing is zero-based
te A
B y
It can be convenient to think of slice bounds this way
a th
© M
a b c
h t d e f
0

yr
1
i g 2 3 4 5 6

o p [1:4]

C
Effective Start and Stop Bounds

Interestingly the following works: l = ['a', 'b', 'c', 'd', 'e', 'f']
m y
d e
a
l[3:100] à ['d', 'e', 'f'] No error!

Ac
te
we can specify slices that are "out of bounds"

B y
In fact, negative indices work too:
a t
l[-1]h à 'f'

© M l[-3: -1] à ['d', 'e']

h t
a

yr
b
i g c d e f
0

o p 1 2 3 4 5 6

C -6 -5 -4 -3 -2 -1
Step Value

Slices also support a third argument – the step value [i:j:k]


y
(a.k.a stride)

m
d e
slice(i, j, k)
When not specified, the step value defaults to 1

c a
0 1 2 3 4
l = ['a', 'b', 'c', 'd', 'e', 'f']
5

te A
-6 -5 -4 -3 -2 -1

B y
a th
M
l[0:6:2] 0, 2, 4 à ['a', 'c', 'e']

t ©
h
l[1:6:3] 1, 4 à ['b', 'e']

yr i g
p
l[1:15:3] 1, 4 à ['b', 'e']

C
l[-1:-4:-1] o -1, -2, -3 à ['f', 'e', 'd']
Range Equivalence

m y
Any slice essentially defines a sequence of indices that is used to select elements for another
sequence
d e
c a
A
In fact, any indices defined by a slice can also be defined using a range

te
B y
The difference is that slices are defined independently of the sequence being sliced

a th
M
The equivalent range is only calculated once the length of the sequence being sliced is known

©
h t
g
Example

yr i
p
[0:100] l sequence of length 10 à range(0, 10)

C o l sequence of length 6 à range(0, 6)


Transformations [i:j]

y
The effective indices "generated" by a slice are actually dependent on the length of the

m
e
sequence being sliced

Python does this by reducing the slice using the following rules:

c a d
seq[i:j]
0 1

te A 2 3 4
l = ['a', 'b', 'c', 'd', 'e', 'f']
5

-6

B y -5 -4 -3 -2 -1
length = 6
if i > len(seq) à len(seq)

a th[0:100] à range(0, 6)
if j > len(seq) à len(seq)

© M
if i < 0
h t
à max(0, len(seq) + i) [-10:3] à range(0, 3)
if j < 0

yr i g
à max(0, len(seq) + j) [-5:3] à range(1, 3)

o p
i omitted or None

C
à 0 [:100] à range(0, 6)

j omitted or None à len(seq) [3:] à range(3, 6)


[:] à range(0, 6)
Transformations [i:j:k], k > 0

With extended slicing things change depending on whether k is negative or positive

m y
[i:j:k] = {x = i + n * k | 0 <= n < (j-i)/k}
d e
c a
stopping when j is reached or exceeded,

A
k > 0 the indices are: i, i+k, i+2k, i+3k, …, < j

te
but never including j itself
0

B y1 2 3 4
l = ['a', 'b', 'c', 'd', 'e', 'f']
5

a th -6 -5 -4 -3 -2 -1
length = 6
if i, j > len(seq) à len(seq)

© M [0:100:2] à range(0, 6, 2)

if i, j < 0

h t
à max(0, len(seq) + i/j) [-10:100:2] à range(0, 6, 2)

yr i g [-5:100:2] à range(1, 6, 2)

o p
i omitted or None à0 [:6:2] à range(0, 6, 2)

C
j omitted or None à len(seq) [1::2] à range(1, 6, 2)
[::2] à range(0, 6, 2)

so same rules as [i:j] – makes sense, since that would be the same as [i:j:1]
Transformations [i:j:k], k < 0

[i:j:k] = {x = i + n * k | 0 <= n < (j-i)/k}

m y
k<0 the indices are: i, i+k, i+2k, i+3k, …, > j
d e
0 1

c
2
a 3 4 5

A
l = ['a', 'b', 'c', 'd', 'e', 'f']

te
-6 -5 -4 -3 -2 -1

y
length = 6
if i, j > len(seq) à len(seq) - 1

th B
[5:2:-1] à range(5, 2, -1)

M a [10:2:-1] à range(5, 2, -1)

if i, j < 0
©
à max(-1, len(seq) + i/j)

t
[5:-2:-1] à range(5, 4, -1)

i g h [-2:-5:-1] à range(4, 1, -1)

p yr [-2:-10:-1] à range(4, -1, -1)

C o
i omitted or None à len(seq) - 1 [:-2:-1] à range(5, 4, -1)

j omitted or None à -1 [5::-1] à range(5, -1, -1)


[::-1] à range(5, -1, -1)
Summary

[i:j] [i:j:k] k > 0 [i:j:k] k < 0


m y
d e
c a
A
i > len(seq) len(seq) len(seq)-1

te
j > len(seq) len(seq) len(seq)-1

B y
i < 0 max(0, len(seq)+i)

a th max(-1, len(seq)+i)
j < 0
M
max(0, len(seq)+j)

©
max(-1, len(seq)+j)

i omitted / None
h t 0 len(seq)-1
j omitted / None

yr i g len(seq) -1

o p
C
0 1 2 3 4 5
Examples
l = ['a', 'b', 'c', 'd', 'e', 'f']
-6 -5 -4 -3 -2 -1
length = 6
m y
d e
[-10:10:1] -10 à 0
c a
10 à 6

te A
à range(0, 6)

B y
[10:-10:-1] 10 à 5
a th
M
-10 à max(-1, 6-10) à max(-1, -4) à -1

©
t
à range(5, -1, -1)

h
yr i g
We can of course easily define empty slices!

o p
C
[3:-1:-1] 3 à 3
-1 à max(-1, 6-1) à 5
à range(3, 5, -1)
Example

seq = sequence of length 6


m y
d e
seq[::-1] i is omitted à len(seq) – 1 à 5
c a
j is omitted à -1
te A
B y
h
à range(5, -1, -1) à 5, 4, 3, 2, 1, 0

a t
© M
t
seq = 'python'

i g h
yr
seq[::-1] à 'nohtyp'

p
C o
If you get confused…

m
The slice object has a method, indices, that returns the equivalent range start/stop/step y
for any slice given the length of the sequence being sliced:

d e
c a
te
slice(start, stop, step).indices(length) à (start, stop, step)
A
B y
h
the values in this tuple can be used to generate a list of indices using the range function

a t
slice(10, -5, -1)

© M
with a sequence of length 6

i=10 > 6 à 6-1 à 5


h t
yr i g
j=-5 < 0 à max(-1, 6+-5) à max(-1, 1) à 1 à range(5, 1, -1)

o p à 5, 4, 3, 2

C
slice(10, -5, -1).indices(6) à (5, 1, -1)

list(range(*slice(10,-5,-1).indices(6))) à [5, 4, 3, 2]
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Creating our own Sequence types

m y
e
We will cover Abstract Base Classes later in this course, so we'll revisit this topic again

c a d
A
At it's most basic, an immutable sequence type should support two things:

returning the length of the sequence


y te
(technically, we don't even really need that!)

th
given an index, returning the element at that index B
M a
t ©
If an object provides this functionality, then we should in theory be able to:

i g h
retrieve elements by index using square brackets []

p yr
iterate through the elements using Python's native looping mechanisms

C o e.g. for loops, comprehensions


How Python does it

m
Remember that sequence types are iterables, but not all iterables are sequence types y
d e
c a
A
Sequence types, at a minimum, implement the following methods:

y te
B
__len__ __getitem__

a th
© M
At its most basic, the __getitem__ method takes in a single integer argument – the index

h t
i g
However, it may also choose to handle a slice type argument

yr
o p
C
So how does this help when iterating over the elements of a sequence?
The __getitem__ method

m y
e
The __getitem__ method should return an element of the sequence based on the specified index

or raise an IndexError exception if the index is out of bounds


c a d
(and may, but does not have

te Ato, support negative indices


and slicing)

B
Python's list object implements the __getitem__ method: y
a th
M
my_list = ['a', 'b', 'c', 'd', 'e', 'f']

©
t
my_list.__getitem__(0) à 'a'

h
i g
my_list.__getitem__(1) à 'b'

yr
p
my_list.__getitem__(-1) à 'f'

C o
my_list.__getitem__(slice(None, None, -1))

à ['f', 'e', 'd', 'c', 'b', 'a']


The __getitem__ method

But if we specify an index that is out of bounds:


m y
d e
a
my_list.__getitem__(100) à IndexError

my_list.__getitem__(-100) à IndexError
Ac
y te
th B
a
All we really need from this __getitem__ method is the ability to

M
©
return an element for a valid index

h t
raise an IndexError exception for an invalid index

yr i g
o p
Also remember, that sequence indices start at 0

C
i.e. we always know the index of the first element of the sequence
Implementing a for loop

So now we know: sequence indexing starts at 0

m y
__getitem__(i) will return the element at index i

d e
c a
__getitem__(i) will raise an IndexError exception when i is out of bounds

te A
y
my_list = [0, 1, 2, 3, 4, 5]

th B
a
for item in my_list: index = 0

M
print(item ** 2)
while True:

t ©
try:

i g h item = my_list.__getitem__(index)

yr
except IndexError:
break

o p print(item ** 2)

C
index += 1

The point is that if the object implements __getitem__


we can iterate through it using a for loop, or even a comprehension
The __len__ Method

m y
In general sequence types support the Python built-in function len()

d e
c a
te A
To support this all we need to do is implement the __len__ method in our custom sequence type

B y
my_list = [0, 1, 2, 3, 4, 5]
a th
© M
h t
g
len(my_list) à 6

yr i
my_list.__len__() à 6

o p
C
Writing our own Custom Sequence Type

m y
to implement our own custom sequence type we should then implement:

d e
__len__
c a
__getitem__

te A
B y
At the very least __getitem__ should:

a th
M
return an element for a valid index [0, length-1]

©
t
raise an IndexError exception if index is out of bounds

i g h
yr
Additionally we can choose to support:

o p
negative indices i < 0 à i = length - i

C
slicing handle slice objects as argument to __getitem__
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Concatenation +
Let's use Python's list as an example

m y
d e
a
We can concatenate two lists together by using the + operator

Ac
te
This will create a new list combining the elements of both lists

B y
l1 = [1, 2, 3]
th
id(l1) = 0xFFF100

a
l2 = [4, 5, 6]
© M
id(l2) = 0xFFF200

h t
l1 = l1 + l2

yr i g
à [1, 2, 3, 4, 5, 6] id(l1) = 0xFFF300

o p
C
In-Place Concatenation +=

Recall that for numbers I have said many times that


m y
a = a + 10 and a += 10 meant the same thing?
d e
c a
That's true for numbers… but not in general!

te A
B y
h
it's true for numbers, strings, tuples à in general, true for immutable types

a t
M
but not lists!

t ©
h
l1 = [1, 2, 3] id(l1) = 0xFFF100

yr i g
l2 = [4, 5, 6]

o p id(l2) = 0xFFF200

C
l1 += l2 à [1, 2, 3, 4, 5, 6] id(l1) = 0xFFF100

the list was mutated


In-Place Concatenation +=
For immutable types, such as number, strings, tuples the behavior is different

m y
d e
t += t1 has the same effect as t = t + t1

c a
te A
y
Since t is immutable, += does NOT perform in-place concatenation

B
a th
Instead it creates a new tuple that concatenates the two tuples and returns the new object

© M
id(t1) = 0xFFF100

t
t1 = (1, 2, 3)

i g h
yr
t2 = (4, 5, 6) id(t2) = 0xFFF200

o p
C
t1 += t2 à (1, 2, 3, 4, 5, 6) id(t1) = 0xFFF300
In-Place Repetition *=

Similar result hold for the * and *= operator


m y
d e
a
l1 = [1, 2, 3] id(l1) = 0xFFF100

Ac
l1 = l1 * 2 à [1, 2, 3, 1, 2, 3]
te
id(l1) = 0xFFF200

y
th B
M a
But the in-place repetition operator works this way:

l1 = [1, 2, 3]
t ©
id(l1) = 0xFFF100

i g h
l1 *= 2

p yr à [1, 2, 3, 1, 2, 3] id(l1) = 0xFFF100

C o
the list was mutated
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Assigning Values via Indexes, Slices and Extended Slices

We have seen how we can extract elements from a sequence by using indexing, slicing, and
m y
extended slicing

d e
[i]
c a
[i:j] slice(i, j)
te A
B y
[i:j:k] slice (i, j, k) k≠1

a th (if k=1 then it's just a standard slice)

M
Mutable sequences support assignment via a specific index

©
h t
and they also support assignment via slices

yr i g
p
The value being assigned via slicing and extended slicing must to be an iterable

C o
(any iterable, not just a sequence type)
Replacing a Slice

A slice can be replaced with another iterable


m y
d e
a
For regular slices (non-extended), the slice and the iterable need not be the same length

Ac
te
l = [1, 2, 3, 4, 5] l[1:3] à [2, 3]

B y
h
l[1:3] = (10, 20, 30) l à [1, 10, 20, 30, 4, 5]

The list l was mutated


a
à id(l) did not changet
© M
h t
With extended slicing, the extended slice and the iterable must have the same length

l = [1, 2, 3, 4, 5]
yr i g l[0:4:2] à [1, 3]

o p
l[0:4:2] = [10, 30] l à [10, 2, 30, 4, 5]

C
The list l was mutated
Deleting a Slice

Deletion is really just a special case of replacement


m y
d e
a
We simply assign an empty iterable à works for standard slicing only

Ac
(extended slicing replacement needs same length)

l = [1, 2, 3, 4, 5] l[1:3] à [2, 3]


y te
th B
l[2:3] = []
a
l à [1, 4, 5]

M
The list l was mutated

t ©
i g h
p yr
C o
Insertions using Slices

We can also insert elements using slice assigment

m y
d e
a
The trick here is that the slice must be empty
otherwise it would just replace the elements in the slice
Ac
y te
l = [1, 2, 3, 4, 5] l[1:1] à []
th B
l[1:1] = 'abc'
M a
l[1:1] à [1, 'a', 'b', 'c', 2, 3, 4, 5]

t ©
The list l was mutated

i g h
p yr
o
Obviously this will also not work with extended slices

C
extended slice assignment requires both lengths to be the same
but for insertion we need the slice to be empty,
and the iterable to have some values
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Concatenation and In-Place Concatenation
When dealing with the + and += operators in the context of sequences
we usually expect them to mean concatenation
m y
d e
c
But essentially, it is just an overloaded definition of these operators
a
te A
We can overload the definition of these operators in our custom classes by using the methods:

__add__ __iadd__
B y
a th
M
In general (but not necessarily), we expect:

©
obj1 + obj2
h t
à obj1 and obj2 are of the same type

yr i gà result is a new object also of the same type

o p
obj1 += obj2 à obj2 is any iterable

C à result is the original obj1 memory reference


(i.e. obj1 was mutated)
Repetition and In-Place Repetition
When dealing with the * and *= operators in the context of sequences
we usually expect them to mean repetition
m y
d e
c
But essentially, it is just an overloaded definition of these operators
a
te A
We can overload the definition of these operators in our custom classes by using the methods:

__mul__ __imul__
B y
a th
M
In general (but not necessarily), we expect:

©
obj1 * n
h t
à n is a non-negative integer

yr i gà result is a new object of the same type as obj1

o
obj1 *= np à n is a non-negative integer

C à result is the original obj1 memory reference


(i.e. obj1 was mutated)
Assignment

We saw in an earlier lecture how we can implement accessing elements in a custom sequence type

m y
__getitem__ à seq[n]
d e
à seq[i:j]
c a
à seq[i:j:k]
te A
B y
th
We can handle assignments in a very similar way, by implementing

a
__setitem__

© M
There a few restrictions with assigning to slices that we have already seen (at least with lists):

h t
For any slice we could only assign an iterable

yr i g
For extended slices only, both the slice and the iterable must have the same length

o p
C
Of course, since we are implementing __setitem__ ourselves, we
could technically make it do whatever we want!
Additional Sequence Functions and Operators

There are other operators and functions we can support:

m y
d e
__contains__ in

c a
__delitem__ del

te A
__rmul__ n * seq
B y
a th
M
The way Python works is that when it encounters an expression such as:

a + b
t © a * b
it first tries
i g
a.__add__(b)
h a.__mul__(b)

p yr
o
if a does not support the operation (TypeError), it then tries:

C b.__radd__(a) b.__rmul__(a)
Implementing append, extend, pop

m y
e
Actually there's nothing special going here.

c a d
A
If we want to, we can just implement methods of the same name (not special methods)

te
B y
and they can just behave the same way as we have seen for lists for example

a th
© M
h t
yr i g
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Sorting and Sort Keys

Sorting a sequence of numbers is something easily understood

m y
d e
But we do have to consider the direction of the sort: ascending

c a
descending

Python provides a sorted() function that will sort a given iterable


te A
B y
The default sort direction is ascending

a th
© M
The sorted() function has an optional keyword-only argument

t
called reverse which defaults to False

h
yr i g
If we set it to True, then the sort will sort in descending order

o p
C
But one really important thing we need to think about: ordering

à obvious when sorting real numbers


Sorting and Sort Keys

What about non-numerical values?

m y
d e
a
'a', 'b', 'c'

'A', 'a', 'B', 'b', 'C', 'c'


Ac
strings are comparable so this is still OK

y te
although is 'a' < 'A' or 'a' > 'A' or 'a' == 'A'

B
'hello', 'python', 'bird', 'parrot'

(0, 0) (1, 1) (2, 2)


a th
M
True
(0, 0) (0, 1) (1, 0)

t ©
i g h
rectangle_1, rectangle_2, rectangle_3

p yr
o
When items are pairwise comparable

C
we can use that ordering to sort
(< or >)

but what happens when they are not? à Sort Keys


Sorting and Sort Keys

'b', 'x', 'a' ASCII character codes a à 97


m y
b à 98

d e
a
x à 120 ord('a') à 97

Ac
te
We now associate the ASCII numerical value with each character, and sort based on that value

B y
items 'b' 'x' 'a'
à
'a'

a th
'b' 'x'

keys 98 120 97

© M 97 98 120

'B' 'b' 'A'


h t 'a' 'X' 'x' '1' '?'
66

yr
98
i g 65 97 88 120 49 63

o p
'1' '?' 'A' 'B' 'X' 'a' 'b' 'x'
à
C 49 63 65 66 88 97 98 120

You'll note that the sort keys have a natural sort order
Sorting and Sort Keys

Let's say we want to sort a list of Person objects based on their age
y
(assumes the Person

m
e
class has an age
p1.age
p2.age
à
à
30
15
c a d
property)

p3.age
p4.age
à
à
5
32 item p1 p2 p3 p4
te A p3 p2 p1 p4
keys 30 15 5
B y
32
à

h
5 15 30 32

a t
M
We could also generate the key value, for any given person, using a function

©
def key(p):

h t key = lambda p: p.age

g
return p.age

yr i
p
sort [p1, p2, p3, p4]

o
C
using sort keys generated by the function key = lambda p: p.age
Sorting and Sort Keys

The sort keys need not be numerical à they just need to have a natural sort order

m y
(< or >)

d e
item 'hello' 'python' 'parrot' 'bird'

c a
keys 'o' 'n' 't' 'd'

te A
ß last character of each string

B y
à
'bird'
'd'
'python'
'n'
'hello'
'o'
a
't'
th
'parrot''

© M
h t
yr
key = lambda s: s[-1]
i g
o p
C
Python's sorted function

That's exactly what Python's sorted function allows us to do

m y
d e
a
Optional keyword-only argument called key

Ac
te
if provided, key must be a function that for any given element in the sequence being sorted

returns the sort key


B y
The sort key does not have to be numerical
a th
à it just needs to be values that are themselves

© M pairwise comparable (such as < or >)

h t
yr i g
If key is not provided, then Python will sort based on the natural ordering of the elements

o p
i.e. they must be pairwise comparable (<, > )

C
If the elements are not pairwise comparable, you will get an exception
Python's sorted function

sorted(iterable, key=None, reverse=False)

m y
keyword-only
d e
a
The sorted function:

• makes a copy of the iterable


Ac
te
• returns the sorted elements in a list
• uses a sort algorithm called TimSort
y
à named after Tim Peters

B
Python 2.3, 2002

• a stable sort
a th https://en.wikipedia.org/wiki/Timsort

© M
t
Side note: for the "natural" sort of elements, we can always think of the keys as the elements

h
g
themselves

yr i
o p
sorted(iterable) ßà sorted(iterable, key=lambda x: x)

C
Stable Sorts

A stable sort is one that maintains the relative order of items that have equal keys

m y
(or values if using natural ordering)
d e
c a
A
p1.age à 30

te
p2.age à 15

y
p3.age à 5

B
p4.age à 32
p5.age à 15

a th
M
sorted((p1, p2, p3, p4, p5), key=lambda p: p.age)

©
à [ p3 p2 p5
h t p1 p4 ]

yr i g
o p keys equal

C p2 preceded p5 in original tuple

à p2 precedes p5 in sorted list


In-Place Sorting
If the iterable is mutable, in-place sorting is possible

m y
But that will depend on the particular type you are dealing with

d e
Python's list objects support in-place sorting
c a
te A
The list class has a sort() instance method that does in-place sorting

B y
l = [10, 5, 3, 2] id(l) à 0xFF42

a th
l.sort()

© M
t
l à [2, 3, 5, 10] id(l) à 0xFF42

i g h
yr
Compared to sorted()

o p
same TimSort algorithm



C
same keyword-only arg: key
same keyword-only arg: reverse (default is False)
in-place sorting, does not copy the data
• only works on lists (it's a method in the list class)
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Quick Recap

and how they work:


m y
You should already know what list comprehensions are, but let's quickly recap their syntax

d e
a
goal à generate a list by transforming, and optionally filtering, another iterable

c
• start with some iterable
te A
other_list = ['this', 'is', 'a', 'parrot']

B y
h
• create empty new list new_list = []

a t
M
• iterate over the original iterable for item in other_list:

t
• skip over certain values (filter)
© if len(item) > 2:

i g h
• transform value and append to new list new_list.append(item[::-1])

p yr
C o
List comprehension:
new_list = [item[::-1] for item in other_list if len(item) > 2]

transformation iteration filter


Formatting the Comprehension Expression

m y
If the comprehension expression gets too long, it can be split over multiple lines

d e
c a
between 1 and 100 that are not divisible by 2, 3 or 5
te A
For example, let's say we want to create a list of squares of all the integers

B y
a th
M
sq = [i**2 for i in range(1, 101) if i%2 and i%3 and i%5]

©
h t
We could write this over multiple lines:

yr i g
sq = [i**2

o p
C
for i in range(1, 101)
if i%2 and i%3 and i%5]
Comprehension Internals
Comprehensions have their own local scope – just like a function

m y
e
We should think of a list comprehension as being wrapped in a function that is created by

d
a
Python that will return the new list when executed
sq = [i**2 for i in range(10)]
Ac
RHS
y te
When the RHS is compiled:
th B
Python creates a temporary function def temp():

M
comprehension a
that will be used to evaluate the new_list = []
for i in range(10):

t © new_list.append(i**2)

h
return new_list

yr i g
When the line is executed: Executes temp()

o p Stores the returned object (the list) in


memory

C Points sq to that object

We'll disassemble some Python code in the coding video to actually see this
Comprehension Scopes

So comprehensions are basically functions

m y
d e
a
They have their own local scope: [item ** 2 for item in range(100)]

Ac
local symbol

te
But they can access global variables:
# module1.py
B y
num = 100
a th global symbol

© M
sq = [item**2 for item in range(num)]

h t local symbol

yr i g
As well as nonlocal variables:

o p
def my_func(num):
nonlocal symbol

C sq = [item**2 for item in range(num)]

Closures!!
Nested Comprehensions

Comprehensions can be nested within each other

m y
d e
a
And since they are functions, a nested comprehension can access (nonlocal) variables from the
enclosing comprehension!

Ac
[ [i * j for j in range(5)] for i in range(5)]
y te
th B closure

a
nested comprehension local variable: j

© M free variable: i

t
outer comprehension

i g h
local variable: i

p yr
C o
Nested Loops in Comprehensions

We can have nested loops (as many levels as we want) in comprehensions.


m y
This is not the same as nested comprehensions
d e
c a
l = []

te A
y
for i in range(5):

B
for j in range(5):
for k in range(5):
l.append((i, j, k))
a th
© M
h t
g
l = [(i, j, k) for i in range(5) for j in range(5) for k in range(5)]

yr i
o p
Note that the order in which the for loops are specified in the comprehension

C
correspond to the order of the nested loops
Nested Loops in Comprehensions

Nested loops in comprehensions can also contain if statements


m y
d e
a
Again the order of the for and if statements does matter, just like a normal set of for

c
A
loops and if statements

te
won't work!

y
l = [] l = []
for i in range(5):
for j in range(5):
th
if i==j: B
for i in range(5):

if i==j:
l.append((i, j))
M a for j in range(5):
l.append((i, j))

t ©
i g h j is referenced after

yr
j is created here
it has been created

o p
l = [(i, j) for i in range(5) for j in range(5) if i == j]

C
l = [(i, j) for i in range(5) if i == j for j in range(5)]

won't work!
Nested Loops in Comprehensions

l = []

m y
e
for i in range(1, 6):

d
[(i, j)
if i%2 == 0:

a
for i in range(1, 6) if i%2==0

c
for j in range(1, 6):
for j in range(1, 6) if j%3==0]
if j%3 == 0:
l.append((i,j))

te A
l = []
B y
for i in range(1, 6):
for j in range(1, 6):
a th
[(i, j)
for i in range(1, 6)

M
for j in range(1, 6)
if i%2==0:

©
if i%2==0
if j%3 == 0:

t
if j%3==0]

h
l.append((i,j))

l = []
yr i g
p
[(i, j)
for i in range(1, 6):

C o
for j in range(1, 6):
if i%2==0 and j%3==0:
l.append((i,j))
for i in range(1, 6)
for j in range(1, 6)
if i%2==0 and j%3==0]
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Background Information

A regular strictly convex polygon is a polygon that has the following characteristics:
m y
d e
• all interior angles are less than 180°

c a
A
• all sides have equal length

y te
circumradius
h
vertex

t B
M aedge (side)

t ©
h
apothem

yr i g
o p
C interior angle
circumradius vertex
Background Information
edge
For a regular strictly convex polygon with R

m
s y
• n edges ( = n vertices)
d e apothem
• R circumradius
c a a

te A
interior angle = $ − 2 ×
!"#

B y
%
$

a th interior angle

M
edge length s = 2 ) sin
$

apothem a = ) ,-.
%

t ©
i g
$
h
yr
!
area = $ . /

p
&

o
perimeter = n s
C
Goal 1

Create a Polygon class:

m y
d e
a
Initializer
• number of edges/vertices
• circumradius
Ac
y te
th B
Properties
# edges M a
Functionality
• a proper representation (__repr__)

©

t
• # vertices • implements equality (==) based on #
• interior angle

i g h vertices and circumradius (__eq__)

yr
• edge length • implements > based on number of
• apothem vertices only (__gt__)
• area
o p
C
• perimeter
Goal 2

Implement a Polygons sequence type:

m y
d e
a
Initializer
• number of vertices for largest polygon in the sequence
• common circumradius for all polygons
Ac
y te
Properties

th B
a
• max efficiency polygon: returns the Polygon with the highest area : perimeter ratio

M
t ©
Functionality

i g h
yr
• functions as a sequence type (__getitem__)

p
• supports the len() function (__len__)

o
• has a proper representation (__repr__)

C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
What is an iterable? Something fit for iterating over

y
à we'll see a more formal definition for Python's iterable protocol

m
d e
Already seen: Sequences and iteration

c a
More general concept of iteration

te A
Iterators à get next item, no indexes needed

B y à consumables

Iterables
a th
M
Consuming iterators manually

©
h t
Relationship between sequence types and iterators

yr i g Infinite Iterables

o p Lazy Evaluation

C Iterator Delegation
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Iterating Sequences

We saw that in the last section à __getitem__


m y
d e
a
à assumed indexing started at 0

Ac
à iteration: __getitem__(0), __getitem__(1), etc

y te
th B
But iteration can be more general than based on sequential indexing

All we need is:


M a
a bucket of items
©
à collection, container

t
get next item

i g h
à no concept of ordering needed

yr
à just a way to get items out of the container one by one

o p a specific order in which this

C
happens is not required – but
can be
Example: Sets

Sets are unordered collections of items s = {'x', 'y', 'b', 'c', 'a'}
m y
d e
c a
Sets are not indexable s[0]

te A
y
à TypeError – 'set' object does not support indexing

th B
But sets are iterable

M a
©
y
for item in s: c

h t Note that we have no idea of the order in

g
à x which the elements are returned in the

i
print(item)

yr
b iteration

p
a

C o
The concept of next

m y
For general iteration, all we really need is the concept of "get the next item" in the collection

d e
a
If a collection object implements a get_next_item method

Ac
te
we can get elements out of the collection, one get_next_item()

y
after the other, this way: get_next_item()

th B get_next_item()

M a
and we could iterate over the collection as follows:
for _ in range(10):
t ©
i g h
item = coll.get_next_item()

yr
print(item)

o p
But how do we know when to stop asking for the next item?

C
i.e. when all the elements of the collection have been returned
by calling get_next_item()?
à StopIteration built-in Exception
Attempting to build an Iterable ourselves
Let's try building our own class, which will be a collection of squares of integers

m y
We could make this a sequence, but we want to avoid the concept of indexing

d e
c a
In order to implement a next method, we need to know what we've already "handed out"
so we can hand out the "next" item without repeating ourselves
te A
B y
class Squares:
a th
def __init__(self):
self.i = 0
© M
h t
yr
def next_(self):
i g
result = self.i ** 2

o p
self.i += 1

C
return result
class Squares:
Iterating over Squares def __init__(self):
self.i = 0

sq = Squares()
m y
def next_(self):

d e
result = self.i ** 2

a
self.i += 1

for _ in range(5): 0
Ac return result

te
item = sq.next_() 1

y
print(item) à 4
9
16
th B
M a
There are a few issues:

t ©
i g h
yr
à the collection is essentially infinite

p
à cannot use a for loop, comprehension, etc

o
C
à we cannot restart the iteration "from the beginning"
Refining the Squares Class

we first tackle the idea of making the collection finite

m y
• we specify the size of the collection when we create the instance
d e
a
• we raise a StopIteration exception if next_ has been called too many times

c
class Squares: class Squares:

te A
def __init__(self):
self.i = 0 self.i = 0
B y
def __init__(self, length):

def next_(self):
th
self.length = length

a
M
result = self.i ** 2
def next_(self):

©
self.i += 1

t
return result if self.i >= self.length:

i g h raise StopIteration

yr
else:
result = self.i ** 2

o p self.i += 1

C
return result
class Squares:
Iterating over Squares instances def __init__(self, length):
self.i = 0

y
self.length = length

m
e
sq = Squares(5) create a collection of length 5

d
def next_(self):

a
if self.i >= self.length:

while True: start an infinite loop

Ac raise StopIteration
else:

te
result = self.i ** 2

y
try: self.i += 1
item = sq.next_()
B
try getting the next item return result

print(item)
a th
except StopIteration:
M
catch the StopIteration exception à nothing left to iterate

©
break
t
break out of the infinite while loop – we're done iterating

h
yr i g
Output: 0

o
1 p
C 4
9
16
Python's next() function

Remember Python's len() function? We could implement that function


m y
for our custom type by

d e
a
implementing the special method: __len__

Python has a built-in function: next()


Ac
We can implement that function

te
for our custom type by

y
B
implementing the special method: __next__

class Squares:
a th
def __init__(self, length):
self.i = 0
© M
h
self.length = length
t
yr i g
def __next__(self):

o p
if self.i >= self.length:
raise StopIteration

C else:
result = self.i ** 2
self.i += 1
return result
Iterating over Squares instances

m y
e
sq = Squares(5) Output: 0
while True:
try:
1
4
c a d
A
item = next(sq) 9

te
print(item) 16

y
except StopIteration:
break

th B
M a
We still have some issues:
t ©
i g h
yr
• cannot iterate using for loops, comprehensions, etc
• once the iteration starts we have no way of re-starting it

o p
• and once all the items have been iterated (using next) the

C
object becomes useless for iteration à exhausted
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Where we're at so far…

We created a custom container type object with a __next__ method


m y
d e
a
But it had several drawbacks: à cannot use a for loop

Ac
à once we start using next there's no going back
à once we have reached StopIteration we're basically

te
done with the object

B y
Let's tackle the loop issue first

a th
We saw how to iterate using __next__, StopIteration, and a while loop

© M
t
This is actually how Python handles for loops in general

i g h
yr
Somehow, we need to tell Python that our class has that __next__

p
o
method and that it will behave in a way consistent with using a

C
while loop to iterate

Python knows we have __next__, but how does it know we implement


StopIteration?
The iterator Protocol

A protocol is simply a fancy way of saying that our class is going to implement certain
m y
functionality that Python can count on
d e
c a
A
To let Python know our class can be iterated over using __next__ we implement the iterator protocol

te
B y
h
The iterator protocol is quite simple – the class needs to implement two methods:

a t
M
à __iter__ this method should just return the object (class instance) itself

©
sounds weird, but we'll understand why later

h t
à __next__

yr i g
this method is responsible for handing back the next
element from the collection and raising the

o p StopIteration exception when all elements have been

C
handed out

An object that implements these two methods is called an iterator


Iterators

An iterator is therefore an object that implements:

m y
__iter__ à just returns the object itself
d e
c a
A
__next__ à returns the next item from the container, or raises SopIteration

y te
th B
If an object is an iterator, we can use it with for loops, comprehensions, etc

M a
Python will know how to loop (iterate) over such an object
(basically using the same while loop technique we used)

t ©
i g h
p yr
C o
Example

Let's go back to our Squares example, and make it into an iterator

m y
class Squares:
sq = Squares(5)

d e 0

a
1

c
def __init__(self, length):
for item in sq: à 4

A
self.i = 0
print(item) 9

te
self.length = length
16
def __next__(self):
B y
if self.i >= self.length:
raise StopIteration
a thStill one issue though!

else:
result = self.i ** 2
© M The iterator cannot be "restarted"

t
Once we have looped through all the items

h
self.i += 1

i g
return result the iterator has been exhausted

p yr
o
def __iter__(self): To loop a second time through the

C
return self collection we have to create a new
instance and loop through that
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Iterators

We saw than an iterator is an object that implements


m y
__iter__ à returns the object itself
d e
__next__ à returns the next element
c a
te A
The drawback is that iterators get exhausted
y
à become useless for iterating again

B
a th à become throw away objects

But two distinct things going on:

© M
h t
maintaining the collection of items (the container) (e.g. creating, mutating (if mutable), etc)

yr i g
iterating over the collection

o p
C
Why should we have to re-create the collection of items just to
iterate over them?
Separating the Collection from the Iterator

Instead, we would prefer to separate these two


m y
d e
Maintaining the data of the collection should be one object

c a
Iterating over the data should be a separate object

te A
à iterator

B y
h
That object is throw-away à but we don't throw away the collection

a t
The collection is iterable

© M
h t
but the iterator is responsible for iterating over the collection

yr i g
p
The iterable is created once

o
C
The iterator is created every time we need to start a fresh iteration
Example
class Cities:
def __init__(self):
self._cities = ['Paris', 'Berlin', 'Rome', 'London']

m y
e
self._index = 0

def __iter__(self):
c a d
A
return self

def __next__(self):
y te
h B
if self._index >= len(self._cities):

t
a
raise StopIteration

M
else:
item = self._cities[self._index]

t ©
self._index += 1

i g h
return item

p yr
Cities instances are iterators

C o
Every time we want to run a new loop, we have to create a new
instance of Cities
This is wasteful, because we should not have to re-create the _cities
list every time
Example So, let's separate the object that maintains the cities, from the iterator itself

class Cities:
def __init__(self):

m y
e
self._cities = ['New York', 'New Delhi', 'Newcastle']

def __len__(self):
c a d
A
return len(self._cities)

class CityIterator:
y te
def __init__(self, cities):
th B
self._cities = cities
self._index = 0
M a
t ©
h
def __iter__(self):
return self

yr i g
o p
def __next__(self):
if self._index >= len(self._cities):

C raise StopIteration
else:
etc…
Example

To use the Cities and CityIterator together here's how we would proceed:

m y
cities = Cities() create an instance of the container object

d e
c a
create a new iterator – but see how we pass in the

A
city_iterator = CityIterator(cities)
existing cities instance

for city in cities_iterator:


y te
B
can now use the iterator to iterate
print(city)

a th
M
At this point, the cities_iterator is exhausted

©
h t
If we want to re-iterate over the collection, we need to create a new one

yr i g
p
city_iterator = CityIterator(cities)

C o
for city in cities_iterator:
print(city)

But this time, we did not have to re-create the collection – we just
passed in the existing one!
So far…

At this point we have:

m y
a container that maintains the collection items
d e
c a
a separate object, the iterator, used to iterate over the collection

So we can iterate over the collection as many times as we want


te A
B y
we just have to remember to create a new iterator every time

a th
M
It would be nice if we did not have to do that manually every time

t ©
and if we could just iterate over the Cities object instead of CityIterator

i g h
p yr
This is where the formal definition of a Python iterable comes in…

C o
Iterables

An iterable is a Python object that implements the iterable protocol


m y
d e
a
The iterable protocol requires that the object implement a single method

Ac
te
__iter__ returns a new instance of the iterator object
used to iterate over the iterable

B y
a th
class Cities:

© M
t
def __init__(self):

i g h
self._cities = ['New York', 'New Delhi', 'Newcastle']

yr
def __len__(self):

p
o
return len(self._cities)

C
def __iter__(self):
return CityIterator(self)
Iterable vs Iterator

m y
e
An iterable is an object that implements

__iter__ à returns an iterator (in general, a new instance)

c a d
An iterator is an object that implements
te A
__iter__ à returns itself (an iterator)
B y
(not a new instance)

a th
__next__

© M
à returns the next element

h t
yr i g
So iterators are themselves iterables

p
but they are iterables that become exhausted

o
C
Iterables on the other hand never become exhausted
because they always return a new iterator that is then used to iterate
Iterating over an iterable

Python has a built-in function iter()


m y
d e
It calls the __iter__ method

c a
(we'll actually come back to this for sequences!)

te A
y
The first thing Python does when we try to iterate over an object

it calls iter() to obtain an iterator


th B
M a
then it starts iterating (using next, StopIteration, etc)

t ©
h
using the iterator returned by iter()

yr i g
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Lazy Evaluation

This is often used in class properties


m y
d e
a
properties of classes may not always be populated when the object is created

Ac
value of a property only becomes known when the property is requested - deferred

y te
Example

th B
class Actor:
def __init__(self, actor_id):
Ma
self.actor_id = actor_id

t ©
h
self.bio = lookup_actor_in_db(actor_id)

i g
self.movies = None

yr
@property

o p
def movies(self):

Cif self.movies is None:


self.movies = lookup_movies_in_db(self.actor_id)
return self.movies
Application to Iterables

We can apply the same concept to certain iterables


m y
d e
a
We do not calculate the next item in an iterable until it is actually requested

c
Example
te A
B y
iterable à Factorial(n)

a th
M
will return factorials of consecutive integers from 0 to n-1

©
h t
do not pre-compute all the factorials

yr i g
wait until next requests one, then calculate it

o p
C
This is a form of lazy evaluation
Application to Iterables

m y
e
Another application of this might be retrieving a list of forum posts

c a d
A
Posts might be an iterable

y te
each call to next returns a list of 5 posts (or some page size)

th B
but uses lazy loading

M a
©
à every time next is called, go back to database and get next 5 posts

t
i g h
p yr
C o
Application to Iterables à Infinite Iterables

m
Using that lazy evaluation technique means that we can actually have infinite iterablesy
d e
Since items are not computed until they are requested
c a
we can have an infinite number of items in the collection
te A
B y
a th
Don't try to use a for loop over such an iterable

M
unless you have some type of exit condition in your loop

©
t
à otherwise infinite loop!

h
yr i g
o p
Lazy evaluation of iterables is something that is used a lot in Python!

C
We'll examine that in detail in the next section on generators
m y
d e
c a
te A
B y
iter()
a th
© M
h t
yr i g
o p
C
What happens when Python performs an iterationon over an iterable?

m y
The very first thing Python does is call the iter() function on the object we want to iterate

d e
c a
A
If the object implements the __iter__ method, that method is called

te
and Python uses the returned iterator

B y
a th
M
What happens if the object does not implement the __iter__ method?

©
h t
i g
Is an exception raised immediately?

yr
o p
C
Sequence Types

m y
So how does iterating over a sequence type – that maybe only implemented __getitem__ work?

d e
I just said that Python always calls iter() first
c a
te A
y
You'll notice I did not say Python always calls the __iter__ method

B
a th
M
I said it calls the iter() function!!

t ©
g h
In fact, if obj is an object that only implements __getitem__

i
p yr
o
iter(obj) à returns an iterator type object!

C
Some form of magic at work?

Not really!
m y
d e
a
Let's think about sequence types and how we can iterate over them

Ac
Suppose seq is some sequence type that implements __getitem__ (but not __iter__)

y te
__getitem__ method? à IndexError
t B
Remember what happens when we request an index that is out of bounds from the

h
index = 0
M a
t ©
h
while True:
try:

yr i g
p
print(seq[index])

o
index += 1

C
except IndexError:
break
Making an Iterator to iterate over any Sequence

This is basically what we just did!

m y
d e
class SeqIterator:
def __init__(self, seq):
c a
self.seq = seq
self.index = 0
te A
B y
def __iter__(self):
return self
a th
def __next__:
© M
try:

h t
g
item = self.seq[self.index]

yr i
self.index += 1
return item

o p
except IndexError:

C raise StopIteration()
Calling iter()

m y
So when iter(obj) is called:

d e
Python first looks for an __iter__ method
c a
te A
y
à if it's there, use it

à if it's not
th B
M
look for a __getitem__ method a
t ©
à if it's there create an iterator object and return that

i g h
à if it's not there, raise a TypeError exception (not iterable)

p yr
C o
Testing if an object is iterable

m y
e
Sometimes (very rarely!)

you may want to know if an object is iterable or not


c a d
te A
But now you would have to check if they implement

B y
__getitem__ or __iter__
a th
M
and that __iter__ returns an iterator

©
Easier approach: try:
h t
yr i g iter(obj)
except TypeError:

o p # not iterable

C
<code>
else:
# is iterable
<code>
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Python provides many functions that return iterables or iterators

m y
e
Additionally, the iterators perform lazy evaluation

c a d
A
You should always be aware of whether you are dealing with an iterable or an iterator

te
B y
why?
th
if an object is an iterable (but not an iterator) you can iterate over it many times

a
M
if an object is an iterator you can iterate over it only once

©
h t
yr i g
o p
C
range(10) à iterable

m y
d e
a
zip(l1, l2) à iterator

Ac
te
enumerate(l1) à iterator

B y
open('cars.csv') à iterator

a th
dictionary .keys() à iterable

© M
h t
g
dictionary .values() à iterable

yr i
o p
dictionary .items() à iterable

C and many more…


m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Iterating over the return values of a callable

Consider a callable that provides a countdown from some start value:


m y
d e
countdown() à 5

c a
We now want to run a loop that will call countdown()
countdown()
countdown()
à
à
4
3
until 0 is reached

te A
countdown() à 2

B y
h
We could certainly do that using a loop and testing the

t
countdown() à 1

a
countdown() à 0 value to break out of the loop once 0 has been reached
countdown()
...
à -1

© M
h t
g
while True:

yr i val = countdown()

p
if val == 0:

C o break
else:
print(val)
An iterator approach

We could take a different approach, using iterators, and we can also make it quite generic

m y
d e
a
Make an iterator that knows two things:
the callable that needs to be called
Ac
te
a value (the sentinel) that will result in a StopIteration if the callable returns that value

y
th B
a
The iterator would then be implemented as follows:

when next() is called:

© M
h t
call the callable and get the result

yr i g
if the result is equal to the sentinel à StopIteration

o p and "exhaust" the iterator

C
otherwise return the result

We can then simply iterate over the iterator until it is exhausted


The first form of the iter() function

m y
e
We just studied the first form of the iter() function:

iter(iterable) à iterator for iterable


c a d
te A
if the iterable did not implement the iterator protocol, but implemented the sequence protocol

B y
th
iter() creates a iterator for us (leveraging the sequence protocol)

a
© M
t
Notice that the iter() function was able to generate an iterator for us automatically

i g h
p yr
C o
The second form of the iter() function

iter(callable, sentinel)
m y
d e
c a
A
This will return an iterator that will:

call the callable when next() is called


y te
th B
or return the result otherwise
M a
and either raise StopIteration if the result is equal to the sentinel value

t ©
i g h
p yr
C o
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Iterating a sequence in reverse order

y
If we have a sequence type, then iterating over the sequence in reverse order is quite simple:

m
for item in seq[::-1]:
d e
This works, but is wasteful because it makes a copy of
print(item) the sequence
c a
for i in range(len(seq)):
te A
print(seq[len(seq) – i – 1])

B y
h
This is more efficient, but the syntax is messy
for i in range(len(seq)-1, -1, -1):
a t
M
print(seq[i])

for item in reversed(seq)


t © This is cleaner and just as efficient, because it
print(item)

i g h creates an iterator that will iterate backwards

yr
over the sequence – it does not copy the
data like the first example

o p
C Both __getitem__ and __len__ must be implemented

We can override how reversed works by implementing


the __reversed__ special method
Iterating an iterable in reverse

m y
Unfortunately, reversed() will not work with custom iterables without a little bit of extra work

d e
c a
When we call reversed() on a custom iterable, Python will look for and call
the __reversed__ function

te A
B y
h
That function should return an iterator that will be used to perform the reversed iteration

a t
M
So basically we have to implement a reverse iterator ourselves

©
h t
i g
Just like the iter() method, when we call reversed() on an object:

yr
p
looks for and calls __reversed__ method

o
C
if it's not there, uses __getitem__ and __len__
to create an iterator for us

exception otherwise
Card Deck Example

m y
In the code exercises I am going to build an iterable containing a deck of 52 sorted cards

d e
a
2 Spades … Ace Spades, 2 Hearts … Ace Hearts, 2 Diamonds … Ace Diamonds, 2 Clubs … Ace Clubs

Ac
te
But I don't want to create a list containing all the pre-created cards à Lazy evaluation

B y
h
So I want my iterator to figure out the suit and card name for a given index in the sorted deck

a t
M
SUITS = ['Spades', 'Hearts', 'Diamonds', 'Clubs']

©
t
RANKS = [2, 3, …, 10, 'J', 'Q', 'K', 'A']

h
yr i g
We assume the deck is sorted as follows:

o p
C
iterate over SUITS
for each suit iterate over RANKS

card = combination of suit and rank


SUITS = ['Spades', 'Hearts', 'Diamonds', 'Clubs']
Card Deck Example

y
RANKS = [2, 3, …, 10, 'J', 'Q', 'K', 'A']

e
2S … AS 2H … AH 2D … AD 2C … AC
m
There are len(SUITS) suits 4 There are len(RANKS) ranks
c a 13 d
The deck has a length of: len(SUITS) * len(RANKS) 52
te A
B y
Each card in this deck has a positional index: a number from 0 to len(deck) - 1 0 - 51

a th
M
To find the suit index of a card at index i: To find the rank index of a card at index i:

i // len(RANKS)
t © i % len(RANKS)

i g h
yr
Examples Examples
5th
p
card (6S) à index 4

o
5th card (6S) à index 4

C
à 4 // 13 à 0 à 4 % 13 à 4

16th card (4H) à index 15 16th card (4H) à index 15


à 15 // 13 à 1 à 15 % 13 à 2
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
generators à a type of iterator

m y
d e
c a
generator functions à generator factories

te A
y
à they return a generator when called

B
th
à they are not a generator themselves

a
© M
generator expressions

h t
à uses comprehension syntax

yr i g à a more concise way of creating generators

o p à like list comprehensions, useful for simple situations

C
performance considerations
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Iterators review

Let's recall how we would write a simple iterator for factorials

m y
class FactIter:
d e
def __init__(self, n):

c a
A
self.n = n

te
self.i = 0

def __iter__(self):
B y
return self

a th
def __next__(self):
if self.i >= self.n:
© M
raise StopIteration

h t
g
else:

yr i
result = math.factorial(self.i)

p
self.i += 1

C o
return result

Now that's quite a bit of work for a simple iterator!


There has to be a better way…

y
What if we could do something like this instead:

def factorials(n): and in our code we would want to do


e m
for i in range(n):
emit factorial(i)
something like this maybe:

c a d
A
pause execution here facts = factorials(4)

te
wait for resume get_next(facts) à 0!
return 'done!'
y
get_next(facts) à 1!

B
a th
get_next(facts) à 2!
get_next(facts) à 3!

© M get_next(facts) à done!

h t
Of course, getting 0!, 1!, 2!, 3! followed by a string is odd

yr i g
And what happens if we call get_next again?

o p
Maybe we should consider raising an exception… StopIteration?

C
And instead of calling get_next, why not just use next?

But what about that emit, pause, resume? à yield


Yield to the rescue…

y
The yield keyword does exactly what we want:
it emits a value

e m
the function is effectively suspended (but it retains its current state)

c a d
calling next on the function resumes running the function right after the yield statement

te A
if function returns something instead of yielding (finishes running) à StopIteration exception

def song():

B y
h
print('line 1')
yield "I'm a lumberjack and I'm OK"
a t
M
print('line 2')
yield 'I sleep all night and I work all day'

t ©
lines = song()

i g h
à no output!

line = next(lines)

p yr à 'line 1' is printed in console

o
line à "I'm a lumberjack and I'm OK"

C
line = next(lines) à 'line 2' is printed in console
line à "I sleep all night and I work all day"
line = next(lines) à StopIteration
Generators
A function that uses the yield statement, is called a generator function

def my_func(): my_func is just a regular function


m y
yield 1
d e
a
calling my_func() returns a generator object
yield 2
yield 3
Ac
te
We can think of functions that contain the yield statement as generator factories

y
th B
The generator is created by Python when the function is called à gen = my_func()

M a
The resulting generator is executed by calling next() à next(gen)

©
the function body will execute until it encounters a yield statement

t
i g h
it yields the value (as return value of next()) then it suspends itself

yr
until next is called again à suspended function resumes execution

o p
if it encounters a return before a yield

C à StopIteration exception occurs

(Remember that if a function terminates without an explicit return, Python


essentially returns a None value for us)
Generators

def my_func():
m y
yield 1
d e
yield 2

c a
A
yield 3

à gen is a generator
y te
B
gen = my_func()

a th
next(gen) à1

© M
next(gen) à2
h t
next(gen) à3
yr i g
next(gen)
o p à StopIteration

C
Generators

next StopIteration

m y
This should remind you of iterators!
d e
c a
In fact, generators are iterators

te A
à they implement the iterator protocol

__iter__
B y__next__

def my_func():
a th
à they are exhausted when function returns a value

M
yield 1
yield 2

©
à StopIteration exception

t
yield 3

h
à return value is the exception message

gen = my_func()
yr i g
o p
gen.__iter__() à iter(gen) à returns gen itself

C
gen.__next__() à next(gen)
Example

class FactIter:
def factorials(n):
m y
e
def __init__(self, n):
for i in range(n):

d
self.n = n

a
yield math.factorial(i)

c
self.i = 0

def __iter__(self):

te A
fact_iter = factorials(5)

y
return self

def __next__(self):
th B
if self.i >= self.n:
raise StopIteration
M a
else:

t ©
result = math.factorial(self.i)

i
self.i += 1
g h
yr
return result

o p
fact_iter = FactIter(5)

C
Generators

Generator functions are functions which contain at least one yield statement

m y
When a generator function is called, Python creates a generator object
d e
c a
Generators implement the iterator protocol

te A
Generators are inherently lazy iterators
y
(and can be infinite)

B
a th
Generators are iterators, and can be used in the same way (for loops, comprehensions, etc)

© M
Generators become exhausted once the function returns a value

h t
yr i g
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Generators become exhausted

Generator functions are functions that use yield


m y
d e
A generator function is a generator factory
c a
à they return a (new) generator when called

te A
B y
Generators are iterators

a th
M
à they can become exhausted (consumed)

©
à they cannot be "restarted"

t
i g h
yr
This can lead to bugs if you try to iterate twice over a generator

o p
C
Example

m y
def squares(n):

d e
a
for i in range(n):
yield i ** 2

Ac
sq = squares(5)
y
à sq is a new generator (iterator)
te
th B
l = list(sq)

M
l à [0, 1, 4, 9, 16]
a
t ©
and sq has been exhausted

i g h
l = list(sq)

p yr l à []

C o
Example
def squares(n):
This of course can lead to unexpected behavior sometimes…
y
for i in range(n):

m
e
yield i ** 2
sq = squares(5)

c a d
enum1 = enumerate(sq)
te A
enumerate is lazy à hasn't iterated through sq yet

B y
next(sq) à0
a th
next(sq) à1

© M
h t
i g
list(enum1) à [(0,4), (1, 9), (2, 16)]

yr
o p notice how enumerate started at i=2

C and the index value returned by enumerate is 0, not 2


Making an Iterable

This behavior is no different than with any other iterator


m y
d e
a
As we saw before, the solution is to create an iterable that returns a new iterator every time

c
te A
y
def squares(n): class Squares:

B
for i in range(n): def __init__(self, n):
yield i ** 2

a th
self.n = n
new instance of
the generator

© M def __iter__(self):
return squares(n)
sq = Squares(n)
h t
yr i g
o p
l1 = list(sq) l1 à [0, 1, 4, 9, 16]

C
l2 = list(sq) l2 à [0, 1, 4, 9, 16]
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Comprehension Syntax

m y
e
We already covered comprehension syntax when we studied list comprehensions

c a d
A
l = [i ** 2 for i in range(5)]

y te
As well as more complicated syntax:
th B
• if statements
M a
©
• multiple nested loops

t
• nested comprehensions

h
[(i, j)
yr i g
p
for i in range(1, 6) if i%2==0

o
C
for j in range(1, 6) if j%3==0]

[[i * j for j in range(5)] for i in range(5)]


Generator Expressions

Generator expressions use the same comprehension syntax


m
à including nesting, if
y
d e
but instead of using [] we use ()
c a
te A
y
[i ** 2 for i in range(5)] (i ** 2 for i in range(5))

th B
a
a list is returned a generator is returned

evaluation is eager
© M evaluation is lazy

h t
has local scope

yr i g has local scope

o p
C
can access nonlocal can access nonlocal
and global scopes and global scopes

iterable iterator
Resource Utilization

List comprehensions are eager Generators are lazy

m y
all objects are created right away
d e
object creation is delayed until requested by next()

c a
A
à takes longer to create/return the list à generator is created/returned immediately

à iteration is faster (objects already created)

y te
à iteration is slower (objects need to be created)

th B
M a
if you iterate through all the elements à time performance is about the same
if you do not iterate through all the elements à generator more efficient

t ©
i g h
à entire collections is loaded into memory à only a single item is loaded at a time

p yr
o
in general, generators tend to have less memory overhead

C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Delegating to another iterator

Often we may need to delegate yielding elements to another iterator


m y
d e
file1.csv file2.csv file3.csv
c a
def read_all_data():
te A
for file in ('file1.csv', 'file2.csv', 'file3.csv'):

B y
h
with open(file) as f:
for line in f:
a t
M
yield line

t ©
The inner loop is basically just using the file iterator and yielding values directly

i g h
yr
Essentially we are delegating yielding to the file iterator

o p
C
Simpler Syntax

We can replace this inner loop by using a simpler syntax: yield from
m y
d e
a
def read_all_data():
for file in ('file1.csv', 'file2.csv',
'file3.csv'):
Ac
with open(file) as f:

y te
B
for line in f:
yield line

a th
def read_all_data():

© M
t
for file in ('file1.csv', 'file2.csv',
'file3.csv'):

i g h
yr
with open(file) as f:
yield from f

o p
C
We'll come back to yield from, as there is a lot more to it!
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Background Info

m y
Along with this project is a data file: nyc_parking_tickets_extract.csv

d e
c a
Here are the first few lines of data

te A
B y
th
Summons Number,Plate ID,Registration State,Plate Type,Issue Date,Violation Code,Vehicle Body Type,Vehicle Make,Violation Description

a
4006478550,VAD7274,VA,PAS,10/5/2016,5,4D,BMW,BUS LANE VIOLATION
4006462396,22834JK,NY,COM,9/30/2016,5,VAN,CHEVR,BUS LANE VIOLATION

M
4007117810,21791MG,NY,COM,4/10/2017,5,VAN,DODGE,BUS LANE VIOLATION

©
h t
g
àfields separated by commas

yr i
à first row contains the field names

o p
C
à data rows are a mix of data types: string, date, int

Note that an end-of-line character is not visible, but it's there!


Goal 1

m y
Your first goal is to create a lazy iterator that will produce a named tuple for each row of data

d e
c a
The contents of each tuple should be an appropriate data type (e.g. date, int, string)

te A
y
You can use the split method for string to split on the comma

B
th
You will need to use the strip method to remove the end-of-line character (\n)

a
© M
t
Remember, the goal is to produce a lazy iterator

h
yr i g
p
à you should not be reading the entire file in memory and then processing it

C o
à the goal is to keep the required memory overhead to a minimum

Please stick to Python's built-ins and the standard library only!


Goal 2

Calculate the number of violations by car make.


m y
d e
Use the lazy iterator you created in Goal 1
c a
te A
Use lazy evaluation whenever possible

B y
a th
M
You can choose otherwise, but I would store the make and violation counts as a dictionary

t ©
h
à key = car make

yr i g
à value = # violations

o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Python has many tools for working with iterables

You should already know almost all of these:

m y
iter reversed next len slicing
d e
c a
A
zip

filter

y te
sorted

th B
enumerate

M a built-in

min max sum

t ©
all any
i g h
map
p yr
C
reduce
o functools module
The itertools module

Slicing islice

m y
d e
Selecting and Filtering dropwhile takewhile

c
compress
a filterfalse

Chaining and Teeing chain tee


te A
B y
Mapping and Reducing starmap

a
accumulate
th
Infinite Iterators count
© M
cycle repeat

h t
Zipping

yr i g
zip_longest

o p
C
Combinatorics product permutations
combinations
combinations_with_replacement
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Aggregators

m y
Functions that iterate through an iterable and return a single value that (usually) takes into
account every element of the iterable
d e
c a
min(iterable) à minimum value in the iterable

te A
B y
h
max(iterable) à maximum value in the iterable

a t
sum(iterable)
M
à sum of all the values in the iterable

©
h t
yr i g
o p
C
Associated Truth Values

You should already know this, but let's review briefly:


m y
d e
a
Every object in Python has an associated truth value bool(obj) à True / False

Ac
Every object has a True truth value, except:

y te
• None
th B
• False
M a
©
• 0 in any numeric type (e.g. 0, 0.0, 0+0j, …)

t
i g h
• empty sequences (e.g. list, tuple, string, …)

yr
• empty mapping types (e.g. dictionary, set, …)

o p
• custom classes that implement a __bool__ or __len__

C
method that returns False or 0

which have a False truth value


The any and all functions

m y
d e
any(iterable)

c a
à returns True if any (one or more) element in iterable is truthy

à False otherwise

te A
B y
all(iterable)
a th
à returns True if all the elements in iterable are truthy

M
à False otherwise

©
h t
yr i g
o p
C
Leveraging the any and all functions

m y
Often, we are not particularly interested in the direct truth value of the elements in our iterables

d e
à want to know if any, or all, satisfy some condition
c a
à if the condition is True

te A
y
A function that takes a single argument and returns True or False is called a predicate

B
a th
We can make any and all more useful by first applying a predicate to each element of the iterable

© M
h t
yr i g
o p
C
Example

Suppose we have some iterable l = [1, 2, 3, 4, 100]

m y
and we want to know if: every element is less than 10
d e
c a
First define a suitable predicate: pred = lambda x: x < 10
te A
B y
a th
Apply this predicate to every element of the iterable:

© M
results = [pred(1), pred(2), pred(3), pred(4), pred(100)]

à [True,
h tTrue, True, True, False]

yr i g
o p
Then we use all on these results all(results) à False

C
How do we apply that predicate?

The map function map(fn, iterable)


m y
d e
c
à applies fn to every element of iterable
a
te A
A comprehension:
B y
(fn(item) for item in iterable)

a th
Or even:

© M
new_list = []
h t
for item in iterable:

yr i g
new_list.append(fn(item))

o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
itertools.islice

We know that we can slice sequence types seq[i:j:k]


m y
d e
a
seq[slice(i, j, k)]

Ac
te
We can also slice general iterables (including iterators of course)

B y
à islice(iterable, start, stop, step)

a th
from itertools import islice

© M
t
l = [1, 2, 3, 4]

i g
result = islice(l, 0, 3)
h
p
list(result)
yr à [1, 2, 3]

C o
à islice returns a lazy iterator

list(result) à [] even though l was a list!


m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
The filter function

You should already be familiar with the filter function


m y
à filter(predicate, iterable)

d e
à returns all elements of iterable where predicate(element) is True
c a
predicate can be None – in which case it is the identity function
te A
y
f(x) à x

h B
à in other words, truthy elements only will be retained

t
à filter returns a lazy iterator
M a
t ©
g h
We can achieve the same result using generator expressions:

i
p yr
(item for item in iterable if pred(item)) predicate is not None

C o
(item for item in iterable if item)
predicate is None
or (item for item in iterable if bool(item))
Example

m y
filter(lambda x: x < 4, [1, 10, 2, 10, 3, 10]) à 1, 2, 3

d e
c a
te A
y
filter(None, [0, '', 'hello', 100, False]) à 'hello', 100

th B
M a
©
à remember that filter returns a (lazy) iterator

h t
yr i g
o p
C
itertools.filterfalse

This works the same way as the filter function


m y
but instead of retaining elements where the predicate evaluates to True
d e
c a
A
it retains elements where the predicate evaluates to False

y te
Example

th B
M a
filterfalse(lambda x: x < 4, [1, 10, 2, 10, 3, 10]) à 10, 10, 10

t ©
i g h
filterfalse(None, [0, '', 'hello', 100, False]) à 0, '', False

p yr
C o
à filterfalse returns a (lazy) iterator
itertools.compress

No, this is not a compressor in the sense of say a zip archive!

m y
d e
a
It is basically a way of filtering one iterable, using the truthiness of items in another iterable

Ac
te
data = ['a', 'b', 'c', 'd', 'e']

B y
h
selectors = [True, False, 1, 0]

t
None

M
à a, c
a
©
compress(data, selectors)

h t
yr i g
à compress returns a (lazy) iterator

o p
C
itertools.takewhile

m y
e
takewhile(pred, iterable)

c a d
A
The takewhile function returns an iterator that will yield items while pred(item) is Truthy

y te
à at that point the iterator is exhausted

th B
a
even if there are more items in the iterable whose predicate would be truthy

M
t ©
h
takewhile(lambda x: x < 5, [1, 3, 5, 2, 1]) à 1, 3

yr i g
o p
à takewhile returns a (lazy) iterator

C
itertools.dropwhile

m y
e
dropwhile(pred, iterable)

c a d
A
The dropwhile function returns an iterator that will start iterating (and yield all remaining elements)

te
once pred(item) becomes Falsy

B y
dropwhile(lambda x: x < 5, [1, 3, 5, 2, 1])
a th à 5, 2, 1

© M
h t
à dropwhile returns a (lazy) iterator

yr i g
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
itertools.count à lazy iterator

The count function is an infinite iterator


m y
d e
similar to range à start, step
c a
different from range à no stop à infinite
te A
B y
h
à start and step can be any numeric type float

a t complex
Example

© M Decimal

count(10, 2)
h t
à 10, 12, 14, … bool False à 0

yr i g True à 1

p
count(10.5, 0.1) à 10.5, 10.6, 10.7, …

C o
takewhile(lambda x: x < 10.8, count(10.5, 0.1))
à 10.5, 10.6, 10.7
itertools.cycle à lazy iterator

m y
e
The cycle function allows us to loop over a finite iterable indefinitely

c a d
Example

te A
B y
h
cycle(['a', 'b', 'c']) à 'a', 'b', 'c', 'a', 'b', 'c', …

a t
© M
Important

h t
yr i g
If the argument of cycle is itself an iterator à iterators becomes exhausted

o p
cycle will still produce an infinite sequence

C
à does not stop after the iterator becomes exhausted
itertools.repeat à lazy iterator

m y
e
The repeat function simply yields the same value indefinitely

c a d
A
repeat('spam') à 'spam', 'spam', 'spam', 'spam', …

y te
th B
Optionally, you can specify a count to make the iterator finite

M
repeat('spam', 3) à 'spam', 'spam', 'spam' a
t ©
Caveat
i g h
p yr
The items yielded by repeat are the same object

C o
à they each reference the same object in memory
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Chaining Iterables itertools.chain(*args) à lazy iterator

This is analogous to sequence concatenation


m y
but not the same!
d e
à dealing with iterables (including iterators)
c a
à chaining is itself a lazy iterator

te A
B y
h
We can manually chain iterables this way: iter1 iter2 iter3

a t
for it in (iter1, iter2, iter3):

© M yield from it

h
Or, we an use chain as follows: t
yr i g
p
for item in chain(iter1, iter2, iter3):

o
C
print(item)

Variable number of positional arguments – each argument must be an iterable


Chaining Iterables

m y
What happens if we want to chain from iterables contained inside another, single, iterable?

d e
l = [iter1, iter2, iter3]

c a
chain(l) à l
te A
B y
h
What we really want is to chain iter1, iter2 and iter3

a t
We can try this using unpacking:

© Mchain(*l)

t
à produces chained elements from iter1, iter2 and iter3

BUT
i g h
unpacking is eager – not lazy!

p yr
If l was a lazy iterator, we essentially iterated through l (not the sub

C o
iterators), just to unpack!

This could be a problem if we really wanted the entire chaining


process to be lazy
Chaining Iterables itertools.chain.from_iterable(it) à lazy iterator

We could try this approach: def chain_lazy(it):


m y
for sub_it in it:

d e
a
yield from sub_it

Ac
te
Or we can use chain.from_iterable

B y
chain.from_iterable(it)
a th
© M
This achieves the same result
h t
yr i g
p
à iterates lazily over it

o
C à in turn, iterates lazily over each iterable in it
"Copying" Iterators itertools.tee(iterable, n)

m y
Sometimes we need to iterate through the same iterator multiple times, or even in parallel

d e
a
We could create the iterator multiple times manually

Ac
te
iters = []

y
for _ in range(10):

B
iters.append(create_iterator())

Or we can use tee in itertools


a th
© M
t
à returns independent iterators in a tuple

i g h
tee(iterable, 10)

p yr à (iter1, iter2, …, iter10)

C o
all different objects
Teeing Iterables

One important to thing to note


m y
d e
a
The elements of the returned tuple are lazy iterators

à always!
Ac
à even if the original argument was not

y te
th B
l = [1, 2, 3, 4]

M a
tee(l, 3)
©
à (iter1, iter2, iter3)

t
i g h
yr
all lazy iterators

o p not lists!

C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Mapping and Accumulation

Mapping à applying a callable to each element of an iterable


m y
d e
à map(fn, iterable)

c a
Accumulation
te
à reducing an iterable down to a single value A
B y
à sum(iterable)
th
calculates the sum of every element in an iterable

a
à min(iterable)

© M
returns the minimal element of the iterable

à max(iterable)

h t returns the maximal element of the iterable

yr i g
à reduce(fn, iterable, [initializer])

o p à fn is a function of two arguments

C à applies fn cumulatively to elements of iterable


map

You should already be familiar with map à quick review


m y
d e
c a
map(fn, iterable) applies fn to every element of iterable, and returns an iterator (lazy)

à fn must be a callable that requires a single argument


te A
map(lambda x: x**2, [1, 2, 3, 4])
B y
à 1, 4, 9, 16 à lazy iterator

a th
M
Of course, we can easily do the same thing using a generator expression too

©
h t
g
maps = (fn(item) for item in iterable)

yr i
o p
C
reduce

You should already be familiar with reduce à quick review

m y
Suppose we want to find the sum of all elements in an iterable:

d e
l = [1, 2, 3, 4]

c a
A
sum(l) à 1 + 2 + 3 + 4 = 10

reduce(lambda x, y: x + y, l) à 1

y te
B
à1 + 2 = 3
à3 + 3 = 6
à 6 + 4 = 10
a th
To find the product of all elements:

© M
h t
reduce(lambda x, y: x * y, l) à 1

yr i g à1 * 2 = 2
à2 * 3 = 6

o p à 6 * 4 = 24

C
We can specify a different "start" value in the reduction

reduce(lambda x, y: x + y, l, 100) à 110


itertools.starmap

starmap is very similar to map


m y
d e
a
à it unpacks every sub element of the iterable argument, and passes that to the map function

Ac
à useful for mapping a multi-argument function on an iterable of iterables

y te
l = [ [1, 2], [3, 4] ]

th B
map(lambda item: item[0] * item[1], l) à 2, 12

We can use starmap:


M a
starmap(operator.mul, l) à 2, 12

t ©
we could also just use a generator expression to do the same thing:

i g h
(operator.mul(*item) for item in l)

p yr
o
We can of course use iterables that contain more than just two values:

C
l = [ [1, 2, 3], [10, 20, 30], [100, 200, 300] ]

starmap(lambda: x, y, z: x + y + z, l) à 6, 60, 600


itertools.accumulate(iterable, fn) à lazy iterator

The accumulate function is very similar to the reduce function


m y
d e
c
But it returns a (lazy) iterator producing all the intermediate results
a
à reduce only returns the final result
te A
B y
Unlike reduce, It does not accept an initializer

a th
© M
Note the argument order is not the same! reduce(fn, iterable)

h t accumulate(iterable, fn)

yr i g
à in accumulate, fn is optional

o p
C
à defaults to addition
Example

l = [1, 2, 3, 4]

m y
d e
a
functools.reduce(operator.mul, l) Ø 1 à 24
Ø
Ø
1 * 2 = 2
2 * 3 = 6
Ac
te
Ø 6 * 4 = 24

B y
a
itertools.accumulate(l, operator.mul)
th à 1, 2, 6, 24

© M
h t
yr i g
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
The zip Function à lazy iterator

We have already seen the zip function

m y
d e
a
It takes a variable number of positional arguments – each of which are iterables

Ac
te
It returns an iterator that produces tuples containing the elements of the iterables, iterated one

y
at a time

th B
It stops immediately once one of the iterables has been completely iterated over

M
à zips based on the shortest iterable a
t ©
i g h
zip([1, 2, 3], [10, 20], ['a', 'b', 'c', 'd'])

p yr
à (1, 10, 'a'), (2, 20, 'b')

C o
itertools.zip_longest(*args, [fillvalue=None])

Sometimes we want to zip, but based on the longest iterable


m y
d e
a
à need to provide a default value for the "holes" à fillvalue

Ac
zip([1, 2, 3], [10, 20], ['a', 'b', 'c', 'd'])

y te
à (1, 10, 'a'), (2, 20, 'b')

th B
M a
©
zip_longest([1, 2, 3], [10, 20], ['a', 'b', 'c', 'd'])

h t
à (1, 10, 'a'), (2, 20, 'b'), (3,None, 'c'), (None, None, 'd')

yr i g
o p
C
zip_longest([1, 2, 3], [10, 20], ['a', 'b', 'c', 'd'], -1)

à (1, 10, 'a'), (2, 20, 'b'), (3,-1, 'c'), (-1, -1, 'd')
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Grouping
Sometimes we want to loop over an iterable of elements
but we want to group those elements as we iterate through them
m y
Suppose we have an iterable containing tuples, and we want to group based
d e
on the first element of each tuple
c a
(1, 10, 100)
We would like to iterate
using this kind of
te A for key, group in groups:
print(key)
(1, 11, 101) group 1 approach:
B y for item in group:

(1, 12, 102)


a thkey
(1,
à 1
10, 100)
print(item)

(2, 20, 200)


group 2
© M (1,
(1,
11, 101)
12, 102)
(2, 21, 201)

h t
(3, 30, 300)

yr i g key à 2
(2, 20, 200)

p
(3, 31, 301) group 3 (2, 21, 201)

C o
(3, 32, 302)
key
(3,
à 3
30, 300)
(3, 31, 301)
(3, 32, 302)
itertools.groupby(data, [keyfunc]) à lazy iterator

The groupby function allows us to do precisely that


m y
d e
à normally specify keyfunc which calculates the key we want to use for grouping

iterable
c a
(1, 10, 100) Here we want to group based on the 1st
A
element of each tuple

te
(1, 11, 101) à grouping key lambda x: x[0]

B y
(1, 12, 102)

a th
groupby(iterable, lambda x: x[0])

M
(2, 20, 200)
(2, 21, 201) à iterator

t © à of tuples (key, sub_iterator)


(3, 30, 300)

i g h 1, sub_iterator à (1, 10, 100), (1, 11, 101), (1, 12, 102)
(3, 31, 301)

p yr 2, sub_iterator à (2, 20, 200), (2, 21, 201)


(3, 32, 302)

C o 3, sub_iterator à (3, 30, 300), (3, 31, 301), (3, 32, 302)

note how the sequence is sorted by the grouping key!


Important Note
The sequence of elements produced from the "sub-iterators" are all produced
from the same underlying iterator
m y
d e
iterable

c a
groups = groupby(iterable, lambda x: x[0])

A
(1, 10, 100)
(1, 11, 101) next(groups) next(iterable)

y te
next(iterable)
1, sub_iterator à (1, 10, 100), (1, 11, 101), (1, 12, 102)
next(iterable)

(1, 12, 102)

th B
(2, 20, 200)
(2, 21, 201)
next(groups)
2, sub_iterator
M a
next(iterable)
(2, 20, 200)
next(iterable)
(2, 21, 201)

(3, 30, 300)


t ©
h
next(groups) next(iterable) next(iterable) next(iterable)
(3, 31, 301)

yr i g
3, sub_iterator à (3, 30, 300), (3, 31, 301), (3, 32, 302)

(3, 32, 302)

o p
C
next(groups) actually iterates through all the elements of the current "sub-iterator"
before proceeding to the next group
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
The itertool module contains a few functions for generating

m y
e
permutations combinations

c a d
A
It also has a function to generate the Cartesian product of multiple iterables

y te
All these functions return lazy iterators

th B
M a
t ©
i g h
p yr
C o
Cartesian Product

m y
e
{1, 2, 3} x {a, b, c} (1, a)

d
(2, a) c
(3, a)

c a
(1, b)
b

te A
y
(2, b)

B
(3, b) a

(1, c)
a th
M
1 2 3
(2, c)

©
(3, c)

h t
2-dimensional:

y r i g
! ×# = (&, () & ∈ !, ( ∈ #

o
n-dimensional:
p
! × ⋯ ×!
& ' = (&&, &(, … , &' ) && ∈ !&, … , &' ∈ !'
C
Cartesian Product

Let's say we wanted to generate the Cartesian product of two lists:

m y
l1 = [1, 2, 3] l2 = ['a', 'b', 'c', 'd']
d e
à notice not same length

c a
def cartesian_product(l1, l2):

te A
y
for x in l1:

B
for y in l2:
yield (x, y)

a th
© M
t
cartesian_product(l1, l2)

i g h
à (1, 'a'), (1, 'b'), (1, 'c'), (1, 'd'), …, (3,'d')

p yr
C o
itertools.product(*args) à lazy iterator

m y
d e
a
l1 = [1, 2, 3] l2 = ['a', 'b', 'c', 'd']

Ac
te
product(l1, l2) à (1, 'a'), (1, 'b'), (1, 'c'), (1, 'd'), …, (3,'d')

B y
l3 = [100, 200]

a th
product(l1, l2, l3) à (1,
© M
'a', 100), (1, 'a', 200),

h t
(1, 'b', 100), (1, 'b', 200),

yr i g (1,

'c', 100), (1, 'c', 200),

o p (3, 'd', 100), (3, 'd', 200)

C
Permutations

This function will produce all the possible permutations of a given iterable

m y
In addition, we can specify the length of each permutation
d e
c a
à maxes out at the length of the iterable

itertools.permutations(iterable, r=None)
te A
B y
h
à r is the size of the permutation

a t
M
à r = None means length of each permutation is the length of the iterable

©
h t
i g
Elements of the iterable are considered unique based on their position, not their value

yr
p
à if iterable produces repeat values

o
C
then permutations will have repeat values too
Combinations

Unlike permutations, the order of elements in a combination is not considered

m y
à OK to always sort the elements of a combination
d e
c a
Combinations of length r, can be picked from a set

te A
• without replacement

B y
à once an element has been picked from the set it

a th
cannot be picked again

• with replacement

© M
à once an element has been picked from the set it can
be picked again

h t
yr i g
o p
C
itertools.combinations(iterable, r)

itertools.combinations_with_replacement(iterable, r)

m y
d e
Just like for permutations:
c a
te A
the elements of an iterable are unique based on their position, not their value

B y
a th
The different combinations produced by these functions are sorted

M
based on the original ordering in the iterable

©
h t
yr i g
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Data Files

You are given four data files personal_info.csv

m y
vehicles.csv

d e
employment.csv
c a
update_status.csv

te A
B y
Each file contains a common key that uniquely identifies each row – SSN

a th
©
à appears only once in every file M
You are guaranteed that every SSN number

h
à is present in all 4 files
t
yr i g
à the order of SSN in each file is the same

o p
C
To make the approach easier, I am going to break it down into
multiple smaller goals
Goal 1

Create (lazy) iterators for each of the four files

m y
d e
à returns named tuples

c a
à data types are appropriate (string, date, int, etc)

te A
B
à the 4 iterators are independent of each other (for now)
y
a th
M
You will want to make use of the standard library module csv for this

©
h t
yr i g
o p
C
Reading CSV Files

m y
e
CSV files are files that contain multiple lines of data à strings

The individual data fields in a row are:


c a d
te A
y
delimited by some separating character à comma, tab are common

th B
a
in addition, individual fields may be wrapped in further delimiters à quotes are common

© M
à this allows the field value to contain what may be otherwise interpreted as a delimiter

h t
Example

yr i g
1,hello,world à 3 values: 1 hello world

o p
C 1,"hello,world" à 2 values: 1 hello, world
Reading CSV Files

1,hello,world
m y
1,"hello, world"
d e
c a
A
Simply splitting on the comma is not going to work in the second example! à 1 "hello world"

csv.reader is exactly what we need


y
à lazy iterator te
th B
a
à we can tell it what the delimiter is

© M à we can tell it what the quote character is

t
Example

i g h
Mueller-Rath,Human Resources,05-8069298,123-88-3381

yr
"Schumm, Schumm and Reichert",Engineering,73-3839744,125-07-9434

o p
def read_file(file_name):

C
with open(file_name) as f:
reader = csv.reader(f, delimiter=',', quotechar='"')
yield from reader
à yields lists of strings containing each field value
Goal 2

Create a single iterable that combines all the data from all four files
m y
d e
a
à try to re-use the iterators you created in Goal 1

c
à by combining I mean one row per SSN containing data from all four files in a single named tuple

A
y te
B
Once again, make sure returned data is a single named tuple containing all fields

a th
M
When you "combine" the data, make sure the SSN's match!

©
h t
i g
Remember that all the files are already sorted by SSN, and that each SSN appears once, and

yr
only once, in every file

o p à viewing files side by side, all the row SSN's will align correctly

C
Don't repeat the SSN 4 times in the named tuple – once is enough!
Goal 3

Some records are considered stale (not updated recently enough)


m y
d e
A record is considered stale if the last update date < 3/1/2017
c a
te A
The update date is located in the update_status.csv file
B y
a th
M
Modify your iterator from Goal 2 to filter out stale records

©
h t
i g
Make sure your iterator remains lazy!

yr
o p
C
Goal 4

m y
e
For non-stale records, generate lists of number of car makes by gender

c a d
If you do this correctly, the largest groups for each gender are:

te A
B y
Female à Ford and Chevrolet (both have 42 persons in those groups)

Male à Ford (40 persons in the group)


a th
© M
h t
yr i g
o p
C
m y
d e
c a
te A
B y
Good
a thluck!
© M
h t
yr i g
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
What is a context?

m y
Oxford dictionary:
e
The circumstances that form the setting for an event, statement, or
idea, and in terms of which it can be fully understood.
d
c a
In Python: the state surrounding a section of code
te A
B y
# module.py
a th global scope
f = open('test.txt', 'r')

© M f à a file object
print(f.readlines())
h t
f.close()
yr i g
o p
C
when print(f.readlines()) runs, it has a context in which it runs

à global scope
Managing the context of a block of code

Consider the open file example: # module.py


m y
d e
a
f = open('test.txt', 'r')
perform_work(f)
f.close()
Ac
There could be an exception before we close the file
y te
à file remains open!

th B
a
Need to better "manage" the context that perform_work(f) needs

f = open('test.txt', 'r')

© M
try:
h t
perform_work(f)
finally:
yr i g
p
f.close()

o
C
this works à writing try/finally every time can get cumbersome
à too easy to forget to close the file
Context Managers

m y
e
à create a context (a minimal amount of state needed for a block of code)

à execute some code that uses variables from the context

c a d
te
à automatically clean up the context when we are done with it
A
B y
à enter context
a th
à open file

à work within context


© M à read the file

h t
à exit context

yr i g à close the file

o p
C
Example

m y
e
with open('test.txt', 'r') as f: create the context à open file

print(f.readlines()) work inside the context


c a d
te A
exit the context à close file

B y
Context managers manage data in our scope
a th à on entry

© M à on exit

h t
yr i g
Very useful for anything that needs to provide Enter / Exit Start / Stop Set / Reset

o p
C
à open / close file
à start db transaction / commit or abort transaction
à set decimal precision to 3 / reset back to original precision
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
try…finally…

The finally section of a try always executes


m y
d e
try:
c a

except: always executes
te A

B y
even if an exception occurs in except block

h
finally:

a t
© M
Works even if inside a function and a return is in the try or except blocks

h t
yr i g
Very useful for writing code that should execute no matter what happens

o p
C
But this can get cumbersome!

There has to be a better way!


Pattern

create some object


m y
d e
do some work with that object
c a
clean up the object after we're done using it
te A
B y
We want to make this easy
a th
© M
h t
à automatic cleanup after we are done using the object

yr i g
o p
C
Context Managers PEP 343 object returned from context (optional)

with context as obj_name:


m y
# with block (can use obj_name)
d e
c a
te
# after the with block, context is cleaned up automatically
A
B y
Example

a th
with open(file_name) as f:

© M enter the context (optional) an object is returned


# file is now open

h t
yr i
# file is now closed g exit the context

o p
C
The context management protocol

Classes implement the context management protocol by implementing two methods:


m y
d e
a
__enter__ setup, and optionally return some object

__exit__ tear down / cleanup


Ac
y te
B
over simplified
with CtxManager() as obj:
# do something th
mgr = CtxManager()

a
exception handling

M
# done with context obj = mgr.__enter__()

t © try:

h
# do something

yr i g finally:
# done with context

o p mgr.__exit__()

C
Use Cases

y
Very common usage is for opening a file (creating resource) and closing the file (releasing resource)

m
d
Context managers can be used for much more than creating and releasing resources
e
c a
Common Patterns

te A
• Open – Close
B y


Lock – Release
Change – Reset
a th


Start – Stop
Enter – Exit
© M
h t
Examples
yr i g
o p
• file context managers

C
• Decimal contexts
How Context Protocol Works class MyClass:
def __init__(self):
works in conjunction with a with statement # init class

m y
d e
def __enter__(self):

a
my_obj = MyClass()

c
return obj
works as a regular class
__enter__, __exit__ were not called

te A
def __exit__(self, + …):

B y # clean up obj

with MyClass() as obj:


a th
© M
à creates an instance of MyClass à no associated symbol, but an instance exists

t
à calls my_instance.__enter__() à my_instance

i g h
à return value from __enter__ is assigned to obj

p yr (not the instance of MyClass that was created)

C o
after the with block, or if an exception occurs inside the with block:

à my_instance.__exit__ is called
Scope of with block

The with block is not like a function or a comprehension


m y
d e
c a
The scope of anything in the with block (including the object returned from __enter__)

is in the same scope as the with statement itself


te A
B y
# module.py
th
f is a symbol in global scope

a
with open(fname) as f:
© M
row = next(f)

h t
yr i g row is also in the global scope

print(f)

o p f is closed, but the symbol exists

C
print(row)
row is available and has a value
The __enter__ Method

m y
def __enter__(self):

d e
c a
This method should perform whatever setup it needs to

te A
It can optionally return an object
y
à as returned_obj

B
a th
That's all there is to this method

© M
h t
yr i g
o p
C
The __exit__ Method

More complicated…
m y
d e
a
Remember the finally in a try statement? à always runs even if an exception occurs

__exit__ is similar
Ac
à runs even if an exception occurs in with block

y te
th B
But should it handle things differently if an exception occurred?

à maybe
M a
à so it needs to know about any exceptions that occurred

t ©
g h
à it also needs to tell Python whether to silence the exception, or let it propagate

i
p yr
C o
The __exit__ Method

with MyContext() as obj:


m y
raise ValueError

d e
print ('done')
c a
te A
Scenario 1

B y
th
__exit__ receives error, performs some clean up and silences error

a
print statement runs

© M
t
no exception is seen

i g h
yr
Scenario 2

o p
__exit__ receives error, performs some clean up and let's error propagate

C
print statement does not run

the ValueException is seen


The __exit__ Method

m y
e
Needs three arguments: à the exception type that occurred (if any, None otherwise)

a d
à the exception value that occurred (if any, None otherwise)

c
A
à the traceback object if an exception occurred (if any, None otherwise)

y te
B
Returns True or False: à True = silence any raised exception

th
à False = do not silence a raised exception

a
© M
t
def __exit__(self, exc_type, exc_value, exc_trace):
# do clean up work here

i g h
yr
return True # or False

o p
---------------------------------------------------------------------------

C
ValueError Traceback (most recent call last)
<ipython-input-14-39a69b57f322> in <module>()
1 with MyContext() as obj:
----> 2 raise ValueError
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Pattern: Open - Close

Open File
m y
operate on open file
d e
Close File
c a
te A
B y
h
Open socket
operate on socket
a t
Close socket

© M
h t
yr i g
o p
C
Pattern: Start - Stop

Start database transaction


m y
perform database operations
d e
c a
A
Commit or rollback transaction

y te
Start timer
th B
perform operations
Ma
Stop timer
t ©
i g h
p yr
C o
Pattern: Lock - Release

m y
e
acquire thread lock

perform some operations


c a d
release thread lock

te A
B y
a th
© M
h t
yr i g
o p
C
Pattern: Change - Reset

m y
change Decimal context precision

d e
perform some operations using the new precision
c a
reset Decimal context precision back to original value
te A
B y
a th
redirect stdout to a file
© M
h t
g
perform some operations that write to stdout

yr i
reset stdout to original value

o p
C
Pattern: Wacky Stuff!

m y
with tag('p'):

d e
a
print('some text', end='') <p>some text</p>

Ac
with tag('p'):
y te
print('some', end='')

th B
a
<p>some <b>bold<b> text</p>
with tag('b'):
print('bold ', end='')

© M
print('text', end='')

h t
yr i g
o p
C
Pattern: Wacky Stuff!

m y
with ListMaker(title='Items', prefix='- ',

d e
a
indent=3, stdout='myfile.txt') as lm:

lm.print('Item 1')
Ac
te
>> myfile.txt

y
with lm :

lm.print('item 1a')
th B Items

a
- Item 1

M
lm.print('item 1b')
- item 1a
lm.print(Item 2')

t © - item 1b
with lm :

i g h - Item 2

p yr
lm.print('item 2a') - item 2a

C o
lm.print('item 2b') - item 2b
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Context Manager Pattern

m y
create context manager

d e
c a
enter context (and, optionally, receive an object)

te A
do some work

B y
exit context

a th
© M
h t
yr i g
with open(file_name) as f:

o p
data = file.readlines()

C
Mimic Pattern using a Generator

def open_file(fname, mode): ctx = open_file('file.txt', 'r')


m y
d e
a
f = open(fname, mode)

c
f = next(ctx) opens file, and yields it

A
try:

te
yield f next(ctx) closes file
finally:
B y
h
à StopIteration exception

t
f.close()

Ma
©
ctx = open_file('file.txt', 'r')
f = next(ctx)

h t
try:

yr i g
p
# do work with file
finally:

C
try: o
next(ctx)
except StopIteration:
pass
This works in general

def gen(args):

m y
# do set up work here

d e
try:
c a
yield object
A
This is quite clunky still

te
finally:

B ybut you should see that we can almost

h
# clean up object here

a t create a context manager pattern using


a generator function!
ctx = gen(…)

© M
t
obj = next(ctx)

i g h
yr
try:
# do work with obj

o p finally:

C
try:
next(ctx)
except StopIteration:
pass
Creating a Context Manager from a Generator Function

def open_file(fname, mode): generator function

m y
e
f = open(fname, mode)
generator object à gen = open_file('test.txt', 'w')

d
try:
yield f

c a
f = next(gen)

A
finally:
# do work with f

te
f.close()

B y next(f) à closes f

class GenContext:

a th
M
def __init__(self, gen): gen = open_file('test.txt', 'w')
self.gen = gen

t © with GenContext(gen) as f:

h
# do work

g
def __enter__(self):

yr i
obj = next(self.gen)

p
return obj

C o
def __exit__(self, exc_type, exc_value, exc_tb):
next(self.gen)
return False
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
So far…

we saw how to create a context manager using a class and a generator function
m y
d e
a
def gen_function(args):

try:
Ac
yield obj single yield

y te
the return value of __enter__

finally:

th
cleanup phase
B __exit__

M a
class GenContextManager:
t ©
g
def __init__(gen_func):

i h
yr
self.gen = gen_func()

o p
def __enter__(self):

C
return next(self.gen)

def __exit__(self, …):


returns what was yielded

next(self.gen) runs the finally block


Usage

with GenContextManager(gen_func):
m y

d e
c a
We can tweak this a bit to also allow passing in
arguments to gen_func
te A
class GenContextManager:
def __init__(gen_obj):

B y self.gen = gen_obj

And usage now becomes:


a th def __enter__(self):

M
return next(self.gen)
gen = gen_func(args)

t © def __exit__(self, …):

i g
with GenContextManager(gen):
h next(self.gen)

yr

o p
C
This works, but we have to create the generator object first,
and use the GenContextManager class

à lose clarity of what the context manager is


class GenContextManager:
Using a decorator to encapsulate these steps def __init__(gen_obj):

y
self.gen = gen_obj
gen = gen_func(args)

e m
def __enter__(self):

with GenContextManager(gen):

c a d
return next(self.gen)

A
… def __exit__(self, …):

te
next(self.gen)

B y
def contextmanager_dec(gen_fn):

a th
def helper(*args, **kwargs):

© M
t
gen = gen_fn(*args, **kwargs)

h
yr i g
return GenContextManager(gen)

p
return helper

o
C
Usage Example def contextmanager_dec(gen_fn):

y
@contextmanager_dec def helper(*args, **kwargs):

m
def open_file(f_name):
f = open(f_name)
e
gen = gen_fn(*args, **kwargs)

d
a
try:

c
yield f return GenContextManager(gen)

finally:
te A
return helper
f.close()

B y
à open_file = contextmanager_dec(open_file)
a th
M
à open_file is now actually the helper closure

©
calling open_file(f_name)
h t
yr i g
à calls helper(f_name) [free variable gen_fn = open_file ]

o p
à creates the generator object

C
à returns GenContextManager instance

à with open_file(f_name)
The contextlib Module

One of the goals when context managers were introduced to Python


m y
e
PEP 343
was to ensure generator functions could be used to easily create them

c a d
Technique is basically what we came up with

te A
à more complex à exception handling
B y
a th
à if an exception occurs in with block, needs to be propagated

© M
back to generator function
__exit__(self, exc_type, exc_value, exc_tb)

h t
g
à enhanced generators as coroutines à later

yr i
This is implemented for us in the standard library:

o p
Ccontextlib.contextmanager

à decorator which turns a generator function into a context manager


m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Project Setup

In this project you are provided two CSV files


m y
d e
cars.csv

c a
à first row contains the field names

A
personal_info.csv

y te
B
The basic goal will be to create a context manager that only requires the file name

a th
and provides us an iterator we can use to iterate over the data in those files

© M
h t
The iterator should yield named tuples with field names based on the header row in the CSV file

yr i g
p
For simplicity, we assume all fields are just strings

o
C
Goal 1

For this goal implement the context manager using a context manager class
m y
d e
i.e. a class that implements the context manager protocol

c a
te A
__enter__ __exit__

B y
Make sure your iterator uses lazy evaluation
a th
© M
h t
If you can, try to create a single class that implements both

yr i g
the context manager protocol and the iterator protocol

o p
C
Goal 2

m y
e
For this goal, re-implement what you did in Goal 1, but using a generator function instead

c a d
A
You'll have to use the @contextmanager from the contextlib module

te
B y
a th
© M
h t
yr i g
o p
C
Information you may find useful

File objects implement the iterator protocol: with open(f_name) as f:


m y
for row in f:

d e
a
print(row)

Ac
te
But file objects also support just reading data using the read function

B y
th
we specify how much of the file to read (that can span multiple rows)

a
© M
when we do this a "read head" is maintained à we can reposition this read head à seek()

h t
i g
with open(f_name) as f:

yr
print(f.read(100)) à reads the first 100 characters à read head is now at 100

o p
print(f.read(100)) à reads the next 100 characters à read head is now at 200

Cf.seek(0) à moves read head back to beginning of file


Information you may find useful

CSV files can be read using csv.reader

m y
d e
But CSV files can be written in different "styles" à dialects
c a
te A
john\tcleese\t42

y
john,cleese,42 john;cleese;42 john|cleese|42

th B
a
"john","cleese","42" 'john';'cleese';'42'

© M
The csv module has a Sniffer class we can use to auto-determine the specific dialect

h t
à need to provide it a sample of the csv file

yr i
with open(f_name) as f: g
o p
sample = f.read(2000)

C
dialect = csv.Sniffer().sniff(sample)

with open(f_name) as f:
reader = csv.reader(f, dialect)
m y
d e
c a
te A
B y
Good
a th
Luck!
© M
h t
yr i g
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Concurrency vs Parallelism

m y
concurrency parallelism

d e
Task 1 Task 2 Task 1 Task 2
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Cooperative vs Preemptive Multitasking

m y
e
cooperative preemptive

Task 1 Task 2
a
Task 1

c d
Task 2

voluntary not
te A
y
voluntary
voluntary!

B
yield yield not

a th voluntary!

© M
h t
yr i g
o p
C
completely controlled by developer not controlled by developer
Python some sort of
coroutines scheduler involved
threading
Coroutines

m y
Cooperative multitasking

d e
c a
à Python programs execute on a "single thread"

A
Concurrent, not parallel

te
Global Interpreter Lock à GIL

B y
Two ways to create coroutines in Python
a th
à generators
© M
à uses extended form of yield à recent addition: asyncio

h t
yr i g
à native coroutines à uses async / await

o p
C
This section is not about

m y
asyncio

d e
native coroutines
c a
threading
te A
B y
multiprocessing à parallelism

a th
© M
This section is about
h t
yr i g
learning the basics of generator-based coroutines

o p
C
some practical applications of these coroutines
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
What is a coroutine?

y
cooperative routines

m
inner is now in control

e
subroutines

running_averages is in control
subroutine is called

c a d stack frame
def running_averages(iterable):

te A created
(inner)

y
avg = averager()
for value in iterable:
running_average = avg(value)
th B
def averager():
print(running_average)

M a total = 0
count = 0

t © def inner(value):

h
nonlocal total

yr i g
running_averages is back in control
nonlocal count
total += value

o p count += 1
return total / count

C stack frame
destroyed
(inner)
return inner

subroutine terminates may, or may not


return a value
coroutine

m y
d e stack frame

a
def running_averages(iterable):
create instance of running_averager
start coroutine
Ac created
(running_averager)
for value in iterable:
send value to running_averager
y te
def running_averager():
received value back

th B total = 0

a
print(received value) count = 0

M
running_average = None
while True:

t © wait for value

i g h receive new value


calculate new average

yr
yield new average

o p coroutine is still active

C waiting for next value to be sent

We'll come back to this example in another lecture


Abstract Data Structures

What are queues and stacks?

m y
d e
A queue is a data structure that supports first-in first-out (FIFO) addition/removal of items

c a
A
remove elements from front of queue

te
FIFO
add elements to back of queue

B y
a th
A stack is a data structure that supports last-in first-out addition/removal of items

© M
h t last pushed element is removed first (popped)

yr i g LIFO

o p
push elements

C
on top of stack

many different ways of creating


why abstract?
concrete implementations
Using lists

m y
e
stack lst.append(item) à appends item to end of list
lst.pop()
a d
à removes and returns last element of list

c
te A
queue lst.insert(0, item)
B y
à inserts item to front of list

lst.pop()
a th
à removes and returns last element of list

© M
t
So a list can be used for both a stack and a queue

i g h
yr
But, inserting elements in a list is quite inefficient!

p
C o numbers coming up in a bit…
The deque data structure

Python's collections module implements a data structure called deque


m y
d e
This is a double-ended queue

c a
A
à very efficient at adding / removing items from both front and end of a collection

te
B y
h
from collections import deque

a t
dq = deque()

© M
t
dq = deque(iterable) dq = deque(maxlen=n)

i g h
yr
dq.append(item) dq.appendleft(item)

o
dq.pop() p dq.popleft()

C
dq.clear() len(dq)
Timings # items = 10_000
# tests = 1_000
(times in seconds)

m y
list deque
d e
c a
append (right) 0.87 0.87 --

te A
pop (right) 0.002 0.0005 x4
B y
a th
insert (left) 20.80

© M
0.84 x25

h t
g
pop (left) x24

i
0.012 0.0005

p yr
C o
Another use case…

producer consumer
m y
grabs data from queue

d e
c a
te A
B y
h
consumer

t
producer

M
queue a
t ©
i g h
adds data to queue
performs work

p yr
C o
Implementing a Producer/Consumer using Subroutines

à create an "unlimited" deque

m y
à run producer to insert all elements into deque
d e
à run consumer to remove and process all elements in deque
c a
def produce_elements(dq):
te A
def consume_elements(dq):
for i in range(1, 100_000):

B y
while len(dq) > 0:

h
dq.appendleft(i) item = dq.pop()

a t print('processing item', item)

© M
def coordinator():

h t
g
dq = deque()

yr i
producer = produce_elements(dq)

p
consume_elements(dq)

C o
Implementing a Producer/Consumer using Generators

m y
e
à create a limited size deque
à coordinator creates instance of producer generator

c a d
à coordinator creates instance of consumer generator

te A
à producer runs until deque is filled

B y
à yields control back to caller
à consumer runs until deque is empty
a th repeat until producer is "done"
or controller decides to stop

© M
à yields control back to caller

h t
yr i g
o p
C
Implementing a Producer/Consumer using Generators

def produce_elements(dq, n): def consume_elements(dq):

m y
e
for i in range(1, n): while True:
dq.appendleft(i)
if len(dq) == dq.maxlen:
while len(dq) > 0:

c a
item = dq.pop() d
yield
A
# process item

te
yield
def coordinator():
dq = deque(maxlen=10)
B y
t
producer = produce_elements(dq, 100_000)
consumer = consume_elements(dq)
a h
while True:
try:
© M
t
Notice how yield is not used to yield values

h
next(producer)

break
yr i g
except StopIteration: but to yield control back to controller

o p
finally:
next(consumer)

C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Generators can be in different states

def my_gen(f_name):

m y
e
f = open(f_name)

d
try:
for row in f:

c a
A
yield row.split(',')

te
finally:

y
f.close()

th B
create the generator
rows = my_gen()
M a
à CREATED

t ©
i g
run the generator
h
p yr
next(rows) à RUNNING

C o until yield à SUSPENDED

until generator return à CLOSED


Inspecting a generator's state

m y
e
use inspect.getgeneratorstate to see the current state of a generator

c a d
from inspect import getgeneratorstate

te A
B y
g = my_gen()
th
getgeneratorstate(g) à GEN_CREATED

a
row = next(g)
M
getgeneratorstate(g) à GEN_SUSPENDED

©
h t
g
getgeneratorstate(g) à GEN_CLOSED

i
lst(g)

p yr
C o
(inside the generator code while it is running)
getgeneratorstate(g) à GEN_RUNNING
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
So far…

We saw how yield can produce values


m y
d e
à use iteration to get the produced values à next()

c a
te A
After a value is yielded, the generator is suspended
B y
a th
M
How about sending data to the generator upon resumption?

©
h t
yr i g
Enhancement to generators introduced in Python 2.5 PEP 342

o p
C
Sending data to a generator

yield is actually an expression


m y
d e
it can yield a value (like we have seen before)
a
yield 'hello'

c
it can also receive values
te A
B y
h
it is used just like an expression would received = yield

a t
© M
we can combine both

h t received = yield 'hello'

yr i g
p
à works, but confusing!

C o à use sparingly
What's happening?

y
def gen_echo():
while True:
received = yield
e m
print('You said:', received)

c a d
echo = gen_echo() à CREATED

te A
has not started running yet – not in a suspended state

B y
h
next(echo) à SUSPENDED Python has just yielded (None)

a t
generator is suspended at the yield

© M
t
we can resume and send data to the generator at the same time using send()

i g h echo.send('hello')

yr
generator resumes running exactly at the yield

o p
C
the yield expression evaluates to the just received data

then the assignment to received is made


What's happening?

m y
received = yield 'python'
d e
c a
generator is
te A
y
'python' is yielded and control is returned to caller
suspended here

th B
caller sends data to generator: g.send('hello')

M a
t ©
i g h
generator resumes

p yr
'hello' is the result of the yield expression

C o
'hello' is assigned to received

generator continues running until the next yield or return


Priming the generator
received = yield 'python'

m y
d e
c a
Notice that we can only send data if the generator is suspended at a yield

te A
So we cannot send data to a generator that is in a CREATED state – it must be in a SUSPENDED state

B y
def gen_echo():
while True:
a th
received = yield

©
print('You said:', received)M
h t
echo = gen_echo()

yr i g à CREATED echo.send('hello')

next(echo)
o p à SUSPENDED echo.send('hello')

C
à yes, a value has been yielded – and we can choose to just ignore it
à in this example, None has been yielded
Priming the generator

Don't forget to prime a generator before sending values to it!


m y
d e
à generator must be SUSPENDED to receive data

c a
à always use next() to prime

te A
B y
th
Later we'll see how we can "automatically" prime the generator using a decorator

a
© M
h t
yr i g
o p
C
Using yield…

m y
à used for producing data à yield 'Python'

d e
c a
A
à used for receiving data à a = yield (technically this produces None)

y te
Be careful mixing the two usages in your code

th B
à difficult to understand
M a
à sometimes useful
t ©
i g h
yr
à often not needed

o p
C
Example
def running_averager():
total = 0
count = 0
m y
running_average = None

d e
while True:
value = yield running_average
c a
total += value
count += 1
te A
running_average = total / count
B y
averager = running_averager()
a th
next(averager) à primed
© M
à None has been yielded

h t
g
averager.send(10)

i
à value received 10

p yr à continues running until next yield

C o à yields running_average à 10
à suspended and waiting

averager.send(30) à value received 30 à eventually yields 20


m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Consider this generator function…

def read_file(f_name):
m y
f = open(f_name)

d e
a
try:
for row in f:
yield row à yield from f
Ac
finally:

y te
B
f.close()
Suppose the file has 100 rows

a th
rows = read_file('test.txt')
for _ in range(10):

© Mnext(rows)

à read 10 rows
h t
à file is still open
yr i g
o p
à how do we now close the file without iterating through the entire file?

C
Closing a generator

We have seen the possible generator states


y
created, running, suspended, closed

m
We can close a generator by calling its close() method
d e
c a
def read_file(f_name):
f = open(f_name)
te A
try:
B y
for row in f:
yield row
a th
finally:
f.close()
© M
h t
g
rows = read_file('test.txt')

yr
for _ in range(10):
next(rows) i
o p
C
rows.close() à finally block runs, and file is closed

why did it jump to finally? Did an exception occur?


Behind the scenes…

When .close() is called, an exception is triggered inside the generator


m y
d e
The exception is a GeneratorExit exception
c a
def gen():
g = gen()
next(g)
te A
try:

B y
g.close() à Generator close called

h
yield 1

t
à Cleanup here…

a
yield 2

M
except GeneratorExit:
print('Generator close called')
finally:
t ©
i g h
print('Cleanup here…')

p yr
C o
Python's expectations when close() is called

• a GeneratorExit exception bubbles up

m
à the exception is silenced by Python
y
• the generator exits cleanly (returns)
e
à to the caller, everything works "normally"

d
• some other exception is raised from
c a
à exception is seen by caller
inside the generator

te A
if the generator "ignores" the
B y
à Python raises a RuntimeError:
GeneratorExit exception and yields
another value
a thgenerator ignored GeneratorExit

© M
t
in other words, don't try to catch and ignore a GeneratorExit exception

h
yr i g
it's perfectly OK not to catch it, and simply let it bubble up

o p
C
def gen(): g = gen()
yield 1 next(g)
yield 2 g.close()
Use in coroutines

Since coroutines are generator functions, it is OK to close a coroutine also


m y
d e
c a
A
For example, you may have a coroutine that receives data to write to a database

y te
h
à coroutine opens a transaction when it is primed (next)

t B
M a
à coroutine receives data to write to the database

t ©
à coroutine commits the transaction when close() is called (GeneratorExit)

i g h
yr
à coroutine aborts (rolls back) transaction if some other exception occurs

o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Sending things to coroutines

.send(data) à sends data to coroutine


m y
d e
.close()

c a
à sends (throws) a GeneratorExit exception to coroutine

te A
we can also "send" any exception to the coroutine

B yà throwing an exception to the coroutine

.throw(exception)
a th
© M
à the exception is raised at the point where the coroutine is suspended

h t
yr i g
o p
C
How throw() is handled

m y
à generator does not catch the exception (does nothing)

d e
à exception propagates to caller
c a
te A
B y
th
à generator catches the exception, and does something

a
M
à yields a value
à exits (returns)

t ©
i g h
à raises a different exception

p yr
C o
Catch and yield

m y
à generator catches the exception

d e
à handles and silences the exception
c a
à yields a value à generator is now SUSPENDED

te A
B y
th
à yielded value is the return value of the .throw() method

a
© M
t
def gen():
while True:

i g h
yr
try: None has been yielded
received = yield

o p
print(received)

C
except ValueError:
print('silencing ValueError')
Catch and exit

à generator catches the exception


m y
d e
a
à generator exits (returns)

Ac
te
à caller receives a StopIteration exception à generator is now CLOSED

B y
this is the same as calling next() or send() to a generator that returns instead of yielding

a th
can think of throw() as same thing as send(), but causes an exception to be sent
instead of plain data

© M
def gen():
h t
while True:
try:

yr i g
o p
received = yield
print(received)

Cexcept ValueError:
print('silencing ValueError')
return None StopIteration
is seen by caller
Catch and raise different exception

à generator catches the exception


m y
d e
a
à generator handles exception and raises another exception

Ac
te
à new exception propagates to caller à generator is now CLOSED

B y
def gen():
while True:
a th
try:
received = yield
© M
print(received)

h t
g
except ValueError:

yr i
print('silencing ValueError')

p
raise CustomException CustomException

C o is seen by caller
close() vs throw()

close() à GeneratorExit exception is raised inside generator


m y
d e
c a
A
can we just call? gen.throw(GeneratorExit())

y te
B
yes, but…

a th
with close(), Python expects the GeneratorExit, or StopIteration exceptions to propagate,
and silences it for the caller

© M
h t
if we use throw() instead, the GeneratorExit exception is raised inside the caller context (if
the generator lets it)

yr i g
try:
o p
C
gen.throw(GeneratorExit())
except GeneratorExit:
pass
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
We always to prime a coroutine before using it

m y
e
à very repetitive

à pattern is the same every time


c a d
te A
g = gen() à creates coroutine instance

B y
next(g)
a th
à primes the coroutine
or g.send(None)

© M
h t
yr i g
This is a perfect example of using a decorator to do this work for us!

o p
C
Creating a function to auto prime coroutines

m y
e
def prime(gen_fn):
g = gen_fn()
a
creates the generator

c d
next(g)

te A
primes the generator

y
returns the primed generator
return g

th B
def echo():
while True:
M a
received = yield

t ©
h
print(received)

yr i g
o p
echo_gen = prime(echo)

C
echo_gen.send('hello') à 'hello'
A decorator approach

m y
We still have to remember to call the prime function for our echo coroutine before we can use it

d e
Since echo is a coroutine, we know we always have to prime it first

c a
te A
So let's write a decorator that will replace our generator function with another function

y
that will automatically prime it when we create an instance of it

B
def coroutine(gen_fn):
a th
def prime():
g = gen_fn()
© M
next(g)
h t
return g
return prime
yr i g @coroutine

o p def echo():
while True:

C received = yield
print(received)
Understanding how the decorator works

def coroutine(gen_fn):
m y
def prime(): def echo():

d e
a
g = gen_fn() while True:
next(g)
return g
received = yield
print(received)
Ac
return prime

y te
B
echo = coroutine(echo) [same effect as using @coroutine]

a th
M
à echo function is now actually the prime function

à prime is a closure
t ©
à free variable gen_fn is echo

i g h
calling echo()

p yr
C o
à calls prime() with gen_fn = echo g = echo()
next(g)
return g
Expanding the decorator

def coroutine(gen_fn):
m y
def prime():

d e
a
g = gen_fn()

c
à cannot pass arguments to the generator function
next(g)
return g
return prime
te A
B y
a th
def coroutine(gen_fn):
© M
t
def prime(*args, **kwargs):

h
g
g = gen_fn(*args, **kwargs)
next(g)

yr i
p
return g

o
return prime

C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Recall…

def subgen():
m y
for i in range(10):

d e
a
yield i

Ac
te
We could consume the data from subgen in another def delegator():
generator this way:

B y for value in subgen():

h
yield value

a t
© M
t
Instead of using that loop, we saw we could just write: def delegator():

i g h yield from subgen()

p yr
C o
With either definition we can call it this way: d = delegator()
next(d)
etc…
What is going on exactly?

m y
caller delegator subgen

d e
next(d) yield from subgen
c a
yield value

next next
te A
B y
yield value

a th
yield value

© M
2-way communications
h t
yr i g
p
Can we send(), close() and throw() also? Yes!

C o
How does the delegator behave when subgenerator returns?

m y
e
it continues running normally

c a d
A
def delegator(): def subgen():

te
yield from subgen() yield 1

y
yield 'subgen closed' yield 2

th B
d = delegator()

M a
next(d) à 1

t ©
next(d) à 2

i g h
next(d)

p yrà subgen closed

next(d)

C o à StopIteration
Inspecting the subgenerator
from inspect import getgeneratorlocals, getgeneratorstate

m y
e
def delegator(): def subgen(): d = delegator()
a = 100
s = subgen()
yield 1
yield 2
a d
getgeneratorstate(d) à GEN_CREATED

c
A
yield from s getgeneratorlocals(d) à {}

te
yield 'subgen closed'

B y
h
next(d) à 1 getgeneratorstate(d) à GEN_SUSPENDED

a t
getgeneratorlocals(d) à {'a': 100, 's': <gen object>}

s = getgeneratorlocals(d)['s']
© M getgeneratorstate(s) à GEN_SUSPENDED

h t
next(d) à 2

yr i g d à SUSPENDED s à SUSPENDED

o p
next(d) à 'subgen closed' d à SUSPENDED s à CLOSED

C
next(d) à StopIteration d à CLOSED s à CLOSED
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
yield from and send()

yield from establishes a 2-way communication channel


m y
d e
between caller and subgenerator

c a
via a delegator à yield from

te A
B y
caller delegator
a th
M
subgenerator
next next

t ©
yield

i g h yield

p yr
send send

C o
Priming the subgenerator coroutine

We know that before we can send() to a coroutine, we have to prime it first


y
à next(coroutine)

m
d e
a
How does this work with yield from?

def delegator(): def coro():


Ac
yield from coro()
te
while True:

y
B
received = yield

a th print(received)

d = delegator()

© M
h t
before we can send to d we have to prime it next(d)

yr
What about coro()? i g
o p
C
yield from will automatically prime the coroutine when necessary
Sending data to the subgenerator

m y
e
Once the delegator has been primed

data can be sent to it using send()


c a d
te A
def delegator(): def coro():

B y
yield from coro()

t
while True:

a h
received = yield

M
print(received)

d = delegator()
t ©
i g h
yr
next(d)

o p
d.send('python') à python is printed by coroutine

C
Control Flow

m y
caller delegator subgenerator

d e
next yield from coro()
c a
yield
print('next line of code')

te A
B y
a th
© M
t
delegator is "stuck" here until subgenerator closes

i g h
yr
then it resumes running the rest of the code

o p
C
Multiple Delegators à pipeline

m y
e
def coro(): def gen1():

yield
yield from gen2()

c a d
A
… def gen2():

te
yield from coro()

B y
d = gen1()

a th
© M
caller gen1

h t gen2 coro

yr i g
o p
this can even be recursive

C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Closing the subgenerator

m y
def delegator(): def subgen():

d e
a
… …
yield from subgen()

yield

Ac
y te
th B
next
M a
delegator code is effectively "paused" here as long as subgen is not closed

t ©
when subgen closes

i g h
p yr
delegator resumes running exactly where it was paused

C o
Closing the delegator

m y
def delegator(): def subgen():

d e
a
… …
yield from subgen()

yield

Ac
y te
th B
next
M a
t ©
d = delegator()

i g h
yr
d.close() à closes the subgenerator

o p
C
à immediately closes the delegator as well
Returning from a generator

m y
A generator can return

d e
c a
à StopIteration

te A
B y
th
The returned value is embedded in the StopIteration exception

a
© M
t
à we can extract that value try:

i g h next(g)
except StopIteration as ex:

p yr print(ex.value)

C o
à so can Python!
Returning from a subgenerator

m y
yield from is an expression

d e
c a
It evaluates to the returned value of the subgenerator

te A
B y
result = yield from subgen()
a th
def subgen():

M

yield

t © …

i g h return result

p yr
C o
Returning from a subgenerator

def delegator():
m
def subgen(): y

d

e
result = yield from subgen()

c a yield

te A return result

B y
a th delegator receives return value

M
next and continues running normally

t ©
i g h
yr
yield from à establishes conduit

o p
subgenerator returns à conduit is closed

C à yield from evaluates to the returned value


à delegator resumes running
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Throwing Exceptions

m y
We can throw exceptions into a generator using the throw() method

d e
c a
à works with delegation as well

te A
B y
def delegator():
a th def subgen():

M
yield from subgen() …
yield

t © …
d = delegator()

i g h
p yr
o
d.throw(Exc)

C
Delegator does not intercept the exception à just forwards it to subgenerator

Subgenerator can then handle exception (or not)


Exception Propogation

caller
m y
d e
c a
delegator

te A
y
may handle: silence or propagate up (same or different exception)

th B
M a
subgen

t ©
h
may handle: silence or propagate up (same or different exception)

yr i g
o p exception

C
throw something else
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Data pipelines (Pulling)

data consumer
pull pull pull
m y
e
filter transform data source
(sink)

c a d
(producer)

We've seen this before

te A
à use yield and iteration to pull data through the pipeline

B y
consumer filter_data

a th parse_data read_data

M
iterate filter_data() iterate parse_data() iterate read_data() yield row
write data to file

t ©
yield select rows only transform data

i g h yield row

yr
pull pull pull

o p
C
Data pipelines (Pushing)

With coroutines, we can also push (send) data through a pipeline


m y
d e
push push
c a
push
data source
(producer)
transformer filter

te A consumer
(sink)

B y
a th
M
Example

generate integers
push

t ©
square number
push
filter odds only
push
log results

i g h
p yr
C o
Can get crazier…

m y
broadcasting

d e
c a
te A
filter …

B y
source transformer
th
broadcaster

a
transformer …

© M
h t filter …

yr i g
p
pushes data through the pipeline

o
C

You might also like