0% found this document useful (0 votes)
4 views192 pages

Python+Deep+Dive+3

This document outlines a Python course focused on advanced topics, particularly associative arrays and hash maps, aimed at developing expertise in Python programming. It includes prerequisites for participants, course materials, and various data structures such as dictionaries and sets, along with their operations and implementations. The course emphasizes practical exercises and theoretical understanding of Python's standard library and data structures.

Uploaded by

ancaneo21
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
4 views192 pages

Python+Deep+Dive+3

This document outlines a Python course focused on advanced topics, particularly associative arrays and hash maps, aimed at developing expertise in Python programming. It includes prerequisites for participants, course materials, and various data structures such as dictionaries and sets, along with their operations and implementations. The course emphasizes practical exercises and theoretical understanding of Python's standard library and data structures.

Uploaded by

ancaneo21
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 192

m y

d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
What this course is about
m y
the Python language
d e
c a
à canonical CPython 3.6+ implementation

A
the standard library

becoming an expert Python developer


y te
idiomatic Python
th B
M a
obtaining a deeper understanding of the Python language

t ©
i g h and the standard library

p yr
C o
this is NOT an introductory course

à refer to prerequisites video or course description


Included Course Materials
m y
d e
a
lecture videos

coding videos
Ac
y te
Jupyter notebooks

th B
projects, exercises and solutions

M a
t ©
github repository for all code

i g h
yr
https://github.com/fbaptiste/python-deepdive

o p
C
Associative Arrays
m y
d e
a
what are associative arrays?

Ac
te
one concrete implementation à hash maps or hash tables

B y
h
how do hash maps work?

a t
M
what are hash functions?

©
h t
Python 3.5+ specific hash map implementation

yr i g
o p
C
modules namespaces

m y
d e
classes

c a
te A sets

dictionaries
B y
h
instances

a t
© M multi-sets

h t
g
JSON

yr i
o p
C YAML
Relational
Databases
Dictionaries
m y
d e
a
creating

Ac
te
manipulating

B y
h
updating, merging and copying

a t
M
keys, values and items Views

©
h t
custom classes and hashing à use instances as keys

yr i g
o p
C
Sets
m y
d e
hash maps

c a
set operations

te A
B y
h
copying, merging and updating sets

a t
FrozenSets

© M
h t
g
Dictionary views à keys, items

yr i
o p
C
Serializing and Deserializing
m y
d e
pickling

c a
JSON serialization and deserialization
te A
B y
th
use and customize Python's JSONEncoder and JSONDecoder classes

a
the need for JSON schemas
© M
h t
yr i g
3rd party libraries à JSONSchema, Marshmallow, PyYaml, Serpy

o p
C
Specialized Hash Maps
m y
d e
a
defaultdict

Ac
te
OrderedDict

B y
Counter à multi-set

a th
ChainMap
© M
h t
yr i g
o p
C
Custom Dictionary Types
m y
d e
a
using class inheritance to create customized dictionary types

c
te A
y
inheriting from dict

th B
inheriting from UserDict

M a
t ©
i g h
p yr
C o
Exercises
m y
d e
a
exercises after each section

Ac
te
should attempt these yourself first – practice makes perfect!

B y
th
solution videos and notebooks provided

a
M
à my approach

©
h t à more than one approach possible

yr i g
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Python 3: Deep Dive (Part 3) - Prerequisites

m y
This course assumes that you have in-depth knowledge functional programming in Python:

d e
functions and function arguments lambdas
c a
packing and unpacking iterables my_func(*my_list)

te A
B y
f, *_, l = (1, 2, 3, 4, 5)

nested scopes free variables


a th
decorators

M
closures

Boolean truth values


t ©
named tuples == vs is

i g h
yr
zip map sorted any all chain

o p
C
@lru_cache @singledispatch @wraps
Python 3: Deep Dive (Part 3) - Prerequisites

This course assumes that you have in-depth knowledge of:


m y
d e
sequences iterables iterators

c a
list comprehensions
te A
generators and generator expressions
B y
a th
M
context managers

importing modules and symbols


t ©
i g h
p yr
C o
Python 3: Deep Dive (Part 3) - Prerequisites

You should have a basic understanding of creating and using classes in Python
m y
d e
class Person:
c a
A
def __init__(self, name, age):

te
self.name = name

y
self.age = age

@property
th B
def age(self):
return self._age
M a
t ©
h
@age.setter

i g
def age(self, age):

yr
if value <= 0:

p
raise ValueError('Age must be greater than 0')

C oelse:
self._age = age
Python 3: Deep Dive (Part 3) - Prerequisites

y
You should understand how special functionality is implemented in Python using special methods
class Point:

e m
def __init__(self, x, y):
self.x = x

c a d
A
self.y = y

def __repr__(self):

y te
B
return f'Point(x={self.x}, y={self.y})'

def __eq__(self, other):


if not isinstance(other, Point):
a th
return False
else:
© M
t
return self.x == other.x and self.y == other.y

h
yr i g
def __gt__(self, other):
if not isinstance(other, Point):

o p
return NotImplemented

C
else:
return self.x ** 2 + self.y ** 2 > other.x**2 + other.y**2

def __add__(self, other):



Python 3: Deep Dive (Part 3) - Prerequisites

You should also have a good grasp of:


m y
d e
c a
A
exception handling try…except…else…finally…

Various basic types


y te
int, float, Decimal, Fraction, complex

th B
creating and using simple dictionaries

M a
d = {'a': 1, 'b': 2}
d['a']

t © d['a'] = 1

i g h
p yr
strings and string formatting f'result: {result}'

C o 'result: {result}'.format(result=result)

and other basic Python language items: loops, conditionals, etc


Python 3: Deep Dive (Part 3) - Prerequisites

m y
I will use a limited number of 3rd party libraries in this course
d e
c a
You will need to know how to install 3rd party Python libraries

te A
y
pip install marshmallow

B
a th
Most code examples are provided using Jupyter Notebooks

© M
Freely available
t
https://jupyter.org/

h
GitHub and git
yr i g
à recommended but not required

o p
Chttps://github.com/fbaptiste/python-deepdive
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
This section is going to primarily theoretical in nature
m y
d e
what are dictionaries (aka associative arrays)
c a
à abstract data structure

te A
there are many ways to implement dictionaries
B y
a th
M
specifically we'll look at how they can be implemented using hash tables (aka hash maps)

©
h t
i g
This is not a data structure course, so we're not going to look at all the intricacies

yr
o p
just enough to get a rough understanding

C
y
Why bother?

e m
Dictionaries are everywhere in Python

c a d
modules

te A
classes
B y
objects (class instances)
a th
scopes
© M
h t
sets

yr i g
p
your own dictionaries

o
C
It is arguably one of the most important data structures in Python
If you're really not into theory…

m y
d e
or you already understand associative arrays, hash functions, hash maps, etc…

c a
Skip this section!
te A
B y
a th
à maybe just check out the videos on Python's (3.6+) implementation of hash maps

à key-sharing dictionaries
© M
Mark Shannon PEP 412

h t
yr i g
o p
à compact dictionaries Raymond Hettinger

C https://mail.python.org/pipermail/python-dev/2012-December/123028.html
really the main points that come out of this section:

m y
d e
a
à dictionary keys must be hashable

Ac
te
à dictionary key order is maintained (in order of insertion)

B y
a th
© M
h t
yr i g
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
What is an associative array?

Person objects
m y
d e
persons = [John, Eric, Michael, Graham]
c a
0 1 2 3

te A
B y
th
We can think of the indices as a key for the items in the list

a
0 à John

M
1 à Eric

t © 2 à Michael

i g h 3 à Graham

p yr
So when we want to get hold of the Michael object, we just need to remember the key

C o
persons[2] à Michael

But remembering a number while we write our code???


there has to be a better way…

persons = [('john', John),


m y
('eric', Eric),

d e
a
('michael', Michael),
('graham', Graham)]

Ac
we have associated a string with an object
y te
(key, object)

th B
to get the Michael object:
M a
©
à lookup the key 'michael' and return the associated value

t
i g h
yr
scan the persons list until we find a tuple with first element = key

p
return the second element of the tuple

o
C
At least we don't have to remember a number anymore!
But there really has to be a better way…

Consider our associative array: persons = [('john', John), ('eric', Eric),


m y
d e
('michael', Michael), ('graham', Graham)]

c a
A
And let's break it up:

keys = ['john', 'eric', 'michael', 'graham']


y te
Notice how the index of 'john'
persons = [John, Eric, Michael,
h
Graham]

t B matches up with the index of John,


and so on

M a
©
What if we could define a function h that would return these results - always:

t
h('john') à 0
i g h
h('eric') à 1 h('michael') à 2 h(graham') à 3

p yr
o
To get Michael, we would first call h('michael') à2 then persons[2]

C
persons[h('michael')] à Michael
Associative Arrays

An associative array is an abstract data structure


m y
d e
a
that associates keys (keys are unique) to values

Ac
abstractly we can think of it as a collection of (key, value) pairs

y te
Sometimes also called: maps dictionaries

th B
M
Can be implemented in different concrete ways a
t ©
They support:
h
à adding/removing elements

i g
yr
à modifying an associated value

o p à looking up a value via its key

C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Hash Maps (aka Hash Table)

One common concrete implementation of an associative array (aka dictionary) is a hash map

m y
Suppose we have an array of 7 slots, initially containing nothing
d e
c a
0
1 A
Now suppose we are going to want to store these maps

te
2
'john' à John

B y
3 'eric' à Eric

a th
4
5 M
'michael' à Michael

©
t
'graham' à Graham
6

i g h
p yr
We'll define a function that will return an integer value for all these strings ('john', 'eric', etc)

C o
à that will be unique for each of these strings
à is between 0 and 6
à always returns the same integer for the same string (deterministic)
Hash Tables
0 Michael

y
'john' à 2 'john'

m
1

h(s)
'eric' à 4 'eric'
2 John
d e
'michael' à 0 'michael' 3
c a
'graham' à 5
'graham'
4

te A Eric

y
5 Graham

Storing a key/value pair:


th B 6

à calculate h(key) à idx


M a
à store value in slot idx
t ©
i g h
yr
Looking up a value by key:

p
C o
à calculate h(key) à idx
à return value in slot idx
Hash Functions

Creating the function h(key) when we know all the possible keys ahead of time is easy
m y
d e
a
Reality check: most of the time we don't know all the possible keys ahead of time

Ac
te
In reality, creating such a function is hard

B y
t
Bounding the returned index value is not difficult

a h à modulo

© M x % 7 à 0, 1, 2, 3, 4, 5, 6

t
Ensuring uniqueness is hard

i g h
yr
how to ensure that h(k1) != h(k2) if k1 != k2

o p
C
maybe we don't need to…
Hash Functions

m
! = # ⇒ %(!) = %(#) y
e
A hash function is a function (in the mathematical sense)

c
that maps from a set (domain) of arbitrary size (possibly infinite)
a d (deterministic)

to another (smaller) set of fixed size (range)

te A
ℎ: * → , where - , < -(*)
B y
a th
For our hash tables, we'll also want:

© M
h t
à the range to be a defined subset of the non-negative integers 0, 1, 2, 3, …

yr i g
à the generated indices for expected input values to be uniformly distributed (as much as possible)

o p
Cℎ /
Note that we do allow getting the same output for different keys

i.e. ! = ℎ /" ⇏ /! = /"


Example

def h(key, num_slots):


m y
return len(key) % num_slots

d e
c a
h('alexander', 11) à 9

te A
h('alexander', 5) à 4

h('john', 11) à 4
B y
h('john', 5) à 4 collision

h('eric', 11) à 4
collision

a th
h('eric', 5) à 4

© M
t
h('michael', 11) à 7 h('michael', 5) à 2

i g h
yr
h('graham', 11) à 6 h('graham', 5) à 1

o p
C
ord('A') à 65
Example
ord('B') à 66
def h(key, num_slots):
total = sum(ord(c) for c in key)

m y
return total % num_slots
d eord('Z') à 90

c a …

A
h('alexander', 11) à 948 % 11 = 2 ord('a') à 97

te
ord('b') à 98
h('john', 11) à 431 % 11 = 2
h('eric', 11) à 419 % 11 = 1
B y …

h('michael', 11) à 723 % 11 = 8


a th ord('z') à 122

h('graham', 11) à 625 % 11 = 8

© M
h t
g
h('alexander', 5) à 948 % 5 = 3

yr i
h('john', 5) à 431 % 5 = 1
All these hash functions have collisions…

o p
h('eric', 5) à 419 % 5 = 4

C
h('michael', 5) à 723 % 5 = 3
h('graham', 5) à 625 % 5 = 4
Dealing with Collisions

chaining h('alexander', 5) à 948 % 5 = 3


m y
h('john', 5) à 431 % 5 = 1
d e
h('eric', 5) à 419 % 5 = 4
c a
h('michael', 5) à 723 % 5 = 3
te A
h('graham', 5) à 625 % 5 = 4
B y
a th
0
© M
1 ['john', John]
h t
2

yr i g
3
p
['alexander', Alexander] ['michael', Michael]

o
C
4 ['eric', Eric] [graham', Graham]
Dealing with Collisions
Probe Sequence

probing (linear) h('alexander', 5) à 948 % 5 = 3 3 à 4 à 0 à 1 à 2


m y
h('john', 5) à 431 % 5 = 1
d e
1 à 2 à 3 à 4 à 0
h('eric', 5) à 419 % 5 = 4
c a
4 à 0 à 1 à 2 à 3
h('michael', 5) à 723 % 5 = 3
te A3 à 4 à 0 à 1 à 2
h('graham', 5) à 625 % 5 = 4
B y 4 à 0 à 1 à 2 à 3

a th
0 ['michael', Michael]

© M
1 ['john', John]

h t
2
3 i
['graham', Graham]

yr g
['alexander', Alexander]
4
o p
['eric', Eric]

C
other types of probing
à must generate the same sequence of valid indices for any given key
Probe Sequence
Fetching Elements

y
h('alexander', 5) à 948 % 5 = 3 3 à 4 à 0 à 1 à 2
0 ['michael', Michael]
1 ['john', John] h('john', 5) à 431 % 5 = 1
e m
1 à 2 à 3 à 4 à 0
2 ['graham', Graham] h('eric', 5) à 419 % 5 = 4

c a d
4 à 0 à 1 à 2 à 3

A
3 ['alexander', Alexander] h('michael', 5) à 723 % 5 = 3 3 à 4 à 0 à 1 à 2

te
4 ['eric', Eric]

y
h('graham', 5) à 625 % 5 = 4 4 à 0 à 1 à 2 à 3

th B
find 'alexander' à hash = 3
is 'alexander' at index 3? à yes
M a
à probe sequence: 3 à 4 à 0 à 1 à 2
à return item

t ©
h
find 'michael' à hash = 3 à probe sequence: 3 à 4 à 0 à 1 à 2

yr i g
is 'michael' at index 3? à no

p
is 'michael' at index 4? à no

o
C
is 'michael' at index 0? à yes à return item

à this is why the hash of a key should not change over it's lifetime
à in reality more complex than this, but this is the basic idea
Sizing Issues

When we create a hash table, how big should it be?


m y
d e
We don't know how big it will become
c a
te A
y
we can't make it arbitrarily large à memory constraints

th B
à start small, and grow it over time as needed

M a
t ©
à resizing is expensive

i g h à recompute hashes

yr
à move data around

o p
C
à over allocate (create more slots than necessary)

à algorithms exist to optimize the cost of doing this


Other Issues

m y
what happens when items are deleted

d e
c a
A
à this can affect probing algorithm

à compacting the table when items are deleted


y te
th B
choice of hash function

M a
gets complicated
t ©
i g h
yr
beyond the needs of this course

o p
C
à https://en.wikipedia.org/wiki/Hash_table
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Python Dictionaries are ubiquitous

Python dictionaries are everywhere you look!

m y
d e
namespaces

c a
classes

te A
modules

B y
functions
a th
sets

© M
and of course, your own dicts
h t
yr i g
Dictionaries are such an important part of Python that a lot of time and effort was put into

p
making them as efficient as possible

o
C
key sharing compact dictionaries
class Person:
Key Sharing PEP 412 def __init__(self, name, age):

john
self.name = name
self.age = age
m y
john = Person('John', 78) ['name', 'John']
d e
['age', 78]
c a
eric = Person('Eric', 75)
te A
eric
['name', 'Eric']

B y ['age', 75]

michael = Person('Michael', 75)


michael
a th
© M ['name', 'Michael']
['age', 75]

h t
yr i g
à multiple instances of the same class à instance attribute names are the same

o p john eric michael


'name'
'age'
C ['John', 'Eric', 'Michael']
[78, 75, 75]
à split-table dictionary
Compact Dictionaries hash('alex') à 3 (simplified – in
hash('john') à 1 reality we may
{'alex': Alex, 'john': John, 'eric': Eric}
hash('eric') à 6
y
have collisions!)

m
d e
a
0 wasted space
1 ['john', John] [ ['—', '—', '—'],
Ac
te
2 1 [-6350376054362344353, 'john', John],

y
['—', '—', '—'], key order

B
3 ['alex', Alex]
3 [4939205761874899982, 'alex, Alex],

h
different from

t
4 ['—', '—', '—'], insertion order
5
6 ['eric', Eric]
M a
['—', '—', '—'],
6 [6629767757277097963, 'eric', Eric]
]

t ©
g h
values = [[4939205761874899982, 'alex, Alex],

i
key order

yr
[-6350376054362344353, 'john', John], same as

p
[6629767757277097963, 'eric', Eric]] insertion order

C o 1 3 6
indices = [None, 1, None, 0, None, None, 2]
m y
d e
c a
te A
B y
th
hash()
a
© M
h t
yr i g
o p
C
Python hash()

built-in function: hash() à always returns an int

m y
d e
à if a == b is True, then hash(a) == hash(b) is also True

c a
A
à Python truncates hashes to some fixed size

te
(sys.hash_info.width)
à me = 64-bits

B y
h
map(hash, (1, 2, 3, 4)) à 1, 2, 3, 4

a t
map(hash, (1.1, 2.2, 3.3, 4.4))

© M à 1152921504606846977, 1152921504606846978,

t
1152921504606846979, 1152921504606846980

i g h
yr
map(hash, ('hello', 'python', '!')) à 2558804294780988881, 1235297897608439440,

p
-8029463035455593707

C o
hash((1, 'a', 10.5)) à -5053599863580733767
Python hash()

m y
hash([1, 2]) à TypeError: unhashable type

d e
c a mutable

A
hash({'a', 'b'}) à TypeError: unhashable type

y te
hash((1, 2))
th B
à 3713081631934410656

M a
hash(frozenset({'a', 'b'})) à 4261914069630221614
immutable

t ©
i g h
p yr
o
hash((1, 2, [3, 4])) à TypeError: unhashable type

C
Why?

hash values à used for hash tables (dictionaries) à position index


m y
d e
a = (1, 2, 3)
c a
d = {a: 'this key is a tuple – immutable'}
te A
hash(a) never changes since a is immutable

B y d[a] à looks for a at same index

a = [1, 2, 3]
a th
M
d = {a: 'this key is a list – mutable'}

©
a.append(4)
h t
à same object

yr i gà hash has changed

o p à looking for a at wrong index!! à d[a] ???

C
Caveat

built-in function: hash() à always returns an int

m y
e
à if a == b is True, then hash(a) == hash(b) is also True

d
c a
à Python truncates hashes to some fixed size

te A
# mod1.py run 1:
B y
1235297897608439440
print(hash('python'))
print(hash('python')) th
1235297897608439440

a
© M
run 2: -5750637952798290655
-5750637952798290655

h t
yr i g
hash values for objects that compare equal remain equal during program run

o p
but they can change from run to run à strings, bytes and datetime

C
à never rely on a hash value being the same from one program run to another
à although may be ok sometimes, ex: integers
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
creating dictionaries à literals, dict(), comprehensions, and more…

m y
d e
common operations
c a
à membership tests, retrieving, adding, removing elements…

te A
updating
B y
à update, packing/unpacking, copy, deepcopy

a th
dictionary views
© M
à keys, items, values and iteration

h t
yr i g
custom classes as keys à default hash, custom hashing

o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Dictionary Elements

m y
basic structure of dictionary elements: key : value

d e
c a
value à any Python object integer
A
custom class or instance

te
B y
function module any Python object…

key à any hashable object


a th
© M
t
not all objects are hashable strings are hashable

i g h lists are never hashable

p yr
àhash tables require hash of an object to be constant (for the life of the program)

C o
roughly: immutable objects are hashable
mutable objects are not hashable
more subtle than that…
Hashable Objects

m y
e
Python function: hash(obj) à some integer (truncated based on Python build: 32-bit, 64-bit)
sys.hash_info.width

c a d
à Exception
A
TypeError: unhashable type

te
à int, float, complex, binary, Decimal, Fraction, …
y
à immutable

B
à hashable

à strings à immutable collection


a th
à hashable

à frozenset à immutable collection

© M à elements are required to be hashable à hashable

h t
à tuples à immutable collection à hashable only if all elements are also hashable

à set, dictionary
yr i g
à mutable collections à not hashable

à list
o p
à mutable collection à not hashable

C
à functions à immutable à hashable

à custom classes and objects à maybe


Requirements

If an object is hashable:
m y
d e
à the hash of the object must be an integer value

c a
A
à if two objects compare equal (==), the hashes must also be equal

te
B y
Important:

a th
two objects that do not compare equal may still have the same hash

© M (hash collision)

t
à more hash collisions à slower dictionaries

i g h
yr
later à creating our own custom hashes

p
o
à we will also need to conform to these rules

C
Creating Dictionaries: Literals

This is a very common way of creating dictionaries


m y
d e
{ key1: value1,
c a
key2: value2,
key3: value3 }
te A
B y
a th
any hashable object
© M
any object

h t
{'john':

yr i g
['John', 'Cleese', 78],

p
(0, 0): 'origin',

o
'repr': lambda x: x ** 2,

}
C
'eric': {'name': 'Eric Idle',
'age': 75}
Creating Dictionaries: Constructor

dict(key1=value1, key2=value2, key3=value3)

m y
d e
must be a valid identifier name
c a
(think variable, function, class name, etc)

te A
any object

dictionary key will then be a string of that name

B y
a th
{'john':
(0, 0):
['John', 'Cleese', 78],
'origin',
© M dict(john=['John', 'Cleese', 78],
repr=lambda x: x ** 2,
'repr': lambda x: x ** 2,

h t eric={'name': 'Eric Idle',

g
'eric': {'name': 'Eric Idle', 'age': 75},

}
yr
'age': 75}
i twin=dict(name='Eric Idle', age=75)

p
)

C o
Creating Dictionaries: Comprehensions

Just like we can build lists using list comprehensions


m y
d e
a
or generators using generator expressions (comprehension syntax)

à build dictionaries using dictionary comprehensions


Ac
y te
à same basic syntax à enclosed in {}

th B
a
à elements must be specified as key: value

© M (if not, you'll be creating a set!)

h t
{str(i): i ** 2 for i in range(1, 5)} à {'1': 1, '2': 4, '3': 9, '4': 16}

{str(i): i ** 2
yr i g à {'2': 4, '4': 16}

o p
for i in range(1, 5)

C
if i % 2 == 0}
Soapbox!
d = {i: i** 2 for i in range(1, n)}
vs

m y
e
d = {}
for i in range(1, n):
d[i] = i ** 2
c a d
te A
B y
h
But when things get more complex…

d = {}
a t
url = 'http://localhost/user/{id}'
for i in range(n):
© M
h t
response = requests.get(url.format(id=i))

i g
user_json = response.json()

yr
user_age = int(user_json['age'])

o p
if user_age >= 18:
user_name = user_json['fullName'].strip()

C user_ssn = user_json['ssn']
d[user_ssn] = user_name
Creating Dictionaries: fromkeys()

à class method on dict

m y
à creates a dictionary with specified keys all assigned the same value

d e
c a
A
d = dict.fromkeys(iterable, value)

y te
any iterable
h B
all set to same value

t
contains the keys

M a
optional à None if not provided

©
hashable elements

h t
yr i g
d = dict.fromkeys(['a', (0,0), 100], 'N/A')

o p à {'a': 'N/A', (0,0): 'N/A', 100: 'N/A'}

C
d = dict.fromkeys((i**2 for i in range(1, 5)), False)
à {1: False, 4: False, 9: False, 16: False}
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Basic Operations

m y
d[key] = value à creates key if it does not exist already

d e
à assigns value to key

c a
d[key]
te A
à as an expression returns the value for specified key

B y
h
à exception KeyError if key is not found

a t
M
sometimes want to avoid this KeyError exception, and return a default value if key is not found

©
h t
g
d.get(key) à returns value if key is found, None if key is not found

yr i
p
d.get(key, default) à returns value if key is found, default if key is not found

C o
Basic Operations

membership testing à test if a key is present in the dictionary or not


m y
d e
key in d à True if key is in d, False if it is not
c a
key not in d à True if key is not in d, False if it is

te A
B y
a th
M
number of items in dictionary

len(d)
t ©
i g h
p yr
clearing out all items

C o
d.clear() à d is now empty
Removing Elements from a Dictionary

m y
e
del d[key] à removes element with that key from d
à exception KeyError if key is not in d

c a d
d.pop(key) à removes element with that key from d
te A
à and returns the corresponding value
B y
a th
à exception KeyError if key is not in d

© M
t
sometimes we want to avoid this KeyError exception

i g h
yr
d.pop(key, default) à removes element with that key from d

o p à and returns the corresponding value

C à returns default is key was not found


Another way to remove items…

Python 3.6+ à dictionary remains ordered in order of insertion


m y
d e
c a
d.popitem() à removes an item from d
à returns tuple (key, value)
te A
à KeyError if dictionary is empty
B y
a th
prior Python 3.6
M
à removes some item – no guarantee which one

©
>= Python 3.6
h t
à removes last item – guaranteed

yr i g à last item is the last item inserted

o p
C
last inserted à popped first
Last In First Out à LIFO
à works like a stack
Inserting keys with defaults

sometimes want to insert a key with a default value only if key does not exist
m y
d e
d = {'a': 1, 'b': 2} if 'c' not in d:

c a
A
d['c'] = 0

te
à combine this with returning the newly inserted (default) value, or existing value if already there

y
def insert_if_not_present(d, key, value):
th B
if key not in d:
d[key] = value
M a
return value
else:
t ©
return d[key]

i g h
instead…
p yr
C o
result = d.setdefault(key, value)
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Dictionary Views PEP 3106
Three ways we may want to view the data in a dictionary

m y
à keys only d.keys()

d e
à values only d.values()
c a
all are iterables
à key/value pairs à (key, value) d.items()

te A
d = {'a': 1, 'b': 2, 'c': 3}
B y
list(d.keys()) à ['a', 'b', 'c']
a th
list(d.values())
M
à [1, 2, 3]

©
list(d.items())

h tà [('a', 1), ('b', 2), ('c', 3)]

yr i g
Important: order of keys and values (and items) are the same

o p
à the position of an item in one view corresponds to the same position in other views

C
à Python 3.6+: in addition, this order is same as dictionary (insertion) order
They’re dynamic…
more to it than just an iterable

m y
these views are dynamic à views reflect any changes in the dictionary
d e
à but views are not updatable
c a
d = {'a': 1, 'b': 2}

te A
keys = d.keys()
values = d.values()
keys à 'a', 'b'
values à 1, 2
B y
items = d.items()
th
items à ('a', 1), ('b', 2)

a
d['a'] = 10
M
keys à 'a', 'b'

©
t
values à 10, 2

h
items à ('a', 10), ('b', 2)

del d['b']
yr i g keys à 'a', 'c'
d['c'] = 3
o p values à 10, 3

C items à ('a', 10), ('c', 3)


More than just iterables…

The keys() view is more than an iterable à behaves like a set


m y
d e
à makes sense: keys are unique and hashable
c a
à required for sets

te A
à union, intersection, difference of these key views – just like sets

B y
The values() view does not behave like a set

a th
M
à in general values are not unique

t ©
à in general values are not hashable

i g h
yr
The items() view may behave like a set

o p
à elements of items() are guaranteed unique (since keys are unique)

C
à if all values are hashable à behaves like a set

à if one or more values unhashable à does not behave like a set


Set operations

We'll come back to sets and dictionary views in a later section


m y
d e
s1 = {'a', 'b', 'c'}
c a
s2 = {'b', 'c', 'd'}

te A
B y
union s1 | s2
th
à {'a', 'b', 'c', 'd'}

a
intersection s1 & s2
M
à {'b', 'c'}

©
h t
g
difference s1 - s2 à {'a'}

yr i
o p
Can manipulate keys() the same way

C
Same for items() if dictionary values are all hashable
Set Operations on Views

m y
e
à dictionaries are now considered ordered (insertion order)
à sets are not ordered

c a d
d1.keys() and d2.keys() are ordered

te A
but d1.keys() | d2.keys() is a set
B y
a th
M
à ordering of result is not guaranteed

©
h t
yr i g
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
The update method

updates one dictionary based on items in something else

m y
à three forms
d e
c a
d1.update(d2)

te A
B y
d1.update(iterable)
th
à iterable must contain iterables with 2 elements each: (key, value)

a
© M
t
d1.update(keyword-args) à argument name will become key

i g h à argument value will become value

yr
(similar to dict(a=10, b=20))

o p
C
d1.update(d2)

d1 and d2 are two dictionaries

m y
à for every (k, v) in d2
d e
à if k not in d1, inserts (k, v) in d1
c a
à if k in d1, updates value for k in d1
te A
B y
d1 = {'a': 1, 'b': 2}

a th
'b' was updated
d2 = {'b': 20, 'c': 30}

© M 'c' was inserted

h t
i g
d1.update(d2) d1 à {'a': 1, 'b': 20, 'c': 30}

p yr
C o
à insertion order is maintained (3.6+)
d1.update(keyword-args)

similar to how keyword arguments are used to create a dictionary


m y
d e
a
à argument names must be valid identifiers

Ac
y te
'b' was updated

th B 'c' was inserted

a
d1 = {'a': 1, 'b': 2}

d1.update(b=20, c=30)
M
d1 à {'a': 1, 'b': 20, 'c': 30}

©
h t
i g
à order of keyword arguments is preserved (3.6+)

yr
p
à insertion order is maintained (3.6+)

C o
d1.update(iterable)

à must be an iterable of iterables containing two elements à key, value

m y
d e
a
(('b', 20), ('c', 30)) (('b', 20), ['c', 30]) [('b', 20), ['c', 30]]

d1 = {'a': 1, 'b': 2}
Ac
d1.update(it)
y te
d1 à {'a': 1, 'b': 20, 'c': 30}

à but also more complex iterables


th B
à even generators

((k, ord(k)) for k in 'bcd')


M a
à 'b': 98, 'c': 99, 'd': 100

t ©
d1 = {'a': 1, 'b': 2}
i g h
p yr
d1.update(((k, ord(k)) for k in 'bcd'))

C o
d1 à {'a': 1, 'b': 98, 'c': 99, 'd': 100}

à insertion order is preserved (3.6+)


Unpacking dictionaries

works similar to unpacking a dictionary into keyword arguments in function calls


m y
d e
a
def func(**kwargs): d = {'a': 1, 'b': 2}
print(kwargs)
func(**d)
Ac
kwargs à {'a': 1, 'b': 2} (argument order

te
preserved 3.6+)
à for function arguments, keys must be valid identifiers

B y
à not for unpacking dictionaries in general

a th
d1 = {'a': 1, 'b': 2}

© M
t
d2 = {'a': 10, (0,0): 'origin'}

i g h
d3 = {'b': 20, 'c': 30, 'a': 100}

p yr
d = {**d1, **d2, **d3}

C o
d à {'a': 100, 'b': 20, (0,0): 'origin', 'c': 30}

à last "update" wins


à insertion order is preserved (3.6+)
Copying Dictionaries

m y
shallow copies container object is a new object

d e
a
copied container element keys/values are shared references with original object

c
te A
y
d_copy = d.copy()

d_copy = {**d}

th B
d_copy = dict(d)

M a
©
d_copy = {k: v for k, v in d.items()}

t
(slower, don't use for a simple copy)

i g h
yr
à all these methods result in shallow copies

o p
C
à dictionaries are independent dictionaries
(inserts, deletes are independent)
à but the keys and values are shared references
Deep Copies

m y
e
If a shallow copy is not sufficient, we can create deep copies of dictionaries

à no shared references
c a d
te A
y
à even with nested dictionaries

th B
can do it ourselves
a
à sometimes requires recursion, have to be careful with circular references

M
t ©
this might be needed if we don't want a true deep copy, but only a partial deep copy

i g h
yr
simpler to use copy.deepcopy

o p
C
from copy import deepcopy à works for custom objects, iterables,
dictionaries, etc
d1 = d.deepcopy()
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Quick Review

how Python inserts a key/value item in a dictionary (simplified)


m y
d e
hash(key) à mod dictionary size (allocated)
a
à start index in hash table (sequence of slots)

c
A
à generate probe sequence (sequence of valid indices)

te
y
à iterate over probe sequence à index
à is the slot at that index empty?

th B continue loop until


yes
no
M a
à store the new item there (hash, key, value)
à hash collision
an empty slot is
found to store the

©
item

h t à continue iteration to
look for an empty slot

yr i g
o p
C
more hash collisions à more inefficient
Quick Review

how Python finds a key in a dictionary (simplified)


m y
d e
hash(key) à mod dictionary size (allocated)
a
à start index in hash table (sequence of slots)

c
à generate probe sequence (sequence of valid indices)

te A
y
à loop over probe sequence
little more complex
à is slot empty?

th B because of
deletions
yes
no
M a
à key does not exist in dictionary
à are hashes equal and are keys equal (==)? loop until

t
yes
©
à found the key
found or

h
empty slot

g
no (caused by hash collision upon insertion/resizing)

yr i à continue iteration to find key or empty slot

o p
C
more hash collisions à more inefficient

predictable hashes à subject to DOS attacks


Quick Review

In order for this algorithm to work:


m y
d e
à hash(key) when inserting item
c a
must equal hash(key) when retrieving item

te A
y
otherwise we're starting our search in the wrong place!

B
à probe sequence remains the same
a th
à Python controls that, not us

© M
h t
so hash of key cannot change after storing in dict

yr i g
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
What is a set?

mathematics
m y
A set is a gathering together into a whole of definite, distinct
d e
c
objects of our perception or of our thought -- which are called
a
elements of the set.

te A
- Georg Cantor

B y
à a collection of distinct objects
h
à notice ordering is not mentioned!

a t
set membership
© M
size of set (cardinality)
h t
union

yr i g
p
intersection

o
C
complement
and more…
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
What is a set?

A set is an unordered collection of distinct objects


m y
d e
à there is no particular ordering in a set
c a
à {1, 3, 5} {5, 1, 3}

te A
{3, 5, 1} … are all the same set (equal)

B y
h
à objects are distinct

a t
M
à {1, 1, 3} à not possible – element 1 is repeated

t ©
i
à Python data type: set
g h
p yr
à elements must be hashable

C o
à elements are distinct – they do not compare equal (==)
Membership

If x is an element contained in some set S we write ! ∈#

m y
and say x is an element of S

d e
c a
A
! ∉#
If x is not an element contained in some set S we write

te
and say x is not an element of S

y
à note that these are statements, not questions
th B
M a
à in Python the in operator is a question that returns True or False

t ©
h à! ∈#
x in S à True
x in S

yr i g
à False à! ∉#

o p
à similarly with the not in operator

C x not in S
x not in S
à False
à True
à! ∈#
à! ∉#
Unions and Intersections sets à %! , %"

m
The union of two sets is a set that combines the items from these two sets, keeping only y
a single instance of any repeating elements

d e
%! ∪ %" = ) ) ∈ %! *+ ) ∈ %" } !!
c a !"
à Notice the or
te A
B y
h
à s1 | s2 | … à returns a set
à s1.union(s2, …)
a t
© M
The intersection of two sets is a set that only contains the elements common to both

h t
yr i g
%! ⋂ %" = ) ) ∈ %! ./0 ) ∈ %" }
!! !"
p
à Notice the and

C o
à s1 & s2 & … à returns a set

à s1.intersection(s2, …)
Differences of two sets

m y
The difference of two sets is all the elements of one set without the elements of the other set

d e
%! − %" = ) ) ∈ %! ./0 ) ∉ %" } !! !"
c a
te A
à s1 – s2 - …

B y
à s1.difference(s2, …)

a th
s1 = {1, 2, 3}
© M
s2 = {3, 4, 5}

h t
yr
s1 - s2 à {1, 2}
i g
o p
s2 - s1 à {4, 5}

C
in general: s1 – s2 ≠ s2 – s1
Symmetric Difference of two Sets

m y
The symmetric difference of two sets is the union of both sets without the intersection of both sets

d e
%! ∆ %" = %! ∪ %" − (%! ∩ %" )
!! !" c a
te A
B y
h
à s1 ^ s2

a t
à s1.symmetric_difference(s2)

© M
s1 = {1, 2, 3, 4, 5}

h t
g
s2 = {4, 5, 6, 7, 8}

yr i
p
s1 ^ s2 à {1, 2, 3, 6, 7, 8}

C o
s2 ^ s1 à {1, 2, 3, 6, 7, 8}

in general: s1 ^ s2 = s2 ^ s1
Empty Set, Cardinality, Disjoint Sets

For finite sets, the cardinality of a set is the number of elements in the set
m y
d e
à len(s)

c a
te A
y
An empty set is a set that contains no elements à cardinality is 0

th B
a
à set() cannot use {} to create an empty set

M
à this would create an empty dictionary

©
h t
yr i g
Two sets are said to be disjoint if their intersection is the empty set

o p
à len(s1 & s2) à 0

C
à s1.isdisjoint(s2) à True (Boolean)
Subsets and Supersets

A set s1 is a subset of s2 if all the elements of s1 are in s2 %! ⊆ %"

m y
à s1 <= s2 {1, 2, 3} <= {1, 2, 3, 4} à True
d e
à s1.issubset(s2) {1, 2, 3} <= {1, 2, 3} à True
c a
te A %! ⊂ %"
y
A set s1 is a proper subset of s2 if s1 is a subset of s2 and s1 is not equal to s2

h B
à i.e. s1 is a subset of s2 and s2 contains some additional elements

t
à s1 < s2
M a
{1, 2, 3} < {1, 2, 3, 4} à True
{1, 2, 3} < {1, 2, 3} à False

t ©
i g h
A set s1 is a superset of s2 if s2 is a subset of s1

yr
à s1 >= s2

o p
à s1.issuperset(s2)

C
A set s1 is a proper superset of s2 if s2 is a subset of s1
à s1 > s2
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Python Sets

Python has an implementation of sets that supports many set operations:


m y
cardinality à len(s)
d e
membership testing à in, not in
c a
unions à s1 | s2, s1.union(s2)
te A
intersections
B y
à s1 & s2, s1.intersection(s2)

differences
a th
à s1 - s2, s1.difference(s2)

symmetric differences
M
à s1 ^ s2, s1.symmetric_difference(s2)

©
subsets
h t
à s1 <= s2, s1.issubset(s2)

yr i g à s1 < s2

o p
supersets à s1 >= s2, s1.issuperset(s2)

C
disjointness
à s1 > s2

à s1.isdisjoint(s2)
Python Sets

The type is set


m y
d e
Set literals à {'a', 10, 10.5}
c a
Empty set à set()
te A
B y
th
Notice how the literal notation for sets uses the same braces {} as dictionaries

a
In fact sets are hash tables as well

© M
h t
i g
Unlike dictionary hash tables, sets only contain the "keys", not the values

yr
à set(iterable)
o p
C
Python Sets

elements of a set
m y
à must be unique (distinct)
d e
à must be hashable
c a
à have no guaranteed order

te A
B y
a set is a mutable collection
th
à add and remove elements

a
à a set is therefore not hashable
© M
h t
à cannot be used as a dictionary key

yr i g
à cannot be used as an element in another set

o pà no set of sets

C
Frozen Sets

Frozen Sets are the immutable equivalent of sets à frozenset


m y
à think of tuples and lists
à frozenset(iterable)
d e
c a
elements of a frozenset

te A
à must be unique (distinct)

B y
à must be hashable

a th
M
à have no guaranteed order

t
Can convert any set to a frozenset ©
h
à frozenset({1, 2, 3})

yr i g à no literal for a frozenset

p
A frozenset is hashable

o
Cà can be used as a dictionary key

à can be used as an element of a set (or frozenset)


Membership Testing

Testing membership of an element in a set is extremely efficient


m
(hash table lookup) y
d e
à in, not in

c a
In fact, instead of writing code like this:
te A
B y
if a in [10, 20, 30]:

a th
or even

© M
h t
g
if a in (10, 20, 30):

yr i
p
prefer using (as long as elements are hashable):

o
C
if a in {10, 20, 30}: à higher storage cost
Some Timings
n
s
=
=
10_000_000
set(range(n)) value = 100 value = 9_999_999
m y
l = list(range(n))
d e
t = tuple(range(n)) tuple 0.0186

c a 1692

def test_set(s, value): list 0.0191

te A 1659

y
return value in s
set 0.0016 0.0021
def test_list(l, value):
th B
return value in l

M a
©
def test_tuple(t, value):
return value in t

h t
yr i g
timeit('test_set(s, value)', globals=globals(), number=10_000)

p
timeit('test_list(l, value)', globals=globals(), number=10_000)

C o
timeit('test_tuple(t, value)', globals=globals(), number=10_000)

à list/tuple lookup à scan until found


à set/dictionary à hash table à direct lookup (+ hash collisions)
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Creating Sets

m y
à {'a', 10, 3.14159}

d e
c a
elements must be hashable
à set(iterable)

te A
B y
empty set
a
à cannot use a literal th à {} is an empty dict

© M
h t
à set()

yr i g
p
à set comprehensions

o
C {c for c in 'python'} à elements must be hashable

in this case simpler to just do this: set('python')


Creating Sets

à unpacking unpack iterables *my_list


m y
unpack dictionaries **my_dict
d e
c a
sets are iterable à can be unpacked too *my_set

te A
y
à order in which elements are unpacked is essentially unknown

B
a th
s1 = {'a', 10, 3.14}

© M
s2 = set('abc')
h t
yr i g
p
{*s1, *s2} à {'a', 'b', 'c', 10, 3.14}

C o
[*s1, *s2] à ['a', 'a', 'b', 'c', 10, 3.14]

my_func(*s1, *s2) à works, but what's the order of the arguments???


m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Cardinality and Membership

m y
e
len(s) à number of elements in s (cardinality of s)

c a d
A
in, not in à x in s à tests if x is in the set s

y te
à like dictionary keys, use equality (==) not identity (is)

th B
membership testing in sets is fast

M a
à hash table lookup

t ©
membership testing in a list of tuple is slow (in comparison) à linear scan

i g h
yr
à but sets have more memory overhead than lists or tuples

p
C o
à tradeoff – speed vs memory

à we'll look at this in code


Adding Elements

m y
e
lists have ordering à append
à insert

c a d
sets have no ordering à add
te A
B y
s.add('python')
a th
© M
h t
yr i g
à mutates the set

o p
C
Removing Elements
l = [10, 20, 30]

lists have ordering à can remove element by position à del l[1]


m y
à [10, 30]
à can remove specific element
d e
à l.remove(30) à [10]

sets have no ordering à cannot use position


c a
à can remove specific element

te A
s = {'a', 'b', 'c'}

B y
s.remove('b')
s.remove('z') th
à {'a', 'c'}

a
à KeyError exception

to avoid KeyException
© M à mutates the set

h t
s.discard('a') s à {'c'}

yr i g
s.discard('z') s à {'c'}

s.pop()
o p à removes and returns an arbitrary element

C à KeyError if set is empty

s.clear() à removes all elements from set


m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Set Operations

à union
m y
d e
a
à intersection à related: testing if two sets are disjoint

à difference
Ac
à symmetric difference
y te
th B
a
à containment à strict and non-strict

M
in general, we have two ways of doing these operations

©
à methods
h t
à s1.intersection(s2)

yr i gs2 could be an iterable (of hashable elements)

o p
C
à operators à s1 & s2

s1 and s2 must both be sets


Intersections

{1, 2, 3} & {2, 4} à {2}


m y
d e
{1, 2, 3}.intersection({2, 4}) à {2}
c a
{1, 2, 3}.intersection([2, 4]) à {2}
te A
B y
a th
s1 & s2 & s3 & s4

© M
h t
g
s1.intersection(s2, s3, s4)

yr i
o p
C
à returns a new set
Unions

Similar to how intersections work

m y
{1, 2, 3} | {2, 4} à {1, 2, 3, 4}
d e
c a
{1, 2, 3}.union({2, 4}) à {1, 2, 3, 4}

te A
B y
h
{1, 2, 3}.union([2, 4]) à {1, 2, 3, 4}

a t
© M
t
s1 | s2 | s3 | s4

i g h
yr
s1.union(s2, s3, s4)

o p
C
à returns a new set
Disjointedness

two sets are disjoint if their intersection is empty


m y
d e
len(s1 & s2) à 0

c a
à empty sets are falsy if s1 & s2:

te A
y
print('sets are not disjoint)

th B
a
if not(s1 & s2):

M
print('sets are disjoint)

t ©
if not s1 & s2:

i g h print('sets are disjoint)

p yr
o
à s1.isdisjoint(s2)

C
Differences

{1, 2, 3, 4} – {2, 3} à {1, 4}

m y
{1, 2, 3, 4}.difference({2, 3}) à {1, 4}
d e
{1, 2, 3, 4}.difference([2, 3])
c a
A
à {1, 4}

s1 – s2 – s3
y te
s1.difference(s2, s3)
th B
à returns a new set

Beware!! s1 – (s2 – s3)


M
not same as a
(s1 – s2) – s3

t ©
h
{1, 2, 3} – ({2, 4} – {2, 4}) à {1, 2, 3} – {} à {1, 2, 3}

yr i g
({1, 2, 3} – {2, 4}) – {2, 4} à {1, 3} – {2, 4} à {1, 3}

o p
à left-associative s1 – s2 – s3

C à (s1 – s2) – s3

à s1.difference(s2, s3) à (s1 – s2) – s3


Symmetric Differences

s1 = {1, 2, 3, 4}
m y
s2 = {3, 4, 5, 6}

d e
union - intersection
c a
A
s1 ^ s2 à {1, 2, 5, 6}

à (s1 | s2) – (s1 & s2)

y te
s1.symmetric_difference(s2)
th B
M
s1.symmetric_difference([3, 4, 5, 6]) a
t ©
i g h
yr
à returns a new set

p
C o
Containment

Remember strict à not equal

m y
d e
s1 < s2 à is s1 strictly contained in s2

c a
s1 <= s2 à is s1 contained in (possibly equal to) s2

te A
à s1.issubset(s2)

B y
h
s1 > s2 à does s1 strictly contain s2

a t
à is s2 strictly contained in s1

© M
s1 >= s2

h t
à does s1 contain (possibly equal) s2 à s1.issuperset(s2)

i g
à is s2 contained in (possibly equal to) s1

yr
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Updating Sets

sets have no ordering à no indexing


m y
d e
lists à l[2] = 100 cannot do that with sets
c a
te A
we can add and remove elements
y
à mutates the set

B
a th
we can create unions, intersections, differences and symmetric differences

© M
à but these operations create new sets

h t
list analogy

yr i g
l1 = [1, 2, 3]

o p
l2 = [4, 5, 6]

C l1 + l2 à new list

à mutates l1
[1, 2, 3, 4, 5, 6]

but l1 += l2 l1 à [1, 2, 3, 4, 5, 6]
Analogous Set Mutating Updates

m y
|= &= -= ^=

d e
c a
à all these operations mutate the left hand side

te A
B y
h
lists: l1 += l2 à appends l2 to l1 à mutates l1

a t
à id of l1 has not changed

© M
à method equivalents
h t can use iterables too
s1 |= s2

yr i gs1.update(s2)

o p
s1 &= s2 s1.intersection_update(s2)

C
s1 -= s2

s1 ^= s2
s1.difference_update(s2)

s1.symmetric_difference_update(s2)
Analogous Set Mutating Updates

m y
e
s1.update(s2, s3) s1 |= s2 | s3

s1.intersection_update(s2, s3) s1 &= s2 & s3


c a d
RHS is evaluated first

te A
BEWARE!!
B y
s1.difference_update(s2, s3) is not the same as s1 -= s2 – s3

s1 ß (s1 – s2) – s3
a th s1 ß s1 – (s2 – s3)

s1 = {1, 2, 3, 4}

© M
s1 – s2 à {1, 4} s2 – s3 à {2}
s2 = {2, 3}

h t
s3 = {3, 4}

yr i g
{1, 4} - s3 à {1} {1, 2, 3, 4} – {2} à {1, 3, 4}

o p
C
à set differences are not associative

à in general s1 – (s2 – s3) ≠ (s1 – s2) – s3


m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Shallow vs Deep Copies

As with other types such as dictionaries, lists, etc we have two types of copies
m y
d e
à shallow à deep

c a
s2 = s1.copy()

te A
from copy import deepcopy

s2 = set(s1)
B y
s2 = deepcopy(s1)

unpacking s2 = {*s1}
a th
© M
h t
yr i g
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Frozen Sets

à immutable sets
m y
d e
à same properties and behavior as sets
c a
à except they cannot be mutated
te A
B y
Their elements can be mutable
a th
© M
t
If all elements in a frozen set are hashable, then the frozen set is also hashable

h
yr i g
à can be used as a key in a dictionary

o p
à can be an element of another set

C
frozenset() à no literal expressions to create frozen sets
Copying Frozen Sets

Think back to tuples and lists


m y
d e
l1 = [1, 2, 3] l2 = list(l1) l1 is l2 à False
c a
t1 = (1, 2, 3) t2 = tuple(t1)
te
t1 is t2 à True A
B y
th
à safe for Python to not make a copy of the tuple – since it is immutable

a
©
Same thing with sets and frozen sets M
h t
yr i g
s1 = frozenset({1, 2, 3}) s2 = frozenset(s1)
s1 is s2 à True

o p s2 = s1.copy()

C
Deep copies do not behave that way
Set Operations

non-mutating set operations & | - ^

m y
d e
a
these operations can be performed on mixed sets and frozensets

resulting type? à the type of the first operand


Ac
y te
s1 & s2
th B
à set if s1 is a set, even if s2 is a frozenset

M a
à frozenset if s1 is a frozenset, even if s2 is a set

t ©
i g h
p yr
C o
Equality and Identity

Numbers 1.0 is 1 à False 1 + 0j is 1 à False


m y
True is 1 à False

d e
1.0 == 1 à True 1 + 0j == 1 à True

c a
True == 1 à True

te A
Same thing with sets and frozen sets

B y
s1 = {1, 2, 3}
a th
s2 = frozenset([1, 2, 3])

© M s1 is s2 à False

h t s1 == s2 à True

yr i g
o p
C
Application: Memoization

In Part 1 of this series I covered memorization using decorators.


m y
d e
a
à https://en.wikipedia.org/wiki/Memoization

Python has a decorator available for memorization:


Ac
functools.lru_cache

y te
th B
But that decorator (and the one we wrote ourselves), has one drawback

@lru_cache
M a
def my_func(*, a, b):

t ©
h

yr i g
p
my_func(a=1, b=2) à result is computed and cached

C o
my_func(a=1, b=2) à result is returned directly from cache

my_func(b=2, a=1) à result is computed again, and cached


m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
A long long time ago…

how would we iterate over all the keys, values or items of a dictionary?
m y
d e
d.keys() d.values() d.items()
c a
à created and returned a list of these things

te A
à list is static d = {'a': 1, 'b': 2}

B y
h
keys = d.keys() à keys = ['a', 'b']

d['c'] = 3
a tà keys = ['a', 'b']

© M
t
à list duplicates data – not good for large dictionaries – can be slow

h
yr i g
à inefficient for membership testing

o p
d = {'a': 1, 'b': 2}

C values = d.values()

2 in values à linear search


A long long time ago…

To help with iteration:

m y
d e
a
d.iterkeys() d.itervalues() d.iteritems() were introduced

Ac
te
à iterators better than a new list… did not duplicate data à more lightweight

B y
still does not help with membership testing

a th
© M
also not easy to answer questions such as, given d1 and d2

h t
i g
what keys are common to both?

yr set questions

o p
what keys are in one but not the other?

C
after all, keys have to be unique à keys form a mathematical set
Key View

instead of keys() returning a list, and iterkeys() just being an iterator…

m y
what if keys() was a lightweight object that
d e
c a
maintained a reference to the dictionary

te A
and implemented methods such as:

B y
__iter__ à iterable protocol

a th
__contains__

© M
à membership testing behaves like an iteratable

t
__and__ à intersection of two views behaves like a set

i g h does not "own" any data

yr
__or__ à union of two views

p
__eq__

o
à same keys in both views

C __lt__

etc
à is one set of keys a subset of the other
Dictionary Views PEP 3106
Three ways we may want to view the data in a dictionary

m y
à keys only d.keys()

d e
a
all are iterables

c
à values only d.values()

A
some may have set properties

te
à key/value pairs à (key, value) d.items()

d = {'a': 1, 'b': 2, 'c': 3}


B y
list(d.keys()) à ['a', 'b', 'c']
a th
list(d.values())
M
à [1, 2, 3]

©
list(d.items())

h t
à [('a', 1), ('b', 2), ('c', 3)]

yr i g
Important: order of keys and values (and items) are the same

o p
à the position of an item in one view corresponds to the same position in other views

C (as long as the dictionary keys were not modified in between)

à Python 3.6+: in addition, this order is same as dictionary (insertion) order


Set Behavior

The keys() view always behaves like a (frozen) set


m y
d e
à since elements are unique (==) and hashable

c a
te A
The items() view may behave like a (frozen) set
B y
à if the values are hashable
a th
© M
à uniqueness of tuples are guaranteed since keys are unique

h t
yr i g
p
The values() view never behaves like a set

o
C à values not guaranteed unique

à values not guaranteed hashable


And also…

lightweight à views do not maintain their own copy of the underlying data
m y
d e
à simply implement methods that use the underlying dictionary à proxy

dynamic à views reflect any changes in the dictionary


c a
immutable à but views are not updatable
te A
B y
d = {'a': 1, 'b': 2}
keys = d.keys()
a th
keys à 'a', 'b'
values = d.values()
items = d.items()
© M
values à 1, 2
items à ('a', 1), ('b', 2)

h t
d['a'] = 10

yr i g keys à 'a', 'b'


values à 10, 2

o p items à ('a', 10), ('b', 2)

C
del d['b']
d['c'] = 3
keys à 'a', 'c'
values à 10, 3
items à ('a', 10), ('c', 3)
Modifying the dictionary while iterating over a view

be careful doing this à modifying values usually not a problem


m y
d e
à modifying keys can lead to exceptions or worse disasters!

c a
This is SAFE: for key in d.keys():
d[key] += 1
te A
This leads to an EXCEPTION: for v in d.values():
del d['a']

B y à Python does not allow modifying the

a th size of the underlying dictionary while


iterating over a view

© M
You technically can modify the keys as long as you do not change the size of the dictionary

h t
g
à don't do it!

yr i
Python docs:

o pIterating views while adding or deleting entries in the

C
dictionary may raise a RuntimeError or fail to iterate
over all entries.

à no guarantee it will work the way you think it should


m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Serializing and Deserializing Objects

à useful for persistence and/or transmission


m y
d e
sometimes we have objects (data)
c a
te A
à want to "save" them somewhere so we can retrieve ("load") them later

B y
à even after program that generated the data has terminated

a th
à transmit them to someone or something else outside our app

generically applies to any object


© M
h t
yr i g
à create a persistent representation of the object à serializing

o p
à reconstruct the object from the serialized data à deserializing

C
Pickling and Unpickling

m y
Python specific
d e
à built-in mechanism to serialize and deserialize many objects using
binary representation
c a
te A
Databases à relational databases
y
à e.g. objects like record sets, lists of tuples, etc

B
à NoSQL databases
a thà e.g. graphs, documents, etc

© M
JSON
t
à ubiquitous standard

h
à Web / Javascript

yr i g à REST APIs
à MongoDB

o p à text representation

C à more limited data types


à human readable
and more…
Focus on Serializing Dictionaries

m y
à Pickle will apply to more than just dictionary objects

d e
à focus on dictionaries because of JSON
c a
à easy to serialize dictionaries to JSON
te A
à easy to deserialize JSON to dictionaries
B y
a th
à loss of some data types

© M
h t
à many alternatives

yr i g
à beyond scope of this series: marshmallow

o p https://marshmallow.readthedocs.io

C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
The pickle Module

Python specific
m y
d e
A way to represent an object in a persistent way
a
à disk, transmission

c
Create an object’s representation à serializing
te A
B y marshalling

h
Reload object from representation à deserializing

a t
obj serialize
© M deserialize obj

t
101100011001…

i g h
p yr
Pickle is a binary serialization (by default)

C o
Focus on dictionaries à Can be used for other object types
Danger Zone!

Unpickling (deserializing) can execute code


m y
d e
à not safe!
c a
te A
Only unpickle data you trust
B y
a th
© M
h t
yr i g
o p
C
Usage

import pickle
m y
d e
dump à pickle to file

c a
load à unpickle from file

te A
dumps à returns a (string) pickled representation

B y à store in a variable
loads à unpickle from supplied argument

a th
© M
h t
yr i g
o p
C
Equality and Identity

equality à == identity à is
m y
d e
c a
pickle
0011
0000 unpickle

te A
y
dict1 dict2
1111
id=100
01

th B id=200

M a
dict1 == dict2
t
à True
© à Custom objects will need to implement __eq__

i g h
yr
dict1 is dict2 à False

o p
C
Equality and Identity

While pickling, Python will not re-serialize an object it has already serialized
m y
d e
a
à recursive objects can be pickled

à shared objects are deserialized as shared objects as well


Ac
y te
th B
obj1

M a obj1

• prop1
t ©
pickle / unpickle • prop1
• prop2

i g h • prop2

p yr
C o
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
JSON

JavaScript Object Notation


m y
d e
à text-based object serialization
c a
à open standard
à human-readable
te A
B y
a th
M
Very common format for web API's and general data interchange between systems

©
h t
Unlike pickling, it is considered safe

yr i g
p
à may vary based on the JSON deserializer you use

o
C
Limited Data Types

strings "python" delimited by double quotes unicode

m y
numbers 100 3.14 3.14e-05 3.14E+5 à all floats
d e
c a
booleans true, false

te A
arrays (lists) [1, 3.1, "python"]
B y
delimited by square brackets ordered

dictionaries { "a": 1, "b": "python"}


a th
key-value pairs keys à strings

empty value null


© M values à any supported data type

t
unordered

i g h
yr
Often, non-standard, additional types are supported:

p
integers

floats
C o 100

3.14 100.0 NaN Infinity -Infinity


Example

m y
e
"title": "Fluent Python",
"author": {
"firstName": "Luciano",
c a d
A
"lastName": "Ramalho"

te
},
"publisher": "O'Reilly",
"isbn": "978-1-491-9-46008",
B y
"firstReleased": 2015,
"listPrice": [
a th reminds you of Python??

{
"currency": "USD",
© M
"price": 49.99

h t
},
{
yr i g
o p
"currency": "CAD",
"price": "57.99"

}
] C}
Serialization and Deserialization

JSON is a natural fit for serializing and deserializing Python dictionaries

m y
d e
a
Of course, Python dictionaries are objects

JSON is essentially a string


Ac
y te
import json dump, dumps

th B
load, loads

M a
t ©
i g h
yr
serialize deserialize
dict {…} dict

p
dump, dumps load, loads

C o file
string
Problems…

JSON keys must be strings à but Python dictionary keys just need to be hashable

m y
à how to serialize?

d e
c a
JSON value types are limited
te A
à Python dictionary values can be any data type
à how to serialize?
B y
a th
M
even if we can serialize a complex data type, such as a custom class

©
h t
i g
à how do we deserialize back to original data type?

yr
o p
C
à Customize serialization and deserialization

we'll come back to this later…


m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Custom Encodings

As we saw in the previous video, any object can be serialized to JSON


m y
d e
c
cumbersome à remember to call the JSON serializer for every class
a
te A
y
how do we do it for nested dictionaries and lists?

th B
dump dumps

M a
t ©
à can provide custom callable

i g h
yr
à uses a default instance of JSONEncoder

o pà we can completely override JSONEncoder

C
Specifying a Custom Encoding Function

One of the arguments of the dump / dumps function is default

m y
d e
c a
à when provided, Python will call default if it encounters a type it cannot serialize
à argument must be a callable

te A
à callable must have a single argument
B y
a th
à that argument will receive the object Python cannot serialize

M
à can include logic in our callable to differentiate between different types

©
h t
à or we can use a single dispatch generic function

yr i g
[ using the @singledispatch decorator from the functools module]

o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
dump

In the previous videos we looked at the dump and dumps functions

m y
d e
Beyond the default argument, dump has many other arguments that allow us to control serialization

c a
skipkeys bool default is False

te A
à if dictionary keys are not basic types

y
(string, int, etc) and skipkeys is set to

th BFalse we will get a TypeError, otherwise it


just skips the key

indent int
M
default is None a à useful for human readability

t ©
i g h
yr
separators tuple defaults to (', ', ': ') à customizes how the JSON is rendered

o p
C
sort_keys boolean default is False à if True, dictionary keys
will be sorted
and more… cls
The JSONEncoder Class

Python uses an instance of the JSONEncoder class in the json module to serialize data
m y
d e
a
The JSONEncoder class shares many arguments with the dump / dumps functions

Ac
te
default skipkeys sort_keys indent separators …

B y
The dump / dumps functions have a cls argument

a th
M
allows us to specify our own version of JSONEncoder

©
h t
yr i g
o p
C
Why use JSONEncoder at all?

If dump has all the same arguments as JSONEncoder, why use it at all?

m y
d e
c a
To remain consistent in our app, every time we call dump we need to use the same argument
values

te A
y
Easy to make a mistake, or forget to specify an argument

B
à instead use a custom JSONEncoder
a th
© M
and just remember to specify it via the cls argument
of dump / dumps

h t
yr i g
o p
C
How to create a custom JSONEncoder

y
à subclass JSONEncoder
à custom initialize parent class if we want to
e m
à override the default method

c a d
à handle what we want to handle ourselves
à otherwise delegate back to the parent class
te A
B y
inherit from JSONEncoder

class CustomEncoder(JSONEncoder):
a th custom init parent
def __init__(self):

© M
super().__init__(skipkeys=True, allow_nan=False,

t
indent='---', separators=('', ' = '))

h
yr i g
def default(self, arg):
override default method

p
if isinstance(arg, datetime): handle what we want to handle

C o return arg.isoformat()
else:
return super().default(self, arg)
(return the string serialization of arg)

otherwise delegate
back to parent
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Loading JSON

We have seen how to serialize Python objects to JSON

m y
Now we need to look at deserializing JSON to Python objects à load loads
d e
c a
import json d à

te
{
A
y
"a": 1,

B
j = '{"a": 1, "b": {"sub1": [2, 3]}}' "b": {

d = json.loads(j)
a th }
"sub1": [2, 3]

© M }

h t
works out-of-the-box with the standard JSON data types numbers, booleans, strings, lists,

yr i g dictionaries (key:value pairs)

p
does not work with other types

o
C
j = '{"createdAt": "2020-12-31T23:59:59"}'

interpreted as a string
One Approach

y
Use some custom encoding scheme to define both the value and the type of some entry in the JSON file
For example, when encoding a timestamp, we could do it as follows:

e m
j = '''
{ "createdAt":
c a d
{
"objecttype": "isodatetime",
te A
"value": "2020-12-31T23:59:59"

B y
h
}

'''
}

a t
© M
à load the JSON string into a Python dictionary

h t
yr i g
à iterate through dictionary (recursively), to find objects with an objecttype == isodatetime

p
à replace createdAt with the converted timestamp

o
C
à tedious à load JSON, iterate recursively through
dictionary, and convert as needed
A Slight Improvement

load and loads have an argument named object_hook

m y
à loads(j_string, object_hook=my_func)
d e
c a
A
à my_func is called for every object in the JSON data

y te
B
For example:
à loads first parses JSON into a dictionary
j = '''
a th
à object_hook will call for every dictionary (object) in the dictionary

M
{
"a": 1,
"b": {
t © à b dictionary
"sub1":

i g h
[1, 2, 3], à sub2 dictionary

yr
"sub2": {
"x": 100, à root dictionary (called last)

o p
"y": 200

C
} à dictionary is replaced by return
} value of my_func
}
''' à handles recursive aspect for us
Schemas

Deserializing custom JSON types and objects is difficult

m y
d e
à in general we need to know the structure of the JSON data in order to custom deserialize

c a
à this is referred to as the schema

te A
B y
à a pre-defined agreement on how the JSON is going to be structured or serialized

a th
M
à If JSON has a pre-determined schema, then we can handle custom deserialization

©
h t
i g
à schema might be for the entire JSON, or for sub-components only

yr
p
if we see this, replace the dict with the

o
{ "createdAt":
custom object/value

C
{
"objecttype": "isodatetime",
"value": "2020-12-31T23:59:59"
}
}
Overriding Basic Type Serializations

Notice that object_hook only allows us to customize deserialization of objects

m y
What about numbers?
d e
à by default floats for real numbers, and ints for whole numbers

c a
A
What if we want Decimal instead of float, or binary representations for integers?

te
y
à can override the way these data types are handled by using some extra arguments in load/loads

B
à parse_float
a th
à provide a custom callable

à parse_int
© M
à callable has a single argument

t
à argument value will be the original string in the JSON
à parse_constant

i g h
à return parsed value

p yr
o
à No overrides for strings

C
Example

from decimal import Decimal

m y
def make_decimal(arg):
d e
return Decimal(arg)

c a
te A
y
à loads(j, parse_float=make_decimal)

th B
M a
If load / loads encounters this in the JSON data: "a": 100.5

t
à calls make_decimal("100.5") ©
i g h
yr
à deserialized JSON will now have Decimal("100.5") instead of float 100.5

o p
C
Another argument – object_pairs_hook

à related to object_hook

m y
à cannot use both at the same time
d e
(if both are specified, then object_hook is ignored)

c a
te
object_hook passes the deserialized dictionary to the callable
A
y
à there is no guarantee of the order of elements in the dictionary

B
a t
What if order of elements in JSON is important?h à lists preserve order

© M
t
à instead of callable receiving a dictionary it receives a list of the key/value pairs

i g h
yr
à key/value pairs are provided as a tuple with two elements

o p
C
object_hook object_pairs_hook

{"a": 1, "b": 2} [ ("a", 1), ("b": 2) ]


Mixing Basic Type Overrides and Object Hooks

à can specify both parse_... and object_hook


m y
d e
a
Remember that object_hook (and object_pairs_hook) callables receive a parsed object

c
te A
This means parse_... (if specified) is used first, before we receive the parsed object in the hooks

B y
a th
© M
h t
yr i g
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Recall…

working with serialization, we could use dump or JSONEncoder

m y
d e
Similarly, we can create a custom JSONDecoder class and specify it with the cls argument

c a
à json.loads(j, cls=CustomDecoder)

te A
B y
a th
Just a different way of doing it à might help making sure we use our custom decoder consistently

M
à works a differently than JSONEncoder

©
h t
à inherit from JSONDecoder

yr i g
à override the decode function

o p
à decode function receives entire JSON string

C
à we have to fully parse and return whatever object we want
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Specialized Dictionaries

standard library provides


m y
In this section we are going to look at other specialized types of dictionaries that the Python

d e
defaultdict automatic default values for "missing" keys
c a
te A
y
OrderedDict guaranteed key ordering (based on insertion order), plus some extras

th B
a
specialized tools for dealing with counters
Counter

ChainMap
© M
efficient way of "combining" multiple dictionaries

h t
UserDict

yr i g alternative to subclassing dict for creating custom dictionary types

o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Missing Keys

standard dictionary (dict)


m y
e
d = {}

d['a'] à KeyError

c a d
te A
à can use the get method to handle default values for non-existent keys

B y
d.get('a', 100)
th
à we get 100 if 'a' is not present, but 'a' is still not in dictionary

a
d['a'] = d.get('a', 0) + 1

© M à general pattern for counters

h t
g
in general: d.get(key, value)

i
à value could be returned from calling a callable

p yr
o
à works, but possibly have to remember to always use the same default in multiple places for same dict

C
à easier would be to define the default once (per dict)
defaultdict

à collections module

m y
à subclass of dict type (defaultdict instance IS-A dict instance)
d e
c a
à so has all the functionality of a standard dict
defaultdict(callable, […])
te A
B y
a th remaining arguments are simply passed
to dict constructor

M
callable is called to calculate a default

t
à callable must have zero arguments
©
h
à referred to as a factory method
à default is None

yr i g
and None will be the default value

o p
d = defaultdict(lambda: 'python') d à {}

C
d['a'] à 'python'

d à {'a': 'python'} à entry has been created


Other Factory Functions

Often we want to initialize values to 0, an empty string, an empty list, etc

m y
int() à 0 defaultdict(lambda: 0)
d e
defaultdict(int)
c a
has the same effect

te A
list() à [] defaultdict(lambda: [])
B y
defaultdict(list)
a th has the same effect

© M
t
à factory must simply be a callable that can take zero arguments and returns the desired

h
g
default value

yr i
p
à can even be a function that calls a database and returns some value

o
C
à factory is invoked every time a default value is needed
à function does not have to be deterministic
à can return different values every time it is called
m y
d e
c a
te A
B y
th
OrderedDict
a
© M
h t
yr i g
o p
C
OrderedDict vs dict

à prior to Python 3.6 no guarantee of key insertion order being maintained


m y
d e
à if you must have an ordered dictionary and be backward compatible, must use OrderedDict

c a (collections module)
à if not, OrderedDict still has a few tricks up its sleeve!

te A
B y
à supports reverse iteration

a th makes sense since there is an ordering

M
à pop first or last item in dictionary

©
functionality not built-in to standard dict
à have to "work" to get that behavior

t
à move item to beginning or end of dictionary

h
yr i g
Equality comparison (==) does not behave the same way

o p
C
dict vs dict comparison à order of keys does not matter
OrderedDict vs OrderedDict comparison à order of keys matters
dict vs OrderedDict comparison à order of keys does not matter
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Using Dictionaries to Maintain Counters

We've already seen how we can use regular dict and defaultdict for counters
m y
d e
a
d = {} d = defaultdict(int)
d[key] = d.get(key, 0) + 1 d[key] += 1
Ac
y te
certain operations can be tedious:
th B
M a
count the frequency of characters in a string (or items in an iterable in general)

t ©
h
update one counter dictionary with another counter dictionary (adding or subtracting)

yr i g
from multiple counter dictionaries, find the max/min counter value for each key

o p
C
The collections.Counter class

m y
The Counter class is a specialized dictionary that makes certain operations easier

d e
à acts like a defaultdict and with a default of 0
c a
à supports same constructor options as regular dicts
te A
B y
à additional functionality to auto calculate a frequency table based on any iterable

a th
à iterate through every key, repeating each key as many times as the corresponding counter value

M
à find the n most common items (by count)

©
h t
à increment/decrement counters based on another Counter or dict or iterable

yr i g
p
à fromkeys is not supported

C o
à update works differently than a regular dict

in-place addition of counts


iterable is just a sequence of elements, not tuples
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Remember chain from itertools?

from itertools import chain

m y
e
l1 = […]

d
for e in chain(l1, l2, l3):
l2 = (…)
l3 = generator_func()

c a
te A
à made it look like we had a single iterable – but really just chained them one after the other

B y
collections.ChainMap serves a similar purpose – chaining dictionaries (or mapping types in general)

a th
M
d1, d2, d3 à dictionaries very different from
from collections import ChainMap

t © d = {**d1, **d2, **d3}

g
d = ChainMap(d1, d2, d3)

i h à extra storage

p yr
à no extra storage (nothing is copied)
à essentially a shallow copy/merge

C o
à mutating elements in chain may affect underlying dicts
à sees changes in underlying dicts
à does not see changes in original dicts

à behaves more like a dictionary view (but is updatable!)


Reading Keys from a Chain

There's an added complexity chaining maps that we do not have with iterables
m y
d e
a
The resulting chain should itself be a map à no repeated keys!

Ac
te
d1 = {'a': 10, 'b': 20} d2 = {'a': 100, 'c': 30}

d3 = ChainMap(d1, d2)
B y
a th
M
d3['b'] à 20 d3['c'] à 30

d3['a'] ??
t ©
à uses the first instance of the key it encounters in the chain

g h
à unlike {**d1, **d2} where the last instance takes effect

i
yr
d3['a'] à 10

o p
à iteration works the same way

C
à first instance of any key is returned – others are ignored

Be Careful! à unlike a dict there is no guarantee of key order when


iterating a ChainMap
Think of it as Parent-Child Relationships

ChainMap(d1, d2, d3)

m y
d3
d e
overrides parents

c a
A
d2
overrides
d1 child
y te
th B
a
In fact, there are attributes to deal with this explicitly

d.parents

© M
à a ChainMap containing the parent elements only

h t
d.new_child(d4)

yr i g
à adds d4 to the front of the chain (or bottom of the hierarchy)

d3
o p à same as
d2
d1
d4
C parents

new child
ChainMap(d4, d1, d2, d3)

ChainMap(d4, ChainMap(d1, d2, d3))


Additional ways to update the Chain

The .maps property returns a (mutable) list of all the maps in the chain
m y
d e
The order of the list is the same as the child à parents hierarchy
c a
te A
i.e. first element is the child, other elements are the parents in the same order

B y
h
This list is mutable

a t
à can modify the chain by removing, deleting, inserting and appending other maps

© M
t
d = ChainMap(d1, d2, d3)

i g h
yr
d.maps à [d1, d2, d3]

o p
d.maps.append(d4) à ChainMap(d1, d2, d3, d4)

C
d.maps.insert(0, d5)

del d.maps[1]
à ChainMap(d5, d1, d2, d3, d4)

à ChainMap(d5, d2, d3, d4)


Mutating Maps via the ChainMap

The ChainMap is mutable à we already saw we could add and remove maps from the chain

m y
We can also mutate the key/value pairs in the map itself
d e
d = ChainMap(d1, d2) à d[key] = value
c a
BUT these mutations affect the child (first) map only
te A
B y
h
d1 = {'a': 1, 'b': 2} d2 = {'a': 20, 'c': 3} d = ChainMap(d1, d2)

a t
d['a'] = 100
M
d1 = {'a': 100, 'b': 2}

©
d2 = {'a': 20, 'c': 3}

h t
i g
d['c'] = 200 d1 = {'a': 100, 'b': 2, 'c': 200}

yr
d2 = {'a': 20, 'c': 3}
del d['a']
o p d1 = {'b': 2, 'c': 200}

C
del d['a']
d2 = {'a': 20, 'c': 3}

à KeyError exception
d['a'] à 20

even though 'a' is in the chain


à not in the child (first) map
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Custom Dictionaries

m y
e
In the previous videos of this section, we looked at a variety of specialized dictionaries.

c a d
They extend the plain dict type with additional capabilities geared towards some specialized goal

Sometimes, we want to do the same.

te A
B y
h
à a dictionary that only allows certain types of keys (strings keys only à JSON)

a t
M
à a dictionary that only allows keys from some finite set of pre-defined keys

©
h t
à a dictionary that only allows numerical values

yr i g
o p
We could just create a custom class that uses a plain dict as a backing structure and write

C
custom __getitem__ and __setitem__ methods

à often good enough, but we don't inherit all the functionality that dicts have
Subclassing dict
We can create a custom class that inherits from dict

m y
We can override various methods to customize the dictionary behavior

d e
But there is a caveat here
c a
If we have a parent class that implements a method and we
override that method in a subclass

te A
y
- calling that method on a subclass instance will invoke the

B
overridden method

dicts have __getitem__ and __setitem__ methods


a th
d['a']
M
d.__getitem__('a')

©
d['a'] = 10

h t
d.__setitem__('a', 10)

yr i g
We would expect the dictionary class to use these __xxx__ methods internally

o p
for .get() .update() and so on

C
These built-in types however, often use direct access to data (in C)
They do not guarantee they actually use these "special" methods
Even len(string) does not actually use __len__
Alternative

y
If subclassing a dict is causing issues because of the special methods

we can use a predefined Python class: collections.UserDict


e m
It is not technically a subclass of dict
c a d
it uses a regular dictionary as a backing data structure

te A
and implements key functionality we have in dictionaries
B y
it is not a dict, but it is a mapping type
a th
© M
à views: items(), keys(), values()

h t
à __setitem__ and __getitem__ and uses those internally as appropriate

yr i g
à plus everything else we would expect from a dictionary

o p
C
So it is essentially a head-start on recreating a Python dictionary from scratch that offers
different subclassing possibilities

à Code section will help illustrate all this in more detail

You might also like