Python+Deep+Dive+3
Python+Deep+Dive+3
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
What this course is about
m y
the Python language
d e
c a
à canonical CPython 3.6+ implementation
A
the standard library
t ©
i g h and the standard library
p yr
C o
this is NOT an introductory course
coding videos
Ac
y te
Jupyter notebooks
th B
projects, exercises and solutions
M a
t ©
github repository for all code
i g h
yr
https://github.com/fbaptiste/python-deepdive
o p
C
Associative Arrays
m y
d e
a
what are associative arrays?
Ac
te
one concrete implementation à hash maps or hash tables
B y
h
how do hash maps work?
a t
M
what are hash functions?
©
h t
Python 3.5+ specific hash map implementation
yr i g
o p
C
modules namespaces
m y
d e
classes
c a
te A sets
dictionaries
B y
h
instances
a t
© M multi-sets
h t
g
JSON
yr i
o p
C YAML
Relational
Databases
Dictionaries
m y
d e
a
creating
Ac
te
manipulating
B y
h
updating, merging and copying
a t
M
keys, values and items Views
©
h t
custom classes and hashing à use instances as keys
yr i g
o p
C
Sets
m y
d e
hash maps
c a
set operations
te A
B y
h
copying, merging and updating sets
a t
FrozenSets
© M
h t
g
Dictionary views à keys, items
yr i
o p
C
Serializing and Deserializing
m y
d e
pickling
c a
JSON serialization and deserialization
te A
B y
th
use and customize Python's JSONEncoder and JSONDecoder classes
a
the need for JSON schemas
© M
h t
yr i g
3rd party libraries à JSONSchema, Marshmallow, PyYaml, Serpy
o p
C
Specialized Hash Maps
m y
d e
a
defaultdict
Ac
te
OrderedDict
B y
Counter à multi-set
a th
ChainMap
© M
h t
yr i g
o p
C
Custom Dictionary Types
m y
d e
a
using class inheritance to create customized dictionary types
c
te A
y
inheriting from dict
th B
inheriting from UserDict
M a
t ©
i g h
p yr
C o
Exercises
m y
d e
a
exercises after each section
Ac
te
should attempt these yourself first – practice makes perfect!
B y
th
solution videos and notebooks provided
a
M
à my approach
©
h t à more than one approach possible
yr i g
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Python 3: Deep Dive (Part 3) - Prerequisites
m y
This course assumes that you have in-depth knowledge functional programming in Python:
d e
functions and function arguments lambdas
c a
packing and unpacking iterables my_func(*my_list)
te A
B y
f, *_, l = (1, 2, 3, 4, 5)
M
closures
i g h
yr
zip map sorted any all chain
o p
C
@lru_cache @singledispatch @wraps
Python 3: Deep Dive (Part 3) - Prerequisites
c a
list comprehensions
te A
generators and generator expressions
B y
a th
M
context managers
You should have a basic understanding of creating and using classes in Python
m y
d e
class Person:
c a
A
def __init__(self, name, age):
te
self.name = name
y
self.age = age
@property
th B
def age(self):
return self._age
M a
t ©
h
@age.setter
i g
def age(self, age):
yr
if value <= 0:
p
raise ValueError('Age must be greater than 0')
C oelse:
self._age = age
Python 3: Deep Dive (Part 3) - Prerequisites
y
You should understand how special functionality is implemented in Python using special methods
class Point:
e m
def __init__(self, x, y):
self.x = x
c a d
A
self.y = y
def __repr__(self):
y te
B
return f'Point(x={self.x}, y={self.y})'
h
yr i g
def __gt__(self, other):
if not isinstance(other, Point):
o p
return NotImplemented
C
else:
return self.x ** 2 + self.y ** 2 > other.x**2 + other.y**2
th B
creating and using simple dictionaries
M a
d = {'a': 1, 'b': 2}
d['a']
t © d['a'] = 1
i g h
p yr
strings and string formatting f'result: {result}'
C o 'result: {result}'.format(result=result)
m y
I will use a limited number of 3rd party libraries in this course
d e
c a
You will need to know how to install 3rd party Python libraries
te A
y
pip install marshmallow
B
a th
Most code examples are provided using Jupyter Notebooks
© M
Freely available
t
https://jupyter.org/
h
GitHub and git
yr i g
à recommended but not required
o p
Chttps://github.com/fbaptiste/python-deepdive
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
This section is going to primarily theoretical in nature
m y
d e
what are dictionaries (aka associative arrays)
c a
à abstract data structure
te A
there are many ways to implement dictionaries
B y
a th
M
specifically we'll look at how they can be implemented using hash tables (aka hash maps)
©
h t
i g
This is not a data structure course, so we're not going to look at all the intricacies
yr
o p
just enough to get a rough understanding
C
y
Why bother?
e m
Dictionaries are everywhere in Python
c a d
modules
te A
classes
B y
objects (class instances)
a th
scopes
© M
h t
sets
yr i g
p
your own dictionaries
o
C
It is arguably one of the most important data structures in Python
If you're really not into theory…
m y
d e
or you already understand associative arrays, hash functions, hash maps, etc…
c a
Skip this section!
te A
B y
a th
à maybe just check out the videos on Python's (3.6+) implementation of hash maps
à key-sharing dictionaries
© M
Mark Shannon PEP 412
h t
yr i g
o p
à compact dictionaries Raymond Hettinger
C https://mail.python.org/pipermail/python-dev/2012-December/123028.html
really the main points that come out of this section:
m y
d e
a
à dictionary keys must be hashable
Ac
te
à dictionary key order is maintained (in order of insertion)
B y
a th
© M
h t
yr i g
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
What is an associative array?
Person objects
m y
d e
persons = [John, Eric, Michael, Graham]
c a
0 1 2 3
te A
B y
th
We can think of the indices as a key for the items in the list
a
0 à John
M
1 à Eric
t © 2 à Michael
i g h 3 à Graham
p yr
So when we want to get hold of the Michael object, we just need to remember the key
C o
persons[2] à Michael
d e
a
('michael', Michael),
('graham', Graham)]
Ac
we have associated a string with an object
y te
(key, object)
th B
to get the Michael object:
M a
©
à lookup the key 'michael' and return the associated value
t
i g h
yr
scan the persons list until we find a tuple with first element = key
p
return the second element of the tuple
o
C
At least we don't have to remember a number anymore!
But there really has to be a better way…
c a
A
And let's break it up:
M a
©
What if we could define a function h that would return these results - always:
t
h('john') à 0
i g h
h('eric') à 1 h('michael') à 2 h(graham') à 3
p yr
o
To get Michael, we would first call h('michael') à2 then persons[2]
C
persons[h('michael')] à Michael
Associative Arrays
Ac
abstractly we can think of it as a collection of (key, value) pairs
y te
Sometimes also called: maps dictionaries
th B
M
Can be implemented in different concrete ways a
t ©
They support:
h
à adding/removing elements
i g
yr
à modifying an associated value
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Hash Maps (aka Hash Table)
One common concrete implementation of an associative array (aka dictionary) is a hash map
m y
Suppose we have an array of 7 slots, initially containing nothing
d e
c a
0
1 A
Now suppose we are going to want to store these maps
te
2
'john' à John
B y
3 'eric' à Eric
a th
4
5 M
'michael' à Michael
©
t
'graham' à Graham
6
i g h
p yr
We'll define a function that will return an integer value for all these strings ('john', 'eric', etc)
C o
à that will be unique for each of these strings
à is between 0 and 6
à always returns the same integer for the same string (deterministic)
Hash Tables
0 Michael
y
'john' à 2 'john'
m
1
h(s)
'eric' à 4 'eric'
2 John
d e
'michael' à 0 'michael' 3
c a
'graham' à 5
'graham'
4
te A Eric
y
5 Graham
p
C o
à calculate h(key) à idx
à return value in slot idx
Hash Functions
Creating the function h(key) when we know all the possible keys ahead of time is easy
m y
d e
a
Reality check: most of the time we don't know all the possible keys ahead of time
Ac
te
In reality, creating such a function is hard
B y
t
Bounding the returned index value is not difficult
a h à modulo
© M x % 7 à 0, 1, 2, 3, 4, 5, 6
t
Ensuring uniqueness is hard
i g h
yr
how to ensure that h(k1) != h(k2) if k1 != k2
o p
C
maybe we don't need to…
Hash Functions
m
! = # ⇒ %(!) = %(#) y
e
A hash function is a function (in the mathematical sense)
c
that maps from a set (domain) of arbitrary size (possibly infinite)
a d (deterministic)
te A
ℎ: * → , where - , < -(*)
B y
a th
For our hash tables, we'll also want:
© M
h t
à the range to be a defined subset of the non-negative integers 0, 1, 2, 3, …
yr i g
à the generated indices for expected input values to be uniformly distributed (as much as possible)
o p
Cℎ /
Note that we do allow getting the same output for different keys
d e
c a
h('alexander', 11) à 9
te A
h('alexander', 5) à 4
h('john', 11) à 4
B y
h('john', 5) à 4 collision
h('eric', 11) à 4
collision
a th
h('eric', 5) à 4
© M
t
h('michael', 11) à 7 h('michael', 5) à 2
i g h
yr
h('graham', 11) à 6 h('graham', 5) à 1
o p
C
ord('A') à 65
Example
ord('B') à 66
def h(key, num_slots):
total = sum(ord(c) for c in key)
…
m y
return total % num_slots
d eord('Z') à 90
c a …
A
h('alexander', 11) à 948 % 11 = 2 ord('a') à 97
te
ord('b') à 98
h('john', 11) à 431 % 11 = 2
h('eric', 11) à 419 % 11 = 1
B y …
© M
h t
g
h('alexander', 5) à 948 % 5 = 3
yr i
h('john', 5) à 431 % 5 = 1
All these hash functions have collisions…
o p
h('eric', 5) à 419 % 5 = 4
C
h('michael', 5) à 723 % 5 = 3
h('graham', 5) à 625 % 5 = 4
Dealing with Collisions
yr i g
3
p
['alexander', Alexander] ['michael', Michael]
o
C
4 ['eric', Eric] [graham', Graham]
Dealing with Collisions
Probe Sequence
a th
0 ['michael', Michael]
© M
1 ['john', John]
h t
2
3 i
['graham', Graham]
yr g
['alexander', Alexander]
4
o p
['eric', Eric]
C
other types of probing
à must generate the same sequence of valid indices for any given key
Probe Sequence
Fetching Elements
y
h('alexander', 5) à 948 % 5 = 3 3 à 4 à 0 à 1 à 2
0 ['michael', Michael]
1 ['john', John] h('john', 5) à 431 % 5 = 1
e m
1 à 2 à 3 à 4 à 0
2 ['graham', Graham] h('eric', 5) à 419 % 5 = 4
c a d
4 à 0 à 1 à 2 à 3
A
3 ['alexander', Alexander] h('michael', 5) à 723 % 5 = 3 3 à 4 à 0 à 1 à 2
te
4 ['eric', Eric]
y
h('graham', 5) à 625 % 5 = 4 4 à 0 à 1 à 2 à 3
th B
find 'alexander' à hash = 3
is 'alexander' at index 3? à yes
M a
à probe sequence: 3 à 4 à 0 à 1 à 2
à return item
t ©
h
find 'michael' à hash = 3 à probe sequence: 3 à 4 à 0 à 1 à 2
yr i g
is 'michael' at index 3? à no
p
is 'michael' at index 4? à no
o
C
is 'michael' at index 0? à yes à return item
à this is why the hash of a key should not change over it's lifetime
à in reality more complex than this, but this is the basic idea
Sizing Issues
th B
à start small, and grow it over time as needed
M a
t ©
à resizing is expensive
i g h à recompute hashes
yr
à move data around
o p
C
à over allocate (create more slots than necessary)
m y
what happens when items are deleted
d e
c a
A
à this can affect probing algorithm
M a
gets complicated
t ©
i g h
yr
beyond the needs of this course
o p
C
à https://en.wikipedia.org/wiki/Hash_table
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Python Dictionaries are ubiquitous
m y
d e
namespaces
c a
classes
te A
modules
B y
functions
a th
sets
© M
and of course, your own dicts
h t
yr i g
Dictionaries are such an important part of Python that a lot of time and effort was put into
p
making them as efficient as possible
o
C
key sharing compact dictionaries
class Person:
Key Sharing PEP 412 def __init__(self, name, age):
john
self.name = name
self.age = age
m y
john = Person('John', 78) ['name', 'John']
d e
['age', 78]
c a
eric = Person('Eric', 75)
te A
eric
['name', 'Eric']
B y ['age', 75]
h t
yr i g
à multiple instances of the same class à instance attribute names are the same
m
d e
a
0 wasted space
1 ['john', John] [ ['—', '—', '—'],
Ac
te
2 1 [-6350376054362344353, 'john', John],
y
['—', '—', '—'], key order
B
3 ['alex', Alex]
3 [4939205761874899982, 'alex, Alex],
h
different from
t
4 ['—', '—', '—'], insertion order
5
6 ['eric', Eric]
M a
['—', '—', '—'],
6 [6629767757277097963, 'eric', Eric]
]
t ©
g h
values = [[4939205761874899982, 'alex, Alex],
i
key order
yr
[-6350376054362344353, 'john', John], same as
p
[6629767757277097963, 'eric', Eric]] insertion order
C o 1 3 6
indices = [None, 1, None, 0, None, None, 2]
m y
d e
c a
te A
B y
th
hash()
a
© M
h t
yr i g
o p
C
Python hash()
m y
d e
à if a == b is True, then hash(a) == hash(b) is also True
c a
A
à Python truncates hashes to some fixed size
te
(sys.hash_info.width)
à me = 64-bits
B y
h
map(hash, (1, 2, 3, 4)) à 1, 2, 3, 4
a t
map(hash, (1.1, 2.2, 3.3, 4.4))
© M à 1152921504606846977, 1152921504606846978,
t
1152921504606846979, 1152921504606846980
i g h
yr
map(hash, ('hello', 'python', '!')) à 2558804294780988881, 1235297897608439440,
p
-8029463035455593707
C o
hash((1, 'a', 10.5)) à -5053599863580733767
Python hash()
m y
hash([1, 2]) à TypeError: unhashable type
d e
c a mutable
A
hash({'a', 'b'}) à TypeError: unhashable type
y te
hash((1, 2))
th B
à 3713081631934410656
M a
hash(frozenset({'a', 'b'})) à 4261914069630221614
immutable
t ©
i g h
p yr
o
hash((1, 2, [3, 4])) à TypeError: unhashable type
C
Why?
a = [1, 2, 3]
a th
M
d = {a: 'this key is a list – mutable'}
©
a.append(4)
h t
à same object
C
Caveat
m y
e
à if a == b is True, then hash(a) == hash(b) is also True
d
c a
à Python truncates hashes to some fixed size
te A
# mod1.py run 1:
B y
1235297897608439440
print(hash('python'))
print(hash('python')) th
1235297897608439440
a
© M
run 2: -5750637952798290655
-5750637952798290655
h t
yr i g
hash values for objects that compare equal remain equal during program run
o p
but they can change from run to run à strings, bytes and datetime
C
à never rely on a hash value being the same from one program run to another
à although may be ok sometimes, ex: integers
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
creating dictionaries à literals, dict(), comprehensions, and more…
m y
d e
common operations
c a
à membership tests, retrieving, adding, removing elements…
te A
updating
B y
à update, packing/unpacking, copy, deepcopy
a th
dictionary views
© M
à keys, items, values and iteration
h t
yr i g
custom classes as keys à default hash, custom hashing
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Dictionary Elements
m y
basic structure of dictionary elements: key : value
d e
c a
value à any Python object integer
A
custom class or instance
te
B y
function module any Python object…
p yr
àhash tables require hash of an object to be constant (for the life of the program)
C o
roughly: immutable objects are hashable
mutable objects are not hashable
more subtle than that…
Hashable Objects
m y
e
Python function: hash(obj) à some integer (truncated based on Python build: 32-bit, 64-bit)
sys.hash_info.width
c a d
à Exception
A
TypeError: unhashable type
te
à int, float, complex, binary, Decimal, Fraction, …
y
à immutable
B
à hashable
h t
à tuples à immutable collection à hashable only if all elements are also hashable
à set, dictionary
yr i g
à mutable collections à not hashable
à list
o p
à mutable collection à not hashable
C
à functions à immutable à hashable
If an object is hashable:
m y
d e
à the hash of the object must be an integer value
c a
A
à if two objects compare equal (==), the hashes must also be equal
te
B y
Important:
a th
two objects that do not compare equal may still have the same hash
© M (hash collision)
t
à more hash collisions à slower dictionaries
i g h
yr
later à creating our own custom hashes
p
o
à we will also need to conform to these rules
C
Creating Dictionaries: Literals
h t
{'john':
yr i g
['John', 'Cleese', 78],
p
(0, 0): 'origin',
o
'repr': lambda x: x ** 2,
}
C
'eric': {'name': 'Eric Idle',
'age': 75}
Creating Dictionaries: Constructor
m y
d e
must be a valid identifier name
c a
(think variable, function, class name, etc)
te A
any object
B y
a th
{'john':
(0, 0):
['John', 'Cleese', 78],
'origin',
© M dict(john=['John', 'Cleese', 78],
repr=lambda x: x ** 2,
'repr': lambda x: x ** 2,
g
'eric': {'name': 'Eric Idle', 'age': 75},
}
yr
'age': 75}
i twin=dict(name='Eric Idle', age=75)
p
)
C o
Creating Dictionaries: Comprehensions
th B
a
à elements must be specified as key: value
h t
{str(i): i ** 2 for i in range(1, 5)} à {'1': 1, '2': 4, '3': 9, '4': 16}
{str(i): i ** 2
yr i g à {'2': 4, '4': 16}
o p
for i in range(1, 5)
C
if i % 2 == 0}
Soapbox!
d = {i: i** 2 for i in range(1, n)}
vs
m y
e
d = {}
for i in range(1, n):
d[i] = i ** 2
c a d
te A
B y
h
But when things get more complex…
d = {}
a t
url = 'http://localhost/user/{id}'
for i in range(n):
© M
h t
response = requests.get(url.format(id=i))
i g
user_json = response.json()
yr
user_age = int(user_json['age'])
o p
if user_age >= 18:
user_name = user_json['fullName'].strip()
C user_ssn = user_json['ssn']
d[user_ssn] = user_name
Creating Dictionaries: fromkeys()
m y
à creates a dictionary with specified keys all assigned the same value
d e
c a
A
d = dict.fromkeys(iterable, value)
y te
any iterable
h B
all set to same value
t
contains the keys
M a
optional à None if not provided
©
hashable elements
h t
yr i g
d = dict.fromkeys(['a', (0,0), 100], 'N/A')
C
d = dict.fromkeys((i**2 for i in range(1, 5)), False)
à {1: False, 4: False, 9: False, 16: False}
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Basic Operations
m y
d[key] = value à creates key if it does not exist already
d e
à assigns value to key
c a
d[key]
te A
à as an expression returns the value for specified key
B y
h
à exception KeyError if key is not found
a t
M
sometimes want to avoid this KeyError exception, and return a default value if key is not found
©
h t
g
d.get(key) à returns value if key is found, None if key is not found
yr i
p
d.get(key, default) à returns value if key is found, default if key is not found
C o
Basic Operations
te A
B y
a th
M
number of items in dictionary
len(d)
t ©
i g h
p yr
clearing out all items
C o
d.clear() à d is now empty
Removing Elements from a Dictionary
m y
e
del d[key] à removes element with that key from d
à exception KeyError if key is not in d
c a d
d.pop(key) à removes element with that key from d
te A
à and returns the corresponding value
B y
a th
à exception KeyError if key is not in d
© M
t
sometimes we want to avoid this KeyError exception
i g h
yr
d.pop(key, default) à removes element with that key from d
©
>= Python 3.6
h t
à removes last item – guaranteed
o p
C
last inserted à popped first
Last In First Out à LIFO
à works like a stack
Inserting keys with defaults
sometimes want to insert a key with a default value only if key does not exist
m y
d e
d = {'a': 1, 'b': 2} if 'c' not in d:
c a
A
d['c'] = 0
te
à combine this with returning the newly inserted (default) value, or existing value if already there
y
def insert_if_not_present(d, key, value):
th B
if key not in d:
d[key] = value
M a
return value
else:
t ©
return d[key]
i g h
instead…
p yr
C o
result = d.setdefault(key, value)
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Dictionary Views PEP 3106
Three ways we may want to view the data in a dictionary
m y
à keys only d.keys()
d e
à values only d.values()
c a
all are iterables
à key/value pairs à (key, value) d.items()
te A
d = {'a': 1, 'b': 2, 'c': 3}
B y
list(d.keys()) à ['a', 'b', 'c']
a th
list(d.values())
M
à [1, 2, 3]
©
list(d.items())
yr i g
Important: order of keys and values (and items) are the same
o p
à the position of an item in one view corresponds to the same position in other views
C
à Python 3.6+: in addition, this order is same as dictionary (insertion) order
They’re dynamic…
more to it than just an iterable
m y
these views are dynamic à views reflect any changes in the dictionary
d e
à but views are not updatable
c a
d = {'a': 1, 'b': 2}
te A
keys = d.keys()
values = d.values()
keys à 'a', 'b'
values à 1, 2
B y
items = d.items()
th
items à ('a', 1), ('b', 2)
a
d['a'] = 10
M
keys à 'a', 'b'
©
t
values à 10, 2
h
items à ('a', 10), ('b', 2)
del d['b']
yr i g keys à 'a', 'c'
d['c'] = 3
o p values à 10, 3
te A
à union, intersection, difference of these key views – just like sets
B y
The values() view does not behave like a set
a th
M
à in general values are not unique
t ©
à in general values are not hashable
i g h
yr
The items() view may behave like a set
o p
à elements of items() are guaranteed unique (since keys are unique)
C
à if all values are hashable à behaves like a set
te A
B y
union s1 | s2
th
à {'a', 'b', 'c', 'd'}
a
intersection s1 & s2
M
à {'b', 'c'}
©
h t
g
difference s1 - s2 à {'a'}
yr i
o p
Can manipulate keys() the same way
C
Same for items() if dictionary values are all hashable
Set Operations on Views
m y
e
à dictionaries are now considered ordered (insertion order)
à sets are not ordered
c a d
d1.keys() and d2.keys() are ordered
te A
but d1.keys() | d2.keys() is a set
B y
a th
M
à ordering of result is not guaranteed
©
h t
yr i g
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
The update method
m y
à three forms
d e
c a
d1.update(d2)
te A
B y
d1.update(iterable)
th
à iterable must contain iterables with 2 elements each: (key, value)
a
© M
t
d1.update(keyword-args) à argument name will become key
yr
(similar to dict(a=10, b=20))
o p
C
d1.update(d2)
m y
à for every (k, v) in d2
d e
à if k not in d1, inserts (k, v) in d1
c a
à if k in d1, updates value for k in d1
te A
B y
d1 = {'a': 1, 'b': 2}
a th
'b' was updated
d2 = {'b': 20, 'c': 30}
h t
i g
d1.update(d2) d1 à {'a': 1, 'b': 20, 'c': 30}
p yr
C o
à insertion order is maintained (3.6+)
d1.update(keyword-args)
Ac
y te
'b' was updated
a
d1 = {'a': 1, 'b': 2}
d1.update(b=20, c=30)
M
d1 à {'a': 1, 'b': 20, 'c': 30}
©
h t
i g
à order of keyword arguments is preserved (3.6+)
yr
p
à insertion order is maintained (3.6+)
C o
d1.update(iterable)
m y
d e
a
(('b', 20), ('c', 30)) (('b', 20), ['c', 30]) [('b', 20), ['c', 30]]
d1 = {'a': 1, 'b': 2}
Ac
d1.update(it)
y te
d1 à {'a': 1, 'b': 20, 'c': 30}
t ©
d1 = {'a': 1, 'b': 2}
i g h
p yr
d1.update(((k, ord(k)) for k in 'bcd'))
C o
d1 à {'a': 1, 'b': 98, 'c': 99, 'd': 100}
te
preserved 3.6+)
à for function arguments, keys must be valid identifiers
B y
à not for unpacking dictionaries in general
a th
d1 = {'a': 1, 'b': 2}
© M
t
d2 = {'a': 10, (0,0): 'origin'}
i g h
d3 = {'b': 20, 'c': 30, 'a': 100}
p yr
d = {**d1, **d2, **d3}
C o
d à {'a': 100, 'b': 20, (0,0): 'origin', 'c': 30}
m y
shallow copies container object is a new object
d e
a
copied container element keys/values are shared references with original object
c
te A
y
d_copy = d.copy()
d_copy = {**d}
th B
d_copy = dict(d)
M a
©
d_copy = {k: v for k, v in d.items()}
t
(slower, don't use for a simple copy)
i g h
yr
à all these methods result in shallow copies
o p
C
à dictionaries are independent dictionaries
(inserts, deletes are independent)
à but the keys and values are shared references
Deep Copies
m y
e
If a shallow copy is not sufficient, we can create deep copies of dictionaries
à no shared references
c a d
te A
y
à even with nested dictionaries
th B
can do it ourselves
a
à sometimes requires recursion, have to be careful with circular references
M
t ©
this might be needed if we don't want a true deep copy, but only a partial deep copy
i g h
yr
simpler to use copy.deepcopy
o p
C
from copy import deepcopy à works for custom objects, iterables,
dictionaries, etc
d1 = d.deepcopy()
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Quick Review
c
A
à generate probe sequence (sequence of valid indices)
te
y
à iterate over probe sequence à index
à is the slot at that index empty?
©
item
h t à continue iteration to
look for an empty slot
yr i g
o p
C
more hash collisions à more inefficient
Quick Review
c
à generate probe sequence (sequence of valid indices)
te A
y
à loop over probe sequence
little more complex
à is slot empty?
th B because of
deletions
yes
no
M a
à key does not exist in dictionary
à are hashes equal and are keys equal (==)? loop until
t
yes
©
à found the key
found or
h
empty slot
g
no (caused by hash collision upon insertion/resizing)
o p
C
more hash collisions à more inefficient
te A
y
otherwise we're starting our search in the wrong place!
B
à probe sequence remains the same
a th
à Python controls that, not us
© M
h t
so hash of key cannot change after storing in dict
yr i g
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
What is a set?
mathematics
m y
A set is a gathering together into a whole of definite, distinct
d e
c
objects of our perception or of our thought -- which are called
a
elements of the set.
te A
- Georg Cantor
B y
à a collection of distinct objects
h
à notice ordering is not mentioned!
a t
set membership
© M
size of set (cardinality)
h t
union
yr i g
p
intersection
o
C
complement
and more…
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
What is a set?
te A
{3, 5, 1} … are all the same set (equal)
B y
h
à objects are distinct
a t
M
à {1, 1, 3} à not possible – element 1 is repeated
t ©
i
à Python data type: set
g h
p yr
à elements must be hashable
C o
à elements are distinct – they do not compare equal (==)
Membership
m y
and say x is an element of S
d e
c a
A
! ∉#
If x is not an element contained in some set S we write
te
and say x is not an element of S
y
à note that these are statements, not questions
th B
M a
à in Python the in operator is a question that returns True or False
t ©
h à! ∈#
x in S à True
x in S
yr i g
à False à! ∉#
o p
à similarly with the not in operator
C x not in S
x not in S
à False
à True
à! ∈#
à! ∉#
Unions and Intersections sets à %! , %"
m
The union of two sets is a set that combines the items from these two sets, keeping only y
a single instance of any repeating elements
d e
%! ∪ %" = ) ) ∈ %! *+ ) ∈ %" } !!
c a !"
à Notice the or
te A
B y
h
à s1 | s2 | … à returns a set
à s1.union(s2, …)
a t
© M
The intersection of two sets is a set that only contains the elements common to both
h t
yr i g
%! ⋂ %" = ) ) ∈ %! ./0 ) ∈ %" }
!! !"
p
à Notice the and
C o
à s1 & s2 & … à returns a set
à s1.intersection(s2, …)
Differences of two sets
m y
The difference of two sets is all the elements of one set without the elements of the other set
d e
%! − %" = ) ) ∈ %! ./0 ) ∉ %" } !! !"
c a
te A
à s1 – s2 - …
B y
à s1.difference(s2, …)
a th
s1 = {1, 2, 3}
© M
s2 = {3, 4, 5}
h t
yr
s1 - s2 à {1, 2}
i g
o p
s2 - s1 à {4, 5}
C
in general: s1 – s2 ≠ s2 – s1
Symmetric Difference of two Sets
m y
The symmetric difference of two sets is the union of both sets without the intersection of both sets
d e
%! ∆ %" = %! ∪ %" − (%! ∩ %" )
!! !" c a
te A
B y
h
à s1 ^ s2
a t
à s1.symmetric_difference(s2)
© M
s1 = {1, 2, 3, 4, 5}
h t
g
s2 = {4, 5, 6, 7, 8}
yr i
p
s1 ^ s2 à {1, 2, 3, 6, 7, 8}
C o
s2 ^ s1 à {1, 2, 3, 6, 7, 8}
in general: s1 ^ s2 = s2 ^ s1
Empty Set, Cardinality, Disjoint Sets
For finite sets, the cardinality of a set is the number of elements in the set
m y
d e
à len(s)
c a
te A
y
An empty set is a set that contains no elements à cardinality is 0
th B
a
à set() cannot use {} to create an empty set
M
à this would create an empty dictionary
©
h t
yr i g
Two sets are said to be disjoint if their intersection is the empty set
o p
à len(s1 & s2) à 0
C
à s1.isdisjoint(s2) à True (Boolean)
Subsets and Supersets
m y
à s1 <= s2 {1, 2, 3} <= {1, 2, 3, 4} à True
d e
à s1.issubset(s2) {1, 2, 3} <= {1, 2, 3} à True
c a
te A %! ⊂ %"
y
A set s1 is a proper subset of s2 if s1 is a subset of s2 and s1 is not equal to s2
h B
à i.e. s1 is a subset of s2 and s2 contains some additional elements
t
à s1 < s2
M a
{1, 2, 3} < {1, 2, 3, 4} à True
{1, 2, 3} < {1, 2, 3} à False
t ©
i g h
A set s1 is a superset of s2 if s2 is a subset of s1
yr
à s1 >= s2
o p
à s1.issuperset(s2)
C
A set s1 is a proper superset of s2 if s2 is a subset of s1
à s1 > s2
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Python Sets
differences
a th
à s1 - s2, s1.difference(s2)
symmetric differences
M
à s1 ^ s2, s1.symmetric_difference(s2)
©
subsets
h t
à s1 <= s2, s1.issubset(s2)
yr i g à s1 < s2
o p
supersets à s1 >= s2, s1.issuperset(s2)
C
disjointness
à s1 > s2
à s1.isdisjoint(s2)
Python Sets
a
In fact sets are hash tables as well
© M
h t
i g
Unlike dictionary hash tables, sets only contain the "keys", not the values
yr
à set(iterable)
o p
C
Python Sets
elements of a set
m y
à must be unique (distinct)
d e
à must be hashable
c a
à have no guaranteed order
te A
B y
a set is a mutable collection
th
à add and remove elements
a
à a set is therefore not hashable
© M
h t
à cannot be used as a dictionary key
yr i g
à cannot be used as an element in another set
o pà no set of sets
C
Frozen Sets
te A
à must be unique (distinct)
B y
à must be hashable
a th
M
à have no guaranteed order
t
Can convert any set to a frozenset ©
h
à frozenset({1, 2, 3})
p
A frozenset is hashable
o
Cà can be used as a dictionary key
c a
In fact, instead of writing code like this:
te A
B y
if a in [10, 20, 30]:
a th
or even
© M
h t
g
if a in (10, 20, 30):
yr i
p
prefer using (as long as elements are hashable):
o
C
if a in {10, 20, 30}: à higher storage cost
Some Timings
n
s
=
=
10_000_000
set(range(n)) value = 100 value = 9_999_999
m y
l = list(range(n))
d e
t = tuple(range(n)) tuple 0.0186
c a 1692
te A 1659
y
return value in s
set 0.0016 0.0021
def test_list(l, value):
th B
return value in l
M a
©
def test_tuple(t, value):
return value in t
h t
yr i g
timeit('test_set(s, value)', globals=globals(), number=10_000)
p
timeit('test_list(l, value)', globals=globals(), number=10_000)
C o
timeit('test_tuple(t, value)', globals=globals(), number=10_000)
m y
à {'a', 10, 3.14159}
d e
c a
elements must be hashable
à set(iterable)
te A
B y
empty set
a
à cannot use a literal th à {} is an empty dict
© M
h t
à set()
yr i g
p
à set comprehensions
o
C {c for c in 'python'} à elements must be hashable
te A
y
à order in which elements are unpacked is essentially unknown
B
a th
s1 = {'a', 10, 3.14}
© M
s2 = set('abc')
h t
yr i g
p
{*s1, *s2} à {'a', 'b', 'c', 10, 3.14}
C o
[*s1, *s2] à ['a', 'a', 'b', 'c', 10, 3.14]
m y
e
len(s) à number of elements in s (cardinality of s)
c a d
A
in, not in à x in s à tests if x is in the set s
y te
à like dictionary keys, use equality (==) not identity (is)
th B
membership testing in sets is fast
M a
à hash table lookup
t ©
membership testing in a list of tuple is slow (in comparison) à linear scan
i g h
yr
à but sets have more memory overhead than lists or tuples
p
C o
à tradeoff – speed vs memory
m y
e
lists have ordering à append
à insert
c a d
sets have no ordering à add
te A
B y
s.add('python')
a th
© M
h t
yr i g
à mutates the set
o p
C
Removing Elements
l = [10, 20, 30]
te A
s = {'a', 'b', 'c'}
B y
s.remove('b')
s.remove('z') th
à {'a', 'c'}
a
à KeyError exception
to avoid KeyException
© M à mutates the set
h t
s.discard('a') s à {'c'}
yr i g
s.discard('z') s à {'c'}
s.pop()
o p à removes and returns an arbitrary element
à union
m y
d e
a
à intersection à related: testing if two sets are disjoint
à difference
Ac
à symmetric difference
y te
th B
a
à containment à strict and non-strict
M
in general, we have two ways of doing these operations
©
à methods
h t
à s1.intersection(s2)
o p
C
à operators à s1 & s2
© M
h t
g
s1.intersection(s2, s3, s4)
yr i
o p
C
à returns a new set
Unions
m y
{1, 2, 3} | {2, 4} à {1, 2, 3, 4}
d e
c a
{1, 2, 3}.union({2, 4}) à {1, 2, 3, 4}
te A
B y
h
{1, 2, 3}.union([2, 4]) à {1, 2, 3, 4}
a t
© M
t
s1 | s2 | s3 | s4
i g h
yr
s1.union(s2, s3, s4)
o p
C
à returns a new set
Disjointedness
c a
à empty sets are falsy if s1 & s2:
te A
y
print('sets are not disjoint)
th B
a
if not(s1 & s2):
M
print('sets are disjoint)
t ©
if not s1 & s2:
p yr
o
à s1.isdisjoint(s2)
C
Differences
m y
{1, 2, 3, 4}.difference({2, 3}) à {1, 4}
d e
{1, 2, 3, 4}.difference([2, 3])
c a
A
à {1, 4}
s1 – s2 – s3
y te
s1.difference(s2, s3)
th B
à returns a new set
t ©
h
{1, 2, 3} – ({2, 4} – {2, 4}) à {1, 2, 3} – {} à {1, 2, 3}
yr i g
({1, 2, 3} – {2, 4}) – {2, 4} à {1, 3} – {2, 4} à {1, 3}
o p
à left-associative s1 – s2 – s3
C à (s1 – s2) – s3
s1 = {1, 2, 3, 4}
m y
s2 = {3, 4, 5, 6}
d e
union - intersection
c a
A
s1 ^ s2 à {1, 2, 5, 6}
y te
s1.symmetric_difference(s2)
th B
M
s1.symmetric_difference([3, 4, 5, 6]) a
t ©
i g h
yr
à returns a new set
p
C o
Containment
m y
d e
s1 < s2 à is s1 strictly contained in s2
c a
s1 <= s2 à is s1 contained in (possibly equal to) s2
te A
à s1.issubset(s2)
B y
h
s1 > s2 à does s1 strictly contain s2
a t
à is s2 strictly contained in s1
© M
s1 >= s2
h t
à does s1 contain (possibly equal) s2 à s1.issuperset(s2)
i g
à is s2 contained in (possibly equal to) s1
yr
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Updating Sets
B
a th
we can create unions, intersections, differences and symmetric differences
© M
à but these operations create new sets
h t
list analogy
yr i g
l1 = [1, 2, 3]
o p
l2 = [4, 5, 6]
C l1 + l2 à new list
à mutates l1
[1, 2, 3, 4, 5, 6]
but l1 += l2 l1 à [1, 2, 3, 4, 5, 6]
Analogous Set Mutating Updates
m y
|= &= -= ^=
d e
c a
à all these operations mutate the left hand side
te A
B y
h
lists: l1 += l2 à appends l2 to l1 à mutates l1
a t
à id of l1 has not changed
© M
à method equivalents
h t can use iterables too
s1 |= s2
yr i gs1.update(s2)
o p
s1 &= s2 s1.intersection_update(s2)
C
s1 -= s2
s1 ^= s2
s1.difference_update(s2)
s1.symmetric_difference_update(s2)
Analogous Set Mutating Updates
m y
e
s1.update(s2, s3) s1 |= s2 | s3
te A
BEWARE!!
B y
s1.difference_update(s2, s3) is not the same as s1 -= s2 – s3
s1 ß (s1 – s2) – s3
a th s1 ß s1 – (s2 – s3)
s1 = {1, 2, 3, 4}
© M
s1 – s2 à {1, 4} s2 – s3 à {2}
s2 = {2, 3}
h t
s3 = {3, 4}
yr i g
{1, 4} - s3 à {1} {1, 2, 3, 4} – {2} à {1, 3, 4}
o p
C
à set differences are not associative
As with other types such as dictionaries, lists, etc we have two types of copies
m y
d e
à shallow à deep
c a
s2 = s1.copy()
te A
from copy import deepcopy
s2 = set(s1)
B y
s2 = deepcopy(s1)
unpacking s2 = {*s1}
a th
© M
h t
yr i g
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Frozen Sets
à immutable sets
m y
d e
à same properties and behavior as sets
c a
à except they cannot be mutated
te A
B y
Their elements can be mutable
a th
© M
t
If all elements in a frozen set are hashable, then the frozen set is also hashable
h
yr i g
à can be used as a key in a dictionary
o p
à can be an element of another set
C
frozenset() à no literal expressions to create frozen sets
Copying Frozen Sets
a
©
Same thing with sets and frozen sets M
h t
yr i g
s1 = frozenset({1, 2, 3}) s2 = frozenset(s1)
s1 is s2 à True
o p s2 = s1.copy()
C
Deep copies do not behave that way
Set Operations
m y
d e
a
these operations can be performed on mixed sets and frozensets
M a
à frozenset if s1 is a frozenset, even if s2 is a set
t ©
i g h
p yr
C o
Equality and Identity
d e
1.0 == 1 à True 1 + 0j == 1 à True
c a
True == 1 à True
te A
Same thing with sets and frozen sets
B y
s1 = {1, 2, 3}
a th
s2 = frozenset([1, 2, 3])
© M s1 is s2 à False
h t s1 == s2 à True
yr i g
o p
C
Application: Memoization
y te
th B
But that decorator (and the one we wrote ourselves), has one drawback
@lru_cache
M a
def my_func(*, a, b):
t ©
h
…
yr i g
p
my_func(a=1, b=2) à result is computed and cached
C o
my_func(a=1, b=2) à result is returned directly from cache
how would we iterate over all the keys, values or items of a dictionary?
m y
d e
d.keys() d.values() d.items()
c a
à created and returned a list of these things
te A
à list is static d = {'a': 1, 'b': 2}
B y
h
keys = d.keys() à keys = ['a', 'b']
d['c'] = 3
a tà keys = ['a', 'b']
© M
t
à list duplicates data – not good for large dictionaries – can be slow
h
yr i g
à inefficient for membership testing
o p
d = {'a': 1, 'b': 2}
C values = d.values()
m y
d e
a
d.iterkeys() d.itervalues() d.iteritems() were introduced
Ac
te
à iterators better than a new list… did not duplicate data à more lightweight
B y
still does not help with membership testing
a th
© M
also not easy to answer questions such as, given d1 and d2
h t
i g
what keys are common to both?
yr set questions
o p
what keys are in one but not the other?
C
after all, keys have to be unique à keys form a mathematical set
Key View
m y
what if keys() was a lightweight object that
d e
c a
maintained a reference to the dictionary
te A
and implemented methods such as:
B y
__iter__ à iterable protocol
a th
__contains__
© M
à membership testing behaves like an iteratable
t
__and__ à intersection of two views behaves like a set
yr
__or__ à union of two views
p
__eq__
o
à same keys in both views
C __lt__
etc
à is one set of keys a subset of the other
Dictionary Views PEP 3106
Three ways we may want to view the data in a dictionary
m y
à keys only d.keys()
d e
a
all are iterables
c
à values only d.values()
A
some may have set properties
te
à key/value pairs à (key, value) d.items()
©
list(d.items())
h t
à [('a', 1), ('b', 2), ('c', 3)]
yr i g
Important: order of keys and values (and items) are the same
o p
à the position of an item in one view corresponds to the same position in other views
c a
te A
The items() view may behave like a (frozen) set
B y
à if the values are hashable
a th
© M
à uniqueness of tuples are guaranteed since keys are unique
h t
yr i g
p
The values() view never behaves like a set
o
C à values not guaranteed unique
lightweight à views do not maintain their own copy of the underlying data
m y
d e
à simply implement methods that use the underlying dictionary à proxy
h t
d['a'] = 10
C
del d['b']
d['c'] = 3
keys à 'a', 'c'
values à 10, 3
items à ('a', 10), ('c', 3)
Modifying the dictionary while iterating over a view
c a
This is SAFE: for key in d.keys():
d[key] += 1
te A
This leads to an EXCEPTION: for v in d.values():
del d['a']
© M
You technically can modify the keys as long as you do not change the size of the dictionary
h t
g
à don't do it!
yr i
Python docs:
C
dictionary may raise a RuntimeError or fail to iterate
over all entries.
B y
à even after program that generated the data has terminated
a th
à transmit them to someone or something else outside our app
o p
à reconstruct the object from the serialized data à deserializing
C
Pickling and Unpickling
m y
Python specific
d e
à built-in mechanism to serialize and deserialize many objects using
binary representation
c a
te A
Databases à relational databases
y
à e.g. objects like record sets, lists of tuples, etc
B
à NoSQL databases
a thà e.g. graphs, documents, etc
© M
JSON
t
à ubiquitous standard
h
à Web / Javascript
yr i g à REST APIs
à MongoDB
o p à text representation
m y
à Pickle will apply to more than just dictionary objects
d e
à focus on dictionaries because of JSON
c a
à easy to serialize dictionaries to JSON
te A
à easy to deserialize JSON to dictionaries
B y
a th
à loss of some data types
© M
h t
à many alternatives
yr i g
à beyond scope of this series: marshmallow
o p https://marshmallow.readthedocs.io
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
The pickle Module
Python specific
m y
d e
A way to represent an object in a persistent way
a
à disk, transmission
c
Create an object’s representation à serializing
te A
B y marshalling
h
Reload object from representation à deserializing
a t
obj serialize
© M deserialize obj
t
101100011001…
i g h
p yr
Pickle is a binary serialization (by default)
C o
Focus on dictionaries à Can be used for other object types
Danger Zone!
import pickle
m y
d e
dump à pickle to file
c a
load à unpickle from file
te A
dumps à returns a (string) pickled representation
B y à store in a variable
loads à unpickle from supplied argument
a th
© M
h t
yr i g
o p
C
Equality and Identity
equality à == identity à is
m y
d e
c a
pickle
0011
0000 unpickle
te A
y
dict1 dict2
1111
id=100
01
th B id=200
M a
dict1 == dict2
t
à True
© à Custom objects will need to implement __eq__
i g h
yr
dict1 is dict2 à False
o p
C
Equality and Identity
While pickling, Python will not re-serialize an object it has already serialized
m y
d e
a
à recursive objects can be pickled
M a obj1
• prop1
t ©
pickle / unpickle • prop1
• prop2
i g h • prop2
p yr
C o
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
JSON
©
h t
Unlike pickling, it is considered safe
yr i g
p
à may vary based on the JSON deserializer you use
o
C
Limited Data Types
m y
numbers 100 3.14 3.14e-05 3.14E+5 à all floats
d e
c a
booleans true, false
te A
arrays (lists) [1, 3.1, "python"]
B y
delimited by square brackets ordered
t
unordered
i g h
yr
Often, non-standard, additional types are supported:
p
integers
floats
C o 100
m y
e
"title": "Fluent Python",
"author": {
"firstName": "Luciano",
c a d
A
"lastName": "Ramalho"
te
},
"publisher": "O'Reilly",
"isbn": "978-1-491-9-46008",
B y
"firstReleased": 2015,
"listPrice": [
a th reminds you of Python??
{
"currency": "USD",
© M
"price": 49.99
h t
},
{
yr i g
o p
"currency": "CAD",
"price": "57.99"
}
] C}
Serialization and Deserialization
m y
d e
a
Of course, Python dictionaries are objects
th B
load, loads
M a
t ©
i g h
yr
serialize deserialize
dict {…} dict
p
dump, dumps load, loads
C o file
string
Problems…
JSON keys must be strings à but Python dictionary keys just need to be hashable
m y
à how to serialize?
d e
c a
JSON value types are limited
te A
à Python dictionary values can be any data type
à how to serialize?
B y
a th
M
even if we can serialize a complex data type, such as a custom class
©
h t
i g
à how do we deserialize back to original data type?
yr
o p
C
à Customize serialization and deserialization
th B
dump dumps
M a
t ©
à can provide custom callable
i g h
yr
à uses a default instance of JSONEncoder
C
Specifying a Custom Encoding Function
m y
d e
c a
à when provided, Python will call default if it encounters a type it cannot serialize
à argument must be a callable
te A
à callable must have a single argument
B y
a th
à that argument will receive the object Python cannot serialize
M
à can include logic in our callable to differentiate between different types
©
h t
à or we can use a single dispatch generic function
yr i g
[ using the @singledispatch decorator from the functools module]
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
dump
m y
d e
Beyond the default argument, dump has many other arguments that allow us to control serialization
c a
skipkeys bool default is False
te A
à if dictionary keys are not basic types
y
(string, int, etc) and skipkeys is set to
indent int
M
default is None a à useful for human readability
t ©
i g h
yr
separators tuple defaults to (', ', ': ') à customizes how the JSON is rendered
o p
C
sort_keys boolean default is False à if True, dictionary keys
will be sorted
and more… cls
The JSONEncoder Class
Python uses an instance of the JSONEncoder class in the json module to serialize data
m y
d e
a
The JSONEncoder class shares many arguments with the dump / dumps functions
Ac
te
default skipkeys sort_keys indent separators …
B y
The dump / dumps functions have a cls argument
a th
M
allows us to specify our own version of JSONEncoder
©
h t
yr i g
o p
C
Why use JSONEncoder at all?
If dump has all the same arguments as JSONEncoder, why use it at all?
m y
d e
c a
To remain consistent in our app, every time we call dump we need to use the same argument
values
te A
y
Easy to make a mistake, or forget to specify an argument
B
à instead use a custom JSONEncoder
a th
© M
and just remember to specify it via the cls argument
of dump / dumps
h t
yr i g
o p
C
How to create a custom JSONEncoder
y
à subclass JSONEncoder
à custom initialize parent class if we want to
e m
à override the default method
c a d
à handle what we want to handle ourselves
à otherwise delegate back to the parent class
te A
B y
inherit from JSONEncoder
class CustomEncoder(JSONEncoder):
a th custom init parent
def __init__(self):
© M
super().__init__(skipkeys=True, allow_nan=False,
t
indent='---', separators=('', ' = '))
h
yr i g
def default(self, arg):
override default method
p
if isinstance(arg, datetime): handle what we want to handle
C o return arg.isoformat()
else:
return super().default(self, arg)
(return the string serialization of arg)
otherwise delegate
back to parent
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Loading JSON
m y
Now we need to look at deserializing JSON to Python objects à load loads
d e
c a
import json d à
te
{
A
y
"a": 1,
B
j = '{"a": 1, "b": {"sub1": [2, 3]}}' "b": {
d = json.loads(j)
a th }
"sub1": [2, 3]
© M }
h t
works out-of-the-box with the standard JSON data types numbers, booleans, strings, lists,
p
does not work with other types
o
C
j = '{"createdAt": "2020-12-31T23:59:59"}'
interpreted as a string
One Approach
y
Use some custom encoding scheme to define both the value and the type of some entry in the JSON file
For example, when encoding a timestamp, we could do it as follows:
e m
j = '''
{ "createdAt":
c a d
{
"objecttype": "isodatetime",
te A
"value": "2020-12-31T23:59:59"
B y
h
}
'''
}
a t
© M
à load the JSON string into a Python dictionary
h t
yr i g
à iterate through dictionary (recursively), to find objects with an objecttype == isodatetime
p
à replace createdAt with the converted timestamp
o
C
à tedious à load JSON, iterate recursively through
dictionary, and convert as needed
A Slight Improvement
m y
à loads(j_string, object_hook=my_func)
d e
c a
A
à my_func is called for every object in the JSON data
y te
B
For example:
à loads first parses JSON into a dictionary
j = '''
a th
à object_hook will call for every dictionary (object) in the dictionary
M
{
"a": 1,
"b": {
t © à b dictionary
"sub1":
i g h
[1, 2, 3], à sub2 dictionary
yr
"sub2": {
"x": 100, à root dictionary (called last)
o p
"y": 200
C
} à dictionary is replaced by return
} value of my_func
}
''' à handles recursive aspect for us
Schemas
m y
d e
à in general we need to know the structure of the JSON data in order to custom deserialize
c a
à this is referred to as the schema
te A
B y
à a pre-defined agreement on how the JSON is going to be structured or serialized
a th
M
à If JSON has a pre-determined schema, then we can handle custom deserialization
©
h t
i g
à schema might be for the entire JSON, or for sub-components only
yr
p
if we see this, replace the dict with the
o
{ "createdAt":
custom object/value
C
{
"objecttype": "isodatetime",
"value": "2020-12-31T23:59:59"
}
}
Overriding Basic Type Serializations
m y
What about numbers?
d e
à by default floats for real numbers, and ints for whole numbers
c a
A
What if we want Decimal instead of float, or binary representations for integers?
te
y
à can override the way these data types are handled by using some extra arguments in load/loads
B
à parse_float
a th
à provide a custom callable
à parse_int
© M
à callable has a single argument
t
à argument value will be the original string in the JSON
à parse_constant
i g h
à return parsed value
p yr
o
à No overrides for strings
C
Example
m y
def make_decimal(arg):
d e
return Decimal(arg)
c a
te A
y
à loads(j, parse_float=make_decimal)
th B
M a
If load / loads encounters this in the JSON data: "a": 100.5
t
à calls make_decimal("100.5") ©
i g h
yr
à deserialized JSON will now have Decimal("100.5") instead of float 100.5
o p
C
Another argument – object_pairs_hook
à related to object_hook
m y
à cannot use both at the same time
d e
(if both are specified, then object_hook is ignored)
c a
te
object_hook passes the deserialized dictionary to the callable
A
y
à there is no guarantee of the order of elements in the dictionary
B
a t
What if order of elements in JSON is important?h à lists preserve order
© M
t
à instead of callable receiving a dictionary it receives a list of the key/value pairs
i g h
yr
à key/value pairs are provided as a tuple with two elements
o p
C
object_hook object_pairs_hook
c
te A
This means parse_... (if specified) is used first, before we receive the parsed object in the hooks
B y
a th
© M
h t
yr i g
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Recall…
m y
d e
Similarly, we can create a custom JSONDecoder class and specify it with the cls argument
c a
à json.loads(j, cls=CustomDecoder)
te A
B y
a th
Just a different way of doing it à might help making sure we use our custom decoder consistently
M
à works a differently than JSONEncoder
©
h t
à inherit from JSONDecoder
yr i g
à override the decode function
o p
à decode function receives entire JSON string
C
à we have to fully parse and return whatever object we want
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Specialized Dictionaries
d e
defaultdict automatic default values for "missing" keys
c a
te A
y
OrderedDict guaranteed key ordering (based on insertion order), plus some extras
th B
a
specialized tools for dealing with counters
Counter
ChainMap
© M
efficient way of "combining" multiple dictionaries
h t
UserDict
o p
C
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Missing Keys
d['a'] à KeyError
c a d
te A
à can use the get method to handle default values for non-existent keys
B y
d.get('a', 100)
th
à we get 100 if 'a' is not present, but 'a' is still not in dictionary
a
d['a'] = d.get('a', 0) + 1
h t
g
in general: d.get(key, value)
i
à value could be returned from calling a callable
p yr
o
à works, but possibly have to remember to always use the same default in multiple places for same dict
C
à easier would be to define the default once (per dict)
defaultdict
à collections module
m y
à subclass of dict type (defaultdict instance IS-A dict instance)
d e
c a
à so has all the functionality of a standard dict
defaultdict(callable, […])
te A
B y
a th remaining arguments are simply passed
to dict constructor
M
callable is called to calculate a default
t
à callable must have zero arguments
©
h
à referred to as a factory method
à default is None
yr i g
and None will be the default value
o p
d = defaultdict(lambda: 'python') d à {}
C
d['a'] à 'python'
m y
int() à 0 defaultdict(lambda: 0)
d e
defaultdict(int)
c a
has the same effect
te A
list() à [] defaultdict(lambda: [])
B y
defaultdict(list)
a th has the same effect
© M
t
à factory must simply be a callable that can take zero arguments and returns the desired
h
g
default value
yr i
p
à can even be a function that calls a database and returns some value
o
C
à factory is invoked every time a default value is needed
à function does not have to be deterministic
à can return different values every time it is called
m y
d e
c a
te A
B y
th
OrderedDict
a
© M
h t
yr i g
o p
C
OrderedDict vs dict
c a (collections module)
à if not, OrderedDict still has a few tricks up its sleeve!
te A
B y
à supports reverse iteration
M
à pop first or last item in dictionary
©
functionality not built-in to standard dict
à have to "work" to get that behavior
t
à move item to beginning or end of dictionary
h
yr i g
Equality comparison (==) does not behave the same way
o p
C
dict vs dict comparison à order of keys does not matter
OrderedDict vs OrderedDict comparison à order of keys matters
dict vs OrderedDict comparison à order of keys does not matter
m y
d e
c a
te A
B y
a th
© M
h t
yr i g
o p
C
Using Dictionaries to Maintain Counters
We've already seen how we can use regular dict and defaultdict for counters
m y
d e
a
d = {} d = defaultdict(int)
d[key] = d.get(key, 0) + 1 d[key] += 1
Ac
y te
certain operations can be tedious:
th B
M a
count the frequency of characters in a string (or items in an iterable in general)
t ©
h
update one counter dictionary with another counter dictionary (adding or subtracting)
yr i g
from multiple counter dictionaries, find the max/min counter value for each key
o p
C
The collections.Counter class
m y
The Counter class is a specialized dictionary that makes certain operations easier
d e
à acts like a defaultdict and with a default of 0
c a
à supports same constructor options as regular dicts
te A
B y
à additional functionality to auto calculate a frequency table based on any iterable
a th
à iterate through every key, repeating each key as many times as the corresponding counter value
M
à find the n most common items (by count)
©
h t
à increment/decrement counters based on another Counter or dict or iterable
yr i g
p
à fromkeys is not supported
C o
à update works differently than a regular dict
m y
e
l1 = […]
d
for e in chain(l1, l2, l3):
l2 = (…)
l3 = generator_func()
…
c a
te A
à made it look like we had a single iterable – but really just chained them one after the other
B y
collections.ChainMap serves a similar purpose – chaining dictionaries (or mapping types in general)
a th
M
d1, d2, d3 à dictionaries very different from
from collections import ChainMap
g
d = ChainMap(d1, d2, d3)
i h à extra storage
p yr
à no extra storage (nothing is copied)
à essentially a shallow copy/merge
C o
à mutating elements in chain may affect underlying dicts
à sees changes in underlying dicts
à does not see changes in original dicts
There's an added complexity chaining maps that we do not have with iterables
m y
d e
a
The resulting chain should itself be a map à no repeated keys!
Ac
te
d1 = {'a': 10, 'b': 20} d2 = {'a': 100, 'c': 30}
d3 = ChainMap(d1, d2)
B y
a th
M
d3['b'] à 20 d3['c'] à 30
d3['a'] ??
t ©
à uses the first instance of the key it encounters in the chain
g h
à unlike {**d1, **d2} where the last instance takes effect
i
yr
d3['a'] à 10
o p
à iteration works the same way
C
à first instance of any key is returned – others are ignored
m y
d3
d e
overrides parents
c a
A
d2
overrides
d1 child
y te
th B
a
In fact, there are attributes to deal with this explicitly
d.parents
© M
à a ChainMap containing the parent elements only
h t
d.new_child(d4)
yr i g
à adds d4 to the front of the chain (or bottom of the hierarchy)
d3
o p à same as
d2
d1
d4
C parents
new child
ChainMap(d4, d1, d2, d3)
The .maps property returns a (mutable) list of all the maps in the chain
m y
d e
The order of the list is the same as the child à parents hierarchy
c a
te A
i.e. first element is the child, other elements are the parents in the same order
B y
h
This list is mutable
a t
à can modify the chain by removing, deleting, inserting and appending other maps
© M
t
d = ChainMap(d1, d2, d3)
i g h
yr
d.maps à [d1, d2, d3]
o p
d.maps.append(d4) à ChainMap(d1, d2, d3, d4)
C
d.maps.insert(0, d5)
del d.maps[1]
à ChainMap(d5, d1, d2, d3, d4)
The ChainMap is mutable à we already saw we could add and remove maps from the chain
m y
We can also mutate the key/value pairs in the map itself
d e
d = ChainMap(d1, d2) à d[key] = value
c a
BUT these mutations affect the child (first) map only
te A
B y
h
d1 = {'a': 1, 'b': 2} d2 = {'a': 20, 'c': 3} d = ChainMap(d1, d2)
a t
d['a'] = 100
M
d1 = {'a': 100, 'b': 2}
©
d2 = {'a': 20, 'c': 3}
h t
i g
d['c'] = 200 d1 = {'a': 100, 'b': 2, 'c': 200}
yr
d2 = {'a': 20, 'c': 3}
del d['a']
o p d1 = {'b': 2, 'c': 200}
C
del d['a']
d2 = {'a': 20, 'c': 3}
à KeyError exception
d['a'] à 20
m y
e
In the previous videos of this section, we looked at a variety of specialized dictionaries.
c a d
They extend the plain dict type with additional capabilities geared towards some specialized goal
te A
B y
h
à a dictionary that only allows certain types of keys (strings keys only à JSON)
a t
M
à a dictionary that only allows keys from some finite set of pre-defined keys
©
h t
à a dictionary that only allows numerical values
yr i g
o p
We could just create a custom class that uses a plain dict as a backing structure and write
C
custom __getitem__ and __setitem__ methods
à often good enough, but we don't inherit all the functionality that dicts have
Subclassing dict
We can create a custom class that inherits from dict
m y
We can override various methods to customize the dictionary behavior
d e
But there is a caveat here
c a
If we have a parent class that implements a method and we
override that method in a subclass
te A
y
- calling that method on a subclass instance will invoke the
B
overridden method
©
d['a'] = 10
h t
d.__setitem__('a', 10)
yr i g
We would expect the dictionary class to use these __xxx__ methods internally
o p
for .get() .update() and so on
C
These built-in types however, often use direct access to data (in C)
They do not guarantee they actually use these "special" methods
Even len(string) does not actually use __len__
Alternative
y
If subclassing a dict is causing issues because of the special methods
te A
and implements key functionality we have in dictionaries
B y
it is not a dict, but it is a mapping type
a th
© M
à views: items(), keys(), values()
h t
à __setitem__ and __getitem__ and uses those internally as appropriate
yr i g
à plus everything else we would expect from a dictionary
o p
C
So it is essentially a head-start on recreating a Python dictionary from scratch that offers
different subclassing possibilities