Clean Architectures in Python
Clean Architectures in Python
Leonardo Giordani
This book is for sale at http://leanpub.com/clean-architectures-in-python
This is a Leanpub book. Leanpub empowers authors and publishers with the Lean Publishing
process. Lean Publishing is the act of publishing an in-progress ebook using lightweight tools and
many iterations to get reader feedback, pivot until you have the right book and build traction once
you do.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
What is a software architecture? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Why is it called “clean” architecture? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Why Python? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Part 1 - Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Chapter 1 - Introduction to TDD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
A real-life example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
A simple TDD project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Setup the project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Step 1 - Adding two numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Step 2 - Adding three numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Step 3 - Adding multiple numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Step 4 - Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Step 5 - Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Step 6 - Refactoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Step 7 - Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Step 8 - Testing exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
CONTENTS
Chapter 3 - Mocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
First steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Simple return values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Complex return values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Asserting calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
A simple example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Patching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
The patching decorator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Multiple patches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Patching immutable objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Mocks and proper TDD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
A warning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Recap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Changelog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Introduction
Learn about the Force, Luke.
- Star Wars (1977)
This book is about a software design methodology. A methodology is a set of guidelines that help
you reach your goal effectively, thus saving time, implementing far-sighted solutions, and avoiding
the need to reinvent the wheel time and again.
As other professionals around the world face problems and try to solve them, some of them, having
discovered a good way to solve a problem, decide to share their experience, usually in the form
of a “best practices” post on a blog, or talk at a conference. We also speak of patterns¹, which are
formalised best practices, and anti-patterns, when it comes to advice about what not to do and why
it is better to avoid a certain solution.
Often, when best practices encompass a wide scope, they are designated a methodology. The
definition of a methodology is to convey a method, more than a specific solution to a problem.
The very nature of methodologies means they are not connected to any specific case, in favour of a
wider and more generic approach to the subject matter. This also means that applying methodologies
without thinking shows that one didn’t grasp the nature of a methodology, which is to help to find
a solution and not to provide it.
This is why the main advice I have to give is: be reasonable; try to understand why a methodology
leads to a solution and adopt it if it fits your need. I’m saying this at the very beginning of this book
because this is how I’d like you to approach this work of mine.
The clean architecture, for example, pushes abstraction to its limits. One of the main concepts is
that you should isolate parts of your system as much as possible, so you can replace them without
affecting the rest. This requires a lot of abstraction layers, which might affect the performances
of the system, and which definitely require a greater initial development effort. You might consider
these shortcomings unacceptable, or perhaps be forced to sacrifice cleanliness in favour of execution
speed, as you cannot afford to waste resources.
In these cases, break the rules. You are always free to keep the parts you consider useful and discard
the rest, but if you have understood the reason behind the methodology, you will also know why
you do something different. My advice is to keep track of such reasons, either in design documents
or simply in code comments, as a future reference for you or for any other programmer who might
be surprised by a “wrong” solution and be tempted to fix it.
¹from the seminal book “Design Patterns: Elements of Reusable Object-Oriented Software” by Gamma, Vlissides, Johnson, and Helm.
Introduction 2
I will try as much as possible to give reasons for the proposed solutions, so you can judge whether
those reasons are valid in your case. In general let’s say this book contains possible contributions to
your job, it’s not an attempt to dictate THE best way to work.
Spoiler alert: there is no such a thing.
structure of the shop is unknown, we don’t even know what it sells. We can however already devise
a simple performance analysis, for example comparing the amount of money that goes out (to pay
the wholesaler) and the amount of money that comes in (from the customers). If the former is higher
than the latter the business is not profitable.
Even in the case of a shop that has positive results we might want to increase its performances, and
to do this chances are that we need to understand its internal structure and what we can change to
increase its productivity. This may reveal, for example, that the shop has too many workers, that are
underemployed waiting for clients because we overestimated the size of the business. Or it might
show that the time taken to serve is too long and many clients walk away without buying anything.
Or maybe there are not enough shelves to display goods and the staff carries stock around all day
searching for display space so the shop is in chaos and clients cannot find what they need.
At this level, however, workers are pure entities, and still we don’t know much about the shop. To
better understand the reasons behind a problem we might need to increase the zoom level and look
at the workers for what they are, human beings, and start understanding what their needs are and
how to help them work better.
This example can easily be translated into the software realm. Our shop is a processing unit in
the cloud, for example, input and output being the money we pay and the amount of requests
the system serves per second, which is probably connected with the income of the business. The
internal processes are revealed by a deeper analysis of the resources we allocated (storage, processors,
memory), which breaks the abstraction of the “processing unit” and reveals details like the hardware
architecture or the operating system. We might go deeper, discussing the framework or the library
we used to implement a certain service, the programming language we used, or the specific hardware
on which the whole system runs.
Remember that an architecture tries to detail how a process is implemented at a certain granularity,
given certain assumptions or requirements. The quality of an architecture can then be judged on
the basis of parameters such as its cost, the quality of the outputs, its simplicity or “elegance”, the
amount of effort required to change it, and so on.
system collapsing. The main point of the clean architecture is to make clear “what is where and
why”, and this should be your first concern while you design and implement a software system,
whatever architecture or development methodology you want to follow.
The clean architecture is not the perfect architecture and cannot be applied unthinkingly. Like any
other solution, it addresses a set of problems and tries to solve them, but there is no panacea that
will solve all issues. As already stated, it’s better to understand how the clean architecture solves
some problems and decide if the solution suits your need.
Why Python?
I have been working with Python for 20 years, along with other languages, but I came to love its
simplicity and power and so I ended up using it on many projects. When I was first introduced to the
clean architecture I was working on a Python application that was meant to glue together the steps
of a processing chain for satellite imagery, so my journey with the concepts I will explain started
with this language.
I will therefore speak of Python in this book, but the main concepts are valid for any other language,
especially object-oriented ones. I will not introduce Python here, so a minimal knowledge of the
language syntax is needed to understand the examples and the projects I will discuss.
The clean architecture concepts are independent from the language, but the implementation
obviously leverages what a specific language allows you to do, so this book is about the clean
architecture and an implementation of it that I devised using Python. I really look forward to seeing
more books about the clean architecture that explore other implementations in Python and in other
languages.
Acknowledgments
• Eleanor de Veras: proofreading of the introduction.
• Roberto Ciatti: introducing me to clean architectures.
• Łukasz Dziedzic: “Lato” cover font (Latofonts³).
The cover photograph is by pxhere⁴. A detail of the Sagrada Familia in Barcelona, one of the world’s
best contemporary artworks, a bright example of architecture in which every single element has a
meaning and a purpose. Praise to Antoni Gaudí, brilliant architect and saint, who will always inspire
me with his works and his life.
³http://www.latofonts.com
⁴https://pxhere.com/en/photo/760437
About the book
We’ll put the band back together, do a few gigs, we get some bread. Bang! Five thousand
bucks.
- The Blues Brothers (1980)
Typographic conventions
This book uses Python, so the majority of the code samples will be in this language, either inline
or in a specific code block
def example():
print("This is a code block")
Code blocks don’t include line numbers, as the part of code that are being discussed are usually
repeated in the text. This also makes it possible to copy the code from the PDF directly.
This aside provides the link to the repository tag that contains the code that was presented
the fact that explaining something forces me to better understand that topic, and writing requires
even more study to get things clear in my mind, before attempting to introduce other people to the
subject.
Much of what I know comes from personal investigations, but without the work of people who
shared their knowledge for free I would not have been able to make much progress. The Free
Software Movement didn’t start with Internet, and I got a taste of it during the 80s and 90s, but the
World Wide Web undeniably gave an impressive boost to the speed and quality of this knowledge
sharing.
So this book is a way to say thanks to everybody gave their time to write blog posts, free books,
software, and to organise conferences, groups, meetups. This is why I teach people at conferences,
this is why I write a technical blog, this is the reason for this book.
That said, if you want to acknowledge the effort with money, feel free. Anyone who publishes a
book or travels to conferences incurs expenses, and any help is welcome. However the best thing
you can do is to become part of this process of shared knowledge; experiment, learn and share what
you learn.
Virtual environments
One of the first things you have to learn as a Python programmer is how to create, manage, and
use your virtual environments. A virtual environment is just a directory (with many subdirectories)
that mirrors a Python installation like the one that you can find in your operating system. This is a
good way to isolate a specific version of Python and the packages that are not part of the standard
library.
This is handy for many reasons. First of all, the Python version installed system-wide (your Linux
distribution, your version of Mac OS, Windows, or other operating system) shouldn’t be tampered
with. That Python installation and its modules are managed by the maintainer of the operating
system, and in general it’s not a good idea to make changes there unless you are certain of what
you are doing. Having a single personal installation of Python, however, is usually not enough, as
different projects may have different requirements. For example, the newest version of a package
might break the API compatibility and unless we are ready to move the whole project to the new
API, we want to keep the version of that package fixed and avoid any update. At the same time
another project may require the bleeding edge or even a fork of that package: for example when you
have to patch a security issue, or if you need a new feature and can’t wait for the usual release cycle
that can take weeks.
Ultimately, the idea is that it is cheaper and simpler (at least in 2018) to copy the whole Python
installation and to customise it than to try to manage a single installation that satisfies all the
requirements. It’s the same advantage we have when using virtual machines, but on a smaller scale.
The starting point to become familiar with virtual environments is the official documentation¹¹, but
if you experience issues with a specific version of your operating system you will find plenty of
resources on Internet that may clarify the matter.
In general, my advice is to have a different virtual environment for each Python project. You may
prefer to keep them inside or outside the project’s directory. In the latter case the name of the
virtual environment shall reflect in some way the associated project. There are packages to manage
¹¹https://docs.python.org/3/tutorial/venv.html
Setup a Python Project 9
the virtual environments any simplify your interaction with them, and the most famous one is
virtualenvwrapper¹².
I used to create my virtual environments inside the directory of my Python projects. Since I started
using Cookiecutter (see next section) to create new projects, however, I switched to a different
setup. Keeping the virtual environment outside the project allows me to install Cookiecutter in the
virtualenv, instead of being forced to install it system-wide, which sometimes prevents me from
using the latest version.
If you create the virtual environment in the project directory you have to configure your version
control and other tools to ignore it. In particular, add it to .gitignore¹³ if you use Git and to
pytest.ini¹⁴ if you use the pytest testing framework (as I do in the rest of this book).
• Create a virtual environment for the project, using one of the methods discussed in the previous
section, and activate it
• Install Cookiecutter with pip install cookiecutter
• Run Cookiecutter with your template of choice cookiecutter <template_URL>, answering the
questions
• Install the requirements following the instructions of the template itself pip install -r
<requirements_file>
Refer to the README of the Cookiecutter template to better understand the questions that the program
will ask you and remember that if you make a mistake you can always delete the project and run
Cookiecutter again.
If you are using my project template the questions you will be asked are
full_name: Your full name
email: Your contact email
github_username: Your GitHub username
project_name: The name of the project
project_slug: The slug for the project
project_short_description: A description for the project
pypi_username: Your PyPI username, if you want to publish the package
version [0.1.0]: The current version of the package
use_pytest [n]: If you want to use pytest to test the package (in this book always “y”)
use_pypi_deployment_with_travis [y]: Publish on PyPI when test pass (you usually don’t want
this feature turned on when you are testing or in the initial stages of the development)
command_line_interface This allows to create a command line interface using click¹⁸. We are not
going to use this feature in the projects presented in the book
create_author_file: The file that lists the authors of the package
open_source_license: If you are unsure select the MIT license
¹⁸https://github.com/pallets/click
Part 1 - Tools
Chapter 1 - Introduction to TDD
Why worry? Each one of us is wearing an unlicensed nuclear accelerator on his back.
- Ghostbusters (1984)
Introduction
“Test-Driven Development” (TDD) is fortunately one of the names that I can spot most frequently
when people talk about methodologies. Unfortunately, many programmers still do not follow it,
fearing that it will impose a further burden on the already difficult life of the developer.
In this chapter I will try to outline the basic concept of TDD and to show you how your job as
a programmer can greatly benefit from it. I will develop a very simple project to show how to
practically write software following this methodology.
TDD is a methodology, something that can help you to create better code. But it is not going to solve
all your problems. As with all methodologies you have to pay attention not to commit blindly to it.
Try to understand the reasons why certain practices are suggested by the methodology and you will
also understand when and why you can or have to be flexible.
Keep also in mind that testing is a broader concept that doesn’t end with TDD. This latter focuses
a lot on unit testing, which is a specific type of test that helps you to develop the API of your
library/package. There are other types of tests, like integration or functional ones, that are not
specifically part of the TDD methodology, strictly speaking, even though the TDD approach can
be extended to any testing activity.
A real-life example
Let’s start with a simple example taken from a programmer’s everyday life.
The programmer is in the office with other colleagues, trying to nail down an issue in some part of
the software. Suddenly the boss storms into the office, and addresses the programmer:
Boss: I just met with the rest of the board. Our clients are not happy, we didn’t fix enough bugs in
the last two months.
Programmer: I see. How many bugs did we fix?
Boss: Well, not enough!
Chapter 1 - Introduction to TDD 13
Where is the issue? The concept expressed by the word “heap” is nebulous, it is not defined clearly
enough to allow the process to find a stable point, or a solution.
When you write software you face that same challenge. You cannot conceive a function and just
expect it “to work”, because this is not clearly defined. How do you test if the function that you
wrote “works”? What do you mean by “works”? TDD forces you to clearly state your goal before
you write the code. Actually the TDD mantra is “Test first, code later”, and we will shortly see a
practical example of this.
For the time being, consider that this is a valid practice also outside the realm of software creation.
Whoever runs a business knows that you need to be able to extract some numbers (KPIs) from the
activity of your company, because it is by comparing those numbers with some predefined thresholds
that you can easily tell if the business is healthy or not. KPIs are a form of test, and you have to
define them in advance, according to the expectations or needs that you have.
Pay attention. Nothing prevents you from changing the thresholds as a reaction to external events.
You may consider that, given the incredible heat wave that hit your country, the amount of coats
that your company sold could not reach the goal. So, because of a specific event, you can justify a
change in the test (KPI). If you didn’t have the test you would have just generically recorded that
you earned less money.
Going back to software and TDD, following this methodology you are forced to state clear goals like
¹⁹https://en.wikipedia.org/wiki/Sorites_paradox
Chapter 1 - Introduction to TDD 14
sum(4, 5) == 9
Let me read this test for you: there will be a sum function available in the system that accepts two
integers. If the two integers are 4 and 5 the function will return 9.
As you can see there are many things that are tested by this statement.
Pay attention that at this stage there is no code that implements the sum function, the tests will fail
for sure.
As we will see with a practical example in the next chapter, what I explained in this section will
become a set of rules of the methodology.
$ py.test -svv
tests/test_calc.py::test_content PASSED
If you use a different template or create the project manually you may need to install pytest explicitly
and to properly format the project structure. I strongly recommend to use the template if you are a
beginner, as the proper setup can be tricky to achieve.
Requirements
The goal of the project is to write a class Calc that performs calculations: addition, subtraction,
multiplication, and division. Addition and multiplication shall accept multiple arguments. Division
shall return a float value, and division by zero shall return the string "inf". Multiplication by zero
must raise a ValueError exception. The class will also provide a function to compute the average of
an iterable like a list. This function gets two optional upper and lower thresholds and should remove
from the computation the values that fall outside these boundaries.
As you can see the requirements are pretty simple, and a couple of them are definitely not “good”
requirements, like the behaviour of division and multiplication. I added those requirements for the
sake of example, to show how to deal with exceptions when developing in TDD.
def test_add_two_numbers():
c = Calc()
res = c.add(4, 5)
assert res == 9
As you can see the first thing we do is to import the Calc class that we are supposed to write. This
class doesn’t exist yet, don’t worry, you didn’t skip any passage.
The test is a standard function (this is how pytest works). The function name shall begin with test_-
so that pytest can automatically discover all the tests. I tend to give my tests a descriptive name, so
it is easier later to come back and understand what the test is about with a quick glance. You are free
to follow the style you prefer but in general remember that naming components in a proper way is
one of the most difficult things in programming. So better to get a handle on it as soon as possible.
The body of the test function is pretty simple. The Calc class is instantiated, and the add method of
the instance is called with two numbers, 4 and 5. The result is stored in the res variable, which is
later the subject of the test itself. The assert res == 9 statement first computes res == 9 which is a
boolean statement, either True or False. The assert keyword, then, silently passes if the argument
is True, but raises an exception if it is False.
And this is how pytest works: if your code doesn’t raise any exception the test passes, otherwise
it fails. assert is used to force an exception in case of wrong result. Remember that pytest doesn’t
consider the return value of the function, so it can detect a failure only if it raises an exception.
Save the file and go back to the terminal. Execute py.test -svv and you should receive the following
error message
[...]
tests/test_calc.py:4: in <module>
from calc.calc import Calc
E ImportError: cannot import name 'Calc'
!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 errors during collection !!!!!!!!!!!!!!!!!!!!!
============================= 1 error in 0.20 seconds =============================
No surprise here, actually, as we just tried to use something that doesn’t exist. This is good, the test
is showing us that something we suppose exists actually doesn’t.
Chapter 1 - Introduction to TDD 17
This, by the way, is not yet an error in a test. The error happens very soon, during the tests collection
phase (as shown by the message in the bottom line Interrupted: 1 errors during collection).
Given this, the methodology is still valid, as we wrote a test and it fails because an error or a missing
feature in the code.
Let’s fix this issue. Open the calc/calc.py file and add write this code
class Calc:
pass
But, I hear you scream, this class doesn’t implement any of the requirements that are in the project.
Yes, this is the hardest lesson you have to learn when you start using TDD. The development is
ruled by the tests, not by the requirements. The requirements are used to write the tests, the tests
are used to write the code. You shouldn’t worry about something that is more than one level above
the current one.
Run the test again, and this time you should receive a different error, that is
tests/test_calc.py::test_add_two_numbers FAILED
def test_add_two_numbers():
c = Calc()
tests/test_calc.py:7: AttributeError
============================ 1 failed in 0.04 seconds =============================
Since the last one is the first proper pytest failure report that we meet, it’s time to learn how to read
them. The first lines show you general information about the system where the tests are run
In this case you can see that I’m using linux and get a quick list of the versions of the main
packages involved in running pytest: Python, pytest itself, py (https://py.readthedocs.io/en/latest/)
and pluggy (https://pluggy.readthedocs.io/en/latest/). You can also see here where pytest is reading
its configuration from (pytest.ini), and the pytest plugins that are installed.
The second part of the output shows the list of files containing tests and the result of each test
collected 1 items
tests/test_calc.py::test_add_two_numbers FAILED
This list is formatted with a syntax that can be given directly to pytest to run a single test. In this
case we already have only one test, but later you might run a single failing test giving the name
shown here on the command line, like for example
def test_add_two_numbers():
c = Calc()
tests/test_calc.py:7: AttributeError
Chapter 1 - Introduction to TDD 19
For each failing test, pytest shows a header with the name of the test and the part of the code that
raised the exception. The last line of each of these boxes shows at which line of the test file the error
happened.
Let’s go back to the Calc project. Again, the new error is no surprise, as the test uses the add method
that wasn’t defined in the class. I bet you already guessed what I’m going to do, didn’t you? This is
the code that you should add to the class
class Calc:
def add(self):
pass
And again, as you notice, we made the smallest possible addition to the code to pass the test. Running
this latter again the error message will be
def test_add_two_numbers():
c = Calc()
tests/test_calc.py:7: TypeError
(Through the rest of the chapter I will only show the error part of the failure report).
The function we defined doesn’t accept any argument other than self (def add(self)), but in the
test we pass three of them (c.add(4, 5), remember that in Python self is implicit). Our move at
this point is to change the function to accept the parameters that it is supposed to receive, namely
two numbers. The code now becomes
class Calc:
def add(self, a, b):
pass
Run the test again, and you will receive another error
Chapter 1 - Introduction to TDD 20
def test_add_two_numbers():
c = Calc()
res = c.add(4, 5)
tests/test_calc.py:9: AssertionError
The function returns None as it doesn’t contain any code, while the test expects it to return 9. What
do you think is the minimum code you can add to pass this test?
Well, the answer is
class Calc:
def add(self, a, b):
return 9
and this may surprise you (it should!). You might have been tempted to add some code that performs
an addition between a and b, but this would violate the TDD principle, because you would have been
driven by the requirements and not by the tests.
I know this sound weird, but think about it: if your code works, for now you don’t need anything
more complex than this. Maybe in the future you will discover that this solution is not good enough,
and at that point you will have to change it (this will happen with the next test, in this case). But
for now everything works and you shouldn’t implement more than this.
Run again the test suite to check that no tests fail, after which you can move on to the second step.
def test_add_three_numbers():
c = Calc()
res = c.add(4, 5, 6)
assert res == 15
def test_add_three_numbers():
> assert Calc().add(4, 5, 6) == 15
E TypeError: add() takes 3 positional arguments but 4 were given
tests/test_calc.py:15: TypeError
for the obvious reason that the function we wrote in the previous section accepts only 2 arguments
other than self. What is the minimum code that you can write to fix this test?
Well, the simplest solution is to add another argument, so my first attempt is
class Calc:
def add(self, a, b, c):
return 9
which solves the previous error, but creates a new one. If that wasn’t enough, it also makes the first
test fail!
def test_add_two_numbers():
c = Calc()
tests/test_calc.py:7: TypeError
_____________________________ test_add_three_numbers _______________________________
def test_add_three_numbers():
c = Calc()
Chapter 1 - Introduction to TDD 22
res = c.add(4, 5, 6)
tests/test_calc.py:17: AssertionError
The first test now fails because the new add method requires three arguments and we are passing
only two. The second tests fails because the add method returns 9 and not 15 as expected by the test.
When multiple tests fail it’s easy to feel discomforted and lost. Where are you supposed to start fixing
this? Well, one possible solution is to undo the previous change and to try a different solution, but
in general you should try to get to a situation in which only one test fails.
This is very important as it allows you to focus on one single test and thus one single problem. And
remember, commenting tests to make them inactive is a perfectly valid way to have only one failing
test. In this case I will comment the second test, so my tests file is now
def test_add_two_numbers():
c = Calc()
res = c.add(4, 5)
assert res == 9
## def test_add_three_numbers():
## c = Calc()
## res = c.add(4, 5, 6)
## assert res == 15
def test_add_two_numbers():
c = Calc()
tests/test_calc.py:7: TypeError
To fix this error we can obviously revert the addition of the third argument, but this would mean
going back to the previous solution. Obviously, though tests focus on a very small part of the code,
we have to keep in mind what we are doing in terms of the big picture. A better solution is to add to
the third argument a default value. The additive identity is 0, so the new code of the add method is
class Calc:
def add(self, a, b, c=0):
return 9
And this makes the failing test pass. At this point we can uncomment the second test and see what
happens.
def test_add_three_numbers():
c = Calc()
res = c.add(4, 5, 6)
tests/test_calc.py:17: AssertionError
The test suite fails, because the returned value is still not correct for the second test. At this point
the tests show that our previous solution (return 9) is not sufficient anymore, and we have to try
to implement something more complex.
We know that writing return 15 will make the first test fail (you may try, if you want), so here we
have to be a bit smarter and try a better solution, that in this case is actually to implement a real
sum
Chapter 1 - Introduction to TDD 24
class Calc:
def add(self, a, b, c=0):
return a + b + c
This solution makes both tests pass, so the entire suite runs without errors.
I can see your face, your are probably frowning at the fact that it took us 10 minutes to write a
method that performs the addition of two or three numbers. On the one hand, keep in mind that
I’m going at a very slow pace, this being an introduction, and for these first tests it is better to take
the time to properly understand every single step. Later, when you will be used to TDD, some of
these steps will be implicit. On the other hand, TDD is slower than untested development, but the
time that you invest writing tests now is usually nothing compared to the amount of time you might
spend trying to indentify and fix bugs later.
The solution, in this case, might be to test a reasonable high amount of input arguments, to check
that everything works. In particular, we should have a concern for a generic solution, which cannot
rely on default arguments. To be clear, we easily realise that we cannot come up with a function like
as it is not “generic”, it is just covering a greater amount of inputs (9, in this case, but not 10 or more).
That said, a good test might be the following
def test_add_many_numbers():
s = range(100)
which creates an array²⁵ of all the numbers from 0 to 99. The sum of all those numbers is 4950, which
is what the algorithm shall return. The test suite fails because we are giving the function too many
arguments
def test_add_many_numbers():
s = range(100)
tests/test_calc.py:23: TypeError
The minimum amount of code that we can add, this time, will not be so trivial, as we have to
pass three tests. Fortunately the tests that we wrote are still there and will check that the previous
conditions are still satisfied.
The way Python provides support to a generic number of arguments (technically called “variadic
functions”) is through the use of the *args syntax, which stores in args a tuple that contains all the
arguments.
class Calc:
def add(self, *args):
return sum(args)
²⁵strictly speaking this creates a range, which is an iterable.
Chapter 1 - Introduction to TDD 26
At that point we can use the sum built-in function to sum all the arguments. This solution makes the
whole test suite pass without errors, so it is correct.
Pay attention here, please. In TDD a solution is not correct when it is beautiful, when it is smart,
or when it uses the latest feature of the language. All these things are good, but TDD wants your
code to pass the tests. So, your code might be ugly, convoluted, and slow, but if it passes the test it is
correct. This in turn means that TDD doesn’t cover all the needs of your software project. Delivering
fast routines, for example, might be part of the advantage you have on your competitors, but it is
not really testable with the TDD methodology²⁷.
Part of the TDD methodology, then, deals with “refactoring”, which means changing the code in a
way that doesn’t change the outputs, which in turns means that all your tests keep passing. Once
you have a proper test suite in place, you can focus on the beauty of the code, or you can introduce
smart solutions according to what the language allows you to do.
Step 4 - Subtraction
From the requirements we know that we have to implement a function to subtract numbers, but this
doesn’t mention multiple arguments (as it would be complex to define what subtracting 3 of more
numbers actually means). The tests that implements this requirements is
def test_subtract_two_numbers():
c = Calc()
res = c.sub(10, 3)
assert res == 7
²⁶https://github.com/pycabook/calc/tree/step-3-adding-multiple-numbers
²⁷yes, you can test it running a function and measuring the execution time. This however, depends too much on external conditions, so
typically performance testing is done in a completely different way.
Chapter 1 - Introduction to TDD 27
def test_subtract_two_numbers():
c = Calc()
tests/test_calc.py:29: AttributeError
Now that you understood the TDD process, and that you know you should avoid over-engineering,
you can also skip some of the passages that we run through in the previous sections. A good solution
for this test is
Step 5 - Multiplication
It’s time to move to multiplication, which has many similarities to addition. The requirements state
that we have to provide a function to multiply numbers and that this function shall allow us to
multiply multiple arguments. In TDD you should try to tackle problems one by one, possibly dividing
a bigger requirement in multiple smaller ones.
In this case the first test can be the multiplication of two numbers, as it was for addition.
def test_mul_two_numbers():
c = Calc()
res = c.mul(6, 4)
assert res == 24
And the test suite fails as expected with the following error
²⁸https://github.com/pycabook/calc/tree/step-4-subtraction
Chapter 1 - Introduction to TDD 28
def test_mul_two_numbers():
c = Calc()
tests/test_calc.py:37: AttributeError
We face now a classical TDD dilemma. Shall we implement the solution to this test as a function that
multiplies two numbers, knowing that the next test will invalidate it, or shall we already consider
that the target is that of implementing a variadic function and thus use *args directly?
In this case the choice is not really important, as we are dealing with very simple functions. In other
cases, however, it might be worth recognising that we are facing the same issue we solved in a
similar case and try to implement a smarter solution from the very beginning. In general, however,
you should not implement anything that you don’t plan to test in one of the next two tests that you
will write.
If we decide to follow the strict TDD, that is implement the simplest first solution, the bare minimum
code that passes the test would be
To show you how to deal with redundant tests I will in this case choose the second path, and
implement a smarter solution for the present test. Keep in mind however that it is perfectly correct
to implement that solution shown above and then move on and try to solve the problem of multiple
arguments later.
The problem of multiplying a tuple of numbers can be solved in Python using the reduce function.
This function implements a typical algorithm that “reduces” an array to a single number, applying
a given function. The algorithm steps are the following
3. Apply the function to the result of the previous step and to the first element of the array
4. Remove the first element
5. If there are still elements in the array go back to step 3
a = [2, 6, 4, 8, 3]
1. Apply the function to 2 and 6 (first two elements). The result is 2 * 6, that is 12
2. Remove the first two elements, the array is now a = [4, 8, 3]
3. Apply the function to 12 (result of the previous step) and 4 (first element of the array). The new
result is 12 * 4, that is 48
4. Remove the first element, the array is now a = [8, 3]
5. Apply the function to 48 (result of the previous step) and 8 (first element of the array). The new
result is 48 * 8, that is 384
6. Remove the first element, the array is now a = [3]
7. Apply the function to 384 (result of the previous step) and 3 (first element of the array). The
new result is 384 * 3, that is 1152
8. Remove the first element, the array is now empty and the procedure ends
Going back to our Calc class, we might import reduce³⁰ form the functools module and use it on
the args array. We need to provide a function that we can define in the mul function itself.
class Calc:
[...]
The above code makes the test suite pass, so we can move on and address the next problem. As
happened with addition we cannot properly test that the function accepts a potentially infinite
number of arguments, so we can test a reasonably high number of inputs.
def test_mul_many_numbers():
s = range(1, 10)
We might use 100 arguments as we did with addition, but the multiplication of all numbers from
1 to 100 gives a result with 156 digits and I don’t really need to clutter the tests file with such a
monstrosity. As I said, testing multiple arguments is testing a boundary, and the idea is that if the
algorithm works for 2 numbers and for 10 it will work for 10 thousands arguments as well.
If we run the test suite now all tests pass, and this should worry you.
Yes, you shouldn’t be happy. When you follow TDD each new test that you add should fail. If it
doesn’t fail you should ask yourself if it is worth adding that test or not. This is because chances
are that you are adding a useless test and we don’t want to add useless code, because code has to be
maintained, so the less the better.
In this case, however, we know why the test already passes. We implemented a smarter algorithm
as a solution for the first test knowing that we would end up trying to solve a more generic problem.
And the value of this new test is that it shows that multiple arguments can be used, while the first
test doesn’t.
So, after this considerations, we can be happy that the second test already passes.
³¹https://github.com/pycabook/calc/tree/step-5-multiply-two-numbers-smart
³²https://github.com/pycabook/calc/tree/step-5-multiply-many-numbers
Chapter 1 - Introduction to TDD 31
Step 6 - Refactoring
Previously, I introduced the concept of refactoring, which means changing the code without altering
the results. How can you be sure you are not altering the behaviour of your code? Well, this is what
the tests are for. If the new code keeps passing the test suite you can be sure that you didn’t remove
any feature³³.
This means that if you have no tests you shouldn’t refactor. But, after all, if you have no tests you
shouldn’t have any code, either, so refactoring shouldn’t be a problem you have. If you have some
code without tests (I know you have it, I do), you should seriously consider writing tests for it, at
least before changing it. More on this in a later section.
For the time being, let’s see if we can work on the code of the Calc class without altering the results.
I do not really like the definition of the mul2 function inside the mul one. It is obviously perfectly
fine and valid, but for the sake of example I will pretend we have to get rid of it.
Python provides support for anonymous functions with the lambda operator, so I might replace the
mul code with
class Calc:
[...]
where I define an anonymous function that accepts two inputs x, y and returns their multiplication
x*y. Running the test suite I can see that all the test pass, so my refactoring is correct.
³³In theory, refactoring shouldn’t add any new behaviour to the code, as it should be an idempotent transformation. There is no real
practical way to check this, and we will not bother with it now. You should be concerned with this if you are discussing security, as your code
shouldn’t add any entry point you don’t want to be there. In this case you will need tests that check not the presence of some feature, but the
absence of them.
³⁴https://github.com/pycabook/calc/tree/step-6-refactoring
Chapter 1 - Introduction to TDD 32
Step 7 - Division
The requirements state that there shall be a division function, and that it has to return a float value.
This is a simple condition to test, as it is sufficient to divide two numbers that do not give an integer
result
def test_div_two_numbers_float():
c = Calc()
res = c.div(13, 2)
The test suite fails with the usual error that signals a missing method. The implementation of this
function is very simple as the / operator in Python performs a float division
class Calc:
[...]
If you run the test suite again all the test should pass. There is a second requirement about this
operation, however, that states that division by zero shall return the string "inf". Now, this is
obviously a requirement that I introduced for the sake of giving some interesting and simple problem
to solve with TDD, as an API that returns either floats or strings is not really the best idea.
The test that comes from the requirement is simple
def test_div_by_zero_returns_inf():
c = Calc()
res = c.div(5, 0)
def test_div_by_zero_returns_inf():
c = Calc()
tests/test_calc.py:59:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
calc/calc.py:15: ZeroDivisionError
Note that when an exception happens in the code and not in the test, the pytest output changes
slightly. The first part of the message shows where the test fails, but then there is a second part that
shows the internal code that raised the exception and provides information about the value of local
variables on the first line (self = <calc.calc.Calc object at 0x7f56c3dddb70>, a = 5, b = 0).
We might implement two different solutions to satisfy this requirement and its test. The first one is
to prevent b to be 0
return a / b
and the second one is to intercept the exception with a try/except block
Both solutions make the test suite pass, so both are correct. I leave to you the decision about which
is the best one, syntactically speaking.
Chapter 1 - Introduction to TDD 34
import pytest
[...]
def test_mul_by_zero_raises_exception():
c = Calc()
with pytest.raises(ValueError):
c.mul(3, 0)
In this case, thus, pytest runs the line c.mul(3, 0). If it doesn’t raise the ValueError exception the
test will fail. Indeed, if you run the test suite now, you will get the following failure
def test_mul_by_zero_raises_exception():
c = Calc()
with pytest.raises(ValueError):
> c.mul(3, 0)
E Failed: DID NOT RAISE <class 'ValueError'>
tests/test_calc.py:70: Failed
which explicitly signals that the code didn’t raise the expected exception.
The code that makes the test pass needs to test if one of the inputs of the mul functions is 0. This
can be done with the help of the built-in all Python function, which accepts an iterable and returns
³⁶https://github.com/pycabook/calc/tree/step-7-division-by-zero
Chapter 1 - Introduction to TDD 35
True only if all the values contained in it are True. Since in Python the value 0 is not true, we may
write
and make the test suite pass. The if condition checks that there are no false values in the args tuples,
that is there are no zeros.
1. The function accepts an iterable and computes the average, i.e. avg([2, 5, 12, 98]) == 29.25
2. The function accepts an optional upper threshold. It must remove all the values that are greater
than the threshold before computing the average, i.e. avg([2, 5, 12, 98], ut=90) == avg([2,
5, 12])
3. The function accepts an optional lower threshold. It must remove all the values that are less
then the threshold before computing the average, i.e. avg([2, 5, 12, 98], lt=10) == avg([12,
98])
4. The upper threshold is not included in the comparison, i.e. avg([2, 5, 12, 98], ut=98) ==
avg([2, 5, 12, 98])
5. The lower threshold is not included in the comparison, i.e. avg([2, 5, 12, 98], ut=5) ==
avg([5, 12, 98])
6. The function works with an empty list, returning 0, i.e. avg([]) == 0
7. The function works if the list is empty after outlier removal, i.e. avg([12, 98], lt=15, ut=90)
== 0
8. The function outlier removal works if the list is empty, i.e. avg([], lt=15, ut=90) == 0
³⁷https://github.com/pycabook/calc/tree/step-8-multiply-by-zero
Chapter 1 - Introduction to TDD 36
As you can see a simple requirement can produce multiple tests. Some of these are clearly expressed
by the requirement (numbers 1, 2, 3), some of these are choices that we make (numbers 4, 5, 6) and
can be discussed, some are boundary cases that we have to discover thinking about the problem
(numbers 6, 7, 8).
There is a fourth category of tests, which are the ones that come from bugs that you discover. We
will discuss about those later in this chapter.
def test_avg_correct_average():
c = Calc()
We feed the avg function a list of generic numbers, which average we calculated with an external
tool. The first run of the test suite fails with the usual complaint about a missing function
def test_avg_correct_average():
c = Calc()
tests/test_calc.py:76: AttributeError
And we can make the test pass with a simple use of sum and len, as both built-in functions work on
iterables
class Calc:
[...]
def test_avg_removes_upper_outliers():
c = Calc()
As you can see the ut=90 parameter is supposed to remove the element 98 from the list and then
compute the average of the remaining elements. Since the result has an infinite number of digits I
used the pytest.approx function to check the result.
The test suite fails because the avg function doesn’t accept the ut parameter
def test_avg_removes_upper_outliers():
c = Calc()
tests/test_calc.py:84: TypeError
There are two problems now that we have to solve, as it happened for the second test we wrote in
this project. The new ut argument needs a default value, so we have to manage that case, and then
we have to make the upper threshold work. My solution is
³⁸https://github.com/pycabook/calc/tree/step-9-1-average-of-an-iterable
Chapter 1 - Introduction to TDD 38
return sum(_it)/len(_it)
The idea here is that ut is used to filter the iterable keeping all the elements that are less than or equal
to the threshold. This means that the default value for the threshold has to be neutral with regards to
this filtering operation. Using the maximum value of the iterable makes the whole algorithm work
in every case, while for example using a big fixed value like 9999 would introduce a bug, as one of
the elements of the iterable might be bigger than that value.
def test_avg_removes_lower_outliers():
c = Calc()
³⁹https://github.com/pycabook/calc/tree/step-9-2-upper-threshold
Chapter 1 - Introduction to TDD 39
if not ut:
ut = max(it)
return sum(_it)/len(_it)
def test_avg_uppper_threshold_is_included():
c = Calc()
def test_avg_lower_threshold_is_included():
c = Calc()
def test_avg_empty_list():
c = Calc()
res = c.avg([])
assert res == 0
def test_avg_empty_list():
c = Calc()
tests/test_calc.py:116:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
calc/calc.py:24: ValueError
The min function that we used to compute the default lower threshold doesn’t work with an empty
list, so the code raises an exception. The simplest solution is to check for the length of the iterable
before computing the default thresholds
if not lt:
lt = min(it)
if not ut:
ut = max(it)
return sum(_it)/len(_it)
As you can see the avg function is already pretty rich, but at the same time it is well structured and
understandable. This obviously happens because the example is trivial, but cleaner code is definitely
among the benefits of TDD.
⁴³https://github.com/pycabook/calc/tree/step-9-6-empty-list
Chapter 1 - Introduction to TDD 42
def test_avg_manages_empty_list_after_outlier_removal():
c = Calc()
assert res == 0
and the test suite fails with a ZeroDivisionError, because the length of the iterable is now 0.
def test_avg_manages_empty_list_after_outlier_removal():
c = Calc()
tests/test_calc.py:124:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
if not lt:
lt = min(it)
if not ut:
ut = max(it)
calc/calc.py:34: ZeroDivisionError
The easiest solution is to introduce a new check on the length of the iterable
Chapter 1 - Introduction to TDD 43
if not lt:
lt = min(it)
if not ut:
ut = max(it)
if not len(_it):
return 0
return sum(_it)/len(_it)
And this code makes the test suite pass. As I stated before, code that makes the tests pass is considered
correct, but you are always allowed to improve it. In this case I don’t really like the repetition of the
length check, so I might try to refactor the function to get a cleaner solution. Since I have all the tests
that show that the requirements are satisfied, I am free to try to change the code of the function.
After some attempts I found this solution
if lt:
_it = [x for x in _it if x >= lt]
if ut:
_it = [x for x in _it if x <= ut]
if not len(_it):
return 0
return sum(_it)/len(_it)
which looks reasonably clean, and makes the whole test suite pass.
⁴⁴https://github.com/pycabook/calc/tree/step-9-7-empty-list-after-thresholds
Chapter 1 - Introduction to TDD 44
def test_avg_manages_empty_list_before_outlier_removal():
c = Calc()
assert res == 0
This test doesn’t fail. So, according to the TDD methodology, we should justify the reason why it
doesn’t fail, and decide if we want to keep it. The reason why it doesn’t fail is because the two list
comprehensions used to filter the elements work perfectly with empty lists. As for the test, it comes
directly from a corner case, and it checks a behaviour which is not already covered by other tests.
This makes me decide to keep the test.
The first thing you need to do is to write tests that expose the bug. This way you can easily decide
when the code that you wrote is correct or good enough. For example, let’s assume that a user file an
issue on the Calc project saying: “The add function doesn’t work with negative numbers”. You should
definitely try to get a concrete example from the user that wrote the issue and some information
about the execution environment (as it is always possible that the problem come from a different
source, like for example an old version of a library your package relies on), but in the meanwhile
you can come up with at least 3 tests: one that involves two negative numbers, one with a negative
number as the first argument, and one with a negative numbers as the second argument.
You shouldn’t write down all of them at once. Write the first test that you think might expose the
issue and see if it fails. If it doesn’t, discard it and write a new one. From the TDD point of view,
if you don’t have a failing test there is no bug, so you have to come up with at least one test that
exposes the issue you are trying to solve.
At this point you can move on and try to change the code. Remember that you shouldn’t have more
than one failing test at a time, so start doing this as soon as you discover a test case that shows there
is a problem in the code.
Once you reach a point where the test suite passes without errors stop and try to run the code in the
environment where the bug was first discovered (for example sharing a branch with the user that
created the ticket) and iterate the process.
Chapter 2 - On unit testing
Describe in single words, only the good things that come into your mind about your mother.
- Blade Runner (1982)
Introduction
What I introduced in the previous chapter is commonly called “unit testing”, since it focuses on
testing a single and very small unit of code. As simple as it may seem, the TDD process has some
caveats that are worth being discussed. In this chapter I discuss some aspects of TDD and unit testing
that I consider extremely important.
The typical example is when you interact with the filesystem in your tests. A test may create a file
and not remove it, and this makes another test fail because the file already exists, or because the
directory is not empty. Whatever you do while interacting with external systems has to be reverted
after the test. If you run your tests concurrently, however, even this precaution is not enough.
This poses a big problem, as interacting with external systems is definitely to be considered
dangerous. Mocks, introduced in the next chapter, are a very good tool to deal with this aspect
of testing.
External systems
It is important to understand that the above definitions (idempotency, isolation) depend on the scope
of the test. You should consider external whatever part of the system is not directly involved in the
test, even though you need to use it to run the test itself. You should also try to reduce the scope of
the test as much as possible.
Let me give you an example. Consider a web application and imagine a test that check that a user
can log in. The login process involves many layers: the user inputs the username and the password
in a GUI and submits the form, the GUI communicates with the core of the application that finds the
user in the DB and checks the password hash against the one stored there, then sends back a message
that grants access to the user, and the GUI stores a cookie to keep the user logged in. Suppose now
that the test fails. Where is the error? Is it in the query that retrieves the user from the DB? Or in the
routine that hashes the password? Or is it just an issue in the connectivity between the application
and the database?
Chapter 2 - On unit testing 49
As you can see there are too many possible points of failure. While this is a perfectly valid integration
test, it is definitely not a unit test. Unit tests try to test the smallest possible units of code in your
system, usually simple routines like functions or object methods. Integration tests, instead, put
together whole systems that have already been tested and test that they can work together.
Too many times developers confuse integration tests with unit tests. One simple example: every time
a web framework makes you test your models against a real database you are mixing a unit test (the
methods of the model object work) with an integration one (the model object connects with the
database and can store/retrieve data). You have to learn how to properly identify what is external
to your system in the scope of a given test, so your tests can be focused and small.
Focus on messages
I will never recommend enough Sandi Metz’s talk “The Magic Tricks of Testing”⁴⁷ where she
considers the different messages that a software component has to deal with. She comes up with
3 different origins for messages (incoming, sent to self, and outgoing) and 2 types (query and
command). The very interesting conclusion she reaches is that you should only test half of them,
and I believe this is one of the most useful results you can learn as a software developer. In this
section I will shamelessly start from Sandi Metz’s categorisations and give a personal view of the
matter. I absolutely recommend to watch the original talk as it is both short and very effective.
Testing is all about the behaviour of a component when it is used, i.e. when it is connected to other
components that interact with it. This interaction is well represented by the word “message”, which
has hereafter the simple meaning of “data exchanged between two actors”.
We can then classify the interactions happening in our system, and thus to our components, by
flow⁴⁸ and by type.
Message flow
The flow is defined as the tuple (source, origin), that is where the message comes from and what
its destination is. There are three different combinations that we are interested in: (outside, self),
(self, self), and (self, outside), where self is the object we are testing, and outside is a
generic object that lives in the system. There is a fourth combination, (outside, outside) that is
not relevant for the testing, since it doesn’t involve the object under analysis.
So (outside, self) contains all the messages that other parts of the system send to our component.
These messages correspond to the public API of the component, that is the set of entry points the
component makes available to interact with it. Notable examples are the public methods of an object
in an object-oriented programming language or the HTTP endpoints of a Web application. This flow
represents the incoming messages.
⁴⁷https://speakerdeck.com/skmetz/magic-tricks-of-testing-railsconf
⁴⁸Sandi Metz speaks of origin.
Chapter 2 - On unit testing 50
On the other side there is (self, outside), which is the set of messages that the component under
test sends to other parts of the system. These are for example the external calls that an object does
to a library or to other objects, or the API of other applications we rely on, like databases or Web
applications. This flow describes all the outgoing messages.
Between the two there is (self, self), which identifies the messages that the component sends
to itself, i.e. the use that the component does of its own internal API. This can be the set of private
methods of an object or the business logic inside a Web application. The important thing about this
last case is that while the component is seen as a black box by the rest of the system it actually has
an internal structure and it uses it to run. This flow contains all the private messages.
Message type
Messages can be further divided according to the interaction the source requires to have with the
target: queries and commands. Queries are messages that do not change the status of the component,
they just extract information. The Calc class that we developed in the previous section is a typical
example of object that exposes query methods. Adding two numbers doesn’t change the status of
the object, and you will receive the same answer every time you call the add method.
Commands, instead, are the complete opposite. They do not extract any information, but they change
the status of the object. A method of an object that increases an internal counter or a method that
adds values to an array are perfect examples of commands.
It’s perfectly normal to combine a query and a command in a single message, as long as you are
aware that your message is changing the status of the component. Remember that changing the
status is something that can have concrete secondary effect.
Incoming queries
An incoming query is a message that an external actor sends to get a value from your component.
Testing this behaviour is straightforward, as you just need to write a test that sends the message and
makes an assertion on the returned value. A concrete example of this is what we did to test the add
method of Calc.
Chapter 2 - On unit testing 51
Incoming commands
And incoming command comes from an external actor that wants to change the status of the system.
There should be a way for an external actor to check the status, which translates into the need of
having either a companion incoming query message that allows to extract the status (or at least the
part of the status affected by the command), or the knowledge that the change is going to affect the
behaviour of another query. A simple example might be a method that sets the precision (number
of digits) of the division in the Calc object. Setting that value changes the result of a query, which
can be used to test the effect of the incoming command.
Private queries
A private query is a message that the component sends to self to get a value without affecting its
own state, and it is basically nothing more than an explicit use of some internal logic. This happens
often in object-oriented languages because you extracted some common logic from one or more
methods of an object and created a private method to avoid duplication.
Since private queries use the internal logic you shouldn’t test them. This might be surprising, as
private methods are code, and code should be tested, but remember that other methods are calling
them, so the effects of that code are not invisible, they are tested by the tests of the public entry
points, although indirectly. The only effect you would achieve by testing private methods is to lock
the tests to the internal implementation of the component, which by definition shouldn’t be used by
anyone outside of the component itself. This in turn, makes refactoring painful, because you have
to keep redundant tests in sync with the changes that you do, instead of using them as a guide for
the code changes like TDD wants you to do.
As Sandi Metz says, however, this is not an inflexible rule. Whenever you see that testing an
internal method makes the structure more robust feel free to do it. Be aware that you are locking
the implementation, so do it only where it makes a real difference businesswise.
Private commands
Private commands shouldn’t be treated differently than private queries. They change the status of
the component, but this is again part of the internal logic of the component itself, so you shouldn’t
test private commands either. As stated for private queries, feel free to do it if this makes a real
difference.
external actor. Let me repeat this: you don’t want to test that the external actor return the correct
value given some inputs.
This is perhaps one of the biggest mistakes that programmers make when they test their applications.
Definitely it is a mistake that I made many times. We tend to introduce tests that, starting from the
code of our component, end up testing different components.
Outgoing commands are messages sent to external actors in order to change their state. Since our
component sends such messages to cause an effect in another part of the system we have to be sure
that the sent values are correct. We do not want to test that the state of the external actor change
accordingly, as this is part of the testing suite of the external actor itself (incoming command).
From this consideration it is evident that you shouldn’t test the results of any outgoing query or
command. Possibly, you should avoid running them at all, otherwise you will need the external
system to be up and running when you run the test suite.
We want to be sure, however, that our component uses the API of the external actor in a proper
way and the standard technique to test this is to use mocks, that is components that simulate other
components. Mocks are an important tool in the TDD methodology and for this reason they are the
topic of the next chapter.
Conclusions
Since the discovery of TDD few thing changed the way I write code more than these considerations
on what I am supposed to test. Out of 6 different type of tests we discovered that 2 shouldn’t be
tested, 2 of them require a very simple technique based on assertions, and the last 2 are the only
ones that requires an advanced technique (mocks). This should cheer you up, as for once a good
methodology doesn’t add new rules and further worries, but removes one third of them, forbidding
you to implement them!
Chapter 3 - Mocks
We’re gonna get bloody on this one, Rog.
- Lethal Weapon (1987)
Basic concepts
As we saw in the previous chapter the relationship between the component that we are testing and
other components of the system can be complex. Sometimes idempotency and isolation are not easy
to achieve, and testing outgoing commands requires to check the parameters sent to the external
component, which is not trivial.
The main difficulty comes from the fact that your code is actually using the external system. When
you run it in production the external system will provide the data that your code needs and the
whole process can work as intended. During testing, however, you don’t want to be bound to the
external system, for the reasons explained in the previous chapter, but at the same time you need it
to make your code work.
So, you face a complex issue. On the one hand your code is connected to the external system (be
it hardcoded or chosen programmatically), but on the other hand you want it to run without the
external system being active (or even present).
This problem can be solved with the use of mocks. A mock, in the testing jargon, is an object
that simulates the behaviour of another (more complex) object. Wherever your code connects to
an external system, during testing you can replace the latter with a mock, pretending the external
system is there and properly checking that your component behaves like intended.
First steps
Let us try and work with a mock in Python and see what it can do. First of all fire up a Python shell
and import the library
The main object that the library provides is Mock and you can instantiate it without any argument
Chapter 3 - Mocks 54
>>> m = mock.Mock()
This object has the peculiar property of creating methods and attributes on the fly when you require
them. Let us first look inside the object to get an idea of what it provides
>>> dir(m)
['assert_any_call', 'assert_called_once_with', 'assert_called_with', 'assert_has_cal\
ls', 'attach_mock', 'call_args', 'call_args_list', 'call_count', 'called', 'configur\
e_mock', 'method_calls', 'mock_add_spec', 'mock_calls', 'reset_mock', 'return_value'\
, 'side_effect']
As you can see there are some methods which are already defined into the Mock object. Let’s try to
read a non-existent attribute
>>> m.some_attribute
<Mock name='mock.some_attribute' id='140222043808432'>
>>> dir(m)
['assert_any_call', 'assert_called_once_with', 'assert_called_with', 'assert_has_cal\
ls', 'attach_mock', 'call_args', 'call_args_list', 'call_count', 'called', 'configur\
e_mock', 'method_calls', 'mock_add_spec', 'mock_calls', 'reset_mock', 'return_value'\
, 'side_effect', 'some_attribute']
As you can see this class is somehow different from what you are used to. First of all, its instances
do not raise an AttributeError when asked for a non-existent attribute, but they happily return
another instance of Mock itself. Second, the attribute you tried to access has now been created inside
the object and accessing it returns the same mock object as before.
>>> m.some_attribute
<Mock name='mock.some_attribute' id='140222043808432'>
Mock objects are callables, which means that they may act both as attributes and as methods. If you
try to call the mock it just returns another mock with a name that includes parentheses to signal its
callable nature
>>> m.some_attribute()
<Mock name='mock.some_attribute()' id='140247621475856'>
As you can understand, such objects are the perfect tool to mimic other objects or systems, since
they may expose any API without raising exceptions. To use them in tests, however, we need them to
behave just like the original, which implies returning sensible values or performing real operations.
Now, as you can see the object does not return a mock object any more, instead it just returns the
static value stored in the return_value attribute. Since in Python everything is an object you can
return here any type of value: simple types like an integer of a string, more complex structures like
dictionaries or lists, classes that you defined, instances of those, or functions.
Pay attention that what the mock returns is exactly the object that it is instructed to use as return
value. If the return value is a callable such as a function, calling the mock will return the function
itself and not the result of the function. Let me give you an example
As you can see calling some_attribute() just returns the value stored in return_value, that is the
function itself. To make the mock call the object that we use as a return value we have to use a
slightly more complex attribute called side_effect.
If you pass an iterable, such as for example a generator, a plain list, tuple, or similar objects, the mock
will yield the values of that iterable, i.e. return every value contained in the iterable on subsequent
calls of the mock.
Chapter 3 - Mocks 56
As promised, the mock just returns every object found in the iterable (in this case a range object)
one at a time until the generator is exhausted. According to the iterator protocol once every item
has been returned the object raises the StopIteration exception, which means that you can safely
use it in a loop.
Last, if you feed side_effect a callable, the latter will be executed with the parameters passed when
calling the attribute. Let’s consider again the simple example given in the previous section
As you can see the arguments passed to the attribute are directly used as arguments for the
stored function. This is very powerful, especially if you stop thinking about “functions” and start
considering “callables”. Indeed, given the nature of Python objects we know that instantiating an
object is not different from calling a function, which means that side_effect can be given a class
and return a instance of it
Chapter 3 - Mocks 57
Asserting calls
As I explained in the previous chapter outgoing commands shall be tested checking the correctness
of the message argument. This can be easily done with mocks, as these objects record every call that
they receive and the argument passed to it.
Let’s see a practical example
def test_connect():
external_obj = mock.Mock()
myobj.MyObj(external_obj)
external_obj.connect.assert_called_with()
Here, the myobj.MyObj class needs to connect to an external object, for example a remote repository
or a database. The only thing we need to know for testing purposes is if the class called the connect
method of the external object without any parameter.
So the first thing we do in this test is to instantiate the mock object. This is a fake version
of the external object, and its only purpose is to accept calls from the MyObj object under test
and possibly return sensible values. Then we instantiate the MyObj class passing the external
object. We expect the class to call the connect method so we express this expectation calling
external_obj.connect.assert_called_with().
What happens behind the scenes? The MyObj class receives the fake external object and somewhere
in its initialization process calls the connect method of the mock object. This call creates the method
itself as a mock object. This new mock records the parameters used to call it and the subsequent call
Chapter 3 - Mocks 58
to its assert_called_with method checks that the method was called and that no parameters were
passed.
In this case an object like
class MyObj():
def __init__(self, repo):
repo.connect()
would pass the test, as the object passed as repo is a mock that does nothing but record the calls.
As you can see, the __init__() method actually calls repo.connect(), and repo is expected to be
a full-featured external object that provides connect in its API. Calling repo.connect() when repo
is a mock object, instead, silently creates the method (as another mock object) and records that the
method has been called once without arguments.
The assert_called_with method allows us to also check the parameters we passed when calling.
To show this let us pretend that we expect the MyObj.setup method to call setup(cache=True,
max_connections=256) on the external object. Remember that this is an outgoing command, so we
are interested in checking the parameters and not the result.
The new test can be something like
def test_setup():
external_obj = mock.Mock()
obj = myobj.MyObj(external_obj)
obj.setup()
external_obj.setup.assert_called_with(cache=True, max_connections=256)
class MyObj():
def __init__(self, repo):
self._repo = repo
repo.connect()
def setup(self):
self._repo.setup(cache=True, max_connections=256)
def setup(self):
self._repo.setup(cache=True)
Which I consider a very clear explanation of what went wrong during the test execution.
As you can read in the official documentation, the Mock object provides other methods and attributes,
like assert_called_once_with, assert_any_call, assert_has_calls, assert_not_called, called,
call_count, and many others. Each of those explores a different aspect of the mock behaviour
concerning calls, make sure to read their description and go through the examples.
A simple example
To learn how to use mocks in a practical case, let’s work together on a new module in the calc
package. The target is to write a class that downloads a JSON file with data on meteorites and
computes some statistics on the dataset using the Calc class. The file is provided by NASA at this
URL⁴⁹.
The class contains a get_data method that queries the remote server and returns the data, and a
method average_mass that uses the Calc.avg method to compute the average mass of the meteorites
and return it. In a real world case, like for example in a scientific application, I would probably split
the class in two. One class manages the data, updating it whenever it is necessary, and another one
manages the statistics. For the sake of simplicity, however, I will keep the two functionalities together
in this example.
Let’s see a quick example of what is supposed to happend inside our code. An excerpt of the file
provided from the server is
[
{
"fall": "Fell",
"geolocation": {
"type": "Point",
"coordinates": [6.08333, 50.775]
},
"id":"1",
"mass":"21",
"name":"Aachen",
"nametype":"Valid",
"recclass":"L5",
"reclat":"50.775000",
"reclong":"6.083330",
"year":"1880-01-01T00:00:00.000"
⁴⁹https://data.nasa.gov/resource/y77d-th95.json
Chapter 3 - Mocks 60
},
{
"fall": "Fell",
"geolocation": {
"type": "Point",
"coordinates": [10.23333, 56.18333]
},
"id":"2",
"mass":"720",
"name":"Aarhus",
"nametype":"Valid",
"recclass":"H6",
"reclat":"56.183330",
"reclong":"10.233330",
"year":"1951-01-01T00:00:00.000"
}
]
import urllib.request
import json
import calc
URL = ("https://data.nasa.gov/resource/y77d-th95.json")
print(masses)
avg_mass = calc.Calc().avg(masses)
print(avg_mass)
Where the list comprehension filters out those elements which do not have a mass attribute.
An initial test for our class might be
Chapter 3 - Mocks 61
def test_average_mass():
m = MeteoriteStats()
data = m.get_data()
This little test contains however two big issues. First of all the get_data method is supposed to use
the Internet connection to get the data from the server. This is a typical example of an outgoing
query, as we are not trying to change the state of the web server providing the data. You already
know that you should not test the return value of an outgoing query, but you can see here why you
shouldn’t use real data when testing either. The data coming from the server can change in time,
and this can invalidate your tests.
In this case, however, testing the code is simple. Since the class has a public method get_data that
interacts with the external component, it is enogh to temporarily replace it with a mock that provides
sensible values. Create the tests/test_meteorites.py file and put this code in it
def test_average_mass():
m = MeteoriteStats()
m.get_data = mock.Mock()
m.get_data.return_value = [
{
"fall": "Fell",
"geolocation": {
"type": "Point",
"coordinates": [6.08333, 50.775]
},
"id":"1",
"mass":"21",
"name":"Aachen",
"nametype":"Valid",
"recclass":"L5",
"reclat":"50.775000",
"reclong":"6.083330",
"year":"1880-01-01T00:00:00.000"},
{
"fall": "Fell",
"geolocation": {
Chapter 3 - Mocks 62
"type": "Point",
"coordinates": [10.23333, 56.18333]
},
"id":"2",
"mass":"720",
"name":"Aarhus",
"nametype":"Valid",
"recclass":"H6",
"reclat":"56.183330",
"reclong":"10.233330",
"year":"1951-01-01T00:00:00.000"
}
]
avgm = m.average_mass(m.get_data())
When we run this test we are not testing that the external server provides the correct data. We are
testing the process implemented by average_mass, feeding the algorithm some known input. This is
not different from the first tests that we implemented: in that case we were testing an addition, here
we are testing a more complex algorithm, but the concept is the same.
We can now write a class that passes this test
import urllib.request
import json
URL = ("https://data.nasa.gov/resource/y77d-th95.json")
class MeteoriteStats:
def get_data(self):
with urllib.request.urlopen(URL) as url:
return json.loads(url.read().decode())
return c.avg(masses)
Chapter 3 - Mocks 63
Please note that we are not testing the get_data method itself. That method uses the function
urllib.request.urlopen that opens an Internet connection without passing through any other
public object that we can replace at run time during the test. We need then a tool to replace “internal”
parts of our objects when we run them, and this is provided by patching.
Patching
Mocks are very simple to introduce in your tests whenever your objects accept classes or instances
from outside. In that case, as shown in the previous sections, you just have to instantiate the Mock
class and pass the resulting object to your system. However, when the external classes instantiated
by your library are hardcoded this simple trick does not work. In this case you have no chance to
pass a fake object instead of the real one.
This is exactly the case addressed by patching. Patching, in a testing framework, means to replace a
globally reachable object with a mock, thus achieving the target of having the code run unmodified,
while part of it has been hot swapped, that is, replaced at run time.
A warm-up example
Let us start with a very simple example. Patching can be complex to grasp at the beginning so it is
better to start learning it with trivial use cases.
Create a new project following the instructions given previously in the book, calling this project
fileinfo. The purpose of this library is to develop a simple class that returns information about a
given file. The class shall be instantiated with the file path, which can be relative.
The starting point is the class with the __init__ method. If you want you can develop the class using
TDD, but for the sake of brevity I will not show here all the steps that I followed. This is the set of
tests I have in tests/test_fileinfo.py
⁵⁰https://github.com/pycabook/calc/tree/meteoritestats-class-added
Chapter 3 - Mocks 64
def test_init():
filename = 'somefile.ext'
fi = FileInfo(filename)
assert fi.filename == filename
def test_init_relative():
filename = 'somefile.ext'
relative_path = '../{}'.format(filename)
fi = FileInfo(relative_path)
assert fi.filename == filename
and this is the code of the FileInfo class in the fileinfo/fileinfo.py file
import os
class FileInfo:
def __init__(self, path):
self.original_path = path
self.filename = os.path.basename(path)
As you can see the class is extremely simple, and the tests are straightforward. So far I didn’t add
anything new to what we discussed in the previous chapter.
Now I want the get_info() function to return a tuple with the file name, the original path
the class was instantiated with, and the absolute path of the file. Pretending we are in the
/some/absolute/path directory, the class should work as shown here
>>> fi = FileInfo('../book_list.txt')
>>> fi.get_info()
('book_list.txt', '../book_list.txt', '/some/absolute')
You can immediately realise that you have an issue in writing the test. There is no way to easily test
something as “the absolute path”, since the outcome of the function called in the test is supposed to
vary with the path of the test itself. Let us try to write part of the test
⁵¹https://github.com/pycabook/fileinfo/tree/first-version
Chapter 3 - Mocks 65
def test_get_info():
filename = 'somefile.ext'
original_path = '../{}'.format(filename)
fi = FileInfo(original_path)
assert fi.get_info() == (filename, original_path, '???')
where the '???' string highlights that I cannot put something sensible to test the absolute path of
the file.
Patching is the way to solve this problem. You know that the function will use some code to get
the absolute path of the file. So, within the scope of this test only, you can replace that code with
something different and perform the test. Since the replacement code has a known outcome writing
the test is now possible.
Patching, thus, means to inform Python that during the execution of a specific portion of the code
you want a globally accessible module/object replaced by a mock. Let’s see how we can use it in our
example
[...]
def test_get_info():
filename = 'somefile.ext'
original_path = '../{}'.format(filename)
You clearly see the context in which the patching happens, as it is enclosed in a with statement. Inside
this statement the module os.path.abspath will be replaced by a mock created by the function patch
and called abspath_mock. So, while Python executes the lines of code enclosed by the with statement
any call to os.path.abspath will return the abspath_mock object.
The first thing we can do, then, is to give the mock a known return_value. This way we solve
the issue that we had with the initial code, that is using an external component that returns an
unpredictable result. The line
abspath_mock.return_value = test_abspath
instructs the patching mock to return the given string as a result, regardless of the real values of the
file under consideration.
The code that make the test pass is
Chapter 3 - Mocks 66
class FileInfo:
[...]
def get_info(self):
return (
self.filename,
self.original_path,
os.path.abspath(self.filename)
)
When this code is executed by the test the os.path.abspath function is replaced at run time by the
mock that we prepared there, which basically ignores the input value self.filename and returns
the fixed value it was instructed to use.
It is worth at this point discussing outgoing messages again. The code that we are considering here
is a clear example of an outgoing query, as the get_info method is not interested in changing the
status of the external component. In the previous chapter we reached the conclusion that testing the
return value of outgoing queries is pointless and should be avoided. With patch we are replacing the
external component with something that we know, using it to test that our object correctly handles
the value returned by the outgoing query. We are thus not testing the external component, as it got
replaced, and definitely we are not testing the mock, as its return value is already known.
Obviously to write the test you have to know that you are going to use the os.path.abspath function,
so patching is somehow a “less pure” practice in TDD. In pure OOP/TDD you are only concerned
with the external behaviour of the object, and not with its internal structure. This example, however,
shows that this pure approach has some limitations that you have to cope with, and patching is a
clean way to do it.
@patch('os.path.abspath')
def test_get_info(abspath_mock):
test_abspath = 'some/abs/path'
abspath_mock.return_value = test_abspath
filename = 'somefile.ext'
original_path = '../{}'.format(filename)
fi = FileInfo(original_path)
assert fi.get_info() == (filename, original_path, test_abspath)
As you can see the patch decorator works like a big with statement for the whole function. The
abspath_mock argument passed to the test becomes internally the mock that replaces os.path.abspath.
Obviously this way you replace os.path.abspath for the whole function, so you have to decide case
by case which form of the patch function you need to use.
Multiple patches
You can patch more that one object in the same test. For example, consider the case where the
get_info method calls os.path.getsize in addition to os.path.abspath, because it needs it to return
the size of the file. You have at this point two different outgoing queries, and you have to replace
both with mocks to make your class working during the test.
This can be easily done with an additional patch decorator
@patch('os.path.getsize')
@patch('os.path.abspath')
def test_get_info(abspath_mock, getsize_mock):
filename = 'somefile.ext'
original_path = '../{}'.format(filename)
test_abspath = 'some/abs/path'
abspath_mock.return_value = test_abspath
test_size = 1234
getsize_mock.return_value = test_size
⁵³https://github.com/pycabook/fileinfo/tree/patch-with-function-decorator
Chapter 3 - Mocks 68
fi = FileInfo(original_path)
assert fi.get_info() == (filename, original_path, test_abspath, test_size)
Please note that the decorator which is nearest to the function is applied first. Always remember that
the decorator syntax with @ is a shortcut to replace the function with the output of the decorator, so
two decorators result in
@decorator1
@decorator2
def myfunction():
pass
def myfunction():
pass
myfunction = decorator1(decorator2(myfunction))
This explains why, in the test code, the function receives first abspath_mock and then getsize_mock.
The first decorator applied to the function is the patch of os.path.abspath, which appends the mock
that we call abspath_mock. Then the patch of os.path.getsize is applied and this appends its own
mock.
The code that makes the test pass is
class FileInfo:
[...]
def get_info(self):
return (
self.filename,
self.original_path,
os.path.abspath(self.filename),
os.path.getsize(self.filename)
)
We can write the above test using two with statements as well
⁵⁴https://github.com/pycabook/fileinfo/tree/multiple-patches
Chapter 3 - Mocks 69
def test_get_info():
filename = 'somefile.ext'
original_path = '../{}'.format(filename)
fi = FileInfo(original_path)
assert fi.get_info() == (
filename,
original_path,
test_abspath,
test_size
)
Using more than one with statement, however, makes the code difficult to read, in my opinion, so
in general I prefer to avoid complex with trees if I do not really need to use a limited scope of the
patching.
>>> a = 1
>>> a.conjugate = 5
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'int' object attribute 'conjugate' is read-only
⁵⁵having multiple interpreters is something that you achieve embedding the Python interpreter in a C program, for example.
Chapter 3 - Mocks 70
Here I’m trying to replace a method with an integer, which is pointless, but nevertheless shows the
issue we are facing.
What has this immutability to do with patching? What patch does is actually to temporarily replace
an attribute of an object (method of a class, class of a module, etc), which also means that is we try
to replace an attribute in an immutable object the patching action will fail.
A typical example of this problem is the datetime module, which is also one of the best candidates
for patching, since the output of time functions is by definition time-varying.
Let me show the problem with a simple class that logs operations. I will temporarily break the TDD
methodology writing first the class and then the tests, so that you can appreciate the problem.
Create a file called logger.py and put there the following code
import datetime
class Logger:
def __init__(self):
self.messages = []
This is pretty simple, but testing this code is problematic, because the log() method produces results
that depend on the actual execution time. The call to datetime.datetime.now is however an outgoing
query, and as such it can be replaced by a mock with patch.
If we try to do it, however, we will have a bitter surprise. This is the test code, that you can put in
tests/test_logger.py
@patch('datetime.datetime.now')
def test_log(mock_now):
test_now = 123
test_message = "A test message"
mock_now.return_value = test_now
test_logger = Logger()
test_logger.log(test_message)
assert test_logger.messages == [(test_now, test_message)]
Chapter 3 - Mocks 71
When you try to execute this test you will get the following error
which is raised because patching tries to replace the now function in datetime.datetime with a mock,
and the module being immutable this operation fails.
There are several ways to address this problem. All of them, however, start from the fact that
importing or subclassing an immutable object gives you a mutable “copy” of that object.
The easiest example in this case is the module datetime itself. In the test_log function we tried
to patch directly the datetime.datetime.now object, affecting the builtin module datetime. The
file logger.py, however, does import datetime, so this latter becomes a local symbol in the logger
module. This is exactly the key for our patching. Let us change the code to
@patch('fileinfo.logger.datetime.datetime')
def test_log(mock_datetime):
test_now = 123
test_message = "A test message"
mock_datetime.now.return_value = test_now
test_logger = Logger()
test_logger.log(test_message)
assert test_logger.messages == [(test_now, test_message)]
If you run the test now, you can see that the patching works. What we did was to inject
our mock in fileinfo.logger.datetime.datetime instead of datetime.datetime.now. Two things
changed, thus, in our test. First, we are patching the module imported in the logger.py file
and not the module provided globally by the Python interpreter. Second, we have to patch
the whole module because this is what is imported by the logger.py file. If you try to patch
fileinfo.logger.datetime.datetime.now you will find that it is still immutable.
Another possible solution to this problem is to create a function that invokes the immutable object
and returns its value. This last function can be easily patched, because it just uses the builtin objects
⁵⁶https://github.com/pycabook/fileinfo/tree/initial-logger-not-working
⁵⁷https://github.com/pycabook/fileinfo/tree/correct-patching
Chapter 3 - Mocks 72
and thus is not immutable. This solution, however, requires to change the source code to allow
testing, which is far from being optimal. Obviously it is better to introduce a small change in the
code and have it tested than to leave it untested, but whenever is possible I try as much as possible
to avoid solutions that introduce code which wouldn’t be required without tests.
A warning
Mocks are a good way to approach parts of the system that are not under test but that are still
part of the code that we are running. This is particularly true for parts of the code that we wrote,
which internal structure is ultimately known. When the external system is complex and completely
detached from our code, mocking starts to become complicated and the risk is that we spend more
time faking parts of the system than actually writing code.
In this cases we definitely crossed the barrier between unit testing and integration testing. You may
see mocks as the bridge between the two, as they allow you to keep unit-testing parts that are
naturally connected (“integrated”) with external systems, but there is a point where you need to
recognise that you need to change approach.
This threshold is not fixed, and I can’t give you a rule to recognise it, but I can give you some advice.
First of all keep an eye on how many things you need to mock to make a test run, as an increasing
number of mocks in a single test is definitely a sign of something wrong in the testing approach.
Chapter 3 - Mocks 73
My rule of thumb is that when I have to create more than 3 mocks, an alarm goes off in my mind
and I start questioning what I am doing.
The second advice is to always consider the complexity of the mocks. You may find yourself patching
a class but then having to create monsters like cls_mock().func1().func2().func3.assert_-
called_with(x=42) which is a sign that the part of the system that you are mocking is deep into
some code that you cannot really access, because you don’t know it’s internal mechanisms. This is
the case with ORMs, for example, and I will discuss it later in the book.
The third advice is to consider mocks as “hooks” that you throw at the external system, and that
break its hull to reach its internal structure. These hooks are obviously against the assumption that
we can interact with a system knowing only its external behaviour, or its API. As such, you should
keep in mind that each mock you create is a step back from this perfect assumption, thus “breaking
the spell” of the decoupled interaction. Doing this you will quickly become annoyed when you have
to create too many mocks, and this will contribute in keeping you aware of what you are doing (or
overdoing).
Recap
Mocks are a very powerful tool that allows us to test code that contains outgoing messages, in
particular they allow us to test the arguments of outgoing commands. Patching is a good way
to overcome the fact that some external components are hardcoded in our code and are thus
unreachable through the arguments passed to the classes or the methods under analysis.
Mocks are also the most complex part of testing, so don’t be surprised if you are still a bit confused
by them. Review the chapter once, maybe, but then try to go on, as in later chapters we will use
mocks in very simple and practical examples, which may shed light upon the whole matter.
Part 2 - The clean architecture
Chapter 1 - Components of a clean
architecture
Wait a minute. Wait a minute Doc, uh, are you telling me you built a time machine… out of
a DeLorean?
- Back to the Future (1985)
Main layers
Let’s have a look at the main layers of a clean architecture, keeping in mind that your implementation
may require to create new layers or to split some of these into multiple ones.
Chapter 1 - Components of a clean architecture 76
Entities
This layer of the clean architecture contains a representation of the domain models, that is everything
your project need to interact with and is sufficiently complex to require a specific representation.
For example, strings in Python are complex and very powerful objects. They provide many methods
out of the box, so in general it is useless to create a domain model for them. If your project is a tool
to analyse medieval manuscripts, however, you might need to isolate sentences and at this point
maybe you need a specific domain model.
Since we work in Python, this layer will contain classes, with methods that simplify the interaction
with them. It is very important, however, to understand that the models in this layer are different
from the usual models of frameworks like Django. These models are not connected with a storage
system, so they cannot be directly saved or queried using methods of their classes, they don’t contain
methods to dump themselves to JSON strings, they are not connected with any presentation layer.
They are so-called lightweight models.
This is the inmost layer. Entities have a mutual knowledge since they live in the same layer, so the
architecture allows them to interact directly. This means that one of your Python classes can use
another one directly, instantiating it and calling its methods. Entities don’t know anything that lives
in outer layers, however. For example, entities don’t know details about the external interfaces, and
they only work with interfaces.
Use cases
This layer contains the use cases implemented by the system. Use cases are the processes that happen
in your application, where you use you domain models to work on real data. Examples can be a user
logging in, a search with specific filters being performed, or a bank transaction happening when the
user wants to buy the content of the cart.
A use case should be as small a possible. It is very important to isolate small actions in use cases, as
this makes the whole system easier to test, understand and maintain.
Use cases know the entities, so they can instantiate them directly and use them. They can also call
each other, and it is common to create complex use cases that put together other simpler ones.
External systems
This part of the architecture is made by external systems that implement the interfaces defined in
the previous layer. Examples of these systems can be a specific framework that exposes an HTTP
API, or a specific database.
Chapter 1 - Components of a clean architecture 77
Project overview
The goal of the “Rent-o-matic” project (fans of “Day of the Tentacle” may get the reference) is to
create a simple search engine on top of a dataset of objects which are described by some quantities.
The search engine shall allow to set some filters to narrow the search.
The objects in the dataset are houses for rent described by the following quantities:
• An unique identifier
• A size in square meters
• A renting price in Euro/day
• Latitude and longitude
The description of the house is purposely minimal, so that the whole project can easily fit in a
chapter. The concepts that I will show are however easily extendable to more complex cases.
As pushed by the clean architecture model, we are interested in separating the different layers of
the system.
I will follow the TDD methodology, but I will not show all the single steps to avoid this chapter
becoming too long.
Remember that there are multiple ways to implement the clean architecture concepts, and the code
you can come up with strongly depends on what your language of choice allows you to do. The
following is an example of clean architecture in Python, and the implementation of the models, use
cases and other components that I will show is just one of the possible solutions.
The full project is available on GitHub⁵⁹.
⁵⁹https://github.com/pycabook/rentomatic
Chapter 2 - A basic example 79
Project setup
Follow the instructions I gave in the first chapter and create a virtual environment for the project,
install Cookiecutter, and then create a project using the recommended template. For this first project
use the name rentomatic as I did, so you can use the same code that I will show without having
to change the name of the imported modules. You also want to use pytest, so answer yes to that
question.
After you created the project install the requirements with
Try to run py.test -svv to check that everything is working correctly, and then remove the files
tests/test_rentomatic.py and rentomatic/rentomatic.py.
In this chapter I will not explicitly state that I run the test suite, as I consider it part of the standard
workflow. Every time we write a test you should run the suite and check that you get an error (or
more), and the code that I give as a solution should make the test suite pass. You are free to try to
implement your own code before copying my solution, obvioously.
Domain models
Let us start with a simple definition of the Room model. As said before, the clean architecture models
are very lightweight, or at least they are lighter than their counterparts in common web frameworks.
Following the TDD methodology the first thing that I write are the tests. Create the tests/domain/test_-
room.py and put this code inside it
import uuid
from rentomatic.domain import room as r
def test_room_model_init():
code = uuid.uuid4()
room = r.Room(code, size=200, price=10,
longitude=-0.09998975,
latitude=51.75436293)
assert room.code == code
assert room.size == 200
assert room.price == 10
assert room.longitude == -0.09998975
assert room.latitude == 51.75436293
Chapter 2 - A basic example 80
Remember to create an __init__.py file in every subdirectory of tests/, so in this case create
tests/domain/__init__.py. This test ensures that the model can be initialised with the correct
values. All the parameters of the model are mandatory. Later we could want to make some of them
optional, and in that case we will have to add the relevant tests.
Now let’s write the Room class in the rentomatic/domain/room.py file.
class Room:
def __init__(self, code, size, price, longitude, latitude):
self.code = code
self.size = size
self.price = price
self.latitude = latitude
self.longitude = longitude
The model is very simple, and requires no further explanation. Given that we will receive data to
initialise this model from other layers, and that this data is likely to be a dictionary, it is useful to
create a method that initialises the model from this type of structure. The test for this method is
def test_room_model_from_dict():
code = uuid.uuid4()
room = r.Room.from_dict(
{
'code': code,
'size': 200,
'price': 10,
'longitude': -0.09998975,
'latitude': 51.75436293
}
)
assert room.code == code
assert room.size == 200
assert room.price == 10
assert room.longitude == -0.09998975
assert room.latitude == 51.75436293
@classmethod
def from_dict(cls, adict):
return cls(
code=adict['code'],
size=adict['size'],
price=adict['price'],
latitude=adict['latitude'],
longitude=adict['longitude'],
)
As you can see one of the benefits of a clean architecture is that each layer contains small pieces of
code that, being isolated, shall perform simple tasks. In this case the model provides an initialisation
API and stores the information inside the class.
It is often useful to compare models, and we will use this feature later in the project. The comparison
operator can be added to any Python object through the __eq__ method that receives another object
and returns either True or False. Comparing Room fields might however result in a very big and chain
of statements, so the first things I will do is to write a method to convert the object in a dictionary.
The test goes in tests/domain/test_room.py
def test_room_model_to_dict():
room_dict = {
'code': uuid.uuid4(),
'size': 200,
'price': 10,
'longitude': -0.09998975,
'latitude': 51.75436293
}
room = r.Room.from_dict(room_dict)
⁶¹https://github.com/pycabook/rentomatic/tree/chapter-2-domain-models-step-2
Chapter 2 - A basic example 82
def to_dict(self):
return {
'code': self.code,
'size': self.size,
'price': self.price,
'latitude': self.latitude,
'longitude': self.longitude,
}
Note that this is not yet a serialisation of the object, as the result is still a Python data structure and
not a string.
At this point writing the comparison operator is very simple. The test goes in the same file as the
previous test
def test_room_model_comparison():
room_dict = {
'code': uuid.uuid4(),
'size': 200,
'price': 10,
'longitude': -0.09998975,
'latitude': 51.75436293
}
room1 = r.Room.from_dict(room_dict)
room2 = r.Room.from_dict(room_dict)
⁶²https://github.com/pycabook/rentomatic/tree/chapter-2-domain-models-step-3
⁶³https://github.com/pycabook/rentomatic/tree/chapter-2-domain-models-step-4
Chapter 2 - A basic example 83
Serializers
Outer layers can use the Room model, but if you want to return the model as a result of an API call
you need a serializer.
The typical serialization format is JSON, as this is a broadly accepted standard for web-based API.
The serializer is not part of the model, but is an external specialized class that receives the model
instance and produces a representation of its structure and values.
To test the JSON serialization of our Room class put in the tests/serializers/test_room_json_-
serializer.py file the following code
import json
import uuid
def test_serialize_domain_room():
code = uuid.uuid4()
room = r.Room(
code=code,
size=200,
price=10,
longitude=-0.09998975,
latitude=51.75436293
)
expected_json = """
{{
"code": "{}",
"size": 200,
"price": 10,
"longitude": -0.09998975,
"latitude": 51.75436293
}}
""".format(code)
Here, we create the Room object and write the expected JSON output (with some annoying escape
sequences like {{ and }} due to the clash with the {} syntax of Python strings format methos). Then
we dump the Room object to a JSON string and compare the two. To compare the two we load them
again into Python dictionaries, to avoid issues with the order of the attributes. Comparing Python
dictionaries, indeed, doesn’t consider the order of the dictionary fields, while comparing strings
obviously does.
Put in the rentomatic/serializers/room_json_serializer.py file the code that makes the test pass
import json
class RoomJsonEncoder(json.JSONEncoder):
Providing a class that inherits from json.JSONEncoder let us use the json.dumps(room, cls=RoomEncoder)
syntax to serialize the model. Note that we are not using the to_dict method, as the UUID code is
not directly JSON serialisable. This means that there is a slight degree of code repetition in the two
classes, which in my opinion is acceptable, being covered by tests. If you prefer, however, you can
call the to_dict method and then adjust the code field with the str conversion.
Use cases
It’s time to implement the actual business logic that runs inside our application. Use cases are the
places where this happens, and they might or might not be directly linked to the external API of the
system.
⁶⁴https://github.com/pycabook/rentomatic/tree/chapter-2-serializers
Chapter 2 - A basic example 85
The simplest use case we can create is one that fetches all the rooms stored in the repository and
returns them. In this first part we will not implement the filters to narrow the search. That part will
be introduced in the next chapter when we will discuss error management.
The repository is our storage component, and according to the clean architecture it will be
implemented in an outer level (external systems). We will access it as an interface, which in Python
means that we will receive an object that we expect will expose a certain API. From the testing point
of view the best way to run code that accesses an interface is to mock this latter. Put this code in the
tests/use_cases/test_room_list_use_case.py
I will make use of pytest’s powerful fixtures, but I will not introduce them. I highly recommend
reading the official documentation⁶⁵, which is very good and covers many different use cases.
import pytest
import uuid
from unittest import mock
@pytest.fixture
def domain_rooms():
room_1 = r.Room(
code=uuid.uuid4(),
size=215,
price=39,
longitude=-0.09998975,
latitude=51.75436293,
)
room_2 = r.Room(
code=uuid.uuid4(),
size=405,
price=66,
longitude=0.18228006,
latitude=51.74640997,
)
room_3 = r.Room(
code=uuid.uuid4(),
size=56,
price=60,
⁶⁵https://docs.pytest.org/en/latest/fixture.html
Chapter 2 - A basic example 86
longitude=0.27891577,
latitude=51.45994069,
)
room_4 = r.Room(
code=uuid.uuid4(),
size=93,
price=48,
longitude=0.33894476,
latitude=51.39916678,
)
def test_room_list_without_parameters(domain_rooms):
repo = mock.Mock()
repo.list.return_value = domain_rooms
room_list_use_case = uc.RoomListUseCase(repo)
result = room_list_use_case.execute()
repo.list.assert_called_with()
assert result == domain_rooms
The test is straightforward. First we mock the repository so that it provides a list method that
returns the list of models we created above the test. Then we initialise the use case with the repository
and execute it, collecting the result. The first thing we check is that the repository method was called
without any parameter, and the second is the effective correctness of the result.
Calling the list method of the repository is an outgoing query action that the use case is supposed
to perform, and according to the unit testing rules we should not test outgoing queries. We should
however test how our system runs the outgoing query, that is the parameters used to run the query.
Put the implementation of the use case in the rentomatic/use_cases/room_list_use_case.py
class RoomListUseCase:
def execute(self):
return self.repo.list()
Chapter 2 - A basic example 87
This might seem too simple, but this particular use case is just a wrapper around a specific function of
the repository. As a matter of fact, this use case doesn’t contain any error check, which is something
we didn’t take into account yet. In the next chapter we will discuss requests and responses, and the
use case will become slightly more complicated.
import pytest
@pytest.fixture
def room_dicts():
return [
{
'code': 'f853578c-fc0f-4e65-81b8-566c5dffa35a',
'size': 215,
'price': 39,
'longitude': -0.09998975,
'latitude': 51.75436293,
},
{
'code': 'fe2c3195-aeff-487a-a08f-e0bdc0ec6e9a',
'size': 405,
'price': 66,
'longitude': 0.18228006,
'latitude': 51.74640997,
},
{
'code': '913694c6-435a-4366-ba0d-da5334a611b2',
'size': 56,
'price': 60,
'longitude': 0.27891577,
'latitude': 51.45994069,
},
{
'code': 'eed76e77-55c1-41ce-985d-ca49bf6c0585',
'size': 93,
'price': 48,
'longitude': 0.33894476,
'latitude': 51.39916678,
}
]
def test_repository_list_without_parameters(room_dicts):
repo = memrepo.MemRepo(room_dicts)
Chapter 2 - A basic example 89
In this case we need a single test that checks the behaviour of the list method. The implementation
that passes the test goes in the file rentomatic/repository/memrepo.py
class MemRepo:
def __init__(self, data):
self.data = data
def list(self):
return [r.Room.from_dict(i) for i in self.data]
You can easily imagine this class being the wrapper around a real database or any other storage
type. While the code might become more complex, the structure of the repository is the same, with
a single public method list. I will dig into database repositories in a later chapter.
#!/usr/bin/env python
repo = mr.MemRepo([])
use_case = uc.RoomListUseCase(repo)
result = use_case.execute()
print(result)
You can execute this file with python cli.py or, if you prefer, run chmod +x cli.py (which make it
executable) and then run it with ./cli.py directly. The expected result is an empty list
$ ./cli.py
[]
which is correct as the MemRepo class in the cli.py file has been initialised with an empty list. The
simple in-memory storage that we use has no persistence, so every time we create it we have to load
some data in it. This has been done to keep the storage layer simple, but keep in mind that if the
storage was a proper database this part of the code would connect to it but there would be no need
to load data in it.
The important part of the script are the three lines
repo = mr.MemRepo([])
use_case = uc.RoomListUseCase(repo)
result = use_case.execute()
which initialise the repository, use it to initialise the use case, and run this latter. This is in general
how you end up using your clean architecture in whatever external system you will plug into it. You
initialise other systems, you initialise the use case, and you collect the results.
For the sake of demonstration, let’s define some data in the file and load them in the repository
⁶⁸https://github.com/pycabook/rentomatic/tree/chapter-2-command-line-interface-step-1
Chapter 2 - A basic example 91
#!/usr/bin/env python
room1 = {
'code': 'f853578c-fc0f-4e65-81b8-566c5dffa35a',
'size': 215,
'price': 39,
'longitude': -0.09998975,
'latitude': 51.75436293,
}
room2 = {
'code': 'fe2c3195-aeff-487a-a08f-e0bdc0ec6e9a',
'size': 405,
'price': 66,
'longitude': 0.18228006,
'latitude': 51.74640997,
}
room3 = {
'code': '913694c6-435a-4366-ba0d-da5334a611b2',
'size': 56,
'price': 60,
'longitude': 0.27891577,
'latitude': 51.45994069,
}
result = use_case.execute()
Again, remember that this is due to the trivial nature of our storage, and not to the architecture
of the system. Note that I changed the print instruction as the repository returns domain models
⁶⁹https://github.com/pycabook/rentomatic/tree/chapter-2-command-line-interface-step-2
Chapter 2 - A basic example 92
and it printing them would result in a list of strings like <rentomatic.domain.room.Room object at
0x7fb815ec04e0>, which is not really helpful.
If you run the command line tool now, you will get a richer result than before
$ ./cli.py
[{'code': 'f853578c-fc0f-4e65-81b8-566c5dffa35a', 'size': 215, 'price': 39, 'latitud\
e': 51.75436293,
'longitude': -0.09998975}, {'code': 'fe2c3195-aeff-487a-a08f-e0bdc0ec6e9a', 'siz\
e': 405, 'price': 66,
'latitude': 51.74640997, 'longitude': 0.18228006}, {'code': '913694c6-435a-4366-\
ba0d-da5334a611b2',
'size': 56, 'price': 60, 'latitude': 51.45994069, 'longitude': 0.27891577}]
HTTP API
In this section I will go through the creation of an HTTP endpoint for the room list use case. An
HTTP endpoint is a URL exposed by a Web server that runs a specific logic and returns values, often
formatted as JSON, which is a widely used format for this type of API.
The semantic of URLs, that is their structure and the requests they can accept, comes from the REST
recommendations. REST is however not part of the clean architecture, which means that you can
choose to model your URLs according to whatever scheme you might prefer.
To expose the HTTP endpoint we need a web server written in Python, and in this case I chose
Flask. Flask is a lightweight web server with a modular structure that provides just the parts that
the user needs. In particular, we will not use any database/ORM, since we already implemented our
own repository layer. The clean architecture works perfectly with other frameworks, like Django,
web2py, Pylons, and so on.
Let us start updating the requirements files. The requirements/prod.txt file shall contain Flask, as
this package contains a script that runs a local webserver that we can use to expose the endpoint
Flask
The requirements/test.txt file will contain the pytest extension to work with Flask (more on this
later)
Chapter 2 - A basic example 93
-r prod.txt
pytest
tox
coverage
pytest-cov
pytest-flask
Remember to run pip install -r requirements/dev.txt again after those changes to install the
new packages in your virtual environment.
The setup of a Flask application is not complex, but a lot of concepts are involved, and since this
is not a tutorial on Flask I will run quickly through these steps. I will however provide links to the
Flask documentation for every concept.
I usually define different configurations for my testing, development, and production environments.
Since the Flask application can be configured using a plain Python object (documentation⁷¹), I created
the file rentomatic/flask_settings.py to host those objects
class Config(object):
"""Base configuration."""
class ProdConfig(Config):
"""Production configuration."""
ENV = 'production'
DEBUG = False
class DevConfig(Config):
"""Development configuration."""
ENV = 'development'
DEBUG = True
class TestConfig(Config):
"""Test configuration."""
ENV = 'test'
TESTING = True
DEBUG = True
⁷⁰https://github.com/pycabook/rentomatic/tree/chapter-2-http-api-step-1
⁷¹http://flask.pocoo.org/docs/latest/api/#flask.Config.from_object
Chapter 2 - A basic example 94
def create_app(config_object=DevConfig):
app = Flask(__name__)
app.config.from_object(config_object)
app.register_blueprint(room.blueprint)
return app
Before we create the proper setup of the webserver we want to create the endpoint that will be
exposed. Endpoints are ultimately functions that are run when a use sends a request to a certain
URL, so we can still work with TDD, as the final goal is to have code that produces certain results.
The problem we have testing an endpoint is that we need the webserver to be up and running when
we hit the test URLs. This time the webserver is not an external system, that we can mock to test the
correct use of its API, but is part of our system, so we need to run it. This is what the pytest-flask
extension provides, in the form of pytest fixtures, in particular the client fixture.
This fixture hides a lot of automation, so it might be considered a bit “magic” at a first glance. When
you install the pytest-flask extension the fixture is available automatically, so you don’t need to
import it. Moreover, it tries to access another fixture named app that you have to define. This is thus
the first thing to do.
Fixtures can be defined directly in your tests file, but if we want a fixture to be globally available
the best place to define it is the file conftest.py which is automatically loaded by pytest. As you
can see there is a great deal of automation, and if you are not aware of it you might be surprised by
the results, or frustrated by the errors.
Lets create the file tests/conftest.py
⁷²http://flask.pocoo.org/docs/latest/config/
⁷³http://flask.pocoo.org/docs/latest/patterns/appfactories/
⁷⁴http://flask.pocoo.org/docs/latest/blueprints/
Chapter 2 - A basic example 95
import pytest
@pytest.yield_fixture(scope='function')
def app():
return create_app(TestConfig)
First of all the fixture has been defined with the scope of a function, which means that it will be
recreated for each test. This is good, as tests should be isolated, and we do not want to resuse the
application that another test has already tainted.
The function itself runs the app factory to create a Flask app, using the TestConfig configuration
from flask_settings, which sets the TESTING flag to True. You can find the description of these
flags in the official documentation⁷⁵.
At this point we can write the test for our endpoint. Create the file tests/rest/test_get_rooms_-
list.py
import json
from unittest import mock
room_dict = {
'code': '3251a5bd-86be-428d-8ae9-6e51a8048c33',
'size': 200,
'price': 10,
'longitude': -0.09998975,
'latitude': 51.75436293
}
room = Room.from_dict(room_dict)
rooms = [room]
@mock.patch('rentomatic.use_cases.room_list_use_case.RoomListUseCase')
def test_get(mock_use_case, client):
mock_use_case().execute.return_value = rooms
⁷⁵http://flask.pocoo.org/docs/1.0/config/
Chapter 2 - A basic example 96
http_response = client.get('/rooms')
import json
from unittest import mock
room_dict = {
'code': '3251a5bd-86be-428d-8ae9-6e51a8048c33',
'size': 200,
'price': 10,
'longitude': -0.09998975,
'latitude': 51.75436293
}
room = Room.from_dict(room_dict)
rooms = [room]
The first part contains imports and sets up a room from a dictionary. This way we can later directly
compare the content of the initial dictionary with the result of the API endpoint. Remember that
the API returns JSON content, and we can easily convert JSON data into simple Python structures,
so starting from a dictionary can come in handy.
@mock.patch('rentomatic.use_cases.room_list_use_case.RoomListUseCase')
def test_get(mock_use_case, client):
This is the only test that we have for the time being. During the whole test we mock the use case,
as we are not interested in running it. We are however interested in checking the arguments it is
called with, and a mock can provide this information. The test receives the mock from the the patch
decorator and client, which is one of the fixtures provided by pytest-flask. The client fixture
automatically loads the app one, which we defined in conftst.py, and is an object that simulates
an HTTP client that can access the API endpoints and store the responses of the server.
Chapter 2 - A basic example 97
mock_use_case().execute.return_value = rooms
http_response = client.get('/rooms')
The first line initialises the execute method of the mock. Pay attention that execute is run on an
instence of the RoomListUseCase class, and not on the class itself, which is why we call the mock
(mock_use_case()) before accessing the method.
The central part of the test is the line where we get the API endpoint, which sends an HTTP GET
requests and collects the server’s response.
After this we check that the data contained in the response is actually a JSON that represents the
room_dict structure, that the execute method has been called without any parameters, that the
HTTP response status code is 200, and last that the server sends the correct mimetype back.
It’s time to write the endpoint, where we will finally see all the pieces of the architecture working
together. Let me show you a template for the minimal Flask endpoint we can create
@blueprint.route('/rooms', methods=['GET'])
def room():
[LOGIC]
return Response([JSON DATA],
mimetype='application/json',
status=[STATUS])
As you can see the structure is really simple. Apart from setting the blueprint, which is the way Flask
registers endpoints, we create a simple function that runs the endpoint, and we decorate it assigning
the /rooms endpoint that serves GET requests. The function will run some logic and eventually return
a Response that contains JSON data, the correct mimetype, and an HTTP status that represents the
success or failure of the logic.
The above template becomes the following code that you can put in rentomatic/rest/room.py ⁷⁶
⁷⁶The Rent-o-matic rest/room is obviously connected with Day of the Tentacle’s Chron-O-John
Chapter 2 - A basic example 98
import json
room1 = {
'code': 'f853578c-fc0f-4e65-81b8-566c5dffa35a',
'size': 215,
'price': 39,
'longitude': -0.09998975,
'latitude': 51.75436293,
}
room2 = {
'code': 'fe2c3195-aeff-487a-a08f-e0bdc0ec6e9a',
'size': 405,
'price': 66,
'longitude': 0.18228006,
'latitude': 51.74640997,
}
room3 = {
'code': '913694c6-435a-4366-ba0d-da5334a611b2',
'size': 56,
'price': 60,
'longitude': 0.27891577,
'latitude': 51.45994069,
}
@blueprint.route('/rooms', methods=['GET'])
def room():
repo = mr.MemRepo([room1, room2, room3])
use_case = uc.RoomListUseCase(repo)
result = use_case.execute()
status=200)
As I did before, I initialised the memory storage with some data to give the use case something to
return. Please note that the code that runs the use case is
which is exactly the same code that we run in the command line interface. The rest of the code
creates a proper HTTP response, serializing the result of the use case using the specific serializer
that matches the domain model, and setting the HTTP status to 200 (success)
This shows you the power of the clean architecture in a nutshell. Writing a CLI interface or a Web
service is different only in the presentation layer, not in the logic, which is contained in the use case.
Now that we defined the endpoint we can finalise the configuration of the webserver, so that we can
access the endpoint with a browser. This is not strictly part of the clean architecture, but as already
happened for the CLI interface I want you to see the final result, to get the whole picture and also
to enjoy the effort you put in following the whole discussion up to this point.
Python web applications expose a common interface called Web Server Gateway Interface⁷⁸ or
WSGI. So to run the Flask development web server we have to define a wsgi.py file in the main
folder of the project, i.e. in the same directory of the cli.py file
app = create_app()
⁷⁷https://github.com/pycabook/rentomatic/tree/chapter-2-http-api-step-2
⁷⁸https://en.wikipedia.org/wiki/Web_Server_Gateway_Interface
⁷⁹https://github.com/pycabook/rentomatic/tree/chapter-2-http-api-step-3
Chapter 2 - A basic example 100
When the Flask Command Line Interface (http://flask.pocoo.org/docs/1.0/cli/) runs it looks for a file
named wsgi.py and lods it, expecting it to contain an app variable that is an instance of the Flask
object. As the create_app is a factory we just need to execute it.
At this point you can execute flask run in the directory that contains this file and you should see
a nice message like
http://localhost:5000/rooms
and enjoy the JSON returned by the first endpoint of your web application.
Conclusions
I hope you can now appreciate the power of the layered architecture that we created. We definitely
wrote a lot of code to “just” print out a list of models, but the code we wrote is a skeleton that can
easily be extended and modified. It is also fully tested, which is a part of the implementation that
many software projects struggle with.
The use case I presented is purposely very simple. It doesn’t require any input and it cannot return
error conditions, so the code we wrote completely ignored input validation and error management.
These topics are however extremely important, so we need to discuss how a clean architecture can
deal with them.
Chapter 3 - Error management
You sent them out there and you didn’t even warn them! Why didn’t you warn them, Burke?
- Aliens (1986)
Introduction
In every software project, a great part of the code is dedicated to error management, and this code
has to be rock solid. Error management is a complex topic, and there is always a corner case that
we left out, or a condition that we supposed could never fail, while it does.
In a clean architecture, the main process is the creation of use cases and their execution. This is
therefore the main source of errors, and the use cases layer is where we have to implement the error
management. Errors can obviously come from the domain models layer, but since those models are
created by the use cases the errors that are not managed by the models themselves automatically
become errors of the use cases.
To start working on possible errors and understand how to manage them, I will expand the
RoomListUseCase to support filters that can be used to select a subset of the Room objects in the
storage.
The filters argument could be for example a dictionary that contains attributes of the Room model
and the thresholds to apply to them. Once we accept such a rich structure, we open our use case to
all sorts of errors: attributes that do not exist in the Room model, thresholds of the wrong type, filters
that make the storage layer crash, and so on. All these considerations have to be taken into account
by the use case.
In particular we can divide the error management code in two different areas. The first one represents
and manages requests, that is the input data that reaches our use case. The second one covers the
way we return results from the use case through responses, the output data. These two concepts
shouldn’t be confused with HTTP requests and responses, even though there are similarities. We
are considering here the way data can be passed to and received from use cases, and how to manage
errors. This has nothing to do with a possible use of this architecture to expose an HTTP API.
Request and response objects are an important part of a clean architecture, as they transport call
parameters, inputs and results from outside the application into the use cases layer.
More specifically, requests are objects created from incoming API calls, thus they shall deal with
things like incorrect values, missing parameters, wrong formats, and so on. Responses, on the other
Chapter 3 - Error management 102
hand, have to contain the actual results of the API calls, but shall also be able to represent error cases
and to deliver rich information on what happened.
The actual implementation of request and response objects is completely free, the clean architecture
says nothing about them. The decision on how to pack and represent data is up to us.
def test_build_room_list_request_object_without_parameters():
request = req.RoomListRequestObject()
def test_build_room_list_request_object_from_empty_dict():
request = req.RoomListRequestObject.from_dict({})
While at the moment this request object is basically empty, it will come in handy as soon as we start
having parameters for the list use case. The code of the RoomListRequestObject is the following and
goes into the rentomatic/request_objects/room_list_request_object.py file
class RoomListRequestObject:
@classmethod
def from_dict(cls, adict):
return cls()
def __bool__(self):
return True
⁸⁰https://github.com/pycabook/rentomatic/tree/chapter-3-basic-requests-and-responses-step-1
Chapter 3 - Error management 103
The response object is also very simple, since for the moment we just need to return a successful
result. Unlike the request, the response is not linked to any particular use case, so the test file can be
named tests/response_objects/test_response_objects.py
def test_response_success_is_true():
assert bool(res.ResponseSuccess()) is True
class ResponseSuccess:
def __bool__(self):
return True
With these two object we just laid the foundations for a richer management of input and outputs of
the use case, especially in the case of error conditions.
⁸¹https://github.com/pycabook/rentomatic/tree/chapter-3-basic-requests-and-responses-step-2
Chapter 3 - Error management 104
import pytest
import uuid
from unittest import mock
@pytest.fixture
def domain_rooms():
room_1 = r.Room(
code=uuid.uuid4(),
size=215,
price=39,
longitude=-0.09998975,
latitude=51.75436293,
)
room_2 = r.Room(
code=uuid.uuid4(),
size=405,
price=66,
longitude=0.18228006,
latitude=51.74640997,
)
room_3 = r.Room(
code=uuid.uuid4(),
size=56,
price=60,
longitude=0.27891577,
latitude=51.45994069,
)
room_4 = r.Room(
code=uuid.uuid4(),
size=93,
price=48,
longitude=0.33894476,
latitude=51.39916678,
)
Chapter 3 - Error management 105
def test_room_list_without_parameters(domain_rooms):
repo = mock.Mock()
repo.list.return_value = domain_rooms
room_list_use_case = uc.RoomListUseCase(repo)
request = req.RoomListRequestObject()
response = room_list_use_case.execute(request)
class RoomListUseCase:
Now we have a standard way to pack input and output values, and the above pattern is valid for
every use case we can create. We are still missing some features however, because so far requests
and responses are not used to perform error management.
Request validation
The filters parameter that we want to add to the use case allows the caller to add conditions
to narrow the results of the model list operation, using a notation <attribute>__<operator>. For
⁸²https://github.com/pycabook/rentomatic/tree/chapter-3-requests-and-responses-in-a-use-case
Chapter 3 - Error management 106
example specifying filters={'price__lt': 100} should return all the results with a price lower
than 100.
Since the Room model has many attributes the number of possible filters is very high, so for
simplicity’s sake I will consider the following cases:
• The code attribute supports only __eq, which finds the room with the specific code, if it exists
• The price attribute supports __eq, __lt, and __gt
• All other attributes cannot be used in filters
The first thing to do is to change the request object, starting from the test. The new version of the
tests/request_objects/test_room_list_request_object.py file is the following
import pytest
def test_build_room_list_request_object_without_parameters():
request = req.RoomListRequestObject()
def test_build_room_list_request_object_from_empty_dict():
request = req.RoomListRequestObject.from_dict({})
def test_build_room_list_request_object_with_empty_filters():
request = req.RoomListRequestObject(filters={})
assert request.filters == {}
assert bool(request) is True
def test_build_room_list_request_object_from_dict_with_empty_filters():
request = req.RoomListRequestObject.from_dict({'filters': {}})
assert request.filters == {}
Chapter 3 - Error management 107
def test_build_room_list_request_object_from_dict_with_filters_wrong():
request = req.RoomListRequestObject.from_dict({'filters': {'a': 1}})
assert request.has_errors()
assert request.errors[0]['parameter'] == 'filters'
assert bool(request) is False
def test_build_room_list_request_object_from_dict_with_invalid_filters():
request = req.RoomListRequestObject.from_dict({'filters': 5})
assert request.has_errors()
assert request.errors[0]['parameter'] == 'filters'
assert bool(request) is False
@pytest.mark.parametrize(
'key',
['code__eq', 'price__eq', 'price__lt', 'price__gt']
)
def test_build_room_list_request_object_accepted_filters(key):
filters = {key: 1}
@pytest.mark.parametrize(
'key',
['code__lt', 'code__gt']
)
def test_build_room_list_request_object_rejected_filters(key):
filters = {key: 1}
assert request.has_errors()
assert request.errors[0]['parameter'] == 'filters'
Chapter 3 - Error management 108
As you can see I added the assert request.filters is None check to the original two tests, then I
added 6 tests for the filters syntax. Remember that if you are following TDD you should add these
tests one at a time and change the code accordingly, here I am only showing you the final result of
the process.
In particular, note that I used the pytest.mark.parametrize decorator to run the same test on multi-
ple value, the accepted filters in test_build_room_list_request_object_accepted_filters and the
filters that we don’t consider valid in test_build_room_list_request_object_rejected_filters.
The core idea here is that requests are customised for use cases, so they can contain the logic that
validates the arguments used to instantiate them. The request is valid or invalid before it reaches
the use case, so it is not responsibility of this latter to check that the input values have proper values
or a proper format.
To make the tests pass we have to change our RoomListRequestObject class. There are obviously
multiple possible solutions that you can come up with, and I recommend you to try to find your
own. This is the one I usually employ. The file rentomatic/request_objects/room_list_request_-
object.py becomes
import collections
class InvalidRequestObject:
def __init__(self):
self.errors = []
def has_errors(self):
return len(self.errors) > 0
def __bool__(self):
return False
class ValidRequestObject:
@classmethod
def from_dict(cls, adict):
raise NotImplementedError
Chapter 3 - Error management 109
def __bool__(self):
return True
class RoomListRequestObject(ValidRequestObject):
@classmethod
def from_dict(cls, adict):
invalid_req = InvalidRequestObject()
if 'filters' in adict:
if not isinstance(adict['filters'], collections.Mapping):
invalid_req.add_error('filters', 'Is not iterable')
return invalid_req
if invalid_req.has_errors():
return invalid_req
import pytest
@pytest.fixture
def response_value():
return {'key': ['value1', 'value2']}
@pytest.fixture
def response_type():
return 'ResponseError'
@pytest.fixture
def response_message():
return 'This is a response error'
Chapter 3 - Error management 111
def test_response_success_is_true(response_value):
assert bool(res.ResponseSuccess(response_value)) is True
def test_response_success_has_type_and_value(response_value):
response = res.ResponseSuccess(response_value)
def test_response_failure_has_type_and_message(
response_type, response_message):
response = res.ResponseFailure(response_type, response_message)
assert response.value == {
'type': response_type, 'message': response_message}
def test_response_failure_initialisation_with_exception():
response = res.ResponseFailure(
response_type, Exception('Just an error message'))
def test_response_failure_from_empty_invalid_request_object():
response = res.ResponseFailure.build_from_invalid_request_object(
req.InvalidRequestObject())
Chapter 3 - Error management 112
def test_response_failure_from_invalid_request_object_with_errors():
request_object = req.InvalidRequestObject()
request_object.add_error('path', 'Is mandatory')
request_object.add_error('path', "can't be blank")
response = res.ResponseFailure.build_from_invalid_request_object(
request_object)
def test_response_failure_build_resource_error():
response = res.ResponseFailure.build_resource_error("test message")
def test_response_failure_build_parameters_error():
response = res.ResponseFailure.build_parameters_error("test message")
def test_response_failure_build_system_error():
response = res.ResponseFailure.build_system_error("test message")
Let’s have a closer look at the tests contained in this file before moving to the code that implements
a solution. The first part contains just the imports and some pytest fixtures to make it easier to write
the tests
Chapter 3 - Error management 113
import pytest
@pytest.fixture
def response_value():
return {'key': ['value1', 'value2']}
@pytest.fixture
def response_type():
return 'ResponseError'
@pytest.fixture
def response_message():
return 'This is a response error'
The first two tests check that ResponseSuccess can be used as a boolean (this test was already
present), that it provides a type, and that it can store a value.
def test_response_success_is_true(response_value):
assert bool(res.ResponseSuccess(response_value)) is True
def test_response_success_has_type_and_value(response_value):
response = res.ResponseSuccess(response_value)
The remaining tests are all about ResponseFailure. A test to check that it behaves like a boolean
A test to check that it can be initialised with a type and a message, and that those values are stores
inside the object. A second test to verify the class exposes a value attribute that contains both the
type and the message.
Chapter 3 - Error management 114
def test_response_failure_has_type_and_message(
response_type, response_message):
response = res.ResponseFailure(response_type, response_message)
assert response.value == {
'type': response_type, 'message': response_message}
We sometimes want to create responses from Python exceptions that can happen in a use case, so
we test that ResponseFailure objects can be initialised with a generic exception. We also check that
the message is formatted properly
def test_response_failure_initialisation_with_exception():
response = res.ResponseFailure(
response_type, Exception('Just an error message'))
We want to be able to build a response directly from an invalid request, getting all the errors
contained in the latter.
def test_response_failure_from_empty_invalid_request_object():
response = res.ResponseFailure.build_from_invalid_request_object(
req.InvalidRequestObject())
def test_response_failure_from_invalid_request_object_with_errors():
request_object = req.InvalidRequestObject()
request_object.add_error('path', 'Is mandatory')
request_object.add_error('path', "can't be blank")
Chapter 3 - Error management 115
response = res.ResponseFailure.build_from_invalid_request_object(
request_object)
The last three tests check that the ResponseFailure can create three specific errors, represented by
the RESOURCE_ERROR, PARAMETERS_ERROR, and SYSTEM_ERROR class attributes. This categorization is an
attempt to capture the different types of issues that can happen when dealing with an external system
through an API. RESOURCE_ERROR contains all those errors that are related to the resources contained
in the repository, for instance when you cannot find an entry given its unique id. PARAMETERS_ERROR
describes all those errors that occur when the request parameters are wrong or missing. SYSTEM_-
ERROR encompass the errors that happen in the underlying system at operating system level, such
as a failure in a filesystem operation, or a network connection error while fetching data from the
database.
def test_response_failure_build_resource_error():
response = res.ResponseFailure.build_resource_error("test message")
def test_response_failure_build_parameters_error():
response = res.ResponseFailure.build_parameters_error("test message")
def test_response_failure_build_system_error():
response = res.ResponseFailure.build_system_error("test message")
Let’s write the classes that make the tests pass in rentomatic/response_objects/response_-
objects.py
Chapter 3 - Error management 116
class ResponseFailure:
RESOURCE_ERROR = 'ResourceError'
PARAMETERS_ERROR = 'ParametersError'
SYSTEM_ERROR = 'SystemError'
@property
def value(self):
return {'type': self.type, 'message': self.message}
def __bool__(self):
return False
@classmethod
def build_from_invalid_request_object(cls, invalid_request_object):
message = "\n".join(["{}: {}".format(err['parameter'], err['message'])
for err in invalid_request_object.errors])
return cls(cls.PARAMETERS_ERROR, message)
@classmethod
def build_resource_error(cls, message=None):
return cls(cls.RESOURCE_ERROR, message)
@classmethod
def build_system_error(cls, message=None):
return cls(cls.SYSTEM_ERROR, message)
@classmethod
def build_parameters_error(cls, message=None):
return cls(cls.PARAMETERS_ERROR, message)
class ResponseSuccess:
SUCCESS = 'Success'
Chapter 3 - Error management 117
def __bool__(self):
return True
Through the _format_message() method we enable the class to accept both string messages and
Python exceptions, which is very handy when dealing with external libraries that can raise
exceptions we do not know or do not want to manage.
As explained before, the PARAMETERS_ERROR type encompasses all those errors that come from an
invalid set of parameters, which is the case of this function, that shall be called whenever the request
is wrong, which means that some parameters contain errors or are missing.
def test_room_list_without_parameters(domain_rooms):
repo = mock.Mock()
repo.list.return_value = domain_rooms
room_list_use_case = uc.RoomListUseCase(repo)
request = req.RoomListRequestObject()
response = room_list_use_case.execute(request)
There are three new tests that we can add to check the behaviour of the use case when filters is
not None. The first one checks that the value of the filters key in the dictionary used to create the
request is actually used when calling the repository. This last two tests check the behaviour of the
use case when the repository raises an exception or when the request is badly formatted.
[...]
def test_room_list_with_filters(domain_rooms):
repo = mock.Mock()
repo.list.return_value = domain_rooms
room_list_use_case = uc.RoomListUseCase(repo)
qry_filters = {'code__eq': 5}
request_object = req.RoomListRequestObject.from_dict(
{'filters': qry_filters})
response_object = room_list_use_case.execute(request_object)
def test_room_list_handles_generic_error():
repo = mock.Mock()
repo.list.side_effect = Exception('Just an error message')
room_list_use_case = uc.RoomListUseCase(repo)
request_object = req.RoomListRequestObject.from_dict({})
response_object = room_list_use_case.execute(request_object)
def test_room_list_handles_bad_request():
repo = mock.Mock()
Chapter 3 - Error management 119
room_list_use_case = uc.RoomListUseCase(repo)
request_object = req.RoomListRequestObject.from_dict({'filters': 5})
response_object = room_list_use_case.execute(request_object)
class RoomListUseCase(object):
try:
rooms = self.repo.list(filters=request_object.filters)
return res.ResponseSuccess(rooms)
except Exception as exc:
return res.ResponseFailure.build_system_error(
"{}: {}".format(exc.__class__.__name__, "{}".format(exc)))
As you can see the first thing that the execute() method does is to check if the request is valid,
otherwise it returns a ResponseFailure built with the same request object. Then the actual business
logic is implemented, calling the repository and returning a successful response. If something goes
wrong in this phase the exception is caught and returned as an aptly formatted ResponseFailure.
⁸⁵https://github.com/pycabook/rentomatic/tree/chapter-3-error-management-in-a-use-case
Chapter 3 - Error management 120
Actually, after the introduction of requests and responses, we didn’t change the REST endpoint,
which is one of the connections between the external world and the use case. Given that the API of
the use case changed, we surely need to change the endpoint code, which calls the use case.
import json
from unittest import mock
room_dict = {
'code': '3251a5bd-86be-428d-8ae9-6e51a8048c33',
'size': 200,
'price': 10,
'longitude': -0.09998975,
'latitude': 51.75436293
}
room = Room.from_dict(room_dict)
rooms = [room]
Chapter 3 - Error management 121
@mock.patch('rentomatic.use_cases.room_list_use_case.RoomListUseCase')
def test_get(mock_use_case, client):
mock_use_case().execute.return_value = res.ResponseSuccess(rooms)
http_response = client.get('/rooms')
mock_use_case().execute.assert_called
args, kwargs = mock_use_case().execute.call_args
assert args[0].filters == {}
@mock.patch('rentomatic.use_cases.room_list_use_case.RoomListUseCase')
def test_get_with_filters(mock_use_case, client):
mock_use_case().execute.return_value = res.ResponseSuccess(rooms)
http_response = client.get('/rooms?filter_price__gt=2&filter_price__lt=6')
mock_use_case().execute.assert_called
args, kwargs = mock_use_case().execute.call_args
assert args[0].filters == {'price__gt': '2', 'price__lt': '6'}
The test_get function was already present but has been changed to reflect the use of requests and
responses. The first change is that the execute method in the mock has to return a proper response
[...]
and the second is the assertion on the call of the same method. It should be called with a properly
Chapter 3 - Error management 122
formatted request, but since we can’t compare requests we need a way to look into the call
arguments. This can be done with
mock_use_case().execute.assert_called
args, kwargs = mock_use_case().execute.call_args
assert args[0].filters == {}
as execute should receive as an argument a request with empty filters. The test_get_with_filters
functions performs the same operation, but passing a querystring to the /rooms URL, which requires
a different assertion
Both the tests are passed by a new version of the room endpoint in the rentomatic/rest/room.py
file
import json
STATUS_CODES = {
res.ResponseSuccess.SUCCESS: 200,
res.ResponseFailure.RESOURCE_ERROR: 404,
res.ResponseFailure.PARAMETERS_ERROR: 400,
res.ResponseFailure.SYSTEM_ERROR: 500
}
room1 = {
'code': 'f853578c-fc0f-4e65-81b8-566c5dffa35a',
'size': 215,
'price': 39,
'longitude': -0.09998975,
'latitude': 51.75436293,
}
Chapter 3 - Error management 123
room2 = {
'code': 'fe2c3195-aeff-487a-a08f-e0bdc0ec6e9a',
'size': 405,
'price': 66,
'longitude': 0.18228006,
'latitude': 51.74640997,
}
room3 = {
'code': '913694c6-435a-4366-ba0d-da5334a611b2',
'size': 56,
'price': 60,
'longitude': 0.27891577,
'latitude': 51.45994069,
}
@blueprint.route('/rooms', methods=['GET'])
def room():
qrystr_params = {
'filters': {},
}
request_object = req.RoomListRequestObject.from_dict(qrystr_params)
response = use_case.execute(request_object)
⁸⁶https://github.com/pycabook/rentomatic/tree/chapter-3-the-http-server
Chapter 3 - Error management 124
The repository
If we run the Flask development webserver now and try to access the /rooms endpoint, we will get
a nice response that says
and if you look at the HTTP response⁸⁷ you can see an HTTP 500 error, which is exactly the mapping
of our SystemError use case error, which in turn signals a Python exception, which is in the message
part of the error.
This error comes from the repository, which has not been migrated to the new API. We need then to
change the list method of the MemRepo class to accept the filters parameter and to act accordingly.
The new version of the tests/repository/test_memrepo.py file is
import pytest
@pytest.fixture
def room_dicts():
return [
{
'code': 'f853578c-fc0f-4e65-81b8-566c5dffa35a',
'size': 215,
'price': 39,
'longitude': -0.09998975,
'latitude': 51.75436293,
},
{
'code': 'fe2c3195-aeff-487a-a08f-e0bdc0ec6e9a',
'size': 405,
'price': 66,
'longitude': 0.18228006,
'latitude': 51.74640997,
},
{
'code': '913694c6-435a-4366-ba0d-da5334a611b2',
⁸⁷For example using the browser developer tools. In Chrome, press F12 and open the Network tab, then refresh the page.
Chapter 3 - Error management 125
'size': 56,
'price': 60,
'longitude': 0.27891577,
'latitude': 51.45994069,
},
{
'code': 'eed76e77-55c1-41ce-985d-ca49bf6c0585',
'size': 93,
'price': 48,
'longitude': 0.33894476,
'latitude': 51.39916678,
}
]
def test_repository_list_without_parameters(room_dicts):
repo = memrepo.MemRepo(room_dicts)
def test_repository_list_with_code_equal_filter(room_dicts):
repo = memrepo.MemRepo(room_dicts)
repo_rooms = repo.list(
filters={'code__eq': 'fe2c3195-aeff-487a-a08f-e0bdc0ec6e9a'}
)
assert len(repo_rooms) == 1
assert repo_rooms[0].code == 'fe2c3195-aeff-487a-a08f-e0bdc0ec6e9a'
def test_repository_list_with_price_equal_filter(room_dicts):
repo = memrepo.MemRepo(room_dicts)
repo_rooms = repo.list(
filters={'price__eq': 60}
)
assert len(repo_rooms) == 1
assert repo_rooms[0].code == '913694c6-435a-4366-ba0d-da5334a611b2'
Chapter 3 - Error management 126
def test_repository_list_with_price_less_than_filter(room_dicts):
repo = memrepo.MemRepo(room_dicts)
repo_rooms = repo.list(
filters={'price__lt': 60}
)
assert len(repo_rooms) == 2
assert set([r.code for r in repo_rooms]) ==\
{
'f853578c-fc0f-4e65-81b8-566c5dffa35a',
'eed76e77-55c1-41ce-985d-ca49bf6c0585'
}
def test_repository_list_with_price_greater_than_filter(room_dicts):
repo = memrepo.MemRepo(room_dicts)
repo_rooms = repo.list(
filters={'price__gt': 48}
)
assert len(repo_rooms) == 2
assert set([r.code for r in repo_rooms]) ==\
{
'913694c6-435a-4366-ba0d-da5334a611b2',
'fe2c3195-aeff-487a-a08f-e0bdc0ec6e9a'
}
def test_repository_list_with_price_between_filter(room_dicts):
repo = memrepo.MemRepo(room_dicts)
repo_rooms = repo.list(
filters={
'price__lt': 66,
'price__gt': 48
}
)
assert len(repo_rooms) == 1
Chapter 3 - Error management 127
As you can see, I added many tests. One test for each of the four accepted filters (code_-
_eq, price__eq, price__lt, price__gt, see rentomatic/request_objects/room_list_request_-
object.py), and one final test that tries two fidderent filters at the same time. The new version
of the rentomatic/repository/memrepo.py file that passes all the tests is
class MemRepo:
def __init__(self, data):
self.data = data
if filters is None:
return result
if 'code__eq' in filters:
result = [r for r in result if r.code == filters['code__eq']]
if 'price__eq' in filters:
result = [r for r in result if r.price == filters['price__eq']]
if 'price__lt' in filters:
result = [r for r in result if r.price < filters['price__lt']]
if 'price__gt' in filters:
result = [r for r in result if r.price > filters['price__gt']]
return result
At this point you can fire up the Flask development webserver with flask run, and get the list of
all your rooms at
⁸⁸https://github.com/pycabook/rentomatic/tree/chapter-3-the-repository
Chapter 3 - Error management 128
http://localhost:5000/rooms
http://localhost:5000/rooms?filter_code__eq=f853578c-fc0f-4e65-81b8-566c5dffa35a
http://localhost:5000/rooms?filter_price__lt=50
which return all the rooms with a price less than 50.
Conclusions
We now have a very robust system to manage input validation and error conditions, and it is generic
enough to be used with any possible use case. Obviously we are free to add new types of errors to
increase the granularity with which we manage failures, but the present version already covers
everything that can happen inside a use case.
In the next chapter we will have a look at repositories based on real database engines, showing how
to test external systems with integration tests, and how the clean architecture allows us to simply
switch between very different backends for services.
Chapter 4 - Database repositories
Ooooh, I’m very sorry Hans. I didn’t get that memo. Maybe you should’ve put it on the bulletin
board.
- Die Hard (1988)
The basic in-memory repository I implemented for the project is enough to show the concept of
the repository layer abstraction, and any other type of repository will follow the same idea. In the
spirit of providing a simple but realistic solution, however, I believe it is worth reimplementing the
repository layer with a proper database.
This gives me the chance to show you one of the big advantages of a clean architecture, namely
the simplicity with which you can replace existing components with others, possibly based on a
completely different technology.
Introduction
The clean architecture we devised in the previous chapters defines a use case that receives a
repository instance as an argument and uses its list method to retrieve the contained entries. This
allows the use case to form a very loose coupling with the repository, being connected only through
the API exposed by the object and not to the real implementation. In other words, the use cases are
polymorphic in respect of the list method.
This is very important and it is the core of the clean architecture design. Being connected through
an API, the use case and the repository can be replaced by different implementations at any time,
given that the new implementation provides the requested interface.
It is worth noting, for example, that the initialisation of the object is not part of the API that the use
cases are using, since the repository is initialised in the main script and not in each use case. The
__init__ method, thus, doesn’t need to be the same among the repository implementation, which
gives us a great deal of flexibility, as different storages may need different initialisation values.
The simple repository we implemented in one of the previous chapters was
Chapter 4 - Database repositories 130
class MemRepo:
def __init__(self, data):
self.data = data
if filters is None:
return result
if 'code__eq' in filters:
result = [r for r in result if r.code == filters['code__eq']]
if 'price__eq' in filters:
result = [r for r in result if r.price == filters['price__eq']]
if 'price__lt' in filters:
result = [r for r in result if r.price < filters['price__lt']]
if 'price__gt' in filters:
result = [r for r in result if r.price > filters['price__gt']]
return result
which interface is made of two parts: the initialisation and the list method. The __init__ method
accepts values because this specific object doesn’t act as a long-term storage, so we are forced to
pass some data every time we instantiate the class.
A repository based on a proper database will not need to be filled with data when initialised, its
main job being that of storing data between sessions, but will nevertheless need to be initialised at
least with the database address and access credentials.
Furthermore, we have to deal with a proper external system, so we have to devise a strategy to test it,
as this might require a running database engine in the background. Remember that we are creating a
specific implementation of a repository, so everything will be tailored to the actual database system
that we will choose.
Chapter 4 - Database repositories 131
import pytest
pytestmark = pytest.mark.integration
def test_dummy():
pass
The pytestmark module attribute labels every test in the module with the integration tag. To verify
that this works I added a test_dummy test function which passes always. You can now run py.test
-svv -m integration to ask pytest to run only the tests marked with that label. The -m option
supports a rich syntax that you can learn reading the documentation⁹².
⁸⁹https://www.postgresql.org
⁹⁰https://www.sqlalchemy.org
⁹¹unless you consider things like sessionmaker_mock()().query.assert_called_with(Room) something attractive. And this was by far the
simplest mock I had to write.
⁹²https://docs.pytest.org/en/latest/example/markers.html
Chapter 4 - Database repositories 132
While this is enough to run integration tests selectively, it is not enough to skip them by default. To
do this we can alter the pytest setup to label all those tests as skipped, but this will give us no means
to run them. The standard way to implement this is to define a new command line option and to
process each marked test according to the value of this option.
To do it open the tests/conftest.py that we already created and add the following code
def pytest_addoption(parser):
parser.addoption("--integration", action="store_true",
help="run integration tests")
def pytest_runtest_setup(item):
if 'integration' in item.keywords and not \
item.config.getvalue("integration"):
pytest.skip("need --integration option to run")
The first function is a hook into the pytest CLI parser that adds the --integration option. When
this option is specified on the command line the pytest setup will contain the key integration with
value True.
The second function is a hook into the pytest setup of each single test. The item variable contains the
test itself (actually a _pytest.python.Function object), which in turn contains two useful pieces of
information. The first is the item.keywords attribute, that contains the test marks, alongside many
other interesting things like the name of the test, the file, the module, and also information about
the patches that happen inside the test. The second is the item.config attribute that contains the
parsed pytest command line.
So, if the test is marked with integration ('integration' in item.keywords) and the --integration
option is not present (not item.config.getvalue("integration")) the test is skipped.
Base = declarative_base()
class Room(Base):
__tablename__ = 'room'
id = Column(Integer, primary_key=True)
Base = declarative_base()
We need to import many things from the SQLAlchemy package to setup the database and to create
the table. Remember that SQLAlchemy has a declarative approach, so we need to instantiate the
Base object and then use it as a starting point to declare the tables/objects.
class Room(Base):
__tablename__ = 'room'
id = Column(Integer, primary_key=True)
This is the class that represents the Room in the database. It is important to understand that this not
the class we are using in the business logic, but the class that we want to map into the SQL database.
The structure of this class is thus dictated by the needs of the storage layer, and not by the use
cases. You might want for instance to store longitude and latitude in a JSON field, to allow for
easier extendibility, without changing the definition of the domain model. In the simple case of the
Rent-o-matic project the two classes almost overlap, but this is not the case generally speaking.
Obviously this means that you have to keep in sync the storage and the domain levels, and that
you need to manage migrations on your own. You can obviously use tools like Alembic, but the
migrations will not come directly from domain model changes.
@pytest.fixture(scope='session')
def docker_setup(docker_ip):
return {
'postgres': {
'dbname': 'rentomaticdb',
'user': 'postgres',
'password': 'rentomaticdb',
'host': docker_ip
}
}
This way I have a single source of parameters that I will use to spin up the Docker container, but
also to set up the connection with the container itself during the tests.
The other two fixtures in the same file are the one that creates a temporary file and a one that creates
the configuration for docker-compose and stores it in the previously created file.
import os
import tempfile
import yaml
[...]
@pytest.fixture(scope='session')
def docker_tmpfile():
f = tempfile.mkstemp()
yield f
os.remove(f[1])
@pytest.fixture(scope='session')
def docker_compose_file(docker_tmpfile, docker_setup):
content = {
'version': '3.1',
'services': {
'postgresql': {
'restart': 'always',
'image': 'postgres',
'ports': ["5432:5432"],
'environment': [
'POSTGRES_PASSWORD={}'.format(
docker_setup['postgres']['password']
)
Chapter 4 - Database repositories 136
]
}
}
}
f = os.fdopen(docker_tmpfile[0], 'w')
f.write(yaml.dump(content))
f.close()
return docker_tmpfile[1]
The pytest-docker plugin leaves to us the task of defining a function to check if the container
is responsive, as the way to do it depends on the actual system that we are running (in this case
PostgreSQL). I also have to define the final fixture related to docker-compose, which makes use of
all I defined previously to create the connection with the PostgreSQL database. Both fixtures are
defined in tests/repository/postgres/conftest.py
import psycopg2
import sqlalchemy
import sqlalchemy_utils
import pytest
@pytest.fixture(scope='session')
def pg_engine(docker_ip, docker_services, docker_setup):
docker_services.wait_until_responsive(
Chapter 4 - Database repositories 137
timeout=30.0, pause=0.1,
check=lambda: pg_is_responsive(docker_ip, docker_setup)
)
conn_str = "postgresql+psycopg2://{}:{}@{}/{}".format(
docker_setup['postgres']['user'],
docker_setup['postgres']['password'],
docker_setup['postgres']['host'],
docker_setup['postgres']['dbname']
)
engine = sqlalchemy.create_engine(conn_str)
sqlalchemy_utils.create_database(engine.url)
conn = engine.connect()
yield engine
conn.close()
As you can see the pg_is_responsive function relies on a setup dictionary like the one that we
defined in the docker_setup fixture (the input argument is aptly named the same way) and returns
a boolean after having checked if it is possible to establish a connection with the server.
The second fixture receives docker_services, which spins up docker-compose automatically using
the docker_compose_file fixture I defined previously. The pg_is_responsive function is used to
wait for the container to reach a running state, then a connection is established and the database is
created. To simplify this last operation I imported and used the package sqlalchemy_utils. The
fixture yields the SQLAlchemy engine object, so it can be correctly closed once the session is
finished.
To properly run these fixtures we need to add some requirements. The new requirements/test.txt
file is
-r prod.txt
tox
coverage
pytest
pytest-cov
pytest-flask
pytest-docker
docker-compose
pyyaml
psycopg2
sqlalchemy_utils
Chapter 4 - Database repositories 138
Remember to run pip again to actually install the requirements after you edited the file
Database fixtures
With the pg_engine fixture we can define higher-level functions such as pg_session_empty that
gives us access to the pristine database, pg_data, which defines some values for the test queries, and
pg_session that creates the rows of the Room table using the previous two fixtures. All these fixtures
will be defined in tests/repository/postgres/conftest.py
[...]
@pytest.fixture(scope='session')
def pg_session_empty(pg_engine):
Base.metadata.create_all(pg_engine)
Base.metadata.bind = pg_engine
DBSession = sqlalchemy.orm.sessionmaker(bind=pg_engine)
session = DBSession()
yield session
session.close()
@pytest.fixture(scope='function')
def pg_data():
return [
{
'code': 'f853578c-fc0f-4e65-81b8-566c5dffa35a',
'size': 215,
⁹⁵https://github.com/pycabook/rentomatic/tree/chapter-4-the-database-container
Chapter 4 - Database repositories 139
'price': 39,
'longitude': -0.09998975,
'latitude': 51.75436293,
},
{
'code': 'fe2c3195-aeff-487a-a08f-e0bdc0ec6e9a',
'size': 405,
'price': 66,
'longitude': 0.18228006,
'latitude': 51.74640997,
},
{
'code': '913694c6-435a-4366-ba0d-da5334a611b2',
'size': 56,
'price': 60,
'longitude': 0.27891577,
'latitude': 51.45994069,
},
{
'code': 'eed76e77-55c1-41ce-985d-ca49bf6c0585',
'size': 93,
'price': 48,
'longitude': 0.33894476,
'latitude': 51.39916678,
}
]
@pytest.fixture(scope='function')
def pg_session(pg_session_empty, pg_data):
for r in pg_data:
new_room = Room(
code=r['code'],
size=r['size'],
price=r['price'],
longitude=r['longitude'],
latitude=r['latitude']
)
pg_session_empty.add(new_room)
pg_session_empty.commit()
yield pg_session_empty
Chapter 4 - Database repositories 140
pg_session_empty.query(Room).delete()
Note that this last fixture has a function scope, thus it is run for every test. Therefore, we delete all
rooms after the yield returns, leaving the database in the same state it had before the test. This is not
strictly necessary in this particular case, as during the tests we are only reading from the database,
so we might add the rooms at the beginning of the test session and just destroy the container at the
end of it. This doesn’t however work in general, for instance when tests add entries to the database,
so I preferred to show you a more generic solution.
We can test this whole setup changing the test_dummy function so that is fetches all the rows of the
Room table and verifying that the query returns 4 values.
import pytest
from rentomatic.repository.postgres_objects import Room
pytestmark = pytest.mark.integration
def test_dummy(pg_session):
assert len(pg_session.query(Room).all()) == 4
Integration tests
At this point we can create the real tests in the tests/repository/postgres/test_postgresrepo.py
file, replacing the test_dummy one. The first function is test_repository_list_without_parameters
which runs the list method without any argument. The test receives the docker_setup fixture that
allows us to initialise the PostgresRepo class, the pg_data fixture with the test data that we put in
the database, and the pg_session fixture that creates the actual test database in the background. The
actual test code compares the codes of the rooms returned by the list method and the test data of
the pg_data fixture.
The file is basically a copy of tests/repository/postgres/test_memrepo.py, which is not surpris-
ing. Usually you want to test the very same conditions, whatever the storage system. Towards the
end of the chapter we will see however that while these files are initially the same, they can evolve
differently as we find bugs or corner cases that come from the specific implementation (in-memory
storage, PostrgeSQL, ad so on).
⁹⁶https://github.com/pycabook/rentomatic/tree/chapter-4-database-fixtures
Chapter 4 - Database repositories 141
import pytest
pytestmark = pytest.mark.integration
def test_repository_list_without_parameters(
docker_setup, pg_data, pg_session):
repo = postgresrepo.PostgresRepo(docker_setup['postgres'])
repo_rooms = repo.list()
The rest of the test suite is basically doing the same. Each test creates the PostgresRepo object, it
runs its list method with a given value of the filters argument, and compares the actual result
with the expected one.
def test_repository_list_with_code_equal_filter(
docker_setup, pg_data, pg_session):
repo = postgresrepo.PostgresRepo(docker_setup['postgres'])
repo_rooms = repo.list(
filters={'code__eq': 'fe2c3195-aeff-487a-a08f-e0bdc0ec6e9a'}
)
assert len(repo_rooms) == 1
assert repo_rooms[0].code == 'fe2c3195-aeff-487a-a08f-e0bdc0ec6e9a'
def test_repository_list_with_price_equal_filter(
docker_setup, pg_data, pg_session):
repo = postgresrepo.PostgresRepo(docker_setup['postgres'])
repo_rooms = repo.list(
filters={'price__eq': 60}
)
assert len(repo_rooms) == 1
assert repo_rooms[0].code == '913694c6-435a-4366-ba0d-da5334a611b2'
Chapter 4 - Database repositories 142
def test_repository_list_with_price_less_than_filter(
docker_setup, pg_data, pg_session):
repo = postgresrepo.PostgresRepo(docker_setup['postgres'])
repo_rooms = repo.list(
filters={'price__lt': 60}
)
assert len(repo_rooms) == 2
assert set([r.code for r in repo_rooms]) ==\
{
'f853578c-fc0f-4e65-81b8-566c5dffa35a',
'eed76e77-55c1-41ce-985d-ca49bf6c0585'
}
def test_repository_list_with_price_greater_than_filter(
docker_setup, pg_data, pg_session):
repo = postgresrepo.PostgresRepo(docker_setup['postgres'])
repo_rooms = repo.list(
filters={'price__gt': 48}
)
assert len(repo_rooms) == 2
assert set([r.code for r in repo_rooms]) ==\
{
'913694c6-435a-4366-ba0d-da5334a611b2',
'fe2c3195-aeff-487a-a08f-e0bdc0ec6e9a'
}
def test_repository_list_with_price_between_filter(
docker_setup, pg_data, pg_session):
repo = postgresrepo.PostgresRepo(docker_setup['postgres'])
repo_rooms = repo.list(
filters={
'price__lt': 66,
'price__gt': 48
}
)
Chapter 4 - Database repositories 143
assert len(repo_romos) == 1
assert repo_rooms[0].code == '913694c6-435a-4366-ba0d-da5334a611b2'
Remember that I introduced these tests one at a time, and that I’m not showing you the full
TDD work flow only for brevity’s sake. The code of the PostgresRepo class has been developed
following a strict TDD approach, and I recommend you to do the same. The resulting code goes
in rentomatic/repository/postgresrepo.py, in the same directory were we create the postgres_-
objects.py file.
class PostgresRepo:
def __init__(self, connection_data):
connection_string = "postgresql+psycopg2://{}:{}@{}/{}".format(
connection_data['user'],
connection_data['password'],
connection_data['host'],
connection_data['dbname']
)
self.engine = create_engine(connection_string)
Base.metadata.bind = self.engine
query = session.query(Room)
if filters is None:
return query.all()
if 'code__eq' in filters:
query = query.filter(Room.code == filters['code__eq'])
if 'price__eq' in filters:
query = query.filter(Room.price == filters['price__eq'])
Chapter 4 - Database repositories 144
if 'price__lt' in filters:
query = query.filter(Room.price < filters['price__lt'])
if 'price__gt' in filters:
query = query.filter(Room.price > filters['price__gt'])
return [
room.Room(
code=q.code,
size=q.size,
price=q.price,
latitude=q.latitude,
longitude=q.longitude
)
for q in query.all()
]
I opted for a very simple solution with multiple if statements, but if this was a real world project
the list method would require a smarter solution to manage a richer set of filters. This class is a
good starting point, however, as it passes the whole tests suite. Note that the list method returns
domain models, which is allowed as the repository is implemented in one of the outer layers of the
architecture.
This executes the postgres image in a container named rentomatic, setting the environment
variable POSTGRES_PASSWORD to rentomaticdb. The container maps the standard PostgreSQL port
5432 to the same port in the host and runs in detached mode (leaving the terminal free).
You can verify that the container is properly running trying to connect with psql
docker run -it --rm --link rentomatic:rentomatic postgres psql -h rentomatic -U post\
gres
Password for user postgres:
psql (11.1 (Debian 11.1-1.pgdg90+1))
Type "help" for help.
postgres=#
Check the Docker documentation⁹⁹ and the PostgreSQL image documentation¹⁰⁰ to get a better
understanding of all the flags used in this command line. The password asked is the one set
previously with the POSTGRES_PASSWORD environment variable.
Now create the initial_postgres_setup.py file in the main directory of the project (alongside
wsgi.py)
import sqlalchemy
import sqlalchemy_utils
setup = {
'dbname': 'rentomaticdb',
'user': 'postgres',
'password': 'rentomaticdb',
'host': 'localhost'
}
conn_str = "postgresql+psycopg2://{}:{}@{}/{}".format(
setup['user'],
setup['password'],
setup['host'],
setup['dbname']
)
⁹⁹https://docs.docker.com/engine/reference/run/
¹⁰⁰https://hub.docker.com/_/postgres/
Chapter 4 - Database repositories 146
engine = sqlalchemy.create_engine(conn_str)
sqlalchemy_utils.create_database(engine.url)
conn = engine.connect()
Base.metadata.create_all(engine)
Base.metadata.bind = engine
DBSession = sqlalchemy.orm.sessionmaker(bind=engine)
session = DBSession()
data = [
{
'code': 'f853578c-fc0f-4e65-81b8-566c5dffa35a',
'size': 215,
'price': 39,
'longitude': -0.09998975,
'latitude': 51.75436293,
},
{
'code': 'fe2c3195-aeff-487a-a08f-e0bdc0ec6e9a',
'size': 405,
'price': 66,
'longitude': 0.18228006,
'latitude': 51.74640997,
},
{
'code': '913694c6-435a-4366-ba0d-da5334a611b2',
'size': 56,
'price': 60,
'longitude': 0.27891577,
'latitude': 51.45994069,
},
{
'code': 'eed76e77-55c1-41ce-985d-ca49bf6c0585',
'size': 93,
'price': 48,
'longitude': 0.33894476,
'latitude': 51.39916678,
}
]
for r in data:
Chapter 4 - Database repositories 147
new_room = Room(
code=r['code'],
size=r['size'],
price=r['price'],
longitude=r['longitude'],
latitude=r['latitude']
)
session.add(new_room)
session.commit()
As you can see, this file is basically a collection of what we already did in some of the fixtures. This
is not surprising, as the fixtures simulated the creation of a production database for each test. This
file, however, is meant to be run only once, at the very beginning of the life of the database.
We are ready to configure the database, then. Run the Postgres initialization
$ python initial_postgres_setup.py
and then you can verify that everything worked connecting again to the PostgreSQL with psql. If you
are not familiar with the tool you can find the description of the commands in the documentation¹⁰¹
$ docker run -it --rm --link rentomatic:rentomatic postgres psql -h rentomatic -U po\
stgres
Password for user postgres:
psql (11.1 (Debian 11.1-1.pgdg90+1))
Type "help" for help.
postgres=# \c rentomaticdb
You are now connected to database "rentomaticdb" as user "postgres".
rentomaticdb=# \dt
List of relations
Schema | Name | Type | Owner
--------+------+-------+----------
public | room | table | postgres
(1 row)
rentomaticdb=#
The last thing to do is to change the Flask app, in order to make it connect to the Postgres
database using the PostgresRepo class instead of using the MemRepo one. The new version of the
rentomatic/rest/room.py is
import json
STATUS_CODES = {
res.ResponseSuccess.SUCCESS: 200,
res.ResponseFailure.RESOURCE_ERROR: 404,
res.ResponseFailure.PARAMETERS_ERROR: 400,
res.ResponseFailure.SYSTEM_ERROR: 500
}
connection_data = {
'dbname': 'rentomaticdb',
'user': 'postgres',
'password': 'rentomaticdb',
'host': 'localhost'
}
@blueprint.route('/rooms', methods=['GET'])
def room():
qrystr_params = {
'filters': {},
}
if arg.startswith('filter_'):
qrystr_params['filters'][arg.replace('filter_', '')] = values
request_object = req.RoomListRequestObject.from_dict(qrystr_params)
repo = pr.PostgresRepo(connection_data)
use_case = uc.RoomListUseCase(repo)
response = use_case.execute(request_object)
Apart from the import and the definition of the connection data, the only line we have to change is
which becomes
repo = pr.PostgresRepo(connection_data)
Now you can run the Flask development server with flask run and connect to
http://localhost:5000/rooms
You will quickly understand the benefits of the complex test structure that I created in the previous
section. That structure allows me to reuse some of the fixtures now that I want to implement tests
for a new storage system.
Let’s start defining the tests/repository/mongodb/conftest.py file, which contains the following
code
import pymongo
import pytest
@pytest.fixture(scope='session')
def mg_client(docker_ip, docker_services, docker_setup):
docker_services.wait_until_responsive(
timeout=30.0, pause=0.1,
check=lambda: mg_is_responsive(docker_ip, docker_setup)
)
client = pymongo.MongoClient(
host=docker_setup['mongo']['host'],
username=docker_setup['mongo']['user'],
password=docker_setup['mongo']['password'],
authSource='admin'
)
yield client
client.close()
Chapter 4 - Database repositories 151
@pytest.fixture(scope='session')
def mg_database_empty(mg_client, docker_setup):
db = mg_client[docker_setup['mongo']['dbname']]
yield db
mg_client.drop_database(docker_setup['mongo']['dbname'])
@pytest.fixture(scope='function')
def mg_data():
return [
{
'code': 'f853578c-fc0f-4e65-81b8-566c5dffa35a',
'size': 215,
'price': 39,
'longitude': -0.09998975,
'latitude': 51.75436293,
},
{
'code': 'fe2c3195-aeff-487a-a08f-e0bdc0ec6e9a',
'size': 405,
'price': 66,
'longitude': 0.18228006,
'latitude': 51.74640997,
},
{
'code': '913694c6-435a-4366-ba0d-da5334a611b2',
'size': 56,
'price': 60,
'longitude': 0.27891577,
'latitude': 51.45994069,
},
{
'code': 'eed76e77-55c1-41ce-985d-ca49bf6c0585',
'size': 93,
'price': 48,
'longitude': 0.33894476,
'latitude': 51.39916678,
}
]
Chapter 4 - Database repositories 152
@pytest.fixture(scope='function')
def mg_database(mg_database_empty, mg_data):
collection = mg_database_empty.rooms
collection.insert_many(mg_data)
yield mg_database_empty
collection.delete_many({})
As you can see these functions are very similar to the ones that we defined for Postgres. The mg_-
is_responsive function is tasked with monitoring the MondoDB container and return True when
this latter is ready. The specific way to do this is different from the one employed for PostgreSQL,
as these are solutions tailored to the specific technology. The mg_client function is similar to the
pg_engine developed for PostgreSQL, and the same happens for mg_database_empty, mg_data, and
mg_database. While the SQLAlchemy package works through a session, PyMongo library creates a
client and uses it directly, but the overall structure is the same.
SInce we are importing the PyMongo library, remember to add pymongo to the requirements/prod.txt
file and run pip again. We need to change the tests/repository/conftest.py to add the configura-
tion of the MongoDB container. Unfortunately, due to a limitation of the pytest-docker package it
is impossible to define multiple versions of docker_compose_file, so we need to add the MongoDB
configuration alongside the PostgreSQL one. The docker_setup fixture becomes
@pytest.fixture(scope='session')
def docker_setup(docker_ip):
return {
'mongo': {
'dbname': 'rentomaticdb',
'user': 'root',
'password': 'rentomaticdb',
'host': docker_ip
},
'postgres': {
'dbname': 'rentomaticdb',
'user': 'postgres',
'password': 'rentomaticdb',
'host': docker_ip
}
}
@pytest.fixture(scope='session')
def docker_compose_file(docker_tmpfile, docker_setup):
content = {
'version': '3.1',
'services': {
'postgresql': {
'restart': 'always',
'image': 'postgres',
'ports': ["5432:5432"],
'environment': [
'POSTGRES_PASSWORD={}'.format(
docker_setup['postgres']['password']
)
]
},
'mongo': {
'restart': 'always',
'image': 'mongo',
'ports': ["27017:27017"],
'environment': [
'MONGO_INITDB_ROOT_USERNAME={}'.format(
docker_setup['mongo']['user']
),
'MONGO_INITDB_ROOT_PASSWORD={}'.format(
docker_setup['mongo']['password']
)
]
}
}
}
f = os.fdopen(docker_tmpfile[0], 'w')
f.write(yaml.dump(content))
f.close()
return docker_tmpfile[1]
As you can see setting up MongoDB is not that different from PostgreSQL. Both systems are
¹⁰³https://github.com/pycabook/rentomatic/tree/chapter-4-a-repository-based-on-mongodb-step-1
Chapter 4 - Database repositories 154
databases, and the way you connect to them is similar, at least in a testing environment, where
you don’t need specific settings for the engine.
With the above fixtures we can write the MongoRepo class following TDD.
The tests/repository/mongodb/test_mongorepo.py file contains all the tests for this class
import pytest
from rentomatic.repository import mongorepo
pytestmark = pytest.mark.integration
def test_repository_list_without_parameters(
docker_setup, mg_data, mg_database):
repo = mongorepo.MongoRepo(docker_setup['mongo'])
repo_rooms = repo.list()
def test_repository_list_with_code_equal_filter(
docker_setup, mg_data, mg_database):
repo = mongorepo.MongoRepo(docker_setup['mongo'])
repo_rooms = repo.list(
filters={'code__eq': 'fe2c3195-aeff-487a-a08f-e0bdc0ec6e9a'}
)
assert len(repo_rooms) == 1
assert repo_rooms[0].code == 'fe2c3195-aeff-487a-a08f-e0bdc0ec6e9a'
def test_repository_list_with_price_equal_filter(
docker_setup, mg_data, mg_database):
repo = mongorepo.MongoRepo(docker_setup['mongo'])
repo_rooms = repo.list(
filters={'price__eq': 60}
)
assert len(repo_rooms) == 1
Chapter 4 - Database repositories 155
def test_repository_list_with_price_less_than_filter(
docker_setup, mg_data, mg_database):
repo = mongorepo.MongoRepo(docker_setup['mongo'])
repo_rooms = repo.list(
filters={'price__lt': 60}
)
assert len(repo_rooms) == 2
assert set([r.code for r in repo_rooms]) ==\
{
'f853578c-fc0f-4e65-81b8-566c5dffa35a',
'eed76e77-55c1-41ce-985d-ca49bf6c0585'
}
def test_repository_list_with_price_greater_than_filter(
docker_setup, mg_data, mg_database):
repo = mongorepo.MongoRepo(docker_setup['mongo'])
repo_rooms = repo.list(
filters={'price__gt': 48}
)
assert len(repo_rooms) == 2
assert set([r.code for r in repo_rooms]) ==\
{
'913694c6-435a-4366-ba0d-da5334a611b2',
'fe2c3195-aeff-487a-a08f-e0bdc0ec6e9a'
}
def test_repository_list_with_price_between_filter(
docker_setup, mg_data, mg_database):
repo = mongorepo.MongoRepo(docker_setup['mongo'])
repo_rooms = repo.list(
filters={
'price__lt': 66,
'price__gt': 48
Chapter 4 - Database repositories 156
}
)
assert len(repo_rooms) == 1
assert repo_rooms[0].code == '913694c6-435a-4366-ba0d-da5334a611b2'
def test_repository_list_with_price_as_string(
docker_setup, mg_data, mg_database):
repo = mongorepo.MongoRepo(docker_setup['mongo'])
repo_rooms = repo.list(
filters={
'price__lt': '60'
}
)
assert len(repo_rooms) == 2
assert set([r.code for r in repo_rooms]) ==\
{
'f853578c-fc0f-4e65-81b8-566c5dffa35a',
'eed76e77-55c1-41ce-985d-ca49bf6c0585'
}
These tests obviously mirror the tests written for Postgres, as the Mongo interface has to provide
the very same API. Actually, since the initialization of the MongoRepo class doesn’t differ from the
initialization of the PostgresRepo one, the test suite is exactly the same.
I added a test called test_repository_list_with_price_as_string that checks what happens when
the price in the filter is expressed as a string. Experimenting with the MongoDB shell I found that
in this case the query wasn’t working, so I included the test to be sure the implementation didn’t
forget to manage this condition.
The MongoRepo class is obviously not the same as the Postgres interface, as the PyMongo library
is different from SQLAlchemy, and the structure of a NoSQL database differs from the one of a
relational one. The file rentomatic/repository/mongorepo.py is
Chapter 4 - Database repositories 157
import pymongo
class MongoRepo:
def __init__(self, connection_data):
client = pymongo.MongoClient(
host=connection_data['host'],
username=connection_data['user'],
password=connection_data['password'],
authSource='admin'
)
self.db = client[connection_data['dbname']]
if filters is None:
result = collection.find()
else:
mongo_filter = {}
for key, value in filters.items():
key, operator = key.split('__')
if key == 'price':
value = int(value)
filter_value['${}'.format(operator)] = value
mongo_filter[key] = filter_value
result = collection.find(mongo_filter)
which makes use of the similarity between the filters of the Rent-o-matic project and the ones of
the MongoDB system¹⁰⁴.
¹⁰⁴The similitude between the two systems is not accidental, as I was studying MongoDB at the time I wrote the first article about clean
architectures, so I was obviously influenced by it.
Chapter 4 - Database repositories 158
At this point we can follow the same steps we did for Postgres, that is creating a stand-alone
MongoDB container, filling it with real data, changing the REST endpoint to use MongoRepo and
run the Flask webserver.
To create a MongoDB container you can run this Docker command line
To check the connectivity you may run the MongoDB shell in the same container (then exit with
Ctrl-D)
$ docker exec -it rentomatic mongo --port 27017 -u "root" -p "rentomaticdb" --authen\
ticationDatabase "admin"
MongoDB shell version v4.0.4
connecting to: mongodb://127.0.0.1:27017/
Implicit session: session { "id" : UUID("44f615e3-ec0b-4a16-8b58-f0ae1c48c187") }
MongoDB server version: 4.0.4
>
THe initialisation file is similar to the one I created for PostgreSQL, and like that one it borrows
code from the fixtures that run in the test suite. The file is named initial_mongo_setup.py and is
saved in the main project directory.
import pymongo
setup = {
'dbname': 'rentomaticdb',
'user': 'root',
'password': 'rentomaticdb',
'host': 'localhost'
}
client = pymongo.MongoClient(
host=setup['host'],
username=setup['user'],
password=setup['password'],
authSource='admin'
¹⁰⁵https://github.com/pycabook/rentomatic/tree/chapter-4-a-repository-based-on-mongodb-step-2
Chapter 4 - Database repositories 159
db = client[setup['dbname']]
data = [
{
'code': 'f853578c-fc0f-4e65-81b8-566c5dffa35a',
'size': 215,
'price': 39,
'longitude': -0.09998975,
'latitude': 51.75436293,
},
{
'code': 'fe2c3195-aeff-487a-a08f-e0bdc0ec6e9a',
'size': 405,
'price': 66,
'longitude': 0.18228006,
'latitude': 51.74640997,
},
{
'code': '913694c6-435a-4366-ba0d-da5334a611b2',
'size': 56,
'price': 60,
'longitude': 0.27891577,
'latitude': 51.45994069,
},
{
'code': 'eed76e77-55c1-41ce-985d-ca49bf6c0585',
'size': 93,
'price': 48,
'longitude': 0.33894476,
'latitude': 51.39916678,
}
]
collection = db.rooms
collection.insert_many(data)
$ python initial_mongo_setup.py
If you want to check what happened in the database you can connect again to the container and run
a manual query that should return 4 rooms
$ docker exec -it rentomatic mongo --port 27017 -u "root" -p "rentomaticdb" --authen\
ticationDatabase "admin"
MongoDB shell version v4.0.4
connecting to: mongodb://127.0.0.1:27017/
Implicit session: session { "id" : UUID("44f615e3-ec0b-4a16-8b58-f0ae1c48c187") }
MongoDB server version: 4.0.4
> use rentomaticdb
switched to db rentomaticdb
> db.rooms.find({})
{ "_id" : ObjectId("5c123219a9a0ca3e85ab34b8"), "code" : "f853578c-fc0f-4e65-81b8-56\
6c5dffa35a", "size" : 215, "price" : 39, "longitude" : -0.09998975, "latitude" : 51.\
75436293 }
{ "_id" : ObjectId("5c123219a9a0ca3e85ab34b9"), "code" : "fe2c3195-aeff-487a-a08f-e0\
bdc0ec6e9a", "size" : 405, "price" : 66, "longitude" : 0.18228006, "latitude" : 51.7\
4640997 }
{ "_id" : ObjectId("5c123219a9a0ca3e85ab34ba"), "code" : "913694c6-435a-4366-ba0d-da\
5334a611b2", "size" : 56, "price" : 60, "longitude" : 0.27891577, "latitude" : 51.45\
994069 }
{ "_id" : ObjectId("5c123219a9a0ca3e85ab34bb"), "code" : "eed76e77-55c1-41ce-985d-ca\
49bf6c0585", "size" : 93, "price" : 48, "longitude" : 0.33894476, "latitude" : 51.39\
916678 }
The last step is to modify the rentomatic/rest/room.py file to make it use the MongoRepo class. The
new version of the file is
import json
STATUS_CODES = {
res.ResponseSuccess.SUCCESS: 200,
res.ResponseFailure.RESOURCE_ERROR: 404,
res.ResponseFailure.PARAMETERS_ERROR: 400,
res.ResponseFailure.SYSTEM_ERROR: 500
}
connection_data = {
'dbname': 'rentomaticdb',
'user': 'root',
'password': 'rentomaticdb',
'host': 'localhost'
}
@blueprint.route('/rooms', methods=['GET'])
def room():
qrystr_params = {
'filters': {},
}
request_object = req.RoomListRequestObject.from_dict(qrystr_params)
repo = mr.MongoRepo(connection_data)
use_case = uc.RoomListUseCase(repo)
response = use_case.execute(request_object)
Please note that the second difference is due to choices in the database configuration, so the relevant
changes are only two. This is what you can achieve with a well decoupled architecture. As I said in
the introduction, this might be overkill for some applications, but if you want to provide support for
multiple database backends this is definitely one of the best ways to achieve it.
If you run now the Flask development server with flask run, and head to
http://localhost:5000/rooms
you will receive the very same result that the interface based on Postgres was returning.
Conclusions
This chapter concludes the overview of the clean architecture example. Starting from scratch,
we created domain models, serializers, use cases, an in-memory storage system, a command
line interface and an HTTP endpoint. We then improved the whole system with a very generic
request/response management code, that provides robust support for errors. Last, we implemented
two new storage systems, using both a relational and a NoSQL database.
This is by no means a little achievement. Our architecture covers a very small use case, but is
robust and fully tested. Whatever error we might find in the way we dealt with data, databases,
requests, and so on, can be isolated and tamed much faster than in a system which doesn’t have tests.
Moreover, the decoupling philosophy allows us to provide support for multiple storage systems, but
also to quickly implement new access protocols, or new serialisations for out objects.
¹⁰⁶https://github.com/pycabook/rentomatic/tree/chapter-4-a-repository-based-on-mongodb-step-3
Part 3 - Appendices
Changelog
What’s the last thing you do remember? Hmm?
- Alien (1979)
I will track here changes between releases of the book, following Semantic Versioning¹⁰⁷. A change
in the major number means an incompatible change, that is a big rewrite of the book, also known
as 2nd edition, 3rd edition, and so on. I don’t know if this will ever happen, but the version number
comes for free. A change in the minor number means that something important was added to the
content, like a new section or chapter. A change in the patch number signals minor fixes like typos
in the text or the code, rewording of sentences, and so on.
Current version: 1.0.1
Version 1.0.1 (2019-01-01)
• Max H. Gerlach¹⁰⁸, Paul Schwendenman¹⁰⁹, and Eric Smith¹¹⁰ kindly fixed many typos and
grammar mistakes. Thank you very much!
• Initial release
¹⁰⁷https://semver.org/
¹⁰⁸https://github.com/maxhgerlach
¹⁰⁹https://github.com/paul-schwendenman
¹¹⁰https://github.com/genericmoniker