Principles of Algorithmic Problem Solving PDF
Principles of Algorithmic Problem Solving PDF
Johan Sannemo
Preface ix
I Preliminaries 1
1 Algorithms and Problems 3
1.1 Computational Problems . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Programming Languages . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Pseudo Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 The Kattis Online Judge . . . . . . . . . . . . . . . . . . . . . . . 12
1.6 Additional Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.7 Chapter Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Programming in C++ 17
2.1 Development Environments . . . . . . . . . . . . . . . . . . . . . 18
2.1.1 Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.2 Ubuntu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.3 macOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.4 Installing the C++ tools . . . . . . . . . . . . . . . . . . . 19
2.2 Hello World! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Variables and Types . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4 Input and Output . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.5 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
iii
iv CONTENTS
2.6 If Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.7 For Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.8 While Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.9 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.10 Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.11 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.12 The Preprocessor . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.13 Template . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.14 Additional Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.15 Chapter Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4 Implementation Problems 69
4.1 Additional Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.2 Chapter Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5 Time Complexity 87
5.1 The Complexity of Insertion Sort . . . . . . . . . . . . . . . . . . 87
5.2 Asymptotic Notation . . . . . . . . . . . . . . . . . . . . . . . . . 91
CONTENTS v
II Basics 121
7 Brute Force 123
7.1 Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . 123
7.2 Generate and Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
7.3 Backtracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
7.4 Fixing Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.5 Meet in the Middle . . . . . . . . . . . . . . . . . . . . . . . . . . 138
7.6 Chapter Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
14 Strings 231
14.1 Tries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
14.2 String Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
14.3 Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
14.3.1 The Parameters of Polynomial Hashes . . . . . . . . . . . 248
14.3.2 2D Polynomial Hashing . . . . . . . . . . . . . . . . . . . 250
14.4 Chapter Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
15 Combinatorics 253
15.1 The Addition and Multiplication Principles . . . . . . . . . . . . 253
15.2 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
15.2.1 Permutations as Bijections . . . . . . . . . . . . . . . . . . 258
15.3 Ordered Subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
15.4 Binomial Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . 265
15.4.1 Dyck Paths . . . . . . . . . . . . . . . . . . . . . . . . . . 269
15.4.2 Catalan Numbers . . . . . . . . . . . . . . . . . . . . . . . 272
15.5 The Principle of Inclusion and Exclusion . . . . . . . . . . . . . . 274
15.6 Invariants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
15.7 Monovariants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
15.8 Chapter Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
Bibliography 335
Index 337
Preface
ix
x CONTENTS
This book consists of two parts. The first part contains some preliminary
background, such as algorithm analysis and programming in C++. With
an undergraduate education in computer science most of these chapters are
probably familiar to you. It is recommended that you at least skim through
the first part since the remainder of the book assumes you know the contents
of the preliminary chapters.
The second part makes up most of the material in the book. Some of it should
be familiar if you have taken a course in algorithms and data structures. The
take on those topics is a bit different compared to an algorithms course. We
therefore recommend that you read through the parts even if you feel familiar
with them – in particular those on the basic problem solving paradigms, i.e.
brute force, greedy algorithms, dynamic programming and divide & conquer.
The chapters in this part are structured so that a chapter builds upon only the
preliminaries and previous chapters to the largest extent possible.
At the end of the book you can find an appendix with some mathematical
background.
This book can also be used to improve your competitive programming skills.
Some parts are unique to competitive programming (in particular Chap-
ter 17 on contest strategy). This knowledge is extracted into competitive
tips:
xi
xii CONTENTS
Competitive Tip
The book often refers to exercises from the Kattis online judge. These are
named Kattis Exercise, given as problem names and IDs.
Preliminaries
1
Chapter 1
The greatest technical invention of the last century was probably the digital
general purpose computer. It was the start of the revolution which provided
us with the Internet, smartphones, tablets, and the computerization of soci-
ety.
To harness the power of computers we use programming. Programming is the
art of developing a solution to a computational problem, in the form of a set of
instructions that a computer can execute. These instructions are what we call
code, and the language in which they are written a programming language.
The abstract method that such code describes is what we call an algorithm. The
aim of algorithmic problem solving is thus to, given a computational problem,
devise an algorithm that solves it. One does not necessarily need to complete
the full programming process (i.e. write code that implements the algorithm
in a programming language) to enjoy solving algorithmic problems. However,
it often provides more insight and trains you at finding simpler algorithms to
problems.
3
4 CHAPTER 1. ALGORITHMS AND PROBLEMS
Sorting
Your task is to sort a sequence of integers in ascending order, i.e. from the
lowest to the highest.
Input
The input is a sequence of N integers a0 , a1 , ..., aN−1 .
Output
Output a permutation a 0 of the sequence a, such that a00 ≤ a10 ≤ ... ≤
0
aN−1 .
Some variations of this format appears later (such as problems without inputs)
but in general this is what our problems look like.
Competitive Tip
Exercise 1.1
What is the input and output for the following computational problems?
1) Compute the greatest common divisor (Def. 16.4, page 294) of two
numbers.
2) Find a root of a polynomial.
3) Multiply two numbers.
Exercise 1.2
Consider the following problem. I am thinking of an integer between 1 and
100. Your task is to find this number by asking me questions of the form
“is your number higher, lower, or equal to x” for different numbers x.
This is an interactive, or online, computational problem. How would you de-
scribe the input and output to it? Why do you think it is called interactive?
1.2 Algorithms
You can see this algorithm in practice, performed on our previous example
instance (the sequence 3, 6, 1, −1, 2, 2) in Figures 1.1a-1.1f.
3 6 1 −1 2 2
(b) The smallest element of the sequence is −1, so this is the first element of the sorted
sequence.
−1 1 3 6 2 2
(c) We find the next element of the output by removing the −1 and finding the smallest
remaining element – in this case 1.
−1 1 2 3 6 2
(d) Here, there is no unique smallest element. We can choose any of the two 2’s in this
case.
−1 1 2 2 3 6
−1 1 2 2 3 6
(f) Finally, we choose the last remaining element of the input sequence – the 6. This
concludes the sorting of our sequence.
making the correctness of the algorithm more obvious. The main downsides
of such a description are ambiguity and a lack of detail.
Until an algorithm is described in sufficient detail, it is possible to accidentally
abstract away operations we may not know how to perform behind a few
English words. As a somewhat contrived example, our plain text description
of selection sort includes actions such as “choosing the smallest number of
a sequence”. While such an operation may seem very simple to us humans,
algorithms are generally constructed with regards to some kind of computer.
Unfortunately, computers can not map such English expressions to their code
counterparts yet. Instructing a computer to execute an algorithm thus requires
us to formulate our algorithm in steps small enough that even a computer
knows how to perform them. In this sense, a computer is rather stupid.
The English language is also ambiguous. We are sloppy with references to
“this variable” and “that set”, relying on context to clarify meaning for us. We
use confusing terminology and frequently misunderstand each other. Real
code does not have this problem. It forces us to be specific with what we
mean.
We will generally describe our algorithms in a representation called pseudo
code (Section 1.4), accompanied by an online exercise to implement the code.
Sometimes, we will instead give explicit code that solves a problem. This
will be the case whenever an algorithm is very complex, or care must be
taken to make the implementation efficient. The goal is that you should get
to practice understanding pseudo code, while still ending up with correct
implementations of the algorithms (thus the online exercises).
Exercise 1.3
Do you know any algorithms, for example from school? (Hint: you use
many algorithms to solve certain arithmetic and algebraic problems, such
as those in Exercise 1.1.)
Exercise 1.4
Construct an algorithm that solves the guessing problem in exercise 1.2.
How many questions does it use? The optimal number of questions is
about log2 100 ≈ 7 questions. Can you achieve this?
8 CHAPTER 1. ALGORITHMS AND PROBLEMS
1.2.1 Correctness
One subtle, albeit important, point that we glossed over is what it means for
an algorithm to actually be correct.
There are two common notions of correctness – partial correctness and total
correctness. The first notion requires an algorithm to, upon termination, have
produced an output that fulfills all the criteria laid out in the output description.
Total correctness additionally requires an algorithm to terminate within finite
time. When we talk about correctness of our algorithms later on, we generally
focus on the partial correctness. Termination is instead proved implicitly, as
we consider a more granular measure of efficiency (called time complexity,
Chapter 5) than just finite termination. This measure implies the termination
of the algorithm, completing the proof of total correctness.
Proving that the selection sort algorithm terminates in finite time is quite easy.
It performs one iteration of the selection step for each element in the original
sequence (which is finite). Furthermore, each such iteration can be performed
in finite time by considering each remaining element of the selection when
finding the smallest one. The remaining sequence is a subsequence of the
original one and is therefore also finite.
Proving that the algorithm produces the correct output is a bit more difficult
to prove formally. The main idea behind a formal proof is contained within
our description of the algorithm itself.
At later points in the book we will compromise on both conditions. Generally,
we are satisfied with an algorithm terminating in expected finite time or
answering correctly with, say, probability 0.75 for every input. Similarly, we
are sometimes happy to find an approximate solution to a problem. What this
means more concretely will become clear in due time when we study such
algorithms.
Competitive Tip
Exercise 1.5
Prove the correctness of your algorithm to the guessing problem from
Exercise 1.4.
Exercise 1.6
Why would an algorithm that is correct with e.g. probability 0.75 still be
very useful to us?
Why is it important that such an algorithm is correct with probability 0.75
on every problem instance, instead of always being correct for 75% of all
cases?
Pseudo code reads somewhat like our English language variant of the algo-
rithm, except the actions are broken down into much smaller pieces. Most
of the constructs of our pseudo code are more or less obvious. The notation
variable ← value is how we denote an assignment in pseudo code. For those
without programming experience, this means that the variable named variable
now takes the value value. Pseudo code appears when we try to explain some
part of a solution in great detail but programming language specific aspects
would draw attention away from the algorithm itself.
Competitive Tip
Exercise 1.7
Write pseudo code for your algorithm to the guessing problem from Exer-
cise 1.4.
12 CHAPTER 1. ALGORITHMS AND PROBLEMS
Most of the exercises in this book exist as problems on the Kattis web system.
You can find it at https://open.kattis.com. Kattis is a so called online judge,
which has been used in the university competitive programming world finals
(the International Collegiate Programming Contest World Finals) for several
years. It contains a large collection of computational problems, and allows you
to submit a program you have written that purports to solve a problem. Kattis
will then run your program on a large number of predetermined instances of
the problem called the problem’s test data.
Sorting
Time: 1s, memory: 1MB
Your task is to sort a sequence of integers in ascending order, i.e. from the
lowest to the highest.
Input
The input is a sequence of N integers (0 ≤ N ≤ 1000) a0 , a1 , ..., aN−1
(|ai | ≤ 109 ).
Output
Output a permutation a 0 of the sequence a, such that a00 ≤ a10 ≤ ... ≤
0
aN−1 .
If your program exceeds the allowed resource limits (i.e. takes too much
time or memory), crashes, or gives an invalid output, Kattis will tell you
so with a rejected judgment. There are many kinds of rejected judgments,
1.6. ADDITIONAL EXERCISES 13
such as Wrong Answer, Time Limit Exceeded, and Run-time Error. These mean
your program gave an incorrect output, took too much time, and crashed,
respectively. Assuming your program passes all the instances, it will be be
given the Accepted judgment.
Note that getting a program accepted by Kattis is not the same as having a
correct program – it is a necessary but not sufficient criterion for correctness.
This is also a fact that can sometimes be exploited during competitions by
writing a knowingly incorrect solution that one thinks will pass all test cases
that the judges of the competitions designed.
We strongly recommend that you get a (free) account on Kattis so that you can
follow along with the book’s exercises.
Exercise 1.8
Register an account on Kattis and read the documentation at https://
open.kattis.com/help.
Exercise 1.10
Consider the following problem:
Primality
We call an integer n > 1 a prime if its only divisors are 1 and n. Deter-
mine if a particular integer is a prime.
Input
The input consists of a single integer n > 1.
Output
Output yes if the number was a prime and no otherwise.
The introductions given in this chapter are very bare, mostly stripped down to
what you need to get by when solving algorithmic problems.
Many other books delve deeper into the theoretical study of algorithms than
we do, in particular regarding subjects not relevant to algorithmic problem
solving. Introduction to Algorithms [5] is a rigorous introductory text book on
algorithms with both depth and width.
For a gentle introduction to the technology that underlies computers, CODE
[19] is a well-written journey from the basics of bits and bytes all the way up
to assembly code and operating systems.
Sorting algorithms are a common algorithmic example, in part because of their
rich theory and the fact that the task at hand is familiar to beginners. The
Wikipedia category on sorting algorithms2 contains descriptions of many more
1 https://en.wikipedia.org/wiki/Bubble_sort
2 https://en.wikipedia.org/wiki/Category:Sorting_algorithms
1.7. CHAPTER NOTES 15
Programming in C++
In this chapter we learn some more practical matters – the basics of the C++
programming language. This language is the most common programming
language within the competitive programming community for a few reasons
(aside from C++ being a popular language in general). Programs coded in
C++ are generally somewhat faster than most other competitive programming
languages. There are also many routines in the accompanying standard code
libraries that are useful when implementing algorithms.
Of course, no language is without downsides. C++ is a bit difficult to learn
as your first programming language to say the least. Its error management is
unforgiving, often causing erratic behavior in programs instead of crashing
with an error. Programming certain things become quite verbose, compared to
many other languages.
After bashing the difficulty of C++, you might ask if it really is the best lan-
guage in order to get started with algorithmic problem solving. While there
certainly are simpler languages we believe that the benefits of C++ weigh up
for the disadvantages in the long term even though it demands more from
you as a reader. Either way, it is definitely the language we have the most
experience of teaching problem solving with.
When you study this chapter, you will see a lot of example code. Type this code
and run it. We can not really stress this point enough. Learning programming
from scratch – in particular a complicated language such as C++ – is not
17
18 CHAPTER 2. PROGRAMMING IN C++
Before we get to the juicy parts of C++ you need to install a compiler for C++
and (optionally) a code editor.
We recommend the editor Visual Studio Code. The installation procedure
varies depending on what operating system you use. We provide them for
Windows, Ubuntu and macOS. If you choose to use some other editor, compiler
or operating system you must find out how to perform the corresponding
actions (such as compiling and running code) yourself.
Note that instructions like these tend to rot, with applications disappearing
from the web, operating systems changing names, and so on. In that case, you
are on your own and have to find instructions by yourself.
2.1.1 Windows
2.1.2 Ubuntu
2.1.3 macOS
When using macOS, you first need to install the Clang compiler by installing
Xcode from the Mac App Store. This is also a code editor, but the compiler is
bundled with it.
After installing the compiler, you can download the installer for Visual Studio
Code from https://code.visualstudio.com/. It is available as a normal
macOS package for installation.
Now that you have installed the compiler and Visual Studio Code, you need
to install the C++ plugin for Visual Studio Code. You can do this by open-
ing the program, launching Quick Open (using Ctrl+P), typing ext install
ms-vscode.cpptools, and pressing Enter. Then, launch Quick Open again,
but this time type ext install formulahendry.code-runner instead. Now,
restart your editor.
The tools need to be configured a bit. Press Ctrl+, to open the settings dialog. In
the settings dialog, go to Extensions > Run Code configuration. Here, enable
Run in Terminal and Save All Files Before Run. Then, restart your editor
again.
20 CHAPTER 2. PROGRAMMING IN C++
Now that you have a compiler and editor ready, it is time to learn the basic
structure of a C++ program. The classical example of a program when learning
a new language is to print the text Hello World!. We also solve our first Kattis
problem in this section.
Start by opening Visual Studio Code and create a new file by going to File ⇒
New File. Save the file as hello.cpp by pressing Ctrl+S. Make sure to save it
somewhere you can find it.
Now, type the code from Listing 2.1 into your editor.
To run the program in Visual Studio Code, you press Ctrl+Alt+N. A tab below
your code named TERMINAL containing the text Hello World! should appear.
If no window appears, you probably mistyped the program.
When you submit your solution, Kattis grades it and give you her judgment.
If you typed everything correctly, Kattis tells you it got Accepted. Otherwise,
you probably got Wrong Answer, meaning your program output the wrong
text (and you mistyped the code).
2.2. HELLO WORLD! 21
Now that you have managed to solve the problem, it is time to talk a bit about
the code you typed.
The first line of the code,
#include <iostream>
is used to include the iostream – input and output stream – file from the so-
called standard library of C++. The standard library is a large collection of
ready-to-use algorithms, data structures, and other routines which you can
use when coding. For example, there are sorting routines in the C++ standard
library, meaning you do not need to implement your own sorting algorithm
when coding solutions.
Later on, we will see other useful examples of the standard library and include
many more files. The iostream file contains routines for reading and writing
data to your screen. Your program used code from this file when it printed
Hello World! upon execution.
Competitive Tip
The first line of the main function is thus where the program starts to run with
further lines in the function executed sequentially. Later on we learn how to
define and use additional functions as a way of structuring our code. Note that
the code in a function – its body – must be enclosed by curly brackets. Without
them, we would not know which lines belonged to the function.
On line 6, we wrote a comment
// Print Hello World!
Comments are explanatory lines which are not executed by the computer.
The purpose of a comment is to explain what the code around it does and
why. They begin with two slashes // and continue until the end of the current
line.
It is not until the seventh line that things start happening in the program. We
use the standard library utility cout to print text to the screen. This is done by
writing e.g.:
cout << "this is text you want to print. ";
cout << "you can " << "also print " << "multiple things. ";
cout << "to print a new line" << endl << "you print endl" << endl;
cout << "without any quotes" << endl;
Lines that do things in C++ are called statements. Note the semi colon at the
end of the line! Semi colons are used to specify the end of a statement, and are
mandatory.
Exercise 2.2
Must the main function be named main? What happens if you changed
main to something else and try to run your program?
Exercise 2.3
Play around with cout a bit, printing various things. For example, you can
print a pretty haiku.
2.3. VARIABLES AND TYPES 23
The Hello World! program is boring. It only prints text – seldom the only
necessary component of an algorithm (aside from the Hello World! problem
on Kattis). We now move on to a new but hopefully familiar concept.
When we solve mathematical problems, it often proves useful to introduce
all kinds of names for known and unknown values. Math problems often
deal with classes of N students, ice cream trucks with velocity vcar km/h, and
candy prices of pcandy $/kg.
This concept naturally translates into C++ but with a twist. In most program-
ming languages, we first need to say what type a variable has! We do not
bother with this in mathematics. We say “let x = 5”, and that is that. In C++,
we need to be a bit more verbose. We must write that “I want to introduce a
variable x now. It is going to be an integer – more specifically, 5”. Once we
have decided what kind of value x will be (in this case integer) it will always
be an integer. We cannot just go ahead and say “oh, I’ve changed my mind.
x = 2.5 now!” since 2.5 is of the wrong type (a decimal number rather than an
integer).
Another major difference is that variables in C++ are not tied to a single value
for the entirety of their lifespans. Instead, we are able to modify the value
which our variables have using something called assignment. Some languages
does not permit this, preferring their variables to be immutable.
In Listing 2.2 we demonstrate how variables are used in C++. Type this
program into your editor and run it. What is the output? What did you expect
the output to be?
The first time we use a variable in C++ we must decide what kind of values
it may contain. This is called declaring the variable of a certain type. For
example the statement
int five = 5;
declares an integer variable five and assigns the value 5 to it. The int part
is C++ for integer and is what we call a type. After the type, we write the
name of the variable – in this case five. Finally, we may assign a value to the
variable. Note that further use of the variable never include the int part. We
declare the type of a variable once and only once.
24 CHAPTER 2. PROGRAMMING IN C++
Later on in Listing 2.2 we decide that 5 is a somewhat small value for a variable
called five. We can change the value of a variable by using the assignment
operator – the equality sign =. The assignment
five = seven + 2;
states that from now on the variable five should take the value given by the
expression seven + 2. Since (at least for the moment) seven has the value 7
the expression evaluates to 7 + 2 = 9. Thus five will actually be 9, explaining
the output we get from line 12.
On line 14 we change the value of the variable seven. Note that line 15 still
prints the value of five as 9. Some people find this model of assignment
confusing. We first performed the assignment five = seven + 2;, but the
value of five did not change with the value of seven. This is mostly an
unfortunate consequence of the choice of = as operator for assignment. One
could think that “once an equality, always an equality” – that the value of five
should always be the same as the value of seven + 2. This is not the case. An
assignment sets the value of the variable on the left hand side to the value of
the expression on the right hand side at a particular moment in time, nothing
more.
The snippet also demonstrates how to print the value of a variable on the
screen – we cout it the same way as with text. This also clarifies why text
2.3. VARIABLES AND TYPES 25
Exercise 2.4
C++ allows declarations of immutable (constant) variables, using the key-
word const. For example
const int FIVE = 5;
What happens if you try to perform an assignment to such a variable?
Exercise 2.5
What values will the variables a, b, and c have after executing the following
code:
int a = 4;
int b = 2;
int c = 7;
b = a + c;
c = b - 2;
a = a + a;
b = b * 2;
c = c - c;
Here, the operator - denotes subtraction and * represents multiplication.
Once you have arrived at an answer, type this code into the main function
of a new program and print the values of the variables. Did you get it right?
26 CHAPTER 2. PROGRAMMING IN C++
Exercise 2.6
What happens when an integer is divided by another integer? Try running
the following code:
cout << (5 / 3) << endl;
cout << (15 / 5) << endl;
cout << (2 / 2) << endl;
cout << (7 / 2) << endl;
cout << (-7 / 2) << endl;
cout << (7 / -2) << endl;
There are many other types than int. We have seen one (although without its
correct name), the type for text. You can see some of the most common types
in Listing 2.3.
The text data type is called string. Values of this type must be enclosed with
double quotes. If we actually want to include a quote in a string, we type
2.3. VARIABLES AND TYPES 27
\".
There exists a data type containing one single letter, the char. Such a value is
surrounded by single quotes. The char value containing the single quote is
written '\, similarly to how we included double quotes in strings.
Then comes the int, which we discussed earlier. The long long type contains
integers just like the int type. They differ in how large integers they can
contain. An int can only contain integers between −231 and 231 − 1 while a
long long extends this range to −263 to 263 − 1.
Exercise 2.7
Since \" is used to include a double quote in a string, we can not include
backslashes in a string like any other character. For example, how would
you output the verbatim string \"? Find out how to include a literal
backslash in a string (for example by searching the web or thinking about
how we included the different quote characters).
Exercise 2.8
Write a program that assigns the minimum and maximum values of an int
to a int variable x. What happens if you increment or decrement this value
using x = x + 1; or x = x - 1; respectively and print its new value?
Competitive Tip
One of the most common sources for errors in code is trying to store
an integer value outside the range of the type. Always make sure your
values fit inside the range of an int if you use it. Otherwise, use long
longs!
One of the reasons for why we do not simply use long long all the time
is that some operations involving long longs can be slower using ints
under certain conditions.
Next comes the double type. This type represents decimal numbers. Note that
the decimal sign in C++ is a dot, not a comma. There is also another similar
28 CHAPTER 2. PROGRAMMING IN C++
type called the float. The difference between these types are similar to that of
the int and long long. A double can represent “more” decimal numbers than
a float. This may sound weird considering that there is an infinite number of
decimal numbers even between 0 and 1. However, a computer can clearly not
represent every decimal number – not even those between 0 and 1. To do this,
it would need infinite memory to distinguish between these numbers. Instead,
they represent a limited set of numbers – those with a relatively short binary
expansion, such as (101011.01)2 (the number 43.25 in decimal). Doubles are
able to represent numbers with higher precision than floats.
The last of our common types is the bool (short for boolean). This type can
only contain one of two values – it is either true or false. While this may
look useless at a first glance, the importance of the boolean becomes apparent
later.
Exercise 2.9
In the same way the integer types had a valid range of values, a double
cannot represent arbitrarily large values. Find out what the minimum and
maximum values a double can store is.
instead of
Reading input data is done just as you would expect, almost entirely symmetric
to printing output. Instead of cout we use cin, and instead of << variable
we use >> variable, i.e.
Exercise 2.10
What happens if you type an invalid input, such as your first name instead
of your age?
When the program reads input into a string variable it only reads the text until
the first whitespace. To read an entire line you should instead use the getline
function (Listing 2.5).
We revisit more advanced input and output concepts in Section 3.9 about the
standard library.
2.5 Operators
Exercise 2.11
Type in Listing 2.6 and test it on a few different values. Most importantly,
test:
• b=0
• Negative values for a and/or b
• Values where the expected result is outside the valid range of an int
Exercise 2.12
x
If division rounds down towards zero, how do you compute y compute to
an integer away from zero?
We end this section with some shorthand operators. Check out Listing 2.8
for some examples. Each arithmetic operator has a corresponding combined
assignment operator. Such an operator, e.g. a += 5;, is equivalent to a = a +
5; They act as if the variable on the left hand side is also the left hand side of the
corresponding arithmetic operator and assign the result of this computation
to said variable. Hence, the above statement increases the variable a with
5.
It turns out that addition and subtraction with 1 is a fairly common operation.
So common, in fact, that additional operators were introduced into C++ for
this purpose of saving an entire character compared to the highly verbose +=1
operator. These operators consist of two plus signs or two minus signs. For
instance, a++ increments the variable by 1.
We sometimes use the fact that these expressions also evaluate to a value.
Which value this is depends on whether we put the operator before or after the
variable name. By putting ++ before the variable, the value of the expression
will be the incremented value. If we put it afterwards we get the original value.
To get a better understanding of how this works it is best if you type the code
in Listing 2.8 in yourself and analyze the results.
We end the discussion on operators by saying something about operator prece-
dence, i.e. the order in which operators are evaluted in expressions. In mathe-
matics, there is a well-defined precedence: brackets go first, then exponents,
followed by division, multiplication, addition, and subtraction. Furthermore,
most operations (exponents being a notable exception) have left-to-right asso-
ciativity so that 5 − 3 − 1 equals ((5 − 3) − 1) = 1 rather than (5 − (3 − 1)) = 3.
In C++, there are a lot of operators, and knowing precedence rules can easily
save you from bugs in your future code. We recommend that you look up their
precedence rules on-line.
Faktor – faktor
Herman – herman
2.6 If Statements
Exercise 2.14
Write a program that reads two integers as input, and prints the result of
the different comparison operators from Listing 2.9.
A bool can also be negated using the ! operator. So the expression !false
(which we read as “not false”) has the value true and vice versa !true eval-
uates to false. The operator works on any boolean expressions, so that if
b would be a boolean variable with the value true, then the expression !b
evaluates to false.
There are two more important boolean operators. The and operator && takes
two boolean values and evaluates to true if and only if both values were true.
Similarly, the or operator || evalutes to true if and only if at least one of its
operands were true.
A major use of boolean variables is in conjunction with if statements (also
called conditional statements). They come from the necessity of executing
certain lines of code if (and only if) some condition is true. Let us write a
program that takes an integer as input, and tells us whether it is odd or even.
We can do this by computing the remainder of the input when divided by 2
(using the modulo operator) and checking if it is 0 (even number), 1 (positive
odd number) or, -1 (negative odd number). An implementation of this can be
seen in Listing 2.10.
An if statement consists of two parts – a condition, given inside brackets after
the if keyword, followed by a body – some lines of code surrounded by curly
brackets. The code inside the body will be executed in case the condition
evaluates to true.
Our odd or even example contains a certain redundancy. If a number is not
even we already know it is odd. Checking this explicitly using the modulo
2.6. IF STATEMENTS 35
There is one last if-related construct – the else if. Since code is worth a thousand
words, we demonstrate how it works in Listing 2.12 by implementing a helper
for the children’s game FizzBuzz. In FizzBuzz, one goes through the natural
numbers in increasing order and say them out loud. When the number is
divisible by 3 you instead say Fizz. If it is divisible by 5 you say Buzz, and if it
is divisible by both you say FizzBuzz.
Exercise 2.15
Run the program with the values 30, 10, 6, 4. Explain the output you get.
36 CHAPTER 2. PROGRAMMING IN C++
Another rudimentary building block of programs is the for loop. A for loop is
used to execute a block of code multiple times. The most basic loop repeats
code a fixed number of times as in the example from Listing 2.13.
A for loop is built up from four parts. The first three parts are the semi-colon
separated statements immediately after the for keyword. In the first of these
parts you write some expression, such as a variable declaration. In the second
part you write an expression that evaluates to a bool, such as a comparison
between two values. In the third part you write another expression.
The first part will be executed only once – it is the first thing that happens in a
loop. In this case, we declare a new variable i and set it to 0.
The loop will then be repeated until the condition in the second part is false.
Our example loop will repeat until i is no longer less than repetitions.
2.7. FOR LOOPS 37
The third part executes after each execution of the loop. Since we use the vari-
able i to count how many times the loop has executed, we want to increment
this by 1 after each iteration.
Together, these three parts make sure our loop will run exactly repetitions
times. The final part of the loop is the statements within curly brackets. Just as
with the if statements, this is called the body of the loop and contains the code
that will be executed in each repetition of the loop. A repetition of a loop is in
algorithm language more commonly referred to as an iteration.
Exercise 2.17
What happens if you enter a negative value as the number of loop repeti-
tions?
Exercise 2.18
Design a loop that instead counts backwards, from repetitions − 1 to 0.
38 CHAPTER 2. PROGRAMMING IN C++
Tarifa – tarifa
Trik – trik
Within a loop, two useful keywords can be used to modify the loop – continue
and break. Using continue; inside a loop exits the current iteration and starts
the next one. break; on the other hand, exits the loop altogether. For an
example, consider Listing 2.14.
Exercise 2.20
What will the following code snippet output?
1 for (int i = 0; false; i++) {
2 cout << i << endl;
3 }
4
5 for (int i = 0; i >= -10; --i) {
6 cout << i << endl;
7 }
8
9 for (int i = 0; i <= 10; ++i) {
10 if (i % 2 == 0) continue;
11 if (i == 8) break;
12 cout << i << endl;
13 }
There is a second kind of loop, which is simpler than the for loop. It is called
a while loop, and works like a for loop where the initial statement and the
update statement are removed, leaving only the condition and the body. It can
be used when you want to loop over something until a certain condition is
false (Listing 2.15).
The break; and continue; statements work the same way as the do in a for
40 CHAPTER 2. PROGRAMMING IN C++
loop.
2.9 Functions
In the same way that a variable declaration starts by proclaiming what data
2.9. FUNCTIONS 41
type the variable contains a function declaration states what data type the
function evaluates to. Afterwards, we write the name of the function followed
by its arguments (which is a comma-separated list of variable declarations).
Finally, we give it a body of code wrapped in curly brackets.
All of these functions contain a statement with the return keyword, unlike
our main function. A return statement says “stop executing this function,
and return the following value!”. Thus, when we call the squaring function
by square(x), the function will compute the value x * x and make sure that
square(x) evaluates to just that.
Why have we left a return statement out of the main function? In main(),
the compiler inserts an implicit return 0; statement at the end of the func-
tion.
Exercise 2.21
What will the following function calls evaluate to?
square(5);
add(square(3), 10);
min(square(10), add(square(9), 23));
Exercise 2.22
We declared all of the new arithmetic functions above our main function in
the example. Why did we do this? What happens if you move one below
the main function instead? (Hint: what happens if you try to use a variable
before declaring it?)
An important fact of function calling is that the arguments we send along are
copied. If we try to change them by assigning values to our arguments, we
will not change the original variables in the calling function (see Listing 2.17
for an example).
We can also choose to not return anything by using the void return type. This
may seem useless since nothing ought to happen if we call a function but does
not get anything in return. However, there are ways we can affect the program
without returning.
42 CHAPTER 2. PROGRAMMING IN C++
The first one is by using global variables. It turns out that variables may be
declared outside of a function. It is then available to every function in your
program. Changes to a global variable by one function are also be seen by
other functions (try out Listing 2.18 to see them in action).
Exercise 2.23
Why is the function call change(4) not valid C++? (Hint: what exactly are
we changing when we assign to the reference in func?)
2.10 Structures
};
This particular structure contains two member variables, x and y, representing
the coordinates of a point in 2D Euclidean space.
Once we have defined a structure we can create instances of it. Every instance
has its own copy of the member variables of the structure.
To create an instance the same syntax as with other variables is used. We
can access the member variables of a structure using the instance.variable
syntax:
Point origin; // create an instance of the Point structure
cout << "The origin is (" << origin.x << ", "
<< origin.y << ")." << endl;
As you can see structures allow us to group certain kinds of data together in a
logical fashion. Later on, this will simplify the coding of certain algorithms
and data structures immensely.
There is an alternate way of constructing instances called constructors. A
constructor looks like a function inside our structure and allows us to pass
arguments when we create a new instance of a struct. The constructor receives
these arguments to help set up the instance.
Let us add a constructor to our point structure, to more easily create in-
stances:
struct Point {
double x;
double y;
The newly added constructor lets us pass two arguments when constructing
the instance to set the coordinates correctly. With it, we avoid the two extra
statements to set the member variables.
Point p(4, 2.1);
cout << "The point is (" << p.x << ", " << p.y << ")." << endl;
Structure values can also be constructed outside of a variable declaration using
the syntax
Point(1, 2);
so that we can reassign a previously declared variable with
p = Point(1, 2);
We can also define functions inside the structure. These functions work just
like any other functions except they can also access the member variables of
the instance that the member function is called on. For example, we might
want a convenient way to mirror a certain point in the x-axis. This could be
accomplished by adding a member function:
struct Point {
double x;
double y;
Point mirror() {
return Point(x, -y);
}
};
To call the member function mirror() on the point p, we write p.mirror().
Exercise 2.25
Add a translate member function to the point structure. It should take
two double values x and y as arguments, returning a new point which is
the instance point translated by (x, y).
46 CHAPTER 2. PROGRAMMING IN C++
In our mirror function, we are not modifying any of the internal state of the
function. We can make this fact clearer by declaring the function to be const
(similarly to a const variable):
Point mirror() const {
return Point(x, -y);
}
This change ensures that our function can not change any of the member
variables.
Exercise 2.26
What happens if we try to change a member variable in a const member
function?
Exercise 2.27
Fill in the remaining code to implement this structure:
struct Quotient {
int nominator;
int denominator;
};
2.11. ARRAYS 47
2.11 Arrays
In the Sorting Problem from Chapter 1 we often spoke of the data type “se-
quence of integers”. Until now, none of the data types we have seen in C++
represents this kind of data. We present the array. It is a special type of
variable, which can contain a large number of variables of the same type. For
example, it could be used to represent the recurring data type “sequence of
integers” from the Sorting Problem in Chapter 1. When declaring an array, we
specify the type of variable it should contain, its name, and its size using the
syntax:
type name[size];
For example, an integer array of size 50 named seq would be declared with
int seq[50];
This creates 50 integer “variables” which we can refer to using the syntax
seq[index], starting from zero (they are zero-indexed). Thus we can use
seq[0], seq[1], etc., all the way up to seq[49]. The values are called the
elements of the array.
Be aware that using an index outside the valid range for a particular array
(i.e. below 0 or above the size − 1) can cause erratic behavior in the program
without crashing it.
If you declare a global array all elements get a default value. For numeric
types this is 0, for booleans this is false, for strings this is the empty string
and so on. If, on the other hand, the array is declared in the body of a function
that guarantee does not apply. Instead of being zero-initialized, the elements
can have random values.
Later on (Section 3.1) we transition from using arrays to a structure from the
standard library which serves the same purpose – the vector.
48 CHAPTER 2. PROGRAMMING IN C++
Patuljci – patuljci
C++ has a powerful tool called the preprocessor. This utility is able to read
and modify your code using certain rules during compilation. The commonly
used #include is a preprocessor directive that includes a certain file in your
code.
Besides file inclusion, we mostly use the #define directive. It allows us to
replace certain tokens in our code with other ones. The most basic usage
is
#define TOREPLACE REPLACEWITH
which replaces the token TOREPLACE in our program with REPLACEWITH. The
true power of the define comes when using define directives with parameters.
These look similar to functions and allows us to replace certain expressions
with another one, additionally inserting certain values into it. We call these
macros. For example the macro
#define rep(i,a,b) for (int i = a; i < b; i++)
means that the expression
rep(i,0,5) {
cout << i << endl;
}
is expanded to
for (int i = 0; i < 5; ++i) {
cout << i << endl;
}
2.13. TEMPLATE 49
You can probably get by without ever using macros in your code. The reason
we discuss them is because we are going to use them in code in the book so it
is a good idea to at least be familiar with their meaning. They are also used in
competitive programming in general,
2.13 Template
#include <bits/stdc++.h>
using namespace std;
int main() {
}
The trav(a, x) macro is used to iterate through all members of a data struc-
ture from the standard library such as the vector – the first topic of Chap-
ter 3.
Modulo – modulo
3D Printer – 3dprinter
Sibice – sibice
Kornislav – kornislav
C++ was invented by Danish computer scientist Bjarne Stroustrup. Bjarne has
also published a book on the language, The C++ Programming Language[23],
that contains a more in-depth treatment of the language. It is rather acces-
sible to C++ beginners but is better read by someone who have some prior
programming experience (in any programming language).
C++ is standardized by the International Organization for Standardization
(ISO). These standards are the authoritative source on what C++ is. The final
drafts of the standards can be downloaded at the homepage of the Standard
C++ Foundation1 .
There are many online references of the language and its standard library. The
two we use most are:
• http://en.cppreference.com/w/
• http://www.cplusplus.com/reference/
1 https://isocpp.org/
Chapter 3
In this chapter we study parts of the C++ standard library. We start by exam-
ining a number of basic data structures. Data structures help us organize the
data we work with in the hope of making processing both easier and more
efficient. Different data structures serve widely different purposes and solve
different problems. Whether a data structure fit our needs depends on what
operations we wish to perform on the data. We consider neither the efficiency
of the various operations in this chapter nor how they are implemented. These
concerns are postponed until Chapter 6.
The standard library also contains many useful algorithms such as sorting
and various mathematical functions. These are discussed after the data struc-
tures.
In the end, we take a deeper look at string handling in C++ and some more
input/output routines.
51
52 CHAPTER 3. THE C++ STANDARD LIBRARY
3.1 vector
One of the final things discussed in the C++ chapter was the fixed-size array.
As you might remember the array is a special kind of data type that allows us
to store multiple values of the same data type inside what appeared to us as
a single variable. Arrays are a bit awkward to work with in practice. When
passing them as parameters we must also pass along the size of the array. We
are also unable to change the size of arrays once declared nor can we easily
remove or insert elements, or copy arrays.
The dynamic array is a special type of array that can change size (hence the
name dynamic). It also supports operations such as removing and inserting
elements at any position in the list.
The C++ standard library includes a dynamic array under the name vector,
an alternative name for dynamic arrays. To use it you must include the vector
file by adding the line
#include <vector>
among your other includes at the top of your program.
Just like ordinary arrays vectors need to know what types they store. This is
done using a somewhat peculiar syntax. To create a vector containing strings
named words we write
vector<string> words;
This angled bracket syntax appears again later when using other C++ struc-
tures from the standard library.
Once a vector is created elements can be appended to it using the push_back
member function. The following four statements would add the words Simon
is a fish as separate elements to the vector:
words.push_back("Simon");
words.push_back("is");
words.push_back("a");
words.push_back("fish");
To refer to a specific element in a vector you can use the same operator [] as
for arrays. Thus, words[i] refers to the i’th value in the vector (starting at
0).
3.1. VECTOR 53
cout << words[0] << " " << words[1] << " "; // Prints Simon is
cout << words[2] << " " << words[3] << " "; // Prints a fish
Just as with normal arrays accessing indices outside the valid range of the
vector can cause weird behaviour in your program.
We can get the current size of an array using the size() member function:
cout << "The vector contains " << words.size() << " words" << endl;
There is also an empty() function that can be used to check if the vector
contains no elements. These two functions are part of basically every standard
library data structure.
You can also create dynamic arrays that already contain a number of elements.
This is done by passing an integer argument when first declaring the vec-
tor. They are filled with the same default value as (global) arrays are when
created.
vector<int> vec(5); // creates a vector containing 5 zeroes
3.1.1 Iterators
cout << "The first word is " << *first << endl;
If we have an iterator it pointing at the i:th element of a vector we can get
a new iterator pointing to another value by adding or subtracting an integer
value to the iterator. For example, it + 4 points to the (i + 4)’th element of
the vector, and it - 1 is the iterator pointing to the (i − 1)’st element.
Not all iterators support adding or subtracting arbitrary integer values though.
Some iterators can only move one step backwards or forwards by using the ++
and operators.
There is a special kind of iterator which points to the first position after the
last element. We get this iterator by using the function end(). It allows us to
iterate through a vector in the following way:
for (auto it = words.begin(); it != words.end(); it++) {
string value = *it;
cout << value << endl;
}
In this loop we start by creating an iterator which points to the first element of
the vector. Our update condition will repeatedly move the iterator to the next
element in the vector. The loop condition ensures that the loop breaks when
the iterator first points to the element past the end of the vector.
In addition to the begin() and end() pair of iterators, there is also rbegin()
and rend(). They work similarly, except that they are reverse iterators - they
iterate in the other direction. Thus, rbegin() actually points to the last element
of the vector, and rend() to an imaginary element before the first element of
the vector. If we move a reverse iterator in a positive direction, we will actually
move it in the opposite direction (i.e. adding 1 to a reverse iterator makes it
point to the element before it in the vector).
Exercise 3.2
Use the rbegin()/rend() iterators to code a loop that iterates through a
vector in the reverse order.
Certain operators on a vector require the use of vector iterators. For example,
the insert and erase member functions, used to insert and erase elements at
arbitrary positions, take iterators to describe positions. When removing the
3.2. QUEUE 55
Exercise 3.3
After adding these two lines, what would the loop printing every element
of the vector words output?
3.2 queue
Exercise 3.4
There is a similar data structured called a dequeue. The standard library
version is named after the abbreviation deque instead. Use one of the C++
references from the C++ chapter notes (Section 2.15) to find out what this
structure does and what its member functions are called.
3.3 stack
3.4 priority_queue
The queue and stack structures are arguably unnecessary, since they can be
emulated using a vector (see Sections 6.2, 6.3). This is not the case for the next
structure, the priority_queue.
The structure is similar to a queue or a stack, but instead of insertions and
extractions happening at one of the endpoints of the structure, the greatest
element is always returned during the extraction.
The structure is located in the same file as the queue structure, so add
#include <queue>
to use it.
To initialize a priority queue, use the same syntax as for the other struc-
tures:
priority_queue<int> pq;
This time there is one more way to create the structure that is important to
remember. It is not uncommon to prefer the sorting to be done according to
some other order than descending. For this reason there is another way of
creating a priority queue. One can specify a comparison function that takes
two arguments of the type stored in the queue and returns true if the first one
should be considered less than the second. This function can be given as an
argument to the type in the following way:
bool cmp(int a, int b) {
return a > b;
}
// or equivalently
priority_queue<int, vector<int>, greater<int>> pq;
Note that a priority queue by default returns the greatest element. If we
want to make it return the smallest element, the comparison function needs
to instead say that the smallest of the two elements actually is the greatest,
somewhat counter-intuitively.
58 CHAPTER 3. THE C++ STANDARD LIBRARY
Akcija – akcija
Pivot – pivot
The final data structures to be studied in this chapter are also the most power-
ful: the set and the map.
The set structure is similar to a mathematical set (Section A.2), in that it
contains a collection of unique elements. Unlike the vector, particular positions
in the structure can not be accessed using the [] operator. This may make
sets seem worse than vectors. The advantage of sets is twofold. First, we can
determine membership of elements in a set much more efficently compared to
when using vectors (in Chapters 5 and 6, what this means will become clear).
Secondly, sets are also sorted. This means we can quickly find the smallest and
greatest values of the set.
Elements are instead accessed only through iterators, obtained using the
begin(), end() and find() member functions. These iterators can be moved
using the ++ and -- iterators, allowing us to navigate through the set in sorted
(ascending) order (with begin() referring to the smallest element).
Elements are inserted using the insert function and removed using the erase
3.5. SET AND MAP 59
A structure similar to the set is the map. It is essentially the same as a set,
except the elements are called keys and have associated values. When declaring
a map two types need to be provided – that of the key and that of the value.
To declare a map with string keys and int values you write
map<string, int> m;
Accessing the value associated with a key x is done using the [] operator, for
example, m["Johan"]; would access the value associated with the "Johan"
key.
Babelfish – babelfish
60 CHAPTER 3. THE C++ STANDARD LIBRARY
3.6 Math
A1 Paper – a1paper
3.7 Algorithms
A majority of the algorithms we regularly use from the standard library operate
on sequences. To use algorithms, you need to include
#include <algorithm>
3.7.1 Sorting
Sorting a sequences is very easy in C++. The function for doing so is named
sort. It takes two iterators marking the beginning and end of the interval to
be sorted and sorts it in-place in ascending order. For example, to sort the first
10 elements of a vector named v you would use
sort(v.begin(), v.begin() + 10);
Note that the right endpoint of the interval is exclusive – it is not included in
the interval itself. This means that you can provide v.end() as the end of the
interval if you want to sort the vector until the end.
As with priority_queues or sets, the sorting algorithm can take a custom
comparator if you want to sort according to some other order than that defined
by the < operator. For example,
sort(v.begin(), v.end(), greater<int>());
62 CHAPTER 3. THE C++ STANDARD LIBRARY
would sort the vector v in descending order. You can provide other sorting
functions as well. For example, you can sort numbers by their absolute value
by passing in the following comparator:
bool cmp(int a, int b) {
return abs(a) < abs(b);
}
sort(v.begin(), v.end(), cmp);
What happens if two values have the same absolute value when sorted with
the above comparator? With sort, this behaviour is not specified: they can
be ordered in any way. Occasionally you want that values compared by your
comparison function as equal are sorted in the same order as they were given
in the input. This is called a stable sort, and is implemented in C++ with the
function stable_sort.
To check if a vector is sorted, the is_sorted function can be used. It takes the
same arguments as the sort function.
3.7.2 Searching
The most basic search operation is the find function. It takes two iterators
representing an interval and a value. If one of the elements in the interval
equals the value, an iterator to the element is returned. In case of multiple
matches the first one is returned. Otherwise, the iterator provided as the end
of the interval is returned. The common usage is
find(v.begin(), v.end(), 5);
which would return an iterator to the first instance of 5 in the vector.
To find out how many times an element appears in a vector, the count function
takes the same arguments as the find function and returns the total number
of matches.
3.7. ALGORITHMS 63
If the array is sorted, you can use the much faster binary search operations
instead. The binary_search function takes as argument a sorted interval
given by two iterators and a value. It returns true if the interval contains the
value. The lower_bound and upper_bound functions takes the same arguments
as binary_search, but instead returns an iterator to the first element not less
and greater than the specified value, respectively. For more details on how
these are implemented, read Section 10.3.
3.7.3 Permutations
3.8 Strings
We have already used the string type many times before. Until now one
of the essential features of a string has been omitted – a string is to a large
extent like a vector of chars. This is especially true in that you can access the
individual characters of a string using the [] operator. For a string
string thecowsays = "boo";
the expression thecowsays[0] is the character 'b'. Furthermore, you can
push_back new characters to the end of a string.
thecowsays.push_back('p');
would instead make the string boop.
Autori – autori
Skener – skener
3.8.1 Conversions
In some languages, the barrier between strings and e.g. integers is more
fuzzy than in C++. In Java, for example, the code "4" + 2 would append the
character '2' to the string "4", yielding the string "42". This is not the case in
C++ (what errors do you get if you try to do this?).
Instead, there are other ways to convert between strings and other types.
The easiest way is through using the stringstream class. A stringstream
essentially works as a combined cin and cout. An empty stream is declared
by
stringstream ss;
Values can be written to the stream using the << operator and read from it
using the >> operator. This can be exploited to convert strings to and from e.g.
numeric types like this:
3.9. INPUT/OUTPUT 65
stringstream numToString;
numToString << 5;
string val;
numToString >> val; // val is now the string "5"
stringstream stringToNum;
stringToNum << "5";
int val;
stringToNum >> val; // val is now the integer 5
Just as with cin, you can use a stringstream to determine what type the next
word is. If you try to read from a stringstream into an int but the next word
is not an integer, the expression will evaluate to false:
stringstream ss;
ss << "notaninteger";
int val;
if (ss >> val) {
cout << "read an integer!" << endl;
} else {
cout << "next word was not an integer" << endl;
}
3.9 Input/Output
Input and output is primarily handled by the cin and cout objects, as previ-
souly witnessed. While they are very easy to use, adjustments are sometimes
necessary.
66 CHAPTER 3. THE C++ STANDARD LIBRARY
The first advanced usage is reading input until we run out of input (often
called reading until the end-of-file). Normally, input formats are constructed
so that you always know beforehand how many tokens of input you need to
read. For example, lists of integers are often either prefixed by the size of the
list or terminated by some special sentinel value. For those few times when we
need to read input until the end we use the fact that cin >> x is an expression
that evaluates to false if the input reading failed. This is also the case if you
try to read an int but the next word is not actually an integer. This kind of
input loop thus looks something like the following:
int num;
while (cin >> num) {
// do something with num
}
As we stated briefly in the C++ chapter, cin only reads a single word when
used as input to a string. This is a problem if the input format requires us to
read input line by line. The solution to this is the getline function, which
reads text until the next newline:
getline(cin, str);
Be warned that if you use cin to read a single word that is the last on its line,
the final newline is not consumed. That means that for an input such as
word
blah blah
the code
string word;
3.9. INPUT/OUTPUT 67
Another common problem is that outputting decimal values with cout pro-
duces numbers with too few decimals. Many problems stipulate that an
answer is considered correct if it is within some specified relative or absolute
precision of the judges’ answer. The default precision of cout is 10−6 . If a
problem requires higher precision, it must be set manually using e.g.
cout << setprecision(10);
If the function argument is x, the precision is set to 10−x . This means that the
above statement would set the precision of cout to 10−10 . This precision is
68 CHAPTER 3. THE C++ STANDARD LIBRARY
normally the relative precision of the output (i.e. the total number of digits
to print). If you want the precision to be absolute (i.e. specify the number of
digits after the decimal point) you write
cout << fixed;
In this chapter, only the parts from the standard library we deemed most
important to problem solving were extracted. The standard library is much
larger than this, of course. While you will almost always get by using only
what we discussed additional knowledge of the library can make you a faster,
more effective coder.
For a good overview of the library, cppreference.com1 contains lists of the
library contents categorized by topic.
1 http://en.cppreference.com/w/cpp
Chapter 4
Implementation Problems
The Recipe
Swedish Olympiad in Informatics 2011, School Qualifiers
You have decided to cook some food. The dish you are going to make
requires N different ingredients. For every ingredient, you know the
amount you have at home, how much you need for the dish, and how
much it costs to buy (per unit).
If you do not have a sufficient amount of some ingredient you need to buy
the remainder from the store. Your task is to compute the cost of buying
the remaining ingredients.
Input
The first line of input is an integer N ≤ 10, the number of ingredients in
the dish.
The next N lines contain the information about the ingredients, one per line.
69
70 CHAPTER 4. IMPLEMENTATION PROBLEMS
This problem is not particularly hard. For every ingredient we need to calculate
the amount which we need to purchase. The only gotcha in the problem is
the mistake of computing this as n − h. The correct formula is max(0, n − h),
required in case of the luxury problem of having more than we need. We then
multiply this number by the ingredient cost and sum the costs up for all the
ingredients.
Game Rank
Nordic Collegiate Programming Contest 2016 – Jimmy Mårdell
The gaming company Sandstorm is developing an online two player game.
You have been asked to implement the ranking system. All players have a
rank determining their playing strength which gets updated after every
game played. There are 25 regular ranks, and an extra rank, “Legend”,
above that. The ranks are numbered in decreasing order, 25 being the
lowest rank, 1 the second highest rank, and Legend the highest rank.
Each rank has a certain number of “stars” that one needs to gain before
advancing to the next rank. If a player wins a game, she gains a star. If
before the game the player was on rank 6-25, and this was the third or
more consecutive win, she gains an additional bonus star for that win.
When she has all the stars for her rank (see list below) and gains another
star, she will instead gain one rank and have one star on the new rank.
For instance, if before a winning game the player had all the stars on her
current rank, she will after the game have gained one rank and have 1 or
2 stars (depending on whether she got a bonus star) on the new rank. If
on the other hand she had all stars except one on a rank, and won a game
that also gave her a bonus star, she would gain one rank and have 1 star
on the new rank.
If a player on rank 1-20 loses a game, she loses a star. If a player has zero
stars on a rank and loses a star, she will lose a rank and have all stars
minus one on the rank below. However, one can never drop below rank
20 (losing a game at rank 20 with no stars will have no effect).
If a player reaches the Legend rank, she will stay legend no matter how
many losses she incurs afterwards.
The number of stars on each rank are as follows:
• Rank 25-21: 2 stars
• Rank 20-16: 3 stars
• Rank 15-11: 4 stars
• Rank 10-1: 5 stars
72 CHAPTER 4. IMPLEMENTATION PROBLEMS
A very long problem statement! The first hurdle is finding the energy to read
it from start to finish without skipping any details. Not much creativity is
needed here – indeed, the algorithm to implement is given in the statement.
Despite this, it is not as easy as one would think. Although it was the second
most solved problem at the contest where it was used in, it was also the one
with the worst success ratio. On average, a team needed 3.59 attempts before
getting a correct solution, compared to the runner-up problem at 2.92 attempts.
None of the top 6 teams in the contest got the problem accepted on their
first attempt. Failed attempts cost a lot. Not only in absolute time, but many
forms of competition include additional penalties for submitting incorrect
solutions.
Implementation problems get much easier when you know your programming
language well and can use it to write good, structured code. Split code into
functions, use structures, and give your variables good names and implemen-
tation problems become easier to code. A solution to the Game Rank problem
which attempts to use this approach is given here:
1 #include <bits/stdc++.h>
2
3 using namespace std;
4
5 int curRank = 25, curStars = 0, conseqWins = 0;
6
7 int starsOfRank() {
8 if (curRank >= 21) return 2;
9 if (curRank >= 16) return 3;
10 if (curRank >= 11) return 4;
11 if (curRank >= 1) return 5;
12 assert(false);
73
13 }
14
15 void addStar() {
16 if (curStars == starsOfRank()) {
17 --curRank;
18 curStars = 0;
19 }
20 ++curStars;
21 }
22
23 void addWin() {
24 int curStarsWon = 1;
25 ++conseqWins;
26 if (conseqWins >= 3 && curRank >= 6) curStarsWon++;
27
28 for (int i = 0; i < curStarsWon; i++) {
29 addStar();
30 }
31 }
32
33 void loseStar() {
34 if (curStars == 0) {
35 if (curRank == 20) return;
36 ++curRank;
37 curStars = starsOfRank();
38 }
39 --curStars;
40 }
41
42 void addLoss() {
43 conseqWins = 0;
44 if (curRank <= 20) loseStar();
45 }
46
47 int main() {
48 string seq;
49 cin >> seq;
50 for (char res : seq) {
51 if (res == 'W') addWin();
52 else addLoss();
53 if (curRank == 0) break;
54 assert(1 <= curRank && curRank <= 25);
55 assert(0 <= curStars && curStars <= starsOfRank());
56 }
57 if (curRank == 0) cout << "Legend" << endl;
58 else cout << curRank << endl;
59 }
Note the use of the assert() function. The function takes a single boolean
74 CHAPTER 4. IMPLEMENTATION PROBLEMS
parameter and crashes the program with an assertion failure if the parameter
evaluated to false. This is helpful when solving problems since it allows us to
verify that assumptions we make regarding the internal state of the program
indeed holds. In fact, when the above solution was written the assertions in it
actually managed to catch some bugs before submitting the problem!
Mate in One
Introduction to Algorithms at Danderyds Gymnasium
1 If you are not aware of this special pawn rule, do not worry – knowledge of it is irrelevant
31 {2, 1},
32 {-2, 1},
33 {2, -1},
34 {-2, -1}
35 };
36
37 void display(int x, int y) {
38 printf("%c%d", y + 'a', 7 - x + 1);
39 }
40
41 vii next(int x, int y) {
42 vii res;
43
44 if (board[x][y] == 'P' || board[x][y] == 'p') {
45 // pawn
46
47 int dx = is_white(x, y) ? -1 : 1;
48
49 if (is_valid(x + dx, y) && iz_empty(x + dx, y)) {
50 res.push_back(ii(x + dx, y));
51 }
52
53 if (is_valid(x + dx, y - 1) && is_white(x, y) != is_white(x + dx, y - 1)) {
54 res.push_back(ii(x + dx, y - 1));
55 }
56
57 if (is_valid(x + dx, y + 1) && is_white(x, y) != is_white(x + dx, y + 1)) {
58 res.push_back(ii(x + dx, y + 1));
59 }
60
61 } else if (board[x][y] == 'N' || board[x][y] == 'n') {
62 // knight
63
64 for (int i = 0; i < 8; i++) {
65 int nx = x + rook[i][0],
66 ny = y + rook[i][1];
67
68 if (is_valid(nx, ny) && (iz_empty(nx, ny) ||
69 is_white(x, y) != is_white(nx, ny))) {
70 res.push_back(ii(nx, ny));
71 }
72 }
73
74 } else if (board[x][y] == 'B' || board[x][y] == 'b') {
75 // bishop
76
77 for (int dx = -1; dx <= 1; dx++) {
78 for (int dy = -1; dy <= 1; dy++) {
79 if (dx == 0 && dy == 0)
77
80 continue;
81
82 if ((dx == 0) != (dy == 0))
83 continue;
84
85 for (int k = 1; ; k++) {
86 int nx = x + dx * k,
87 ny = y + dy * k;
88
89 if (!is_valid(nx, ny)) {
90 break;
91 }
92
93 if (iz_empty(nx, ny) || is_white(x, y) != is_white(nx, ny)) {
94 res.push_back(ii(nx, ny));
95 }
96
97 if (!iz_empty(nx, ny)) {
98 break;
99 }
100 }
101 }
102 }
103
104 } else if (board[x][y] == 'R' || board[x][y] == 'r') {
105 // rook
106
107 for (int dx = -1; dx <= 1; dx++) {
108 for (int dy = -1; dy <= 1; dy++) {
109 if ((dx == 0) == (dy == 0))
110 continue;
111
112 for (int k = 1; ; k++) {
113 int nx = x + dx * k,
114 ny = y + dy * k;
115
116 if (!is_valid(nx, ny)) {
117 break;
118 }
119
120 if (iz_empty(nx, ny) || is_white(x, y) != is_white(nx, ny)) {
121 res.push_back(ii(nx, ny));
122 }
123
124 if (!iz_empty(nx, ny)) {
125 break;
126 }
127 }
128 }
78 CHAPTER 4. IMPLEMENTATION PROBLEMS
129 }
130
131 } else if (board[x][y] == 'Q' || board[x][y] == 'q') {
132 // queen
133
134 for (int dx = -1; dx <= 1; dx++) {
135 for (int dy = -1; dy <= 1; dy++) {
136 if (dx == 0 && dy == 0)
137 continue;
138
139 for (int k = 1; ; k++) {
140 int nx = x + dx * k,
141 ny = y + dy * k;
142
143 if (!is_valid(nx, ny)) {
144 break;
145 }
146
147 if (iz_empty(nx, ny) || is_white(x, y) != is_white(nx, ny)) {
148 res.push_back(ii(nx, ny));
149 }
150
151 if (!iz_empty(nx, ny)) {
152 break;
153 }
154 }
155 }
156 }
157
158
159 } else if (board[x][y] == 'K' || board[x][y] == 'k') {
160 // king
161
162 for (int dx = -1; dx <= 1; dx++) {
163 for (int dy = -1; dy <= 1; dy++) {
164 if (dx == 0 && dy == 0)
165 continue;
166
167 int nx = x + dx,
168 ny = y + dy;
169
170 if (is_valid(nx, ny) && (iz_empty(nx, ny) ||
171 is_white(x, y) != is_white(nx, ny))) {
172 res.push_back(ii(nx, ny));
173 }
174 }
175 }
176 } else {
177 assert(false);
79
178 }
179
180 return res;
181 }
182
183 bool is_mate() {
184
185 bool can_escape = false;
186
187 char new_board[8][8];
188
189 for (int x = 0; !can_escape && x < 8; x++) {
190 for (int y = 0; !can_escape && y < 8; y++) {
191 if (!iz_empty(x, y) && !is_white(x, y)) {
192
193 vii moves = next(x, y);
194 for (int i = 0; i < size(moves); i++) {
195 for (int j = 0; j < 8; j++)
196 for (int k = 0; k < 8; k++)
197 new_board[j][k] = board[j][k];
198
199 new_board[moves[i].first][moves[i].second] = board[x][y];
200 new_board[x][y] = '.';
201
202 swap(new_board, board);
203
204
205 bool is_killed = false;
206 for (int j = 0; !is_killed && j < 8; j++) {
207 for (int k = 0; !is_killed && k < 8; k++) {
208 if (!iz_empty(j, k) && is_white(j, k)) {
209 vii nxts = next(j, k);
210
211 for (int l = 0; l < size(nxts); l++) {
212 if (board[nxts[l].first][nxts[l].second] == 'k') {
213 is_killed = true;
214 break;
215 }
216 }
217 }
218 }
219 }
220
221 swap(new_board, board);
222
223 if (!is_killed) {
224 can_escape = true;
225 break;
226 }
80 CHAPTER 4. IMPLEMENTATION PROBLEMS
227 }
228
229 }
230 }
231 }
232
233 return !can_escape;
234 }
235
236 int main()
237 {
238 for (int i = 0; i < 8; i++) {
239 for (int j = 0; j < 8; j++) {
240 scanf("%c", &board[i][j]);
241 }
242
243 scanf("\n");
244 }
245
246 char new_board[8][8];
247 for (int x = 0; x < 8; x++) {
248 for (int y = 0; y < 8; y++) {
249 if (!iz_empty(x, y) && is_white(x, y)) {
250
251 vii moves = next(x, y);
252
253 for (int i = 0; i < size(moves); i++) {
254
255 for (int j = 0; j < 8; j++)
256 for (int k = 0; k < 8; k++)
257 new_board[j][k] = board[j][k];
258
259 new_board[moves[i].first][moves[i].second] = board[x][y];
260 new_board[x][y] = '.';
261
262 swap(new_board, board);
263
264
265 if (board[moves[i].first][moves[i].second] == 'P' &&
266 moves[i].first == 0) {
267
268 board[moves[i].first][moves[i].second] = 'Q';
269 if (is_mate()) {
270 printf("%c%d%c%d\n", y + 'a', 7 - x + 1,
271 moves[i].second + 'a', 7 - moves[i].first + 1);
272 return 0;
273 }
274
275 board[moves[i].first][moves[i].second] = 'N';
81
276 if (is_mate()) {
277 printf("%c%d%c%d\n", y + 'a', 7 - x + 1,
278 moves[i].second + 'a', 7 - moves[i].first + 1);
279 return 0;
280 }
281
282 } else {
283 if (is_mate()) {
284 printf("%c%d%c%d\n", y + 'a', 7 - x + 1,
285 moves[i].second + 'a', 7 - moves[i].first + 1);
286 return 0;
287 }
288 }
289
290 swap(new_board, board);
291 }
292 }
293 }
294 }
295
296 assert(false);
297
298 return 0;
299 }
That is a lot of code! Note how there are a few obvious mistakes which makes
the code harder to read, such as typo of iz_empty instead of is_empty, or how
the list of moves for the knight is called rook. Our final solution reduces this
to less than half the size.
Exercise 4.2
Read through the above code carefully and consider if there are better ways
to solve the problem. Furthermore, it has a bug – can you find it?
A short and sweet abstraction, that will prove very useful. It handles all
possible moves, except for pawns. These have a few special cases.
1 vii pawnMoves(int x, int y) {
2 vii moves;
3 if (x == 0 || x == 7) {
4 vii queenMoves = directionMoves(ALL_MOVES, 16, x, y);
5 vii knightMoves = directionMoves(KNIGHT, 1, x, y);
6 queenMoves.insert(queenMoves.begin(), all(knightMoves));
7 return queenMoves;
8 }
9 int mv = (isWhite(x, y) ? - 1 : 1);
10 if (isValid(x + mv, y) && isEmpty(x + mv, y)) {
11 moves.emplace_back(x + mv, y);
12 bool canMoveTwice = (isWhite(x, y) ? x == 6 : x == 1);
13 if (canMoveTwice && isValid(x + 2 * mv, y) && isEmpty(x + 2 * mv, y)) {
14 moves.emplace_back(x + 2 * mv, y);
15 }
16 }
17 auto take = [&](int nx, int ny) {
18 if (isValid(nx, ny) && !isEmpty(nx, ny)
19 && isWhite(x, y) != isWhite(nx, ny))
20 moves.emplace_back(nx, ny);
83
21 };
22 take(x + mv, y - 1);
23 take(x + mv, y + 1);
24 return moves;
25 }
This pawn implementation also takes care of promotion, rendering the logic
previously implementing this obsolete.
The remainder of the move generation is now implemented as:
1 vii next(int x, int y) {
2 vii moves;
3 switch(toupper(board[x][y])) {
4 case 'Q': return directionMoves(ALL_MOVES, 16, x, y);
5 case 'R': return directionMoves(CROSS, 16, x, y);
6 case 'B': return directionMoves(DIAGONAL, 16, x, y);
7 case 'N': return directionMoves(KNIGHT, 1, x, y);
8 case 'K': return directionMoves(ALL_MOVES, 1, x, y);
9 case 'P': return pawnMoves(x, y);
10 }
11 return moves;
12 }
We also have some duplication in the code making the moves. Before extract-
ing this logic, we will change the structure used to represent the board. A
char[8][8] is a tedious structure to work with. It is not easily copied or sent
84 CHAPTER 4. IMPLEMENTATION PROBLEMS
Hmm... there should be one more thing in common between the main and
is_mate functions. Namely, to check if the current player is in check after a
move. However, it seems this is not done in the main function – a bug. Since
we do need to do this twice, it should probably be its own function:
1 bool inCheck(bool white) {
2 trav(mv, getMoves(!white)) {
3 ii to = mv.second;
4 if (!isEmpty(to.first, to.second)
5 && isWhite(to.first, to.second) == white
6 && toupper(board[to.first][to.second]) == 'K') {
7 return true;
8 }
9 }
10 return false;
11 }
Now, the long is_mate function is much shorter and readable, thanks to our
refactoring:
1 bool isMate() {
2 if (!inCheck(false)) return false;
3 Board oldBoard = board;
4 trav(mv, getMoves(false)) {
5 board = doMove(mv);
6 if (!inCheck(false)) return false;
7 board = oldBoard;
8 }
9 return true;
10 }
A similar transformation is now possible of the main function, that loops over
all moves white make and checks if black is in mate:
1 int main() {
85
2 rep(i,0,8) {
3 string row;
4 cin >> row;
5 board.push_back(row);
6 }
7 Board oldBoard = board;
8 trav(mv, getMoves(true)) {
9 board = doMove(mv);
10 if (!inCheck(true) && isMate()) {
11 outputSquare(mv.first.first, mv.first.second);
12 outputSquare(mv.second.first, mv.second.second);
13 cout << endl;
14 break;
15 }
16 }
17 return 0;
18 }
Now, we have actually rewritten the entire solution. From the 300-line behe-
moth with gigantic functions, we have refactored the solution into few, short
functions with are easy to follow. The rewritten solution is less than half
the size, clocking in at less than 140 lines (the author’s own solution is 120
lines). Learning to code such structured solutions comes to a large extent from
experience. During a competition, we might not spend time thinking about
how to structure our solutions, instead focusing on getting it done as soon as
possible. However, spending 1-2 minutes thinking about how to best imple-
ment a complex solution could pay off not only in faster implementation times
(such as halving the size of the program) but also in being less buggy.
Cross – cross
Many good sources exist to become more proficient at writing readable and
simple code. Clean Code[13] describes many principles that helps in writing
better code. It includes good walk-throughs on refactoring, and shows in a
very tangible fashion how coding cleanly also makes coding easier.
Code Complete[14] is a huge tome on improving your programming skills.
While much of the content is not particularly relevant to coding algorithmic
problems, chapters 5-19 give many suggestions on coding style.
Different languages have different best practices. Some resources on improving
your skills in whatever language you code in are:
C++ Effective C++[16], Effective Modern C++[17], Effective STL[15], by Scott
Meyers,
Java Effective Java[3] by Joshua Bloch,
Python Effective Python[21] by Brett Slatkin, Python Cookbook[2] by David Bea-
zley and Brian K. Jones.
Chapter 5
Time Complexity
How do you know if your algorithm is fast enough before you have coded
it? In this chapter we examine this question from the perspective of time
complexity, a common tool of algorithm analysis to determine roughly how
fast an algorithm is.
We start our study of complexity by looking at a new sorting algorithm –
insertion sort. Just like selection sort (studied in Chapter 1), insertion sort
works by iteratively sorting a sequence.
The insertion sort algorithm works by ensuring that all of the first i elements
of the input sequence are sorted. First for i = 1, then for i = 2, etc, up to i = n,
at which point the entire sequence is sorted.
87
88 CHAPTER 5. TIME COMPLEXITY
In this section we determine how long time insertion sort takes to run. When
analyzing an algorithm we do not attempt to compute the actual wall clock
time an algorithm takes. Indeed, this would be nearly impossible a priori
– modern computers are complex beasts with often unpredictable behavior.
Instead, we try to approximate the growth of the running time, as a function of
the size of the input.
Competitive Tip
When sorting fixed-size integers the size of the input would be the number of
elements we are sorting, N. We denote the time the algorithm takes in relation
to N as T (N). Since an algorithm often has different behaviours depending
on how an instance is constructed, this time is taken to be the worst-case time,
over every instance of N elements.
To properly analyze an algorithm, we need to be more precise about exactly
5.1. THE COMPLEXITY OF INSERTION SORT 89
5 2 4 1 3 0 t0 = 0
5 2 4 1 3 0 t1 = 1
2 5 4 1 3 0 t2 = 1
2 4 5 1 3 0 t3 = 3
1 2 4 5 3 0 t4 = 2
1 2 3 4 5 0 t5 = 5
0 1 2 3 4 5
what it does. We give the following pseudo code for insertion sort:
To analyze the running time of the algorithm, we make the assumption that
any “sufficiently small” operation takes the same amount of time – exactly 1
90 CHAPTER 5. TIME COMPLEXITY
X X X
N−1
! N−1
! N−1
!
T (N) = N + N + ti + ti + ti
i=0 i=0 i=0
X
N−1
!
=3 ti + 2N
i=0
X
N−1
!
T (N) = 3 i + 2N
i=0
(N − 1)N
=3 + 2N
2
3
= (N2 − N) + 2N
2
3 N
= N2 +
2 2
This function grows quadratically with the number of elements of N. Since the
approximate growth of the time a function takes is assigned such importance
a notation was developed for it.
Almost always when we express the running time of an algorithm we use what
is called asymptotic notation. The notation captures the behavior of a function
as its arguments grow. For example, the function T (N) = 32 N2 + N 2 which
described the running time of insertion sort, is bounded by c · N2 for large N,
for some constant c. We write
T (N) = O(N2 )
T (N)
100N + 1337
N2
Intuitively, the notation means that f(n) grows slower than or as fast as g(n),
within a constant factor. Any quadratic function an2 + bn + c = O(n2 ).
Similarly, any linear function an + b = O(n2 ) as well. This definition implies
that for two functions f and g which are always within a constant factor of
each other, we have that both f(n) = O(g(n)) and g(n) = O(f(n)).
We can use this definition to prove that the running time of insertion sort is
O(N2 ), even in the worst case.
3 2 N
Example 5.1 Prove that 2N + 2 = O(N2 ).
Competitive Tip
Complexity analysis can also be used to determine lower bounds of the time
an algorithm takes. To reason about lower bounds we use Ω-notation. It is
94 CHAPTER 5. TIME COMPLEXITY
We know that the complexity of insertion sort has an upper bound of O(N2 )
in the worst-case, but does it have a lower bound? It actually has the same
lower bound as upper bound, i.e. T (N) = Ω(N2 ).
3 2 N
Example 5.2 Prove that 2N + 2 = Ω(N2 ).
In this case, both the lower and the upper bound of the worst-case running
time of insertion sort coincided (asymptotically). We have another notation for
when this is the case:
Exercise 5.1
Find a lower and an upper bound that coincide for the best-case running
time for insertion sort.
5.2. ASYMPTOTIC NOTATION 95
Exercise 5.2
Give an O(n) algorithm and an O(1) algorithm to compute the sum of the
n first integers.
Exercise 5.3
Prove, using the definition, that 10n2 + 7n − 5 + log2 n = O(n2 ). What
constants c, n0 did you get?
Exercise 5.4
Prove that f(n) + g(n) = Θ(max{f(n), g(n)}) for non-negative functions f
and g.
Exercise 5.5
Is 2n+1 = O(2n )? Is 22n = O(2n )?
Exercise 5.6
Prove that (n + a)b = O(nb ) for positive constants a, b.
5: while i 6= N do
6: while i < N and A[i] 6= v do
7: i←i+1
8: if i 6= N then
9: ans ← ans + 1
10: i←i+1
b1 + b2 + · · · + bk = Θ(N)
5.2. ASYMPTOTIC NOTATION 97
Our reasoning is as follows. There are two ways the variable i can increase. It
can either be increased inside the loop at line 7, or at line 10. If the loop executes
N times in total, it will certainly complete and never be executed again since
Pk
the loop at line 5 completes too. This gives us i=1 bi = O(N).
On the other hand, we get one iteration for every time i is incresed. If i is
increased on line 7, it was done within a loop iteration. If i is increased on line
9, we instead count the final check if the loop just before it once. Each addition
Pk
of i seems happen together with an iteration of the loop, so i=1 bi = Ω(N).
Together, these two results prove our claim.
This particular application of amortized complexity is called the aggregate
method.
Exercise 5.7
Consider the following method of implementing adding 1 to an integer
represented by its binary digit.
The algorithm is the binary version of the normal addition algorithm where
the two addends are written above each other and each resulting digit is
computed one at a time, possibly with a carry digit.
What is the amortized complexity of this procedure over 2n calls, if D starts
out as 0?
98 CHAPTER 5. TIME COMPLEXITY
There are several other types of complexities aside from the time complexity.
For example, the memory complexity of an algorithm measures the amount of
memory it uses. We use the same asymptotic notation when analyzing memory
complexity. In most modern programming competitions, the allowed memory
usage is high enough for the memory complexity not be a problem – if you get
memory limit problems you also tend to have time limit problems. However,
it is still of interest in computer science (and thus algorithmic problem solving)
and computer engineering in general.
Another common type of complexity is the query complexity. In some prob-
lems (like the Guessing Problem from chapter 1), we are given access to some
kind of external procedure (called an oracle) that computes some value given
parameters that we provide. A procedure call of this kind is called a query.
The number of queries that an algorithm makes to the oracle is called its query
complexity. Problems where the algorithm is allowed access to an oracle often
bound the number of queries the algorithm may make. In these problems the
query complexity of the algorithm is of interest.
5.5. THE IMPORTANCE OF CONSTANT FACTORS 99
In this chapter, we have essentially told you not to worry about the constant
factors that the Θ notation hides from you. While this is true when you are
solving problems in theory, only attempting to get a good asymptotic time
complexity, constant factors can unfortunately be of large importance when
implementing the problems subject to time limits.
Speeding up a program that has the correct time complexity but still gets
time limit exceeded when submitted to an online judge is half art and half
engineering. The first trick is usually to generate the worst-case test instance.
It is often enough to create a test case where the input matches the input limits
of the problem, but sometimes your program behaves differently depending
on how the input looks. In these cases, more complex reasoning may be
required.
Once the worst-case instance has been generated, what remains is to improve
your code until you have gained a satisfactory decrease in how much time our
program uses. When doing this, focus should be placed on those segments of
your code that takes the longest wall-clock time. Decreasing time usage by
10% in a segment that takes 1 second is clearly a larger win than decreasing
time usage by 100% in a segment that takes 0.01 seconds.
There are many tricks to improving your constant factors, such as:
• using symmetry to perform less calculations
• precomputing oft-repeated expressions, especially involving trigonomet-
ric functions
• passing very large data structures by reference instead of by copying
• avoiding to repeatedly allocate large amounts of memory
• using SIMD (single instruction, multiple data) instructions
100 CHAPTER 5. TIME COMPLEXITY
Exercise 5.9
Order the following functions by their asymptotic growth with proof !
• x
√
• x
• x2
• 2x
• ex
• x!
• log x
1
• x
• x log x
• x3
Exercise 5.10
Prove that if a(x) = O(b(x)) and c(x) = O(d(x)), then a(x) + c(x) =
O(b(x) + d(x)).
5.7. CHAPTER NOTES 101
Foundational Data
Structures
The most basic data structure is the fixed-size array. It consists of a fixed size
block of memory and can be viewed as a sequence of N variables of the same
type T. It supports the operations:
103
104 CHAPTER 6. FOUNDATIONAL DATA STRUCTURES
Complexity: Θ(1)1
• delete arr: deleting an existing array.
Complexity: Θ(1)
• arr[index]: accessing the value in a certain access a value.
Complexity: Θ(1)
In Chapter 2 we saw how to create fixed-size arrays where we knew the size
beforehand. In C++ we can create fixed-size arrays using an expression as size
instead. This is done using the syntax above, for example:
int size = 5;
int arr[] = new int[size];
arr[2] = 5;
cout << arr[2] << endl;
delete arr;
Exercise 6.1
What happens if you try to create an array with a negative size?
The fixed-size array can be used to implement a more useful data structure,
the dynamic array. This is an array that can change size when needed. For ex-
ample, we may want to repeatedly insert values in the array without knowing
the total number of values beforehand. This is a very common requirement in
programming problems. In particular, we want to support two additional op-
erations in addition to the operations supported by the fixed-size array.
• insert(pos, val): inserting a value in the array at a given position.
Amortized complexity: Θ(size − pos)
Worst case complexity: Θ(size)
• remove(pos): erase a position in an array.
Complexity: Θ(size − pos)
The complexities we list above are a result of the usual implementation of the
dynamic array. A key consequence of these complexities is that addition and
1 This complexity is debatable, and highly dependent on what computational model one uses.
In practice, this is roughly “constant time” in most memory management libraries used in C++. In
all Java and Python implementations we tried, it is instead linear in the size.
6.1. DYNAMIC ARRAYS 105
removal of elements to the end of the dynamic array takes Θ(1) amortized
time.
A dynamic array can be implemented in numerous ways, and underlies the
implementation of essentially every other data structure that we use. A naive
implementation of a dynamic array is using a fixed-size array to store the data
and creating a new one with the correct size whenever we add or remove
an element, copying the elements from the old array to the new array. The
complexity for this approach is linear in the size of the array for every operation
that changes the size since we need to copy Θ(size) elements during every
update. We need to do better than this.
To achieve the targeted complexity, we can instead modify this naive approach
slightly by not creating a new array every time we have to change the size of
the dynamic array. Whenever we need to increase the size of the dynamic
array, we create a fixed-size array that is larger than we actually need it to
be. For example, if we create a fixed-size array with n more elements than
our dynamic array needs to store, we will not have to increase the size of the
backing fixed-size array until we have added n more elements to the dynamic
array. This means that a dynamic array does not only have a size, the number
of elements we currently store in it, but also a capacity, the number of elements
we could store in it. See Figure 6.1 for a concrete example of what happens
when we add elements to a dynamic array that is both within its capacity and
when we exceed it.
size = 4 size = 5
0 1 2 3 0 1 2 3 4
cap = 5 cap = 5
size = 6
0 1 2 3 4 5
cap = 10
example of how such a structure could look can be found in Listing 6.1.
We are almost ready to add and remove elements to our array now. First, we
need to handle the case where insertion of a new element would result in
the size of the dynamic array would exceed its capacity, that is when size =
capacity. Our previous suggestion was to allocate a new, bigger one, but just
how big? If we always add, say, 10 new elements to the capacity, we have to
perform the copying of the old elements with every 10’th addition. This still
results in additions to the end of the array taking linear time on average. There
is a neat trick that avoids this problem; creating the new backing array with
double the current capacity.
This ensures that the complexity of all copying needed for an array up to
some certain capacity have an amortized complexity of Θ(cap). Assume that
we have just increased the capacity of our array to cap, which required us to
cap
copy 2 elements. Then, the previous increase will have happened at around
cap cap cap
capacity 2 and took time 4 . The one before that occurred at capacity 4 and
so on.
We can sum of all of this copying:
cap cap
+ + · · · ≤ cap
2 4
Since each copy is assumed to take Θ(1) time, the total time to create this array
cap
was Θ(cap). As 2 ≤ size ≤ cap, this is also Θ(size), meaning that adding size
elements to the end of the dynamic array takes amortized Θ(size) time.
6.1. DYNAMIC ARRAYS 107
When implementing this in code, we use a function that takes as argument the
capacity we require the dynamic array to have and ensures that the backing
array have this size, possibly by creating a new one double in size until it is
sufficiently large. Example code for this can be found in Listing 6.2.
With this method in hand, insertion and removal of elements is actually pretty
simple. Whenever we remove an element, we simply need to move the ele-
ments coming after it in the dynamic array forward one step. When adding
an element, we reverse this process by moving the elements coming after
the position we wish to insert a new element at one step towards the back.
Figure 6.2 shows the effects of these operations in a concrete way.
Exercise 6.2
Implement insertion and removal of elements in a dynamic array.
Dynamic arrays are called vectors in C++ (Section 3.1). They have the same
complexities as the one described at the beginning of this section.
108 CHAPTER 6. FOUNDATIONAL DATA STRUCTURES
size = 6 size = 6
0 1 2 3 4 5 0 1 × ←3 ←4 ←5
cap = 10 cap = 10
size = 5
0 1 3 4 5
cap = 10
0 1 3 4 5 0 1 ∗→3→4 →5
cap = 10 cap = 10
size = 6
0 1 2 3 4 5
cap = 10
Exercise 6.3
Assume that you want to implement shrinking of an array where many
elements were deleted so that the capacity is unnecessarily large. We will
call this shrinking function whenever we perform a removal, to see if we
potentially want to shrink the array. What is the problem with the following
implementation?
1: procedure S HRINK V ECTOR (V)
2: while 2 · V.capacity > V.size do
3: arr ← new T [V.capacity/2]
4: copy the elements of V.backing to arr
5: V.backing ← arr
6: V.capacity ← V.capacity/2
6.2. STACKS 109
Exercise 6.4
How can we do any removal in Θ(1) if we do not care about the ordering
of values in the array?
6.2 Stacks
The stack is a data structure that contains an ordered lists of values and
supports the following operations:
• push(val): inserting a value at the top of the stack.
Amortized complexity: Θ(1)
Exercise 6.5
Implement a stack using a dynamic vector.
6.3 Queues
The queue is like the vector and the stack an ordered list of values, but instead
of removing and getting values from the end like the stack, it gets the value
from the front.
110 CHAPTER 6. FOUNDATIONAL DATA STRUCTURES
size = 4
1 2 3 4
f ront = 1
size = 5
1 2 3 4 5
f ront = 1
size = 4
2 3 4 5
f ront = 2
Exercise 6.6
Implement a queue using a vector.
Exercise 6.7
A queue can also be implemented using two stacks. How?
In a similar manner, a stack can also be implemented using two queues.
How?
6.4 Graphs
Finally, we discuss a data structure called a graph. Graphs are very important
objects in algorithmic problem solving. If you are not familiar with them, you
should first read appendix Section A.4 that introduces you to the mathematical
terminology of graphs. This chapter instead deals with the representation of
graphs in an algorithm.
There are three common ways of reprensenting graphs in algorithms: adjacency
matrices, adjacency lists and adjacency maps. Occasionally we also represent
graphs implicitly – for example by a function that gives us the neighbours of a
single vertex. This latter representation is common when dealing with searches
in the graph corresponding to the positions in a game such as chess.
In the following sections, we present the representation of the directed, un-
weighted graph in Figure 6.4.
1 2 5
4 3 6
This representation uses Θ(|V|2 ) memory, and takes O(1) time for adding,
modifying and removing edges. To iterate through the neighbours of a vertex,
you need Θ(|V|) time, independent of the number of neighbours of the vertex
itself.
Adjacency matrices are best to use when |V|2 ≈ |E|, i.e. when the graph is
dense.
The adjacency matrix for the directed, unweighted graph in Figure 6.4 is:
0 1 0 0 0 0
0 0 1 0 0 0
0 0 0 1 0 0
0 1 0 0 0 0
0 0 0 0 1 0
0 0 0 0 0 0
6.4. GRAPHS 113
When representing weighted graphs, the list usually contains pairs of (neighbour, weight)
for the edges instead. For undirected graphs, both endpoints of an edge con-
tains the other in their adjaceny lists.
An adjacency map combines the adjacency matrix with the adjacency list to
get the benefits of both the matrix (Θ(1) time to check if an edge exists) and
the lists (low memory usage and fast neighbourhood iteration). Instead of
using lists of neighbours for each vertex, we can use a hash table for each
vertex.
This has the same time and memory complexities as the adjacency lists, but it
also allows for checking if an edge is present in Θ(1) time. The downsides are
that hash tables have a higher constant factor than the adjacency list, and that
you lose the ordering you have of your neighbours (if this is important). The
adjacency map also inherits another sometimes important property from the
matrix: you can remove arbitrary edges in Θ(1) time!.
114 CHAPTER 6. FOUNDATIONAL DATA STRUCTURES
Exercise 6.8
Given a graph, which representation or representations are suitable if
a) |V| = 1000 and |E| = 499500
b) |V| = 10000 and |E| = 20000
c) |V| = 1000 and |E| = 10000
Now, let us look at our first more complicated data structure. The priority
queue is an unordered bag of elements, from which we can get and remove the
largest one quickly. It supports the operations
• push(val): inserting a value into the heap.
Complexity: O(log n)
A binary tree is a rooted tree where every vertex have either 0, 1 or 2 children.
In Figure 6.5, you can see an example of a binary tree.
5 7
11 4
Figure 6.5: A binary tree
2 3
4 5 6
Figure 6.6: A complete binary tree
We call a binary tree complete if every level of the tree is completely filled,
except possibly the bottom one. If the bottom level is not filled, all the vertices
need to be as far as possible to the left. In a complete binary tree, we can order
every vertex as we do in Figure 6.6, i.e. from the top down, left to right at each
116 CHAPTER 6. FOUNDATIONAL DATA STRUCTURES
layer.
The beauty of this numbering is that we can use it to store a binary tree in a
vector. Since each vertex is given a number, we can map the number of each
vertex into a position in a vector. The n vertices of a complete binary tree then
occupy all the indices [1, n]. An important property of this numbering is that
it is easy to compute the number of the parent, left child and right child of a
vertex. If a vertex has number i, the parent simply have number b 2i c, the left
child has number 2i and the right child has number 2i + 1.
Exercise 6.9
Prove that the above properties of the numbering of a complete binary tree
hold.
Note: if you use a vector to represent a complete binary tree in this manner it
needs to have the size n + 1 where n is the number of vertices, since the tree
numbering is 1-indexed and the vector is 0-indexed!
6.5.2 Heaps
10
7 4
4 5 1
Figure 6.7: A heap
Pushing a value reduces to appending it to the tree and bubbling it up. You
can see this procedure in action in Figure 6.8.
Removing a value is slightly harder. First of, the tree will no longer be a binary
tree – it is missing its root! To rectify this, we can take the last element of
118 CHAPTER 6. FOUNDATIONAL DATA STRUCTURES
10 10
7 4 7 4
4 5 1 4 5 1 13
10 13
7 13 7 10
4 5 1 4 4 5 1 4
the tree and put it as root instead. This keeps the binary tree complete, but
may cause it to violate the heap property since our new root may be smaller
than either or both of its children. The solution to this problem is similar as
to that of adding an element. Instead of bubbling up, we bubble it down by
repeatedly swapping it with one of its children until it no longer is greater
than any of its children. The only question mark is which of its children we
should bubble down to, in case the element is smaller than both of its children.
The answer is clearly the largest of the two children. If we take the smaller of
the two children, we will again violate the heap property.
A final piece of our analysis is missing. It is not yet proven that the time
complexity of adding and removing elements are indeed O(log n). To do
this, we first need to state a basic fact of complete binary trees: their height
is at most log2 n. This is easily proven by contradiction. Assume that the
height of the tree is at least log2 n + 1. We claim that any such tree must have
strictly more than n vertices. Since all but the last layers of the tree must be
complete, it must have at least 1 + 2 + · · · + 2log2 n = 2log2 n+1 − 1 vertices.
But 2log2 n+1 − 1 = 2n + 1 > n for positive n – the tree has more than n
vertices. This means that a tree with n vertices cannot have more than height
log2 n.
The next piece of the puzzle is analyzing just how many iterations the loops
in the bubble up and bubble down procedures can perform. In the bubble
up procedure, we keep an index to a vertex that, for every iteration, moves
up in the tree. This can only happen as many times as there are levels in
the tree. Similarly, the bubble down procedure tracks a vertex that moves
down in the tree for every iteration. Again, this is bounded by the number of
levels in the tree. We are forced to conclude that since the complexity of each
iteration is Θ(1) as they only perform simple operations, the complexities of
the procedures as a whole are O(log n).
120 CHAPTER 6. FOUNDATIONAL DATA STRUCTURES
Exercise 6.11
Prove that adding an element using Push never violates the heap property.
Exercise 6.12
To construct a heap with n elements by repeatedly adding one at a time
takes O(n log n) time, since the add function takes O(log n) time in the
worst case.
One can also construct it in Θ(n) time in the following way: arbitrar-
ily construct a complete binary tree with all the n elements. Now, call
Bubble-Down on each of the elements in reverse order n, n − 1, . . . , 2, 1.
Prove that this correctly constructs a heap, and that is takes Θ(n) time.
For a more rigorous treatment of the basic data structures, we again refer to
Introduction to Algorithms [5]. If you want to dive deeper into proper imple-
mentations of the algorithms in C++, Data Structures and Algorithm Analysis in
C++[25] covers what we brought up in this chapter and a bit more.
Part II
Basics
121
Chapter 7
Brute Force
123
124 CHAPTER 7. BRUTE FORCE
Our first brute force method is the generate and test method. This particular
brute force strategy consists of generating solutions – naively constructing
candidate solutions to a problem – and then testing them – removing invalid
solutions. It is applicable whenever the number of candidate solutions is quite
small.
Max Clique
In a graph, a subset of the vertices form a clique if each pair of vertices is
connected by an edge. Given a graph, determine the size of the largest
clique.
Input
The first line of input contains an integer N and M – the number of
vertices and the number of edges of the graph. The vertices are numbered
0, 1, . . . , N − 1. The next M lines contain the edges of the graph. An edge
is given as two space-separated integers A and B – the endpoints of the
7.2. GENERATE AND TEST 125
graph.
Output
Output a single number – the size of the largest clique in the graph.
1 int main() {
2 int N, M;
3 cin >> N >> M;
4 vector<vector<bool>> adj(N, vector<bool>(N));
5 rep(i,0,M) {
6 int A, B;
7 cin >> A >> B;
8 adj[A][B] = adj[B][A] = true;
9 }
10 rep(i,0,N) adj[i][i] = true;
11 rep(i,0,1<<N) {
126 CHAPTER 7. BRUTE FORCE
12 rep(j,0,N) {
13 if (!(i & (1 << j))) continue;
14 rep(k,0,N) {
15 if (!(i & (1 << k))) continue;
16 if (!adj[j][k]) goto skip;
17 }
18 }
19 ans = max(ans, __builtin_popcount(i));
20 skip:;
21 }
22 cout << ans << endl;
23 }
This kind of brute force problem is often rather easy to spot. There will be
a very small input limit on the parameter you are to brute force over. The
solution will often be subsets of some larger base set (such as the vertices of a
graph).
Let us look at another example of this technique, where the answer is not just
a subset.
The Clock
Swedish Olympiad in Informatics 2004, School Qualifiers
When someone asks you what time it is, most people respond “a quarter
past five”, “15:29” or something similar. If you want to make things a bit
harder, you can answer with the angle between the minute and the hour
hands, since this uniquely determines the time. However, many people
are unused to this way of specifying the time, so it would be nice to have
a program which translates this to a more common format.
We assume that our clock have no seconds hand, and only display the
time at whole minutes (i.e., both hands only move forward once a minute).
7.2. GENERATE AND TEST 127
The angle is determined by starting at the hour hand and measuring the
number of degrees clockwise to the minute hand. To avoid decimals, this
angle will be specified in tenths of a degree.
Input
The first and only line of input contains a single integer 0 ≤ A < 3600, the
angle specified in tenths of a degree.
Output
Output the time in the format hh:mm between 00:00 and 11:59.
It is difficult to come up with a formula that gives the correct times as a function
of the angles between the hands on a clock. Instead, we can turn the problem
around. If we know what the time is, can we compute the angle between the
two hands of the clock?
Assume that the time is currently h hours and m minutes. The minutes hand
will then be at angle 360
60 m = 6m degrees. Similarly, the hour hand moves
360
12 h = 30h degrees due to the hours, and 360 1
12 60 m = 0.5m degrees due to the
minute. While computing the current time directly from the angle is difficult,
computing the angle directly from the current time is easy.
Our solution will be to test the 60 · 12 = 720 different times, and pick the one
which matched the given angle (Algorithm 7.2).
Competitive Tip
Sometimes, competitions pose problems which are solvable quite fast, but
a brute force algorithm will suffice as well. As a general rule, code the
simplest correct solution that is fast enough, even if you see a faster one.
Natjecanje – natjecanje
Perket – perket
7.3 Backtracking
Backtracking is a variation of the generate and test method. Most of the time it
is faster, but it can sometimes be more difficult to code (in particular when the
solutions are subsets).
Let us return to the Max Clique problem. In the problem, we generated all the
candidate solutions (i.e., subsets of vertex) by using bitsets. In problems where
the candidate solutions are other objects than subsets, or the number of subsets
is too large to iterate through, we need to construct the solutions in another
way. Generally, we do this recursively. For example, when generating subsets,
we would go through every element one at a time, and decide whether to
include it. Backtracking extends generate and test to not only testing all the
candidate solutions, but also these partial candidates. Thus, a backtracking
approach can be faster than an iterative approach, by testing fewer candidate
solutions.
For example, assume that two vertices a and b are not connected by an edge
in the graph. If we decide to include a in our solution, we already know that a
candidate solution can not include b. Thus, we have managed to reduce four
choices – including or excluding a and b – to three choices, by eliminating the
inclusion of both vertices. Algorithm 7.3 contains a C++ implementation of
this.
7.3. BACKTRACKING 129
Geppetto – geppetto
Sudokunique – sudokunique
All Friends – friends
drones to position. Then follows one line with 1 ≤ n ≤ 100 000, the total
number of intersections in Basin City. Finally follow n lines describing
consecutive intersections. The i’th line describes the i’th intersection in the
following format: The line starts with one integer d (0 ≤ d ≤ 4) describing
the number of intersections neighbouring the i’th one. Then follow d
integers denoting the indices of these neighbouring intersections. They
will be all distinct and different from i. The intersections are numbered
from 1 to n.
Output
If it is possible to position k drones such that no two neighbouring in-
tersections have been assigned a drone, output a single line containing
possible. Otherwise, output a single line containing impossible.
At a first glance, it is not even obvious whether the problem is a brute force
problem, or if some smarter principle should be applied. After all, 100 000
vertices is a huge number of intersections! We can make the problem a bit
more reasonable with our first insight. If we have a large number of inter-
sections, and every intersection is adjacent to very few other intersection, it
is probably very easy to place the drones at appropriate intersections. To
formalize this insight, consider what happens when we place a drone at an
intersection.
By placing a drone at the intersection marked in black in Figure 7.1, at most five
intersections are affected – the intersection we placed the drone at, along with
its neighbouring intersections. If we would remove these five intersections, we
would be left with a new city where we need to place k − 1 drones. This simple
fact – which is the basis of a recursive solution to the problem – tells us that if
we have N ≥ 5k − 4 intersections, we immediately know the answer is possible.
The −4 terms comes from the fact that when placing the final drone, we no
longer care about removing its neighbourhood, since no further placements
will take place.
Therefore, we can assume that the number of vertices is less than 5 · 15 − 4 = 71,
i.e., n ≤ 70. This certainly makes the problem seem much more tractable. Now,
let us start developing solutions to the problem.
First of all, we can attempt to use the same algorithm as we used for the Max
Clique problem. We could recursively construct the set of our k drones by,
for each intersection, try to either place a drone there or not. If placing a
drone at an intersection, we would forbid placing drones at any neighbouring
intersection.
Unfortunately, this basically means that we are testing every intersection
when placing a certain drone somewhere. This would give us a complexity of
O(nk ). More specifically, the execution time T (n, k) would satisfy T (n, k) ≈
T (n − 1, k) + T (n − 1, k − 1), which implies T (n, k) ≈ n
k
k = Ω(n ) (see
Section 15.4 for more details). For n = 70, k = 15, this will almost certainly be
way too high. The values of n and k do suggest that an exponential complexity
is in order, just not of this kind. Instead, something similar to O(ck ) where c is
a small constant would be a better fit. One way of achiving such a complexity
would be to limit the number of intersections we must test to place a drone at
before trying one that definitely works. If we could manage to test only c such
intersections, we would get a complexity of O(ck ).
The trick, yet again, comes from Figure 7.1. Assume that we choose to include
this intersection in our solution, but still can not construct a solution. The
only reason this case can happen is (aside from bad previous choices) that
no optimal solution includes this intersection. What could possibly stop this
intersection from being included in an optimal solution? Basically, one of its
neighbours would have to be included in every optimal solution. Fortunately
for us, this gives us just what we need to improve our algorithm – either a
given intersection, or one of its neighbours, must be included in any optimal
solution.
7.3. BACKTRACKING 133
Infiltration – infiltration
The parameter fixing technique is also similar to the generate and test method,
but is not used to test the solution set itself. Instead, you perform brute force to
fix some parameter in the problem by trying possible values they can assume.
Hopefully, fixing the correct parameters allows you to solve the remaining
problem easily. The intuition is that while any particular choice of parameter
may be wrong, testing every choice allows us to assume that we at some point
used the correct parameter.
Buying Books
Swedish Olympiad in Informatics 2010, Finals
You are going to buy N books, and are currently checking the different
internet M book shops for prices. Each book is sold by at least one book
store, and can vary in prices between the different stores. Furthermore, if
you order anything from a certain book store, you must pay for postage.
Postage may vary between book stores as well, but is the same no matter
how many books you decide to order. You may order books from any
number of book stores. Compute the smallest amount of money you need
to pay for all the books.
Input
The first line contains two integers 1 ≤ N ≤ 100 – the number of books,
and 1 ≤ M ≤ 15 – the number of book stores.
Then, M descriptions of the book stores follow. The description of the i’th
store starts with a line containing two integers 0 ≤ Pi ≤ 1 000 (the postage
for this book store), and 1 ≤ Li ≤ N (the number of books this store sells).
The next L lines contains the books sold by the store. The j’th book is
described by two integers 0 ≤ Bi,j < N – the (zero-indexed) number of a
book being sold here, and 1 ≤ Ci,j ≤ 1000 – the price of this book at the
current book store.
Output
Output a single integer – the smallest amount of money you need to pay
for the books.
136 CHAPTER 7. BRUTE FORCE
Integer Equation
Codeforces Round #262, Problem B
Find all integers 0 < x < 109 which satsify the equation
x = a · s(x)b + c
where a, b and c are given integers, and s(x) is the digit sum of x.
Input
The input contains the three integers a, b, and c such that
|a| ≤ 10 000
1≤b≤5
|c| ≤ 10 000
Output
Output a single integer – number of integer solutions x to the equation.
138 CHAPTER 7. BRUTE FORCE
The meet in the middle technique is essentially a special case of the parameter
fixing technique. The general theme will be to fix half of the parameter space
and build some fast structure such that when testing the other half of the
parameter space, we can avoid explicitly re-testing the first half. It is a space-
time tradeoff, in the sense that we improve the time usage (testing half of the
parameter space much faste much faster), by paying with increased memory
usage (to save the pre-computed structures).
Subset Sum
Given a set of integers S, is there some subset A ⊆ S with a sum equal to
T?
Input
The first line contains an integer N, the size of S, and T . The next line
contains N integers s1 , s2 , ..., sN , separated by spaces – the elements of S.
It is guaranteed that si 6= sj for i 6= j.
Output
Output possible if such a subset exists, and impossible otherwise.
7.5. MEET IN THE MIDDLE 139
In this problem, we have N parameters in our search space. For each element
of S, we either choose to include it in A or not – a total of two choices for
each parameter. This naive attempt at solving the problem (which amounts to
computing the sum of every subset) gives us a complexity of O(2N ). While
sufficient for e.g. N = 20, we can make an improvement that makes the
problem tractable even for N = 40.
As it happens, our parameters are to a large extent independent. If we fix e.g.
the N N
2 first parameters, the only constraint they place on the remaining 2 pa-
rameters is the sum of the integers in subset they decide. This latter constraint
N
takes O(2 2 ) time to check for each choice of the first half of parameters if we
use brute force. However, the computation performed is essentially the same,
only differing in what sum we try to find. We will instead trade away some
memory in order to only compute this information once, by first computing all
possible sums that the latter half of the elements can form. This information
can be inserted into a hash set, which allows us to determine if a sum can
be formed by the elements in Θ(1) instead. Then, the complexity is instead
N
Θ(N2 2 ) in total.
Indoorienteering – indoorienteering
Key to Knowledge – keytoknowledge
r notes
Chapter 8
Greedy Algorithms
141
142 CHAPTER 8. GREEDY ALGORITHMS
One way to solve the problem would be to evaluate G for every path in
the graph. Unfortunately, the number of paths in a general graph can be
huge (growing exponentially with the number of vertices). However, the
optimal substructure property allows us to make a simplification. Assume
that S has neighbors v1 , v2 , ..., vm . Then by the definition of B and g, B(s) =
max(g({s, vi }, B(vi )). Thus, we can solve the problem by first solving the same
problem on all vertices vi instead.
We can phrase this problem using the kind of graph we previously discussed.
Let the graph have vertices labeled 0, 1, 2, ..., T , representing the amount of
money we wish to sum up to. For a vertex labeled x, we add edges to vertices
labeled x − 1, x − 2, x − 5 (if those vertices exist), weighted 1. Traversing such
an edge represents adding a coin of denomination 1, 2 or 5. Then, the Change-
making Problem can be phrased as computing the shortest path from the vertex
T to 0. The corresponding graph for T = 5 can be seen in Figure 8.1.
So, how does this graph formulation help us? Solving the problem on the graph
as before using simple recursion would be very slow (with an exponential
complexity, even). In Chapter 9 on Dynamic Programming, we will see how to
solve such problems in polynomial time. For now, we will settle with solving
problems exhibiting yet another property besides having optimal substructure
– that of local optimality.
8.2. LOCALLY OPTIMAL CHOICES 143
-1
-1 -1
5 4 3 2 1 0
-1 -1 -1 -1 -1
-1 -1
Exercise 8.1
Compute the shortest path from each vertex in Figure 8.1 to T using the
optimal substructure property.
Exercise 8.2
Prove that the greedy choice may fail if the coins have denominations 1, 6
and 7.
144 CHAPTER 8. GREEDY ALGORITHMS
Assume that the optimal solution uses the 1, 2 and 5 coins a, b and c times
respectively. We then have that either:
Competitive Tip
If you have the choice between a greedy algorithm and another algorithm
(such as one based on brute force or dynamic programming), use the
other algorithm unless you are certain the greedy choice works.
of arguments we are going to use – there are a few common types of proofs
which are often used in proofs of greedy algorithms.
8.3 Scheduling
Scheduling problems are a class of problems which deals with constructing large
subsets of non-overlapping intervals, from some given set of intervals.
The classical Scheduling Problem is the following.
Scheduling Problem
Given is a set of half-open (being open on the right) intervals S. Determine
the largest subset A ⊆ S of non-overlapping intervals.
Input
The input contains the set of intervals S, where |S|.
Output
The output should contain the subset A.
We will construct the solution iteratively, adding one interval at a time. When
looking for greedy choices to perform, extremal cases are often the first ones
you should consider. Hopefully, one of these extremal cases can be proved to
be included in an optimal solution. For intervals, some extremal cases would
be:
• a shortest interval,
• a longest interval,
• an interval overlapping as few other intervals as possible,
• an interval with the leftmost left endpoint (and symmetrically, the right-
most right endpoint),
• an interval with the leftmost right endpoint (and symmetrically, the
rightmost left endpoint).
146 CHAPTER 8. GREEDY ALGORITHMS
[ )
[ )
[ )
[ )
[ )
[ )
[ )
0 1 2 3 4 5 6 7 8
[ ) [ )[ ) [ )
Figure 8.2: An instance of the scheduling problem, with the optimal solution
at the bottom.
As it turns out, we can always select an interval satisfying the fifth case. In
the example instance in Figure 8.2, this results in four intervals. First, the
interval with the leftmost right endpoint is the interval [1, 2). If we include
this in the subset A, intervals [0, 3) and [1, 6) must be removed since they
overlap [1, 2). Then, the interval [3, 4) would be the one with the leftmost
right endpoint of the remaining intervals. This interval overlaps no other
interval, so it should obviously be included. Next, we would choose [4, 6)
(overlapping with [4, 7)), and finally [7, 8). Thus, the answer would be A =
{[1, 2), [3, 4), [4, 6), [7, 8)}.
We can prove that this strategy is optimal using a swapping argument, one of
the main greedy proof techniques. In a swapping argument, we attempt to
prove that given any solution, we can always modify it in such a way that
our greedy choice is no worse. This is what we did in the Change-making
Problem, where we argued that an optimal solution had to conform to a small
set of possibilities, or else we could swap some set of coins for another (such
as two coins worth 1 for a single coin worth 2).
Assume that the optimal solution does not contain the interval [l, r), an interval
with the leftmost right endpoint. The interval [l 0 , r 0 ) that has the leftmost right
endpoint of the intervals in the solution, must have r 0 > r. In particular, this
means any other interval [a, b) in the solution must have a ≥ r 0 (or else the
intervals would overlap). However, this means that the interval [l, r) does
not overlap any other interval either, since a ≥ r 0 > r so that a ≥ r. Then,
swapping the interval [l, r) for [l 0 , r 0 ) still constitute a valid solution, of the
same size. This proves that we could have included the interval [l, r) in the
optimal solution. Note that the argument in no way say that the interval [l, r)
must be in an optimal solution. It is possible for a scheduling problem to have
many distinct solutions. For example, in the example in Figure 8.2, we might
just as well switch [4, 6) for [4, 7) and still get an optimal solution.
This solution can be implemented in Θ(|S| log |S|) time. In Algorithm 8.1, this
is accomplished by sort performing a sort (in Θ(|S| log |S|)), followed by a loop
for each interval, where each iteration takes Θ(1) time.
Exercise 8.3
For each of the first four strategies, find a set of intervals where they fail to
find an optimal solution.
148 CHAPTER 8. GREEDY ALGORITHMS
Exercise 8.4
Prove that choosing to place the interval in the subset with the rightmost
right endpoint is optimal.
Dynamic Programming
This chapter will study a technique called dynamic programming (often abbre-
viated DP). In one sense, it is simply a technique to solve the general case of
the best path in a directed acyclic graph problem (Section 8.1) in cases where
the graph does not admit locally optimal choices, in time approximately equal
to the number of edges in the graph. For graphs which are essentially trees
with a unique path to each vertex, dynamic programming is no better than
brute force. In more interconnected graphs, where many paths lead to the
same vertex, the power of dynamic programming shines through. It can also
be seen as a way to speed up recursive functions (called memoization), which
will be our first application.
First, we will see a familiar example – the Change-making problem, with a
different set of denominations. Then, we will discuss a little bit of theory, and
finally round of with a few concrete examples and standard problems.
149
150 CHAPTER 9. DYNAMIC PROGRAMMING
We promised a generalization that can find the best path in a DAG that exhibit
optimal substructure even when locally optimal choices does not lead to an
optimal solution. In fact, we already have such a problem – the Change-
making Problem from Section 8.1, but with certain other sets of denominations.
Exercise 8.2 even asked you to prove that the case with coins worth 1, 6 and 7
could not be solved in the same greedy fashion.
So, how can we adapt our solution to this case? The secret lies in the graph
formulation of the problem which we constructed (Figure 8.1). For the greedy
solution, we essentially performed a recursive search on this graph, except we
always knew which edge to pick. When we do not know, the solution ought
to be obvious – let us test all the edges.
T
This solution as written is actually exponential in T (it is Ω(3 7 )). The recursion
tree for the case T = 10 can be seen in Figure 9.1.
The key behind the optimal substructure property is that the answer for any
particular call in this graph with the same parameter c is the same, indepen-
dently of the previous calls in the recursion. Right now, we perform calls with
the same parameters multiple times. Instead, we can save the result of a call
the first time we perform it (Algorithm 9.2).
1 0
8 2 1 0
7 0
1 0
6 0
5 4 3 2 1 0
Figure 9.1: The recursion tree for the Change-making problem with T = 10.
8 7 6 5 4 3 2 1 0
Figure 9.2: The recursion tree for the Change-making problem with T = 10,
with duplicate calls merged.
With these examples in hand, we are ready to give a more concrete character-
ization of dynamic programming. In principle, it can be seen as solving the
kind of “sequence of choices” problems that we used bruteforce to solve, but
where different choices can result in the same situation. For example, in the
Change-making Problem, after adding two coins worth 1 and 6, we might just
as well have added a 7 coin instead. After we have performed a sequence of
choices, how we got to the resulting state is no longer relevant – only where we
can go from there. Basically, we throw away the information (what exact coins
we used) that is no longer needed. This view of dynamic programming prob-
lems as having a “forgetful” property, that the exact choices we have made do
not affect the future, is useful in most dynamic programming problems.
Another, more naive view, is that dynamic programming solutions are simple
recursions, where we happen to solve the same recursive subproblem a large
number of times. In this view, a DP solution is basically nothing more than a
recursive solution – find the correct base cases, a fast enough recursion, and
memoize the results.
More pragmatically, DP consists of two important parts – the states of the
DP, and the computation of a state. Both of these parts are equally impor-
9.2. DYNAMIC PROGRAMMING 153
tant. Fewer states generally mean less computations to make, and a better
complexity per state gives a better complexity overall.
How do you choose between bottom-up and top-down? Mostly, it comes down
to personal choice. A dynamic programming solution will almost always be
fast enough no matter if you code it recursively or iteratively. There are some
performance concerns, both ways. A recursive solution is affected by the
overhead of recursive function calls. This problem is not as bad in C++ as
in many other languages, but it is still noticeable. When you notice that the
154 CHAPTER 9. DYNAMIC PROGRAMMING
number of states in your DP solution is running a bit high, you might want to
consider coding it iteratively. Top-down DP, on the other hand, has the upside
that only the states reachable from the starting state will be visited. In some
DP solutions, the number of unreachable states which are still in some sense
“valid” enough to be computed bottom-up is significant enough that excluding
them weighs up for the function call overhead. In extreme cases, it might turn
out that an entire parameter is uniquely given by other parameters (such as
the Ferry Loading problem in Section 9.3). While we probably would notice
when this is the case, the top-down DP saves us when we do not.
For top-down DP, the memory usage is often quite clear and unavoidable. If a
DP solution has N states, it will have an Ω(N) memory usage. For a bottom-up
solution, the situation is quite different.
Firstly, let us consider one of the downsides of bottom-up DP. When coding
a top-down DP, you do not need to bother with the structure of the graph
you are solving your problem on. For a bottom-up DP, you need to ensure
that whenever you solve a subproblem, you have already solved its subprob-
lems too. This requires you to define an order of computation, such that if the
subproblem a is used in solving subproblem b, a is computed before b.
In most cases, such an order is quite easy to find. Most parameters can simply
be increasing or decreasing, using a nested loop for each parameter. When a
DP is done over intervals, the order of computation is often over increasing or
decreasing length of the interval. DP over trees usually require a post-order
traversal of the tree. In terms of the recurring graph representation we often
use for DP problems, the order of computation must be a topological ordering
of the graph.
While this order of computation business may seem to be nothing but a nui-
sance that bottom-up users have to deal with, it is related to one of the perks
of bottom-up computation. If the order of computation is chosen in a clever
way, we need not save every state during our computation. Consider e.g. the
Change-making Problem again, which had the following recursion:
0 if n = 0
change(n) =
min(change(n − 1), change(n − 6), change(n − 7)) if n > 0
9.3. MULTIDIMENSIONAL DP 155
It should be clear that using the order of computation 0, 1, 2, 3, ..., once we have
computed e.g. change(k), the subproblems change(k − 7), change(k − 8), ...
etc. are never used again.
Thus, we only need to save the value of 7 subproblems at a time. This Θ(1)
memory usage is pretty neat compared to the Θ(K) usage needed to compute
change(K) otherwise.
Competitive Tip
9.3 Multidimensional DP
Now, we are going to look at a DP problem where our state consists of more
than one variable. The example will demonstrate the importance of carefully
choosing your DP parameters.
Ferry Loading
Swedish Olympiad in Informatics 2013, Online Qualifiers
A ferry is to be loaded with cars of different lengths, with a long line of
cars currently queued up for a place. The ferry consists of four lanes, each
of the same length. When the next car in the line enters the ferry, it picks
one of the lanes and parks behind the last car in that line. There must be
safety margin of 1 meter between any two parked cars.
Given the length of the ferry and the length of the cars in the queue,
compute the maximal amount of cars that can park if they choose the lanes
optimally.
156 CHAPTER 9. DYNAMIC PROGRAMMING
1
2 2
5 5 1
1
2
1
Figure 9.3: An optimal placement on a ferry of length 5 meters, of the cars
with lengths 2, 1, 2, 5, 1, 1, 2, 1, 1, 2 meters. Only the first 8 cars could fit on
the ferry.
Input
The first line contains the number of cars 0 ≤ N ≤ 200 and the length of
the ferry 1 ≤ L ≤ 60. The second line contains N integers, the length of the
cars 1 ≤ ai ≤ L.
Output
Output a single integer – the maximal number of cars that can be loaded
on the ferry.
9.4 Subset DP
Amusement Park
Swedish Olympiad in Informatics 2012, Online Qualifiers
Lisa has just arrived at an amusement park, and wants to visit each of
the N attractions exactly once. For each attraction, there are two identical
facilities at different locations in the park. Given the locations of all the
facilities, determine which facility Lisa should choose for each attraction,
in order to minimize the total distance she must walk. Originally, Lisa is
at the entrance at coordinates (0, 0). Lisa must return to the entrance once
she has visited every attraction.
Input
The first line contains the integer 1 ≤ N ≤ 15, the number of attractions
Lisa wants to visit. Then, N lines follow. The i’th of these lines contains
four integers −106 ≤ x1 , y1 , x2 , y2 ≤ 106 . These are the coordinates
(x1 , y1 ) and (x2 , y2 ) for the two facilities of the i’th attraction.
Output
First, output the smallest distance Lisa must walk. Then, output N lines,
one for each attraction. The i’th line should contain two numbers a and
f – the i’th attraction Lisa visited (a number between 1 and N), and the
facility she visited (1 or 2).
Consider a partial walk, where we have visited a set S of attractions and cur-
rently stand at coordinates (x, y). Then, any choice up to this point is irrelevant
for the remainder of the problem, which suggests that these parameter S, x, y
is a good DP state. Note that (x, y) only have at most 31 possibilities – two for
9.5. DIGIT DP 159
each attraction, plus the entrance at (0, 0). Since we have at most 15 attractions,
the set S of visited attractions has 215 possibilities. This gives us 31 · 215 ≈ 106
states. Each state can be computed in Θ(N) time, by choosing what attraction
to visit next. All in all, we get a complexity of Θ(N2 2N ). When coding DP over
subsets, we generally use bitsets to represent the subset, since these map very
cleanly to integers (and therefore indices into a vector):
9.5 Digit DP
Digit DP are a class of problems where we count numbers with certain proper-
ties that contain a large number of digits, up to a certain limit. These properties
are characterized by having the classical properties of DP problems, i.e. being
easily computable if we would construct the numbers digit-by-digit by remem-
bering very little information about what those numbers actually were.
Palindrome-Free Numbers
Baltic Olympiad in Informatics 2013 – Antti Laaksonen
A string is a palindrome if it remains the same when it is read backwards.
A number is palindrome-free if it does not contain a palindrome with a
160 CHAPTER 9. DYNAMIC PROGRAMMING
We fix the length of all numbers to have length n, by giving shorter num-
bers leading zeroes. Since leading zeroes in a number are not subject to the
palindrome restriction, they must be treated differently. In our case, they are
given the special digit −1 instead, resulting in 11 possible “digits”. Once this
function is memoized, it will have n · 2 · 11 · 11 different states, with each
state using a loop iterating only 10 times. Thus, it uses on the order of 1000n
operations. In our problem, the upper limit has at most 19 digits. Thus, the
solution only requires about 20 000 operations.
Once a solution has been formulated for this simple upper limit, extending it
to a general upper limit is quite natural. First, we will save the upper limit as
a sequence of digits L. Then, we need to differentiate between two cases in
our recursive function. The partially constructed number is either equal to the
corresponding partial upper limit, or it is less than the limit. In the first case,
we are still constrained by the upper limit – the next digit of our number can
not exceed the next digit of the upper limit. In the other case, the the upper
limit is no longer relevant. If a prefix of our number is strictly lower than the
prefix of the upper limit, our number can never exceed the upper limit.
This gives us our final solution:
1 vector<int> L;
2
3 ll sol(int at, int len, bool limEq, int b1, int b2) {
4 if (at == len) return 1;
162 CHAPTER 9. DYNAMIC PROGRAMMING
5 ll ans = 0;
6 // we may not exceed the limit for this digit
7 // if equal to the prefix of the limit
8 rep(d,0,(limEq ? L[at] + 1 : 10)) {
9 if (d == b2 || d == b1) continue;
10 // the next step will be equal to the prefix if it was now,
11 // and our added digit was exactly the limit
12 bool limEqNew = limEq && d == L[at];
13 bool leadingZero = b2 == -1 && d == 0;
14 ans += sol(at + 1, len, limEqNew, b2, leadingZero ? -1 : d);
15 }
16 return ans;
17 }
18
19 // initially, the number is equal to the
20 // prefix limit (both being the empty number)
21 sol(0, n, true, true, -1, -1);
9.6.1 Knapsack
The knapsack problem is one of the most common standard DP problem. The
problem itself has countless variations. We will look at the “original” knapsack
problem, with constraints making it suitable for a dynamic programming
approach.
Knapsack
Given is a knapsack with an integer capacity C, and n different objects,
each with an integer weight and value. Your task is to select a subset of
the items with maximal value, such that the sum of their weights does not
exceed the capacity of the knapsack.
Input
9.6. STANDARD PROBLEMS 163
The integer C giving the capacity of the knapsack, and an integer n, giving
the number of objects. This is followed by the n objects, given by their
value vi and weight wi .
Output
Output the indicies of the chosen items.
However, this only helps us compute the answer. The problem asks us to
explicitly construct the subset. This step, i.e., tracing what choices we made to
arrive at an optimal solution is called backtracking.
164 CHAPTER 9. DYNAMIC PROGRAMMING
For this particular problem, the backtracking is relatively simple. One usually
proceeds by starting at the optimal state, and then consider all transitions that
lead to this state. Among these, the “best” one is picked. In our case, the
transitions correspond to either choosing the current item, or not choosing it.
Both lead to two other states which are simple to compute. In the first case,
the state we arrived from must have the same value and capacity, while in the
second case the value should differ by V[i] and the weight by W[i]:
Set Cover
You are given a family of subsets S1 , S2 , ..., Sk of some larger set S of size
166 CHAPTER 9. DYNAMIC PROGRAMMING
n. Find a minimum number of subsets Sa1 , Sa2 , ..., Sal such that
l
[
Sai = S
i=1
i.e., cover the set S by taking the union of as few of the subsets Si as
possible.
For small k and large n, we can solve the problem in Θ(n2k ), by simply testing
each of the 2k covers. In the case where we have a small n but k can be
large, this becomes intractable. Instead, let us apply the principle of dynamic
programming. In a brute force approach, we would perform k choices. For
each subset, we would try including it or excluding it. After deciding which
of the first m subsets to include, what information is relevant? Well, if we
consider what the goal of the problem is – covering S – it would make sense to
record what elements have been included so far. This little trick leaves us with
a DP of Θ(k2n ) states, one for each subset of S we might have reached, plus
counting how many of the subsets we have tried to use so far. Computing a
state takes Θ(n) time, by computing the union of the current cover with the
set we might potentially add. The recursion thus looks like:
0 if C = S
cover(C, k) =
min(cover(C, k + 1), cover(C ∪ Sk , k + 1)) else
solution was to limit our choices to a set where we knew an optimal solution
would be found.
Applying the same change to our set cover solution, we should instead do DP
over our current cover, and only try including sets which are not subsets of
the current cover. So, does this help? How many subsets are there, for a given
cover C, which are not its subsets? If the size of C is m, there are 2m subsets of
C, meaning 2n − 2m subsets can add a new element to our cover.
To find out how much time this needs, we will use two facts.Pn Firstnof all, there
n
are m subsets of size m of a size n set. Secondly, the sum m=0 m 2m = 3m .
If you are not familiar with this notation or this fact, you probably want to
take a look at Section 15.4 on binomial coefficients.
So, summing over all possible extending subsets for each possible partial C,
we get:
X
n
n
(2n − 2m ) = 2n · 2n − 3n = 4n − 3n
m
m=0
Closer, but no cigar. Intuitively, we still have a large number of redundant
choices. If our cover contains, say, n−1 elements, there are 2n−1 sets which can
extend our cover, but they all extend it in the same way. This sounds wasteful,
and avoiding it probably the key to getting an asymptotic speedup.
It seems that we are missing some key function which, given a set A, can
respond to the question: “is there some subset Si , that could extend our cover
with some subset A ⊆ S?”. If we had such a function, computing all possible
extensions of a cover of size m would instead take time 2n−m – the number of
possible extensions to the cover. Last time we managed to extend a cover in
time 2n − 2m , but this is exponentially better!
In fact, if we do our summing this time, we get:
X n
n n−m Xn
n
2 = 2n−m
m n−m
m=0 m=0
Xn
n m
= 2
m
m=0
n
=3
It turns out our exponential speedup in extending a cover translated into an
exponential speedup of the entire DP.
168 CHAPTER 9. DYNAMIC PROGRAMMING
We are not done yet – this entire algorithm depended on the assumption of
our magical “can we extend a cover with a subset A?” function. Sometimes,
this function may be quick to compute. For example, if S = {1, 2, ..., n} and the
family Si consists of all sets whose sum is less than n, an extension is possible
if and only if its sum is also less than n. In the general case, our Si are not this
nice. Naively, one might think that in the general case, an answer to this query
would take Θ(nk) time to compute, by checking if A is a subset of each of our
k sets. Yet again, the same clever trick comes to the rescue.
If we have a set Si of size m available for use in our cover. just how many
possible extensions could this subset provide? Well, Si itself only have 2m
subsets. Thus, if we for each Si mark for each of its subsets that this is a
possible extension to a cover, this precomputation only takes 3n time (by the
same sum as above).
Since both steps are O(3n ), this is also our final complexity.
r notes
Chapter 10
169
170 CHAPTER 10. DIVIDE AND CONQUER
Grid Tiling
In a square grid of side length 2n , one unit square is blocked (represented
by coloring it black). Your task is to cover the remaining 4n − 1 squares
with triominos, L-shaped tiles consisting of three squares in the following
fashion. The triominos can be rotated by any multiple of 90 deg (Fig-
ure 10.1).
When tiling a 2n × 2n grid, it is not immediately clear how the divide and
conquer principle can be used. To be applicable, we must be able to reduce
the problem into smaller instances of the same problem and combine them.
The peculiar side length 2n does hint about a possible solution. Aside from
the property that 2n · 2n − 1 is evenly divisible by 3 (a necessary condition for
a tiling to be possible), it also gives us a natural way of splitting an instance,
namely into its 4 quadrants.
10.1. INDUCTIVE CONSTRUCTIONS 171
Each of these have the size 2n−1 × 2n−1 , which is also of the form we require of
grids in the problem. The crux lies in that these four new grids do not comply
with the input specification of the problem. While smaller and disjoint, three
of them contain no black square, a requirement of the input. Indeed, a grid of
this size without any black squares can not be tiled using triominos.
The solution lies in the trivial solution to the n = 1 case, where we can easily
reduce the problem to four instances of the n = 0 case:
In the solution, we use a single triomino which blocked a single square of each
of the four quadrants. This gives us four trivial subproblems of size 1 × 1,
where each grid has one blocked square. We can actually place such a triomino
in every grid by placing it in the center (the only place where a triomino may
cover three quadrants at once).
After this transformation, we can now apply the divide and conquer principle.
We split the grid into its four quadrants, each of which now contain one black
square. This allows us to recursively solve four new subproblems. At some
point, this recursion will finally reach the base case of a 1 × 1 square, which
172 CHAPTER 10. DIVIDE AND CONQUER
Figure 10.5: Placing a triomino in the corners of the quadrants without a black
square.
tile(N − 1, blocked[2], Tx , Ty )
tile(N − 1, blocked[3], Tx + mid, Ty )
The time complexity of the algorithm can be computed easily if we use the
fact that each call to tile only takes Θ(1) time except for the four recursive calls.
n
Furthermore, each call places exactly one tile on the board. Since there are 4 3−1
n
tiles to be placed, the time complexity must be Θ(4 ). This is asymptotically
optimal, since this is also the size of the output.
Exercise 10.1
It is possible to tile such a grid with triominos colored red, blue and green
such that no two triominos sharing an edge have the same color. Prove this
fact, and give an algorithm to generate such a coloring.
Divisible Subset
Let n = 2k . Given a set A of 2n − 1 integers, find a subset S of size exactly
n such that X
x
x∈S
is a multiple of n.
Input
The input contains an integer 1 ≤ n ≤ 215 that is a power of two, followed
by the 2n − 1 elements of A.
Output
Output the n elements of S.
When given a problem, it is often a good idea to solve a few small cases by
hand. This applies especially to this kind of construction problems, where
constructions for small inputs often shows some pattern or insight into how to
solve larger instances. The case n = 1 is not particularly meaningful, since it is
trivially true (any integer is a multiple of 1). When n = 2, we get an insight
which might not seem particularly interesting, but is key to the problem. We
174 CHAPTER 10. DIVIDE AND CONQUER
are given 2 · 2 − 1 = 3 numbers, and seek two numbers whose sum is even.
Given three numbers, it must have either two numbers which both are even,
or two odd numbers. Both of these cases yield a pair with an even sum.
It turns out that this construction generalizes to larger instances. Generally, it
is easier to do the “divide” part of a divide and conquer solution first, but in
this problem we will do it the other way around. The recursion will follow
quite naturally after we attempt to find a way in combining solutions to the
smaller instance to a larger one.
We will lay the the ground work for a reduction of the case 2n to n. First,
assume that we could solve the problem for a given n. The larger instance then
contains 2(2n − 1) = 4n − 1 numbers, of which we seek 2n numbers whose
sum is a multiple of 2n. This situation is essentially the same as for the case
n = 2, except everything is scaled up by n. Can we scale our solution up as
well?
If we have three sets of n numbers whose respective sums are all multiples
of n, we can find two sets of n numbers whose total sum is divisible by
2n. This construction essentially use the same argument as for n = 2. If the
three subsets have sums an, bn, cn and we wish to find two whose sum is a
multiple of 2n, this is the same as finding two numbers of a, b, c whose sum is
a multiple of 2. This is possible, according to the case n = 2.
A beautiful generalization indeed, but we still have some remnants of wishful
thinking we need to take care of. The construction assumes that, given 4n − 1
numbers, we can find three sets of n numbers whose sum are divisible by n.
We have now come to the recursive aspect of the problem. By assumption, we
could solve the problem for n. This means we can pick any 2n − 1 of our 4n − 1
numbers to get our first subset. The subset uses up n of our 4n − 1 numbers,
leaving us with only 3n − 1 numbers. We keep going, and pick any 2n − 1 of
these numbers and recursively get a second subset. After this, 2n − 1 numbers
are left, exactly how many we need to construct our third subset.
The division of the problem was thus into four parts. Three subsets of n
numbers, and one set of n − 1 which we throw away. Coming up with such
a division essentially required us to solve the combination part first with the
generalizing of the case n = 2.
Exercise 10.2
What happens if we, when solving the problem for some k, construct k − 1
pairs of integers whose sum are even, throw away the remaining element,
and scale the problem down by 2 instead? What is the complexity then?
Exercise 10.3
The problem can be solved using a similar divide and conquer algorithm
for any k, not just those which are powers of two1 . In this case, those k
which are prime numbers can be treated as base cases. How is this done
for composite k? What is the complexity?
Exercise 10.4
The knight piece in chess can move in 8 possible ways (moving 2 steps in
any one direction, and 1 step in one of the two perpendicular directions). A
closed tour exists for an 8 × 8 grid.
Exercise 10.5
An n-bit Gray code is a sequence of all 2n bit strings of length n, such that
two adjacent bit strings differ in only one position. The first and last strings
of the sequence are considered adjacent. Possible Gray codes for the first
few n are
n = 1: 0 1
n = 2: 00 01 11 10
n = 3: 000 010 110 100 101 111 011 001
Give an algorithm to construct an n-bit Gray code for any n.
Merge sort is a sorting algorithm which uses divide and conquer. It is rather
straightforward, and works by recursively sorting smaller and smaller parts of
the array. When sorting an array by dividing it into parts and combining their
solution, there is an obvious candidate for how to perform this partitioning.
Namely, splitting the array into two halves and sorting them. When splitting
an array in half repeatedly, we will eventually reach a rather simple base case.
An array containing a single element is already sorted, so it is trivially solved.
If we do so recursively, we get the recursion tree in Figure 10.7. Coding this
recursive split is easy.
When we have sorted the two halves, we need to combine them to get a sorted
version of the entire array. The procedure to do this is based on a simple
insight. If an array A is partitioned into two smaller arrays P1 and P2 , the
smallest value of A must be either the smallest value of P1 or the smallest
value of P2 . This insight gives rise to a simple iterative procedure, where we
repeatedly compare the smallest values of P1 and P2 , extract the smaller one
of them, and append it to our sorted array.
178 CHAPTER 10. DIVIDE AND CONQUER
5 1 6 3 7 2 0 4
5 1 6 3 7 2 0 4
5 1 6 3 7 2 0 4
5 1 6 3 7 2 0 4
Figure 10.7: The recursion tree given when performing a recursive split of the
array [5, 1, 6, 3, 7, 2, 0, 4].
To compute the complexity, consider the recursion tree in Figure 10.7. We make
one call with 8 elements, two calls with 4 elements, and so on. Further, the
combining procedure takes Θ(l) time for a call with l elements. In the general
10.3. BINARY SEARCH 179
X
k
2i · Θ(2k−i ) = Θ(k2k )
i=0
Exercise 10.7
Our complexity analysis assumed that the length of the array is a power of
2. The complexity is the same in the general case. Prove this fact.
Exercise 10.8
Given an array A of size n, we call the pair i < j an inversion of A if A[i] >
A[j]. Adapt the merge sort algorithm to count the number of inversions of
an array in Θ(n log n).
lo hi
f (x)
mid
L
x
lo hi
f (x)
mid
L
x
lo hi
f (x)
mid
Competitive Tip
hi − lo
log2 ≤c
p
For example, if we have an interval of size 109 which we wish to binary
search down to 10−7 , this would require log2 1016 = 54 iterations of binary
search.
Now, let us study some applications of binary search.
This problem is not only an optimization problem, but a monotone one. The
monotonicity lies in that while we only ask for a certain maximal hot dog
length, all lengths below it would also work (in the sense of being able to have
M hot dogs cut of this length), while all lengths above the maximal length
produce less than M hot dogs. Monotone optimization problems makes it
possible to remove the optimization aspect by inverting the problem. Instead
of asking ourselves what the maximum length is, we can instead ask how many
hot dogs f(x) can be constructed from a given length x. After this inversion,
the problem is now on the form which binary search solves: we wish to find
the greatest x such that f(x) = M (replacing ≤ with = is equivalent in the cases
where we know that f(x) assume the value we are looking for). We know that
this length is at most maxi ai ≤ 106 , which gives us the interval (0, 106 ] to
search in.
What remains is to actually compute the function f(x). In our case, this can
be done by considering just a single rod. If we want to construct hot dogs of
length x, we can get at most b axi c hot dogs from a rod of length ai . Summing
this for every rod gives us our solution.
The key to our problem was that the number of hot dogs constructible with a
length x was monotonically decreasing with x. It allowed us to perform binary
184 CHAPTER 10. DIVIDE AND CONQUER
Competitive Tip
When binary searching over discrete domains, care must be taken. Many
bugs have been caused by improper binary searches2 .
The most common class of bugs is related to the endpoints of your interval
(i.e. whether they are inclusive or exclusive). Be explicit regarding this,
and take care that each part of your binary search (termination condition,
midpoint selection, endpoint updates) use the same interval endpoints.
Binary search can also be used to find all points where a monotone function
changes value (or equivalently, all the intervals on which a monotone function
is constant). Often, this is used in problems on large sequences (often with
n = 100 000 elements), which can be solved by iterating through all contiguous
sub-sequences in Θ(n2 ) time.
Or Max
Petrozavodsk Winter Training Camp 2015
Given is an array A of integers. Let
i.e. the bitwise or of the k consecutive numbers starting with the i’th,
2 In fact, for many years the binary search in the standard Java run-time had a bug: http:
//bugs.java.com/bugdatabase/view_bug.do?bug_id=6412541
186 CHAPTER 10. DIVIDE AND CONQUER
i.e. the maximum of the k consecutive numbers starting with the i’th, and
As an example, consider the array in Figure 10.9. The best answer for k = 1
would be S(0, 1), with both maximal element and bitwise or 5, totaling 10. For
k = 2, we have S(6, 2) = 7 + 4 = 11.
i=0 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9
5 1 4 2 2 0 4 3 1 2
101 001 100 010 010 000 100 011 001 010
Figure 10.9: Example array, with the numbers additionally written in binary.
This problem can easily be solved in Θ(n2 ), by computing every S(i, k) itera-
tively. We can compute all the B(i, k) and M(i, k) using the recursions
0 if k = 0
B(i, k) :=
B(i, k − 1) | A[i + k − 1] if k > 0
0 if k = 0
M(i, k) =
max{M(i, k − 1), A[i + k − 1]} if k > 0
by looping over k, once we fix an i. With n = 100 000, this approach is too
slow.
10.3. BINARY SEARCH 187
The difficulty of the problem lies in S(i, k) consisting of two basically unre-
lated parts – the maximal element and the bitwise or of a segment. When
maximizing sums of unrelated quantities that put constraints on each other,
brute force often seems like a good idea. This is basically what we did in the
Buying Books problem (Section 7.4), where we minimized the sum of two
parts (postage and book costs) which constrained each other (buying a book
forced us to pay postage to its store) by brute forcing over one of the parts
(the set of stores to buy from). Since the bitwise or is much more complicated
than the maximal element – it is decided by an entire interval rather than a
single element – we are probably better of doing brute force over the maximal
element. Our brute force will consist of fixing which element is our maximal
element, by assuming that A[m] is the maximal element.
With this simplification in hand, only the bitwise or remains. We could now
solve the problem by looping over all the left endpoints of the interval and
all the right endpoints of the interval. At a first glance, this seems to actually
worsen the complexity. Indeed, this takes quadratic time for each m (on
average), resulting in a cubic complexity.
This is where we use our new technique. It turns out that, once we fix m, there
are only a few possible values for the bitwise or of the intervals containing
the m’th element, Any such interval A[l], A[l + 1], ..., A[m − 1], A[m], A[m +
1], ..., A[r − 1], A[r] can be split into two parts: one to the left, A[l], A[l +
1], ..., A[i − 1], A[i], and one to the right, A[i], A[i + 1], ..., A[r − 1], A[r]. The
bitwise or of either of these two parts is actually a monotone function (in their
length), and can only assume at most 16 different values!
i=0 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9
101 001 100 010 010 000 100 011 001 010
111 111 110 110 110 100 100 111 111 111
7 7 6 6 6 4 4 7 7 7
Figure 10.10: The bitwise or of the left and right parts, with an endpoint in
m=6
Studying Figure 10.10 gives a hint about why. The first row shows the binary
188 CHAPTER 10. DIVIDE AND CONQUER
values of the array, with m = 6 (our presumed maximal element) marked. The
second row shows the binary values of the bitwise or of the interval [i, m] or
[m, i] (depending on whether m is the right or left endpoint). The third line
shows the decimal values of the second row.
For example, when extending the interval [2, 6] (with bitwise or 110) to the
left, the new bitwise or will be 110|001. This is the only way the bitwise or
can change – when the new value includes bits which so far have not been set.
Obviously, this can only happen at most 16 times, since the values in A are
bounded by 216 .
For a given m, this gives us a partition of all the elements, by the bitwise or
of the interval [m, i]. In Figure 10.10, the left elements will be partitioned into
[0, 1], [2, 4], [5, 6]. The right elements will be partitioned into [6, 6], [7, 9]. These
partitions are everything we need to compute the final.
For example, if we pick the left endpoint from the part [2, 4] and the right
endpoint from the part [7, 9], we would get a bitwise or that is 6 | 7 = 7, of a
length between 4 and 8, together with the 4 as the presumed maximal element.
For each maximal element, we get at most 16 · 16 such choices, totaling less
than 256N such choices. From these, we can compute the final answer using a
simple sweep line algorithm.
Polynomial Multiplication Pn
Given two n-degree polynomials (where n can be large) p(x) = i=0 xi ai
10.4. KARATSUBA’S ALGORITHM 189
Pn
and q(x) = i=0 xi bi compute their product
X
2n X
i
(pq)(x) = xi ( aj bi−j )
i=0 j=0
(pl (x)+pr (x))(ql (x)+qr (x)) = pl (x)ql (x)+pl (x)qr (x)+pr (x)ql (x)+pr (x)qr (x)
so that
pl (x)qr (x)+pr (x)ql (x) = (pl (x)+pr (x))(ql (x)+pr (x))−pl (x)ql (x)−pr (x)qr (x)
This means we only need to compute three k-degree multiplications: (pl (x) +
pr (x))(ql (x) + qr (x)), pl (x)ql (x), pr (x), qr (x) Our time complexity recurrence
is then reduced to T (n) = 3T ( n 2 ) + O(n), which by the master theorem is
Θ(nlog2 3 ) ≈ Θ(n1.585 ).
190 CHAPTER 10. DIVIDE AND CONQUER
Exercise 10.10
Polynomial Multiplication 2 – polymul2
Data Structures
1 The Dynamic Connectivity problem where edges may also be removed can be solved by a
data structured called a Link-Cut tree, which is not discussed in this book.
191
192 CHAPTER 11. DATA STRUCTURES
This problem can easily be solved in O(Q2 ), by after each query performing a
DFS to partition the graph into its connected components. Since the graph has
at most Q edges, a DFS takes O(Q) time.
Finally, we will use a common trick to improve the complexity to O(V log V +
Q) instead. Whenever we merge two components of size a and b, we can do
this in O(min(a, b)) instead of O(a + b) by merging the smaller component
into the larger component. Any individual vertex can be part of the smaller
component at most O(log V) times. Since the total size of a component is
always at least twice the size of the smaller component, this means that if a
vertex is merged as part of the smaller component k times the new component
size must be at least 2k . This cannot exceed V, meaning k ≤ log2 V. If we sum
this up for every vertex, we arrive at the O(V log V + Q) complexity.
11.1. DISJOINT SETS 193
1 struct DisjointSets {
2
3 vector<vector<int>> components;
4 vector<int> comp;
5 DisjointSets(int elements) : components(elements), comp(elements) {
6 iota(all(comp), 0);
7 for (int i = 0; i < elements; ++i) components[i].push_back(i);
8 }
9
10 void unionSets(int a, int b) {
11 a = comp[a]; b = comp[b];
12 if (a == b) return;
13 if (components[a].size() < components[b].size()) swap(a, b);
14 for (int it : components[b]) {
15 comp[it] = a;
16 components[a].push_back(it);
17 }
18 }
19
20 };
1 struct DisjointSets {
2
3 vector<int> comp;
4 DisjointSets(int elements) : comp(elements, -1) {}
5
6 void unionSets(int a, int b) {
7 a = repr(a); b = repr(b);
8 if (a == b) return;
However, it turns out we can sometimes perform many merges at once. Con-
sider the case where we have k merges lazily stored for some vertex v. Then,
we can perform all the merges of v, comp[v], comp[comp[v]], . . . at the same time
since they all have the same representative: the representative of v.
1 int repr(int x) {
2 if (comp[x] < 0) return x;
3 int par = comp[x];
4 comp[x] = repr(par);
5 return comp[x];
6 }
Interval Sum
Given a sequence of integers a0 , a1 , . . . , aN−1 , you will be given Q queries
of the form [L, R). For each query, compute S(L, R) = aL +aL+1 +· · ·+aR−1 .
Computing the sums naively would require Θ(N) worst-case time per query
if the intervals are large, for a total complexity of Θ(NQ). If Q = Ω(N) we
can improve this to Θ(N2 + Q) by precomputing all the answers. To do this in
quadratic time, we use the recurrence
0 if L = R
S(L, R) =
S(L, R − 1) + aR−1 otherwise
Using this recurrence we can compute the sequence S(L, L), S(L, L + 1), S(L, L +
2), . . . , S(L, N) in average Θ(N) time for every L. This gives us the Θ(N2 + Q)
complexity.
If the function we are computing has an inverse, we can speed this pre-
computation up a bit. Assume that we have computed the values P(R) =
a0 + a1 + · · · + aR−1 , i.e. the prefix sums of ai . Since this function is invertible
(with inverse −P(R)), we can compute S(L, R) = P(R) − P(L). Basically, the
interval [L, R) consists of the prefix [0, R) with the prefix [0, L) removed. As
addition is invertible, we could simply remove the latter prefix P(L) from the
prefix P(R) using subtraction. Indeed, expanding this expression shows us
that
P(R) − P(L) = (a0 + a1 + · · · + aR−1 ) − (a0 + a1 + · · · + aL−1 )
= aL + · · · + aR−1 = S(L, R)
196 CHAPTER 11. DATA STRUCTURES
Exercise 11.1
The above technique does not work straight-off for non-commutative oper-
ations. How can it be adapted to this case?
The case where a function does not have an inverse is a bit more difficult.
Interval Minimum
Given a sequence of integers a0 , a1 , . . . , aN−1 , you will be given Q queries
of the form [L, R). For each query, compute the value
to check in half. We can take it one step further, by using this information
to precompute the minimum of all subarrays of four elements, by taking the
minimum of two pairs. By repeating this procedure for very power of two,
we will end up with a table m[l][i] containing the minimum of the interval
[l, l + 2i ), computable in Θ(N log N).
9 }
10 return ans;
11 }
This is Θ((N + Q) log N) time, since the preprocessing uses Θ(N log N) time
and each query requires Θ(log Q) time. This structure is called a Sparse Table,
or sometimes just the Range Minimum Query data structure.
We can improve the query time to Θ(1) by using that the min operation is
idempotent, meaning that min(a, a) = a. Whenever this is the case (and the
operation at hand is commutative), we can use just two intervals to cover the
entire interval. If 2k is the largest power of two that is at most R−L, then
[L, L + 2k )
[R − 2k , R)
covers the entire interval.
1 int rangeMinimum(const vector<vi>& table, int L, int R) {
2 int maxLen = 31 - __builtin_clz(b - a);
3 return min(jmp[maxLen][L], jmp[maxLen][R - (1 << maxLen)]);
4 }
While most functions either have inverses (so that we can use the prefix
precomputation) or has idempotent (so that we can use the Θ(1) sparse table),
some functions do not. In such cases (for example matrix multiplication), we
must use the logarithmic querying of the sparse table.
The most interesting range queries occur on dynamic sequences, where values
can change.
5 1 6 3 7 2 0 4
5 1 6 3 7 2 0 4
5 1 6 3 7 2 0 4
5 1 6 3 7 2 0 4
In the first case, we are done (and respond with the sum of the current interval).
In the second case, we perform a recursive call on the half of the interval that
the query lies in. In the third case, we make the same recursive construction
for both the left and the right interval.
Since there is a possibility we perform two recursive calls, we might think
that the worst-case complexity of this query would be Θ(N) time. However,
the calls that the third call results in will have a very specific form – they will
always have one endpoint in common with the interval in the tree. In this
case, the only time the recursion will branch is to one interval that is entirely
contained in the query, and one that is not. The first call will not make any
further calls. All in all, this means that there will me at most two branches of
logarithmic height, so that queries are O(log N).
Graph Algorithms
Graph theory is probably the richest of all algorithmic areas. You are almost
guaranteed to see at least one graph problem in any given contest, so it is
important to be well versed in the common algorithms that operate on graphs.
The most important such algorithms concern shortest paths from some vertex,
which will also be our first object of study.
One of the most common basic graph algorithms is the breadth-first search.
Its main usage is to find the distances from a certain vertex in an unweighted
graph:
For simplicity, we will first consider the problem on a grid graph, where
the unit squares constitute vertices, and vertices which share an edge are
203
204 CHAPTER 12. GRAPH ALGORITHMS
Let us solve this problem inductively. First of all, what vertices have distance 0?
Clearly, this can only be the source vertex s itself. This seems like a reasonable
base case, since the problem is about shortest paths from s. Then, what vertices
have distance 1? These are exactly those with a path consisting of a single edge
from s, meaning they are the neighbors of s (marked in Figure 12.2.
1 s
1
4
2 1 s 3 2 1 s 3 2 1 s
2 1 2 3 2 1 2 4 3 2 1 2
Exercise 12.1
Use the BFS algorithm to compute the distance to every square in the
following grid:
Figure 12.4
Exercise 12.2
Prove that the shorter way of coding the BFS loop (Algorithm 12.2) is
equivalent to the longer version (Algorithm 12.1).
In many problems the task is to find a shortest path between some pair of
vertices where the graph is given implicitly:
8-puzzle
In the 8-puzzle, 8 tiles are arranged in a 3 × 3 grid, with one square left
empty. A move in the puzzle consists of sliding a tile into the empty
square. The goal of the puzzle is to perform some moves to reach the
target configuration. The target configuration has the empty square in the
bottom right corner, with the numbers in order 1, 2, 3, 4, 5, 6, 7, 8 on the
three lines.
8 6 8 6 1 2 3
7 1 4 7 1 4 4 5 6
2 5 3 2 5 3 7 8
In such a problem, most of the code is usually concerned with the represen-
tation of a state as a vertex, and generating the edges that a certain vertex is
adjacent to. When an implicit graph is given, we generally do not compute the
entire graph explicitly. Instead, we use the states from the problems as-is, and
generate the edges of a vertex only when it is being visited in the breadth-first
search. In the 8-puzzle, we can represent each state as a 3 × 3 2D-vector. The
difficult part is generating all the states that we can reach from a certain state.
With the edge generation in hand, the rest of the solution is a normal BFS,
slightly modified to account for the fact that our vertices are no longer num-
bered 0, . . . , V − 1. We can solve this by using e.g. maps instead.
12.2. DEPTH-FIRST SEARCH 209
Coast Length
KTH Challenge 2011 – Ulf Lundström
The residents of Soteholm value their coast highly and therefore want
to maximize its total length. For them to be able to make an informed
decision on their position in the issue of global warming, you have to help
them find out whether their coastal line will shrink or expand if the sea
level rises. From height maps they have figured out what parts of their
islands will be covered by water, under the different scenarios described
in the latest IPCC report on climate change, but they need your help to
calculate the length of the coastal lines.
You will be given a map of Soteholm as an N × M grid. Each square in the
grid has a side length of 1 km and is either water or land. Your goal is to
compute the total length of sea coast of all islands. Sea coast is all borders
between land and sea, and sea is any water connected to an edge of the
map only through water. Two squares are connected if they share an edge.
You may assume that the map is surrounded by sea. Lakes and islands in
lakes are not contributing to the sea coast.
12.2. DEPTH-FIRST SEARCH 211
Figure 12.6: Gray squares are land and white squares are water. The thick
black line is the sea coast.
We can consider the grid as a graph, where all the water squares are vertices,
and two squares have an edge between them if they share an edge. If we
surround the entire grid by an water tiles (a useful trick to avoid special cases
in this kind of grid problems), the sea consists exactly of those vertices that are
connected to these surrounding water tiles. This means we need to compute
the vertices which lie in the same connected component as the sea – a typical
DFS task1 . After computing this component, we can determine the coast length
by looking at all the squares which belong to the sea. If such a square share an
edge with a land tile, that edge contributes 1 km to the coast length.
1 const vpi moves = {pii(-1, 0), pii(1, 0), pii(0, -1), pii(0, 1)};
2
3 int coastLength(const vector<vector<bool>>& G) {
4 int H = sz(G) + 4;
5 W = sz(G[0]) + 4;
6 vector<vector<bool>> G2(H, vector<bool>(W, true));
7 rep(i,0,sz(G)) rep(j,0,sz(G[i])) G2[i+2][j+2] = G[i][j];
8 vector<vector<bool>> sea(H, vector<bool>(W));
9
10 function<void(int, int)> floodFill = [&](int row, int col) {
11 if (row < 0 || row >= H|| col < 0 || col >= W) return;
12 if (sea[row][col]) return;
1 This particular application of DFS, i.e. computing a connected area in a 2D grid, is called a
flood fill.
212 CHAPTER 12. GRAPH ALGORITHMS
13 sea[row][col] = true;
14 trav(move, moves) floodFill(row + move.first, col + move.second);
15 };
16 dfs(0, 0);
17
18 int coast = 0;
19 rep(i,1,sz(G)+1) rep(j,1,sz(G[0])+1) {
20 if (sea[i][j]) continue;
21 trav(move, moves) if (!sea[i + move.first][j + move.second]) coast++;
22 }
23 return coast;
24 }
The theory of computing shortest paths in the case of weighted graphs is a bit
more rich than the unweighted case. Which algorithm to use depends on three
factors:
• The number of vertices.
• Whether edge weights are non-negative or not.
• If we seek shortest paths only from a single vertex or between all pairs
of vertices.
There are mainly three algorithms used: Dijkstra’s Algorithm, the Bellman-Ford
algorithm, and the Floyd-Warshall algorithm.
12.3. WEIGHTED SHORTEST PATH 213
12.3.2 Bellman-Ford
When edges can have negative weights, the idea behind Dijkstra’s algorithm no
longer works. It is very much possible that a negative weight edge somewhere
214 CHAPTER 12. GRAPH ALGORITHMS
else in the graph could be used to construct a shorter path to a vertex which
was already marked as completed. However, the concept of relaxing an edge
is still very much applicable and allows us to construct a slower, inductive
solution.
Initially, we know the shortest distances to each vertex, assuming we are
allowed to traverse 0 edges from the source. Assuming that we know these
values when allowed to traverse up to k edges, can we find the shortest paths
that traverse up to k+1 edges? This way of thinking is similar to how to solved
the BFS problem. Using a BFS, we computed the vertices at distance d + 1
by taking the neighbours of the vertices at distance d Similarly, we can find
the shortest path to a vertex v that traverse up to k + 1 edges, by attempting
to extend the shortest paths using k edges from the neighbours of v. Letting
D(k, v) be the shorter distance to v by traversing up to k edges, we arrive at
the following recursion:
0 if v = s
D(k, v) = min D(k − 1, v) if k > 0
mine=(u,v)∈E D(k − 1, u) + W(e) if k > 0
All in all, the states D(v, i) for a particular i takes time Θ(|E|) to evaluate.
To compute the distance d(s, v), we still need to know what the maximum
possible k needed to arrive at this shortest path could be. It turns out that this
could potentially be infinite, in the case where the graph contains a negative-
weight cycle. Such a cycle can be exploited to construct arbitrarily short
paths.
12.3. WEIGHTED SHORTEST PATH 215
Exercise 12.5
Bellman-Ford can be adapted to instead use only Θ(V) memory, by only
keeping a current know shortest path and repeatedly relaxing every edge.
Sketch out the pseudo code for such an approach, and prove its correctness.
Exercise 12.6
We may terminate Bellman-Ford earlier without loss of correctness, in case
D[k] = D[k − 1]. How can this fact be used to determine whether the graph
in question contains a negative-weight cycle?
12.3.3 Floyd-Warshall
Initially, the distance matrix D contains the distances of all the edges in E, so
that D[i][j] is the weight of the edge (i, j) if such an edge exists, ∞ if there is no
edge between i and j or 0 if i = j. Note that if multiple edges exists between
i and j, D[i][j] must be given the minimum weight of them all. Additionally,
if there is a self-loop (i.e. an edge from i to i itself) of negative weight, D[i][i]
must be set to this value.
To see why this approach works, we can use the following invariant proven
by induction. After the k’th iteration of the loop, D[i][j] will be at most the
minimum distance between i and j that uses vertices 0, 1, . . . , k − 1. Assume
that this is true for a particular k. After the next iteration, there are two cases
for D[i][j]. Either there is no shorter path using vertex k than those using
only vertices 0, 1, . . . , k − 1. In this case, D[i][j] will fulfill the condition by
the induction assumption. If there is a shorter path between i and j if we use
the vertex k, this must have length D[i][k] + D[k][j], since D[i][k] and D[k][j]
both contain the shortest paths between i and k, and k and j using vertices
0, 1, . . . , k − 1. Since we set D[i][j] = min(D[i][j], D[i][k] + D[k][j]) in the inner
loop, we will surely find this path too in this iteration. Thus, the statement is
true after the k + 1’th iteration too. By induction, it is true for k = |V|, meaning
D[i][j] contains at most the minimum distance between i and j using any vertex
in the graph.
4 D 4 D
B F B F
3 3
1 1
1 1
2 C 2 C
5
A A
2 2
5
E E
Mainly two algorithms are used to solve the Minimum Spanning Tree (MST)
problem: either Kruskal’s Algorithm which is based on the Union-Find data
structure, or Prim’s Algorithm which is an extension of Dijkstra’s Algorithm.
We will demonstrate Kruskal’s, since it is by far the most common of the
two.
Kruskal’s algorithm is based on a greedy, incremental approach that reminds
us of the Scheduling problem (Section 8.3). In the Scheduling problem, we
tried to find any interval that we could prove to be part of an optimal solution,
by considering various extremal cases. Can we do the same thing when finding
a minimum spanning tree?
First of all, if we can always find such an edge, we are essentially done. Given
an edge (a, b) that we know is part of a minimum spanning tree, we can
contract the two vertices a and b to a single vertex ab, with all edges adjacent
to the two vertices. Any edges that go between contracted vertices are ignored.
An example of this process in action be be seen in Figure 12.8. Note how the
contraction of an edge reduces the problem to finding an MST in a smaller
graph.
A natural extremal case to consider is the edge with the minimum weight. After
all, we are trying to minimize the edge sum. Our proof is similar in structure
to the Scheduling problem as well by using a swapping argument.
Assume that a minimum-weight edge {a, b} with weight w is not part of any
218 CHAPTER 12. GRAPH ALGORITHMS
2 A 3 A
2 3
D E D E
1 1
2
2
2 2 2
2 B B
C C
A
3
D 2 E
1
2
2
2 B
C
2 2
D A 3 D A 3
E E
2 1 2 1
2
C B 2
C B
2 2
minimum spanning tree. Then, consider the spanning tree with this edge
appended. The graph will then contain exactly one cycle. In a cycle, any edge
can be removed while maintaining connectivity. This means that if any edge
{c, d} on this cycle have a weight w 0 larger weight than w, we can erase it. We
will thus have replaced the edge {c, d} by {a, d}, while changing the weight
of the tree by w − w 0 < 0, reducing the sum of weights. Thus, the tree was
improved by using the minimum-weight edge, proving that it could have been
12.5. CHAPTER NOTES 219
Exercise 12.7
What happens if all edges on the cycle that appears have weight w? Is this
a problem for the proof?
When implementing the algorithm, the contraction of the edge added to the
minimum spanning tree is generally not performed explicitly. Instead, a
disjoint set data structure is used to keep track of which subsets of vertices have
been contracted. Then, all the original edges are iterated through in increasing
order of weight. An edge is added to the spanning tree if and only if the two
endpoints of the edge are not already connected (as in Figure 12.8).
The complexity of this algorithm is dominated by the sorting (which is O(E log V)),
since operations on the disjoint set structure is O(log V).
Maximum Flows
This chapter studies so called flow networks, and algorithms we use to solve the
so-called maximum flow and minimum cut problems on such networks. Flow
problems are common algorithmic problems, particularly in ICPC competitions
(while they are out-of-scope for IOI contests). They are often hidden behind
statements which seem unrelated to graphs and flows, especially the minimum
cut problem.
Informally, a flow network is a directed graph that models any kind of network
where paths have a fixed capacity, or throughput. For example, in a road
network, each road might have a limited throughput, proportional to the
number of lanes on the road. A computer network may have different speeds
along different connections due to e.g. the type of material. These natural
models are often use when describing a problem that is related to flows. A
more formal definition is the following.
221
222 CHAPTER 13. MAXIMUM FLOWS
10
c d
5 3
S 2 T
2 7
6 8
a b
6
In such a network, we can assign another value to each edge, that models the
current throughput (which generally does not need to match the capacity).
These values are what we call flows.
In a computer network, the flows could e.g. represent the current rate of
transfer through each connection.
13.1. FLOW NETWORKS 223
Exercise 13.1
Prove that the size of a given flow also equals
X X
f(v) − f(v)
v∈in(T ) v∈out(T )
i.e. the excess flow out from S must be equal to the excess flow in to T .
In Figure 13.2, flows have been added to the network from Figure 13.1.
1/10
c d
1/5 3/3
S 1/2 T
1/2 0/7
6/6 5/8
a b
5/6
Figure 13.2: An example flow network, where each edge has an assigned flow.
The size of the flow is 8.
Given such a flow, we are generally interested in determining the flow of the
largest size. This is what we call the maximum flow problem. The problem is
not only interesting on its own. Many problems which we study might initially
seem unrelated to maximum flow, but will turn out to be reducible to finding
a maximum flow.
Maximum Flow
Given a flow network (V, E, c, S, T ), construct a maximum flow from S to
T.
Input
A flow network.
Output
224 CHAPTER 13. MAXIMUM FLOWS
Output the maximal size of a flow, and the flow assigned to each edge in
one such flow.
Exercise 13.2
The flow of the network in Figure 13.2 is not maximal – there is a flow of
size 9. Find such a flow.
13.2 Edmonds-Karp
There are plenty of algorithms which solve the maximum flow problem. Most
of these are too complicated to be implemented to be practical. We are going
to study two very similar classical algorithms that computes a maximum flow.
We will start with proving the correctness of the Ford-Fulkerson algorithm.
Afterwards, a modification known as Edmonds-Karp will be analyzed (and
found to have a better worst-case complexity).
For each edge, we define a residual flow r(e) on the edge, to be c(e) − f(e). The
residual flow represents the additional amount of flow we may push along an
edge.
In Ford-Fulkerson, we associate every edge e with an additional back edge b(e)
which points in the reverse order. Each back edge is originally given a flow and
capacity 0. If e has a certain flow f, we assign the flow of the back-edge b(e) to
be −f (i.e. f(b(e)) = −f(e). Since the back-edge b(e) of e has capacity 0, their
residual capacity is r(b(e)) = c(b(e)) − f(b(e)) = 0 − (−f(e)) = f(e).
13.2. EDMONDS-KARP 225
Intuitively, the residual flow represents the amount of flow we can add to a
certain edge. Having a back-edge thus represents “undoing” flows we have
added to a normal edge, since increasing the flow along a back-edge will
decrease the flow of its associated edge.
9
c d
1 1 3
4
1
S 1
1 1 T
7 5
6 1 3
a b
5
Figure 13.3: The residual flows from the network in Figure 13.2.
Performing this kind of augmentation on an admissible flow will keep the flow
admissible. A path must have either zero or two edges adjacent to any vertex
(aside from the source and sink). One of these will be an incoming edge, and
one an outgoing edge. Increasing the flow of these edges by the same amount
conserves the equality of flows between in-edges and out-edges, meaning the
flow is still admissible.
This means that a flow can be maximum only if it contains no augmenting
paths. It turns out this is also a necessary condition, i.e. a flow is maximum if it
contains no augmenting path. Thus, we can solve the maximum flow problem
by repeatedly finding augmenting paths, until no more exists.
We will now study a number of problems which are reducible to finding a max-
imum flow in a network. Some of these problems are themselves considered
to be standard problems.
This is nearly the standard maximum flow problem, with the addition of
vertex capacities. We are still going to use the normal algorithms for maximum
flow. Instead, we will make some minor modifications to the network. The
additional constraint given is similar to the constraint placed on an edge. An
edge has a certain amount of flow passing through it, implying that the same
amount must enter and exit the edge. For this reason, it seems like a reasonable
approach to reduce the vertex capacity constraint to an ordinary edge capacity,
by forcing all the flow that passes through a vertex v with capacity Cv through
a particular edge.
If we partition all the edges adjacent to v into incoming and outgoing edges,
it becomes clear how to do this. We can split up v into two vertices vin and
vout , where all the incoming edges to v are now incoming edges to vin and
the outgoing edges instead become outgoing edges from vout . If we then add
an edge of infinite capacity from vin to vout , we claim that the maximum flow
of the network does not change. All the flow that passes through this vertex
must now pass through this edge between vin and vout . This construction
thus accomplish our goal of forcing the vertex flow through a particular edge.
We can now enforce the vertex capacity by changing the capacity of this edge
to Cv .
How can we find such a reduction? In general, we try to find some kind of
graph structure in the problem, and model what it “means” for an edge to have
flow pushed through it. In the bipartite matching problem, we are already
given a graph. We also have a target we wish to maximize – the size of the
matching – and an action that is already associated with edges – including
it in the matching. It does not seem unreasonable that this is how we wish
to model the flow, i.e. that we want to construct a network based on this
graph where pushing flow along one of the edges means that we include the
edge in the matching. No two selected edges may share an endpoint, which
brings only a minor complication. After all, this condition is equivalent to each
of the vertices in the graph having a vertex capacity of 1. We already know
how to enforce vertex capacities from the previous problem, where we split
each such vertex into two, one for in-edges and one for out-edges. Then, we
added an edge between them with the required capacity. After performing
this modification on the given graph, we are still missing one important part
of a flow network. The network does not yet have a source and sink. Since we
want flow to go along the edges, from one of the parts to another part of the
graph, we should place the source at one side of the graph and the sink at the
other, connecting the source to all vertices on one side and all the vertices on
the other side to the sink.
Now, consider any path cover of this new graph, where we ignore the added
edges. Each vertex is then adjacent to at most a single edge, since paths are
230 CHAPTER 13. MAXIMUM FLOWS
Exercise 13.3
The minimum path cover reduction can be modified slightly to find a
minimum cycle cover in a directed graph instead. Construct such a reduction.
Strings
14.1 Tries
The trie (also called a prefix tree) is the most common string-related data struc-
ture. It represents a set of words as a rooted tree, where every prefix of every
word is a vertex, with children from a prefix P to all strings Pc which are
also prefixes of a word. If two words have a common prefix, the prefix only
appears once as a vertex. The root of the tree is the empty prefix. The trie
is very useful when we want to associate some information with prefixes of
strings and quickly get the information from neighboring strings.
The most basic operation of the trie is the insertion of strings, which may be
implemented as follows.
231
232 CHAPTER 14. STRINGS
1 struct Trie {
2 map<char, Trie> children;
3 bool isWord = false;
4
5 void insert(const string& s, int pos) {
6 if (pos != sz(s)) children[s[pos]].insert(s, pos + 1);
7 else isWord = true;
8 }
9
10 };
We mark those vertices which corresponds to the inserted word using a boolean
flag isWord. Many problems essentially can be solved by very simple usage of
a trie, such as the following IOI problem.
Type Printer
International Olympiad in Informatics 2008
You need to print N words on a movable type printer. Movable type
printers are those old printers that require you to place small metal pieces
(each containing a letter) in order to form words. A piece of paper is then
pressed against them to print the word. The printer you have allows you
to do any of the following operations:
• Add a letter to the end of the word currently in the printer.
• Remove the last letter from the end of the word currently in the
printer. You are only allowed to do this if there is at least one letter
currently in the printer.
• Print the word currently in the printer.
Initially, the printer is empty; it contains no metal pieces with letters. At
the end of printing, you are allowed to leave some letters in the printer.
Also, you are allowed to print the words in any order you like. As ev-
ery operation requires time, you want to minimize the total number of
operations.
Your task is to output a sequence of operations that prints all the words
using the minimum number of operations needed.
14.1. TRIES 233
Input
The first line contains the number of words 1 ≤ N ≤ 25 000. The next N
lines contain the words to be printed, one per line. Each word is at most
20 letters long and consist only of lower case letters a-z. All words will be
distinct
Output
Output a sequence of operations that prints all the words. The operations
should be given in order, one per line, starting with the first. Adding a
letter c is represented by outputting c on a line. Removing the last letter of
the current word is represented by a -. Printing the current word is done
by outputting P.
Let us start by solving a variation of the problem, where we are not allowed to
leave letters in the printer at the end. First of all, are there actions that never
make sense? For example, what sequences of letters will ever appear in the
type writer during an optimal sequence of operations? Clearly we never wish
to input a sequence that is not a prefix of a word we wish to type. For example,
if we input abcdef and this is not a prefix of any word, we must at some point
erase the last letter f, without having printed any words. But then we can
erase the entire sequence of operations between inputting the f and erasing
the f, without changing what words we print.
On the other hand, every prefix of a word we wish to print must at some
point appear on the type writer. Otherwise, we would not be able to reach
the word we wish to print. Therefore, the partial words to ever appear on the
type printer are exactly the prefixes of the words we wish to print – strongly
hinting at a trie-based solution.
If we build the trie of all words we wish to print, it contains as vertices exactly
those strings which will appear as partial words on the printer. Furthermore,
the additions and removals of letters form a sequence of vertices that are
connected by edges in this trie. We can move either from a prefix P to a prefix
Pc, or from a prefix Pc to a prefix P, which are exactly the edges of a trie. The
goal is then to construct the shortest possible tour starting at the root of the
trie and passing through all the vertices of the trie.
Since a trie is a tree, any such trail must pass through every edge of the trie at
least twice. If we only passed through an edge once, we can never get back to
234 CHAPTER 14. STRINGS
the root since every edge disconnects the root from the endpoint of the edge
further away from the root. It is actually possible to construct a trail which
passes through every edge exactly twice (which is not particularly difficult if
you attempt this task by hand). As it happens, the depth-first search of a tree
passes through an edge exactly twice – once when first traversing the edge to
an unvisited vertex, and once when backtracking.
The problem is subtly different once we are allowed to leave some letters in the
printer at the end. Clearly, the only difference between an optimal sequence
when letters may remain and an optimal sequence when we must leave the
printer empty is that we are allowed to skip some trailing removal operations.
If the last word we print is S, the difference will be exactly |S| - operations.
An optimal solution will therefore print the longest word last, in order to “win”
as many - operations as possible. We would like this last word to be the
longest word of all the ones we print if possible. In fact, we can order our DFS
such that this is possible. First of all, our DFS starts from the root and the
longest word is s1 s2 . . . sn . When selecting which order the DFS should visit
the children of the root in, we can select the child s1 last. Thus, all words
starting with the letter s1 will be printed last. When visiting s1 , we use the
same trick and visit the child s1 s2 last of the children of s1 , and so on. This
guarantees S to be the last word to be printed.
Note that the solution requires no additional data to be stored in the trie – the
only modification to our basic trie is the DFS.
1 struct Trie {
2 ...
3
4 void dfs(int depth, const string& longest) {
5 trav(it, children)
6 if (it->first != longest[depth])
7 dfs2(depth, longest, it->first);
8 dfs2(depth, longest, longest[depth]);
9 }
10
11 void dfs2(int depth, const string& longest, char output) {
12 cout << output << endl;
13 if (isWord) cout << "P" << endl;
14 children[output]->dfs(depth + 1, longest);
15 if (longest[depth] != output) {
14.1. TRIES 235
Generally, the uses of tries are not this simple, where we only need to construct
the trie and fetch the answer through a simple traversal. We often need to
augment tries with additional information about the prefixes we insert. This is
when tries start to become really powerful. The next problem requires only a
small augmentation of a trie, to solve a problem which looks complex.
Rareville
In Rareville, everyone must have a distinct name. When a new-born baby
is to be given a name, its parents must first visit NAME, the Naming
Authority under the Ministry of Epithets, to have its name approved. The
authority has a long list of all names assigned to the citizens of Rareville.
When deciding whether to approve a name or not, a case officer uses the
following procedure. They start at the first name in the list, and read the
first letter of it. If this letter matches the first letter of the proposed name,
they proceed to read the next letter in the word. This is repeated for every
letter of the name in the list. After reading a letter from the word, the case
officer can sometime determine that this could not possibly be the same
name as the proposed one. This happens if either
• the next letter in the proposed name did not match the name in the
list
• there was no next letter in the proposed name
• there was no next letter in the name in the list
When this happen, the case officer starts over with the next name in the
list, until exhausting all names in the list. For each letter the case officer
reads (or attempts to read) from a name in the list, one second pass.
Currently, there are N people in line waiting to apply for a name. Can you
determine how long time the decision process will take for each person?
Input
236 CHAPTER 14. STRINGS
The first line contains integers 1 ≤ D ≤ 200 000 and 1 ≤ N ≤ 200 000, the
size of the dictionary and the number of people waiting in line. The next
D lines contains one lowercase name each, the contents of the dictionary.
The next N lines contains one lowercase name each, the names the people
in line wish to apply with. The total size of the lists is at most 106 letters.
Output
For each of the N names, output the time (in seconds) the case officer
needs to decide on the application.
The problem clearly relates to prefixes in some way. Given a dictionary word
A and an application for a name B, the case officer needs to read letters from
A corresponding to the longest common prefix of A and B, plus 1. Hence, our
solution will probably be to consider all the prefixes of each proposed name,
which is exactly what tries are good at.
Instead of thinking about this process one name a time, we use a common
trie technique and look at the transpose of this problem, i.e. for every i,
how many names Ci have a longest common prefix of length at least i when
handling the application for a name S? This way, we have transformed the
problem from being about D individual processes to |S| smaller problems
which treats the dictionary as unified group of strings. Then, we will have to
read C0 + C1 + · · · + C|S| letters.
Now, the solution should be clear. We augment the trie vertex for a particular
prefix p with the number of strings Pp in the list that start with this prefix.
Initially, an empty trie has Pp = 0 for every p. Whenever we insert a new word
W = w1 w2 . . . in the trie, we need to increment Pw1 , Pw1 w2 , . . . , to keep all
the Pp correct, since we have added a new string which have those prefixes.
Then, we have that Ci = Ps1 s2 ...si , so that we can compute all the numbers
Pi by following the word S in the trie. The construction of the trie is linear in
the number of characters we insert, and responding to a query is linear in the
length of the proposed name.
1 struct Trie {
2 map<char, Trie> children;
3 int P = 0;
14.2. STRING MATCHING 237
4
5 void insert(const string& s, int pos) {
6 P++;
7 if (pos != sz(s)) children[s[pos]].insert(s, pos + 1);
8 }
9
10 int query(const string& s, int pos) {
11 int ans = P;
12 if (pos != sz(s)) {
13 auto it = children.find(s[pos]);
14 if (it != children.end) ans += it->second.query(s, pos + 1);
15 }
16 return ans;
17 }
18 };
String Matching
Find all occurrences of the pattern P as a substring in the string W.
We can solve this problem naively in O(|W| · |P|). If we assume that an occur-
rence of P starts at position i in W, we can compare the substring W[i...i+|P|−1]
to P in O(|P|) time by looping through both strings, one character at a time.
k − l letters of the partial match, i.e. P[l . . . k − 1]. But a partial match is just
a prefix of P, so we must have P[l . . . k − 1] = P[0 . . . l − 1]. In other word, for
every given k, we must find the longest suffix of P[0 . . . K − 1] that is also a
prefix of P (besides P[0 . . . k − 1] itself, of course).
We can compute these suffixes rather easily in O(n2 ). For each possible po-
sition for the next possible match l, we perform a string matching to find all
occurrences of prefixes of P within P.
In each iteration of the loop, we see that either match is increased by one, or
match is decreased by match−T [match] and pos is increased by the same amount.
Since match is bounded by P and pos is bounded by |W|, this can happen at
most |W| + |P| times. Each iteration takes constant time, meaning our matching
is Θ(|W| + |P|) time.
While this is certainly better than the naive string matching, it is not particularly
helpful when |P| = Θ(|W|) since we need an O(|P|) preprocessing. The solution
lies in how we computed the table of suffix matches, or rather, the fact that
it is entirely based on string matching itself. We just learned how to use this
table to perform string matching in linear time. Maybe we can use this table
to extend itself and get the precomputation down to O(|P|)? After all, we are
looking for occurrences of prefixes of P in P itself, which is exactly what string
matching does. If we modify the string matching algorithm for this purpose,
we get what we need:
Using the same analysis as for the improved string matching, this precom-
putation is instead Θ(|P|). The resulting string matching then takes Θ(|P| +
|W|).
This string matching algorithm is called the Knuth-Morris-Pratt (KMP) algo-
rithm.
Competitive Tip
14.3 Hashing
Hashing is a concept most familiar from the hash table data structure. The
idea behind the structure is to compress a set S of elements from a large set
to a smaller set, in order to quickly determine memberships of S by having
a direct indexing of the smaller set into an array (which has Θ(1) look-ups).
In this section, we are going to look at hashing in a different light, as a way
of speeding up comparisons of data. When comparing two pieces of data
a and b of size n for equality, we need to use Θ(n) time in the worst case
since every bit of data must be compared. This is fine if we perform only a
242 CHAPTER 14. STRINGS
FriendBook
Swedish Olympiad in Informatics 2011, Finals
FriendBook is a web site where you can chat with your friends. For a long
time, they have used a simple “friend system” where each user has a list of
which other users are their “friends”. Recently, a somewhat controversial
feature was added, namely a list of your “enemies”. While the friend
relation will always be mutual (two users must confirm that they wish to
be friends), enmity is sometimes one-way – a person A can have an enemy
B, who – by plain animosity – refuse to accept A as an enemy.
Being a poet, you have lately been pondering the following quote.
A friend is someone who dislike the same people as yourself.
Given a FriendBook network, you wonder to what extent this quote ap-
plies. More specifically, for how many pairs of users is it the case that they
are either friends with identical enemy lists, or are not friends and does
not have identical enemy lists?
Input
The first line contains an integer 2 ≤ N ≤ 5000, the number of friends
on FriendBook. N lines follow, each containing n characters. The c’th
character on the r’th line Src species what relation person r has to person
c. This character is either
V – in case they are friends.
F – if r thinks of c as an enemy.
. – r has a neutral attitude towards c.
14.3. HASHING 243
This problem lends itself very well to hashing. It is clear that the problem is
about comparisons – indeed, we are to count the number of pairs of persons
who are either friends and have equal enemy lists or are not friends and have
unequal enemy lists. The first step is to extract the enemy lists Ei for each
person i. This will be a N-length string, where the j’th character is F if person
j is an enemy of person i, and . otherwise. Basically, we remove all the
friendships from the input matrix. Performing naive comparisons on these
strings would only give us a O(N3 ) time bound, since we need to perform N2
comparisons of enemy lists of length N bounded only by O(N) in the worst
case. Here, hashing comes to our aid. By instead computing hi = H(Ei ) for
every i, comparisons of enemy lists instead become comparisons of the integers
hi – a Θ(1) operation – thereby reducing the complexity to Θ(N2 ).
Alternative solutions exist. For example, we could instead have sorted all the
enemy lists, after which we can perform a partitioning of the lists by equality
in Θ(N2 ) time. However, this takes O(N2 log N) time with naive sorting (or
O(N2 ) if radix sort is used, but it is more complex) and is definitely more
complicated to code than the hashing approach. Another option is to insert all
the strings into a trie, simplifying this partitioning and avoiding the sorting
altogether. This is better, but still more complex. While it would have the same
complexity, the constant factor would be significantly worse compared to the
hashing approach.
This is a common theme among string problems. While most string problems
can be solved without hashes, solutions using them tend to be simpler.
The true power of string hashing is not this basic preprocessing step where
we can only compare two strings. Another hashing technique allows us to
compare arbitrary substring of a string in constant time.
Exercise 14.1
Prove the properties of Theorem 14.1
Exercise 14.2
How can we compute the hash of S||T in O(1) given the hashes of the strings
S and T ?
Properties 1-4 alone allow us to append and remove characters from the
beginning and end of a hash in constant time. We refer to this property as
polynomial hashes being rolling. This property allows us to String Matching
problem with a single pattern (Section 14.2) with the same complexity as
KMP, by computing the hash of the pattern P and then rolling a |P|-length hash
14.3. HASHING 245
through the string we are searching in. This algorithm is called the Rabin-Karp
algorithm.
Property 5 allows us to compute the hash of any substring of a string in con-
stant time, provided we have computed the hashes H(S1 ), H(s1 s2 ), . . . , H(s1 s2 . . . sn )
first. Naively this computation would be Θ(n2 ), but property 1 allows us to
compute them recursively, resulting in Θ(n) precomputation.
Radio Transmission
Baltic Olympiad in Informatics 2009
Given is a string S. Find the shortest string L, such that S is a substring of
the infinite string T = . . . LLLLL . . . .
Input
The first and only line of the input contains the string S, with 1 ≤ |S| ≤ 106 .
Output
Output the string L. If there are multiple strings L of the shortest length,
you can output any of them.
Assume that L has a particular length l. Then, since T is periodic with length
l, S must be too (since it is a substring of T ). Conversely, if S is periodic with
some length l, can can choose as L = s1 s2 . . . sl . Thus, we are actually seeking
the smallest l such that S is periodic with length l. The constraints this puts on
S are simple. We must have that
s1 = sl+1 = s2l+1 = . . .
s2 = sl+2 = s2l+2 = . . .
...
sl = s2l = s3l = . . .
Using this insight as-is gives us a O(|S|2 ) algorithm, where we first fix l and
then verify if those constraints hold. The idea is sound, but a bit slow. Again,
the problematic step is that we need to perform many slow, linear-time com-
parisons. If we look at what comparisons we actually perform, we are actually
comparing two substrings of S with each other:
s1 s2 . . . sn−l+1 = sl+1 sl+2 . . . sn
246 CHAPTER 14. STRINGS
1 H lh = 0, Rh = 0;
2 int l = 0;
3 for (int i = 1; i <= n; ++i) {
4 Lh = (Lh * p + S[i]) % M;
5 Rh = (S[n - i + 1] * p^(i - 1) + Rh) % M;
6 if (Lh == Rh) {
7 l = i;
8 }
9 }
10 cout << n - l << endl;
Output
For each query L, R, S, output a line with the answer to the query.
Let us focus on how to solve the problem where every query has the same
string S. In this case, we would first find which of the strings si that S is
contained in using polynomial hashing. To respond to a query, could for
example keep a set of all the i where si was an occurrence together with how
many smaller si contained the string (i.e. some kind of partial sum). This
would allow us to respond to a query where L = 1 using a upper bound in our
set. Solving queries of the form [1, R] is equivalent to general intervals however,
since the interval [L, R] is simply the interval
P [1, R] with the interval [1, L − 1]
removed. This procedure would take Θ( |si |) time to find the occurrences of
S, and O(Q log N) time to answer the queries.
When extending this to the general case where our queries may contain differ-
ent S, we do the same thing but instead find the occurrences of all the patterns
of the same length p simultaneously. This can be done by keeping the hashes
of those patterns in a map, √
to allow for fast look-up of our rolling hash. Since
there can only be at most 20 000 ≈ 140 different pattern lengths, we must
perform about 140 · 50 000 ≈ 7 000 000 set look-ups, which is feasible.
18 string S;
19 cin >> L >> R >> S;
20 queries.emplace_back(L, R, S);
21 patterns[sz(s)].insert(S);
22 }
23
24 map<H, set<pii>> hits;
25 trav(pat, patterns) {
26 rep(i,0,N) {
27 vector<H> hashes = rollHash(s[i], pat.first);
28 trav(h, hashes)
29 if (pat.second.count(h))
30 hits[h].emplace(i, sz(hits[h]) + 1);
31 }
32 }
33
34 trav(query, queries) {
35 H h = polyHash(get<2>(query));
36 cout << countInterval(R, hits[h]) - countInterval(L-1, hits[h]) << endl;
37 }
38 }
Exercise 14.3
Hashing can be used to determine which of two substrings are the lexico-
graphically smallest one. How? Extend this result to a simple Θ(n log S+S)
construction of a suffix array, where n is the number of strings and S is the
length of the string.
Until now, we have glossed over the choice of M and p in our polynomial
hashing. These choices happen to be important. First of all, we want M and p
to be relatively prime. This ensures p has an inverse modulo M, which we use
when erasing characters from the end of a hash. Additionally, pi mod M have
a smaller period when p and M share a factor.
We wish M to be sufficiently large, to avoid
√ hash collisions. If we compare
the hashes of c strings, we want M = Ω( c) to get a reasonable chance at
avoiding collisions. However, this depends on how we use hashing. p must
be somewhat large as well. If p is smaller than the alphabet, we get trivial
14.3. HASHING 249
Exercise 14.4
Prove that τ2i is a palindrome.
n(n+1)
Theorem 14.2 For a polynomial hash H with an odd p, 2 2 | H(τn ) −
H(τn ).
and n−1
H(τn ) = H(τn−1 ||τn−1 ) = p2 · H(τn−1 ) + H(τn−1 )
Then,
n−1
H(τn ) − H(τn ) = p2 (H(τn−1 ) − H(τn−1 )) + (H(τn−1 ) − H(τn−1 ))
n−1
= (p2 − 1)(H(τn−1 ) − H(τn−1 ))
This means that we can construct a string of length linear in the bit size of M
that causes hash collisions if we choose M as a power of 2, explaining why it
is a bad choice.
Surveillance
Swedish Olympiad in Informatics 2016, IOI Qualifiers
14.3. HASHING 251
a1,j − p1,1 = c
...
a1,j+n−1 − p1,n = c
Since c is arbitrary, this means the only condition is that
Combinatorics
Combinatorics deals with various discrete structures, such as graphs and per-
mutations. In this chapter, we will mainly study the branch of combinatorics
known as enumerative combinatorics – the art of counting. We will count the
number of ways to choose K different candies from N different candies, the
number of distinct seating arrangements around a circular table, the sum of
sizes of all subsets of a set and many more objects. Many combinatorial count-
ing problems are based on a few standard techniques which we will learn in
this chapter.
The addition principle states that, given a finite collection of disjoint sets
S1 , S2 , . . . , Sn , we can compute the size of the union of all sets by simply
adding up the sizes of our sets, i.e.
|S1 ∪ S2 ∪ · · · ∪ Sn | = |S1 | + |S2 | + · · · + |Sn |
Example 15.1 Assume we have 5 different types of chocolate bars (the set C),
3 different types of bubble gum (the set G), and 4 different types of lollipops
(the set L). These form three disjoint sets, meaning we can compute the total
253
254 CHAPTER 15. COMBINATORICS
Later on, we will see a generalization of the addition principle that handles
cases where our sets are not disjoint.
The multiplication principle, on the other hand, states that the size of the
Cartesian product S1 × S2 × · · · × Sn equals the product of the individual sizes
of these sets, i.e.
Example 15.2 Assume that we have the same sets of candies C, G and L
as in Example 15.1. We want to compose an entire dinner out of snacks,
by choosing one chocolate bar, one bubble gum and a lollipop. The mul-
tiplication principles tells us that, modeling a snack dinner as a tuple
(c, g, l) ∈ C × G × L, we can form our dinner in 5 · 3 · 4 = 60 ways.
Example 15.3 How many four letter words consisting of the letters a, b, c
and d contain exactly two letters a?
There are six possible ways to place the two letters a:
aa__
a_a_
a__a
_aa_
_a_a
__aa
For each of these ways, there are four ways of choosing the other two
letters (bb, bc, cb, cc). Thus, there are 4 + 4 + 4 + 4 + 4 + 4 = 6 · 4 = 24
such words.
15.1. THE ADDITION AND MULTIPLICATION PRINCIPLES 255
Let us now apply these basic principle sto solve the following problem:
Kitchen Combinatorics
Northwestern Europe Regional Contest 2015 – Per Austrin
The world-renowned Swedish Chef is planning a gourmet three-course
dinner for some muppets: a starter course, a main course, and a dessert.
His famous Swedish cook-book offers a wide variety of choices for each
of these three courses, though some of them do not go well together (for
instance, you of course cannot serve chocolate moose and sooted shreemp
at the same dinner).
Each potential dish has a list of ingredients. Each ingredient is in turn
available from a few different brands. Each brand is of course unique in its
own special way, so using a particular brand of an ingredient will always
result in a completely different dinner experience than using another brand
of the same ingredient.
Some common ingredients such as pølårber may appear in two of the
three chosen dishes, or in all three of them. When an ingredient is used
in more than one of the three selected dishes, Swedish Chef will use the
same brand of the ingredient in all of them.
While waiting for the meecaroo, Swedish Chef starts wondering: how
many different dinner experiences are there that he could make, by differ-
ent choices of dishes and brands for the ingredients?
Input
The input consists of:
• five integers r, s, m, d, n, where 1 ≤ r ≤ 1 000 is the number of
different ingredients that exist, 1 ≤ s, m, d ≤ 25 are the number of
available starter dishes, main dishes, and desserts, respectively, and
0 ≤ n ≤ 2 000 is the number of pairs of dishes that do not go well
together.
• r integers b1 , . . . , br , where 1 ≤ bi ≤ 100 is the number of different
brands of ingredient i.
• s + m + d dishes – the s starter dishes, then the m main dishes,
then the d desserts. Each dish starts with an integer 1 ≤ k ≤ 20
256 CHAPTER 15. COMBINATORICS
15.2 Permutations
123 132
213 231
312 321
Our first “real” combinatorial problem will be to count the number of per-
mutations of an n-element set S. When counting permutations, we use the
multiplication principle. We will show a procedure that can be used to con-
struct permutations one element at a time. Assume that the permutation is
the sequence ha1 , a2 , . . . , an i. The first element of the permutation, a1 , can
be assigned any of the n elements of S. Once this assignment has been made,
we have n − 1 elements we can choose to be a2 (any element of S except a1 ).
In general, when we are to select the (i + 1)’th value ai+1 of the permutation,
i elements have already been included in the permutation, leaving n − i op-
tions for ai+1 . Using this argument for all n elements of the sequence, we
can construct a permutation in n · (n − 1) · · · 2 · 1 ways (by the multiplication
principle).
This number is so useful that it has its own name and notation.
Y
n
n! = 1 · 2 · · · n = i
i=1
This sequence of numbers thus begin 1, 1, 2, 6, 24, 120, 720, 40 320, 362 880,
3 628 800, 39 916 800 for n = 0, 1, 2, . . . , 11. It is good to know the magnitudes
of these numbers, since they are frequent in time complexities when doing
brute force over permutations. Asypmtotically, the grow as nΘ(n) . More
258 CHAPTER 15. COMBINATORICS
√ n n
1
n! = 2πn 1+O
e n
Exercise 15.1
In how many ways can 8 persons be seated around a round table, if we
consider cyclic rotations of a seating to be different? What if we consider
cyclic rotations to be equivalent?
The word permutation has roots in Latin, meaning “to change completely”. We
are now going look at permutations in a very different light, which gives some
justification to the etymology of the word.
Given a set such as [5], we can fix some ordering of its elements such as
h1, 2, 3, 4, 5i. A permutation π = h1, 3, 4, 5, 2i of this set can then be seen as a
movement of these elements. Of course, this same movement can be applied
to any other 5-element set with a fixed permutation, such as ha, b, c, d, ei being
transformed to ha, c, d, e, bi. This suggests that we can consider permutation
as a “rule” which describes how to move – permute – the elements.
1 Named after James Stirling (who have other important combinatorial objects named after him
i 1 2 3 4 5
↓ ↓ ↓ ↓ ↓
π(i) 1 3 4 5 2
Since each element is mapped to a different element, the function induced by
a permutation is actually a bijection. By interpreting permutations as function,
all the theory from functions apply to permutations too.
We call h1, 2, 3, 4, . . . , ni the identity permutation, since the function given by the
identity permutation is actually the identity function. As a function, we can
also consider the composition of two permutations. Given two permutations, α
and β, their composition αβ is also a permutation, given by αβ(k) = α(β(k)).
If we let σ = h5, 4, 3, 2, 1i the composition with π = h1, 3, 4, 5, 2i from above
would then be
i 1 2 3 4 5
↓ ↓ ↓ ↓ ↓
π(i) 1 3 4 5 2
↓ ↓ ↓ ↓ ↓
σπ(i) 5 3 2 1 4
This is called multiplying permutations, i.e. σπ is the product of σ and π. If
we multiply a permutation π by itself n times, we call the resulting product
πn .
An important property regarding the multiplication of permutations follows
from their functional properties, namely their associativity. We have that
the permutation (αβ)γ = α(βγ), so we will take the liberty of dropping the
parentheses and writing αβγ.
Permutations also have inverses, which are just the inverses of their functions.
The permutation π = h1, 3, 4, 5, 2i which we looked at in the beginning thus
have the inverse given by
π−1 (1) = 1 π−1 (3) = 2 π−1 (4) = 3 π−1 (5) = 4 π−1 (2) = 5
written in permutation notation as h1, 5, 2, 3, 4i. Since this is the functional
inverse, we expect π−1 π = id.
260 CHAPTER 15. COMBINATORICS
i 1 2 3 4 5
↓ ↓ ↓ ↓ ↓
π(i) 1 3 4 5 2
↓ ↓ ↓ ↓ ↓
π−1 π(i) 1 2 3 4 5
We call the k distinct numbers of this sequence the cycle of i. For π, we have
two cycles: (1, 2) and (3, 4, 5). Note how π(1) = 2 and π(2) = 1 for the first
cycle, and π(3) = 4, π(4) = 5, π(5) = 3. It gives us an alternative way of
writing it, namely as the concatenation of its cycles: (1, 2)(3, 4, 5).
To compute the cycle decomposition of a permutation π, we repeatedly pick
any element of the permutation which is currently not a part of a cycle, and
compute the cycle it is in using the method described above. Since we will
consider every element exactly once, this procedure is Θ(n) for n-element
permutations.
15.2. PERMUTATIONS 261
Given a permutation π, we define its order, denoted ord π, as the size of the
set {π, π2 , π3 , . . . }. For all permutations except for the identity permutation,
this is the smallest integer k > 0 such that πk is the identity permutation. In
our example, we have that ord π = 6, since π6 was the first power of π that
was equal to the identity permutation. How can we quickly compute the order
of π?
This fact gives us an upper bound on the order of π. If its cycle decomposition
has cycles of length l1 , l2 , . . . , lm , the smallest positive number that is the
multiple of every li is lcm(l1 , l2 , . . . , lm ). The permutation π = h2, 1, 4, 5, 3i
had two cycles, one of length 2 and 3. Its order was lcm(2, 3) = 2 · 3 = 6. This
is also a lower bound on the order, a fact that uses the following fact which is
left as an exercise:
Exercise 15.6
Prove that if π has a cycle of length l, we must have l | ord π.
Dance Reconstruction
Nordic Collegiate Programming Contest 2013 – Lukáš Poláček
Marek loves dancing, got really excited when he heard about the coming
wedding of his best friend Miroslav. For a whole month he worked on a
special dance for the wedding. The dance was performed by N people
and there were N marks on the floor. There was an arrow from each mark
to another mark and every mark had exactly one incoming arrow. The
arrow could be also pointing back to the same mark.
At the wedding, every person first picked a mark on the floor and no 2
persons picked the same one. Every 10 seconds, there was a loud signal
when all dancers had to move along the arrow on the floor to another
mark. If an arrow was pointing back to the same mark, the person at the
mark just stayed there and maybe did some improvised dance moves on
the spot.
Another wedding is now coming up a year later, and Marek would like
to do a similar dance. He found two photos from exactly when the dance
started and when it ended. Marek also remembers that the signal was
triggered K times during the time the song was played, so people moved
K times along the arrows.
Given the two photos, can you help Marek reconstruct the arrows on the
floor? On the two photos it can be seen for every person to which position
he or she moved. Marek numbered the people in the first photo from 1 to
N and then wrote the number of the person whose place they took in the
second photo.
Marek’s time is running out, so he is interested in any placement of arrows
that could produce the two photos.
Input
Two integers 2 ≤ N ≤ 10 000 and 1 ≤ K ≤ 109 . Then, N integers 1 ≤
a1 , . . . , aN ≤ N, denoting that dancer number i ended up at the place of
dancer number ai . Every number between 1 and N appears exactly once
in the sequence ai .
Output
If it is impossible to find a placement of arrows such that the dance per-
formed K times would produce the two photos, print “Impossible”. Oth-
15.2. PERMUTATIONS 263
The problem can be rephrased in terms of permutations. First of all, the dance
corresponds so some permutation π of the dancers, given by where the arrows
pointed. This is the permutation we seek in the problem. We are given the
permutation a, so we seek a permutation π such that πK = a.
When given permutation problems of this kind, we should probably attack
it using cycle decompositions in some way. Since the cycles of π are all in-
dependent of each other under multiplication, it is a good guess that the
decomposition can simplify the problem. The important question is then how
a cycle of π is affected when taking powers. For example, a cycle of 10 elements
in π would decompose into two cycles of length 5 in π2 , and five cycles of
length 2 in π5 . The general case involves the divisors of l and K:
Exercise 15.8
Prove that a cycle of length l in a permutation π decomposes into gcd(l, K)
l
cycles of length gcd(l,K) in πK .
This suggests our first simplification of the problem: to consider all cycles of
πK partitioned by their lengths. By Exercise 15.8, cycles of different lengths are
completely unrelated in the cycle decomposition of πK .
The result also gives us a way to “reverse” the decomposition that happens to
l
the cycles of π. Given m cycles of length m in πK , we can combine them into a
l-cycle in π in the case where m · gcd(l, K) = l. By looping over every possible
cycle length l (from 1 to N), we can then find all possible ways to combine
cycles of πK into larger cycles of π. This step takes Θ(N log(N + K)) due to the
GCD computation.
Given all the ways to combine cycles, a knapsack problem remains for each
cycle length of πK . If we have a cycles of length l in πK , we want to partition
them into sets of certain sizes (given by by previous computation). This step
takes Θ(a · c) ways, if there are c ways to combine a-length cycles.
Once it has been decided what cycles are to be combined, only the act of
264 CHAPTER 15. COMBINATORICS
Once we have chosen the first k elements of a permutation, there are (n − k)!
ways to order the remaining n − k elements. Thus, we must have divided our
n! permutations into one group for each ordered k-length sequence, with each
group containing (n − k)! elements. To get the correct total, this means there
n!
must be (n−k)! such groups – and k-length sequences.
We call these objects ordered k-subsets of an n-element set, and denote the
number of such ordered sets by
n!
P(n, k) =
(n − k)!
but the first, leaving us with n − 1 choices, and so on. The difference to the
permutation is that we stop after choosing the k’th element, which we can do
in (n − k + 1) ways.
Finally, we are going to do away with the “ordered” part of the ordered k-
subsets, and count the number of subsets of size k of an n-element size. This
number is called the binomial coefficient, and is probably the most important
combinatorial number there is.
To compute the number of k-subsets of a set of size n, we start with all the
P(n, k) ordered subsets. Any particular unordered k-subset can be ordered in
exactly k! different ways. Hence, there must be P(n,k)
k! unordered subsets, by
the same grouping argument we used when determining P(n, k) itself.
For example, consider again the ordered 2-subsets of the set {a, b, c, d}, of
which there are 12.
ab ba ca da
ac bc cb db
ad bd cd dc
The subset {a, b} can be ordered in 2! ways - the ordered subsets ab and ba.
Since each unordered subset is responsible for the same number of ordered
subsets, we get the number of unordered subsets by dividing 12 with 2!, giving
us the 6 different 2-subsets of {a, b, c, d}.
ab
ac bc
ad bd cd
266 CHAPTER 15. COMBINATORICS
Note that
(n − k + 1) · (n − k + 2) · · · (n − 1) · n
n
=
k 1 · 2 · · · (k − 1) · k
They are thus the product of k numbers, divided by another k numbers. With
this fact in mind, it does not seem unreasonable that they should be computable
in O(k) time. Naively, one might try to compute them by first multiplying
the k numbers in the nominator, then the k numbers in the denominator, and
finally divide them.
Unfortunately, both of these numbers grow quickly. Indeed, already at 21!
we have outgrown a 64-bit integer. Instead, we will compute the binomial
coefficient by alternating multiplications and divisions. We will start with
storing 1 = 11 . Then, we multiply with n − r + 1 and divide with 1, leaving
us with n−r+1
1 . In the next step we multiply with n − r + 2 and divide with
2, having computed (n−r+1)·(n−r+2)
1·2 . After doing this r times, we will be left
with our binomial coefficient.
There is one big question mark from performing this procedure - why must
our intermediate result always be integer? This must be true if our procedure
is correct, or we will at some point perform an inexact integer division, leaving
us with an incorrect intermediate quotient. If we study the partial results more
closely, we see that they are binomial coefficients themselves, namely n−r+1
1 ,
n−r+2 n−1
n
2 , . . . , r−1 , r . Certainly, these numbers must be integers. As we just
showed, the binomial coefficients count things, and counting things tend to
result in integers.
As a bonus, we discovered another useful identity in computing binomial
coefficients:
n n n−1
=
r r r−1
15.4. BINOMIAL COEFFICIENTS 267
Exercise 15.9
Prove this identity combinatorially, by first multiplying both sides with
r. (Hint: both sides count the number of ways to do the same two-choice
process, but in different order.)
We have one more useful trick up our sleeves. Currently, if we want to compute
9
e.g. 1010 9
9 −1 , we have to perform 10 − 1 operations. To avoid this, we exploit
109
1 = 109 . More generally, this enables us to compute binomial coefficients
in O(min {r, n − r}) instead of O(r).
Sjecista
Croatian Olympiad in Informatics 2006/2007, Contest #2
In a convex polygon with N sides, line segments are drawn between all
268 CHAPTER 15. COMBINATORICS
2 https://oeis.org/A000332
15.4. BINOMIAL COEFFICIENTS 269
Instead, let us find some kind of bijection between the objects we count (in-
tersections of line segments) with something easier to count. This strategy
is one of the basic principles of combinatorial counting. An intersection is
defined by two line segments, of which there are N
2 . Does every pair of
segments intersect? In Figure 15.2, two segments (the solid segments) do not
intersect. However, two other segments which together have the same four
endpoints do intersect with each other. This suggests that line segments was
the wrong level of abstraction when finding a bijection. On the other hand,
if we choose a set of four points, the segments formed by the two diagonals
in the convex quadrilateral given by those four points will intersect at some
point (the dashed segments in Figure 15.2).
Exercise 15.11
Prove that
1) n n−1 n−1
k = k−1 + k
Pn n
n
2) k=0 k = 2
Pn k n
3) k=0 (−1) k = 0
Pn n k
n
4) k=0 k 2 = 3
Pn h
n
Pk
i
k l
5) k=0 k l=0 l 2 = 4n
As is the this spirit of this chapter, we ask how many Dyck paths there are in a
grid of size W × H. The solution is based on two facts: a Dyck path consists of
exactly H + W moves, and exactly H of those should be northbound moves,
and W eastbound. Conversely, any path consisting of exactly H + W moves
where exactly H of those are northbound moves is a Dyck path.
If we consider e.g. the Dyck path in Figure 15.3, we can write down the
sequence of moves we made, with the symbol N for northbound moves and E
for eastbound moves:
EENENNEEENEEN
Figure 15.4: The two options for the last possible move in a Dyck path.
If we look at Figure 15.3, we can find another way to arrive at the same answer.
Letting D(W, H) be the number of Dyck paths in a W × H grid, some case work
on the last move gives us the recurrence
D 0 (W + H, H) = D 0 (W − 1 + H, W − 1) + D 0 (W + H − 1, H − 1)
Exercise 15.12
Pn n
n
2n
Prove that i=0 i n−i = n .
While Dyck paths sometimes do appear directly in problems, they are also a
useful tool to find bijections to other objects.
Sums
In how many ways can the numbers 0 ≤ a1 , a2 , . . . , ak be chosen such
that
X
k
ai = n
i=1
Input
The integers 0 ≤ n ≤ 106 and 0 ≤ k ≤ 106 .
Output
Output the number of ways modulo 109 + 7.
Given a Dyck path such as the one in Figure 15.3, what happens if we count the
number of northbound steps we take at each x-coordinate? There are a total
of W + 1 coordinates and H northbound steps, so we except this to be a sum
of W + 1 (non-negative) variables with a sum of H. This is indeed similar to
what we are counting, and Figure 15.5 shows this connection explicitly.
272 CHAPTER 15. COMBINATORICS
a1 a2 a3 a4 a5 a6 a7 a8 a9
0+0+1+2+0+0+1+0+1=5
A special case of the Dyck paths are the paths on a square grid that do not cross
the diagonal of the grid. See Figure 15.6 for an example.
We are now going to count the number of such paths, the most complex
counting problem we have encountered so far. It turns out there is a straight-
forward bijection between the invalid Dyck paths, i.e. those who do cross
the diagonal of the grid, to Dyck paths in a grid of different dimensions. In
Figure 15.6, the right grid contained a path that cross the diagonal. If we take
the part of the grid just after the first segment that crossed the diagonal and
mirror it in the diagonal translated one unit upwards, we get the situation in
Figure 15.7.
15.4. BINOMIAL COEFFICIENTS 273
Figure 15.7: Mirroring the part of the Dyck path after its first diagonal crossing.
We claim that when mirroring the remainder of the path in this translated
diagonal, we will get a new Dyck path on the grid of size (n − 1) × (n + 1).
Assume that the first crossing is at the point (c, c). Then, after taking one step
up in order to cross the diagonal, the remaining path goes from (c, c + 1) to
(n, n). This needs n−c steps to the right and n−c−1 steps up. When mirroring,
this instead turns into n − c − 1 steps up and n − c steps right. Continuing
from (c, c + 1), the new path must thus end at (c + (n − c − 1), c + 1 + (n − c)) =
(n − 1, n + 1). This mapping is also bijective.
This bijection lets us count the number of paths that do cross the diagonal:
2n
they are n+1 . The numbers of paths that does not cross the diagonal is then
2n 2n
n − n+1 .
The first few Catalan numbers3 are 1, 1, 2, 5, 14, 42, 132, 429, 1430.
Catalan numbers count many other objects, most notably the number of bal-
3 https://oeis.org/A000108
274 CHAPTER 15. COMBINATORICS
A A∩B B
A∪B
Let us consider the most basic case of the principle, using two sets A and B. If
we wish to compute the size of their union |A ∪ B|, we at least need to count
every element in A and every set in B, i.e. |A| + |B|. The problem with this
formula is that whenever an element is in both A and B, we count it twice.
Fortunately, this is easily mitigated: the number of elements in both sets equals
|A ∩ B| (Figure 15.8). Thus, we see that |A ∪ B| = |A| + |B| − |A ∩ B|.
Similarly, we can determine a formula for the union of three sets |A ∪ B ∪ C|.
We begin by including every element: |A| + |B| + |C|. Again, we have included
the pairwise intersections too many times, so we remove those and get
This time, however, we are not done. While we have counted the elements
which are in exactly one of the sets correctly (using the first three terms), and
the elements which are in exactly two of the sets correctly (by removing the
double-counting using the three latter terms), we currently do not count the
elements which are in all three sets at all! Thus, we need to add them back,
which gives us the final formula:
Exercise 15.14
Compute the number of integers between 1 and 1000 that are divisible by
2, 3 or 5.
From the two examples, you can probably guess formula in the general case,
which we write in the following way:
[ X X X
n
Si = |Si |− |Si ∩Sj |+ |Si ∩Sj ∩Sk |−· · ·+(−1)n+1 |S1 ∩S2 ∩· · ·∩Sn |
i=1 i i<j i<j<k
From this formula, we see the reason behind the naming of the principle. We
include every element, exclude the ones we double-counted, include the ones
we removed too many times, and so on. The principle is based on a very
important assumption – that it is easier to compute intersections of sets than
their unions. Whenever this is the case, you might want to consider if the
principle is applicable.
Derangements
Compute the number of permutations π of length N such that π(i) 6= i for
every i = 1 . . . N.
apply the inclusion and exclusion formula, we must be able to compute the
size of intersections of the subsets of Di . This task is simplified greatly since
the intersection of k such subsets is entirely symmetrical (it does not matter
for which elements the condition is false, only the number).
If we want to compute the intersection of k such subsets, this means that there
are k indices i where π(i) = i. There are N − k other elements, which can be
arranged in (N − k)! ways, so the intersection of these sets have size (N − k)!.
Since we can choose which k elements that should be fixed in N
k ways, the
term in the formula where we compute all k-way intersections will evaluate to
N N!
k (N − k)! = k! . Thus, the formula can be simplified to
N! N! N!
− + − ...
1! 2! 3!
Subtracting this from N! means that there are
1 1
N!(1 − 1 + − + ...)
2! 3!
This gives us a Θ(N) algorithm to compute the answer.
It is possible to simplify this further, using some insights from calculus. We
have that
1 1
e−1 = 1 − 1 + − + . . .
2 3
Then, we expect that the answer should converge to N! e . As it happens, the
answer will always be N!
e rounded to the nearest integer.
Exercise 15.15
8 persons are to be seated around a circular table. The company is made
up of 4 married couples, where the two members of a couple prefer not to
be seated next to each other. How many possible seating arrangements are
possible, assuming the cyclic rotations of an arrangement are considered
equivalent?
15.6. INVARIANTS 277
15.6 Invariants
Many problems deal with processes which consist of many steps. During such
processes, we are often interested in certain properties that never change. We
call such a property an invariant. For example, consider the binary search
algorithm to find a value in a sorted array. During the execution of the algo-
rithm, we maintain the invariant that the value we are searching for must be
contained in some given segment of the array indexed by [lo, hi) at any time.
The fact that this property is invariant basically constitutes the entire proof
of correctness of binary search. Invariants are tightly attached to greedy algo-
rithms, and is a common tool used in proving correctness of various greedy
algorithms. They are also one of the main tools in proving impossibility results
(for example when to answer NO in decision problems).
Permutation Swaps
Given is a permutation ai of h1, 2, ..., Ni. Can you perform exactly K swaps,
i.e. exchanging pairs of elements of the permutation, to obtain the identity
permutation h1, 2, ..., Ni?
Input
The first line of input contains the size of the permutation 1 ≤ N ≤ 100 000.
The next line contains N integers separated, the permutation a1 , a2 , ..., aN .
Output
Output YES if it is possible, and NO if it is impossible.
swaps to return it to the identity permutation, a fact you will be asked to prove
in the next section on monovariants. This gives us one necessary condition:
K ≥ S. However, this is not sufficient. A single additional condition is needed
278 CHAPTER 15. COMBINATORICS
– that S and K have the same parity! To prove this, we will look at the number
of inversions of a permutation, one of the common invariant properties of
permutations.
Given a permutation ai , we say that the pair (i, j) is an inversion if i < j, but
ai > aj . Intuitively, it is the number of pairs of elements that are “out of place”
in relation to each other.
3 5 2 1 4 6 inversions
1 5 2 3 4 3 inversions
1 5 3 2 4 4 inversions
1 5 3 4 2 5 inversions
1 2 3 4 5 0 inversions
If we look at Figure 15.9, where we started out with a permutation and per-
formed a number of swaps (transforming it to the identity permutation), we
can spot a simple invariant. The parity of the number of swaps and the num-
ber of inversions seems to always be the same. This characterization of a
permutation is called odd and even permutations depending on whether the
number of inversions is odd or even. Let us prove that this invariant actually
holds.
If this is the case, it is obvious why S and K must have the same parity. Since S is
the number of swaps needed to transform the identity permutation to the given
permutation, it must have the same parity as the number of inversions. By
performing K swaps, K must have the same parity as the number of inversions.
As K and S must have the same parity as the number of inversions, they must
have the same parity as each other.
To see why these two conditions are sufficient, we can, after performing S
swaps to obtain the identity permutation, simply swap two numbers with
each other the remaining swaps. This can be done since K − S will be an even
number due to their equal parity.
15.7. MONOVARIANTS 279
15.7 Monovariants
D
B F
C G
A
E
lems usually differ in that the choice of next action usually has less thought
behind it in the monovariant problems. We will often focus not on optimally
short sequences of choices as we do with greedy algorithms, but merely find-
ing any valid configuration. For example, in the problem above, one might
try to construct a greedy algorithm based on for example the degrees of the
vertices, which seems reasonable. However, it turns out there is not enough
structure in the problem to find any simple greedy algorithm to solve the
problem.
Instead, we will attempt to use the most common monovariant attack. Roughly,
the process follows these steps:
1. Start with any arbitrary state s.
2. Look for some kind of modification to this state, which is possible if and
only if the state is not admissible. Generally, the goal of this modification
is to “fix” whatever makes the state inadmissible.
3. Prove that there is some value p(s) that must decrease whenever such a
modification is done.
4. Prove that this value cannot decrease infinitely many times.
Using these four rules, we prove the existence of an admissible state. If (and
only if) s is not admissible, by step 2 we can perform some specified action on
it, which by step 3 will decrease the value p(s). Step 4 usually follows from
one of the two value functions discussed previously. Hence, by performing
finitely many such actions, we must (by rule 4) reach a state where no such
action is possible. This happens only when the state is admissible, meaning
15.7. MONOVARIANTS 281
such a state must exist. The process might seem a bit abstract, but will become
clear once we walk you through the bipartitioning step.
Our algorithm will work as follows. First, consider any bipartition of the
graph. Assume that this graph does not fulfill the neighbor condition. Then,
there must exist a vertex v which has more than |N(v)|
2 vertices in the same part
as v itself. Whenever such a vertex exists, we move any of them to the other
side of the partition. See Figure 15.11 of the this process.
G
B B
D D
F F
C G C
A A
E E
Figure 15.11: Two iterations of the algorithm, which brings the graph to a valid
state.
One question remains – why does this move guarantee a finite process? We
now have a general framework to prove such things, which suggests that
perhaps we should look for a value function p(s) which is either strictly
increasing or decreasing as we perform an action. By studying the algorithm
in action in Figure 15.11 we might notice that more and more edges tend to go
between the two parts. In fact, this number never decreased in our example,
and it turns out this is always the case.
If a vertex v has a neighbors in the same part, b neighbors in the other part,
and violates the neighbor condition, this means that a > b. When we move
v to the other part, the b edges from v to its neighbors in the other part will
no longer be between the two parts, while the a edges to its neighbors in
the same part will. This means the number of edges between the parts will
change by a − b > 0. Thus, we can choose this as our value function. Since
this is an integer function with the obvious upper bound of E, we complete
step 4 of our proof technique and can thus conclude the final state must be
admissible.
282 CHAPTER 15. COMBINATORICS
Water Pistols
N girls and N boys stand on a large field, with no line going through three
different children.
Each girl is equipped with a water pistol, and wants to pick a boy to fire
at. While the boys probably will not appreciate being drenched in water,
at least the girls are a fair menace – the will only fire at a single boy each.
Unfortunately, it may be the case that two girls choose which boys to fire
at in such a way that the water from their pistols will cross at some point.
If this happens, they will cancel each other out, never hitting their targets.
Help the girls choose which boys to fire at, in such a way that no two girls
fire at the same boy, and the water fired by two girls will not cross.
D
A
E
C
B
F
is the coordinate (x, y) of a girl. The next and final N lines contain the
coordinates of the boys, in the same format.
Output
Output N lines. The i’th line should contain the zero-indexed number of
the boy which the i’th girl should fire at.
After seeing the solution to the previous problem, the solution should not
come as a surprise. We start by randomly assigning the girls to one boy each,
with no two girls shooting at he same boy. If this assignment contains two
girls firing water beams which cross, we simply swap their targets.
Unless you are geometrically minded, it may be hard to figure out an appropri-
ate value function. The naive value function of counting the current number
of water beams crossing unfortunately fails – and might even increase after a
move.
Instead, let us look closer at what happens when we switch the targets of two
girls. In Figure 15.13, we see the before and after of such an example, as well as
the two situations interposed. If we consider the sum of the two lengths of the
water beams before the swap ((C + D) + (E + F)) versus the lengths after the
swap (A + B), we see that the latter must be less than the first. Indeed, we have
A < C+D and B < E+F by the triangle inequality, which by summing the two
inequalities give the desired result. Thus the sum of all water beam lengths
will decrease whenever we perform such a move. As students of algorithmics,
we can make the additional note that this means the minimum-cost matching
of the complete bipartite graph of girls and boys, with edges given as cost the
284 CHAPTER 15. COMBINATORICS
distance between a particular girl and boy, is a valid assignment. If this was
not the case, we would be able to swap two targets and decrease the cost of the
matching, contradicting the assumption that it was minimum-cost. Thus, this
rather mathematical proof actually ended up giving us a very simple reduction
to min-cost matching.
Number Theory
16.1 Divisibility
All of the number theory in this chapter relate to a single property of integers,
divisibility.
285
286 CHAPTER 16. NUMBER THEORY
Exercise 16.1
Determine the divisors of 18.
The concept of divisibility raises many questions. First and foremost – how do
we check if a number is divisible by another? This question has one short and
one long answer. For small numbers – those that fit inside the native integer
types of a language – checking for divisibility is as simple as using the modulo
operator (%) of your favorite programming language – n is divisible by d if and
only if n mod d = 0. The situation is not as simple for large numbers. Some
programming languages, such as Java and Python, have built-in support for
dealing with large integers, but e.g. C++ does not. In Section 16.4 on modular
arithmetic, we discuss the implementation of the modulo operator on large
integers.
Secondly, how do we compute the divisors of a number? Every integer has at
least two particular divisors called the trivial divisors, namely 1 and n itself.
If we exclude the divisor n, we get the proper divisors. To find the remaining
divisors, we can use the fact that any divisor d of n must satisfy |d| ≤ |n|. This
means that we can limit ourselves to testing whether the integers between 1
and n are divisors of n, a Θ(n) algorithm. We can do a bit better though.
Almost Perfect
Baylor Competitive Learning course – David Sturgill
A positive integer p is called a perfect number if all the proper divisors of
16.1. DIVISIBILITY 287
p sum to p exactly. Perfect numbers are rare; only 10 of them are known.
Perhaps the definition of perfection is a little too strict. Instead, we will
consider numbers that we’ll call almost perfect. A positive integer p is
almost perfect if the proper divisors of p sum to a value that differs from
p by no more than two.
Input
Input consists of a sequence of up to 500 integers, one per line. Each
integer is in the range 2 to 109 (inclusive).
Output
For each input value, output the same value and then one of the following:
“perfect” (if the number is perfect), “almost perfect” (if it is almost perfect
but not perfect), or “not perfect” (otherwise).
In this problem, computing the divisors of the numbers of the input sequence
would be way too slow, requiring upwards of 1011 operations. Hidden in
Example 16.1 lies the key insight to speeding this up. It seems that whenever
we had a divisor d, we were immediately given another divisor q. For example,
when claiming 3 was a divisor of 12 since 3 · 4 = 12, we found another
divisor, 4. This should not be a surprise, since our definition of divisibility
(Definition 16.1)– the existence of the integer q in n = dq – is symmetric in d
and q, meaning divisors come in pairs (d, n d ).
Exercise 16.2
Prove that an integer has an odd number of divisors if and only if it is a
perfect square (except 0, which has an infinite number of divisors).
Since divisors come in pairs, we can limit ourselves to finding one member of
each such pair.
√ Furthermore, one of the elements in each such √ pair √
must be
bounded by n. Otherwise, we would have that n = d · n d > n · n = n,
a contradiction (again, except for 0, which has d0 = 0). This limit helps us
√
reduce the time it takes to find the divisors of a number to Θ( N), which
allows us to solve the problem. You can see the pseudo code for this in
Algorithm 16.1
288 CHAPTER 16. NUMBER THEORY
This also happens to give us some help in answering our next question, re-
garding the plurality of divisors. The above result gives us an upper bound
√ 1
of 2 n divisors of an integer n. We can do a little better, with O(n 3 ) being
a commonly used bound for the number of divisors when dealing with inte-
gers which fit in the native integer types.1 For example, the maximal number
of divisors of a number less than 103 is 32, 106 is 240, 109 is 1 344, 1018 is
103 680.2
A bound we will find more useful when solving problems concerns the average
number of divisors of the integers between 1 and n.
1 In reality, the maximal number of divisors of the interval [1, n] grows sub-polynomially, i.e.,
X
n
d(i) ≈ n ln n
i=1
X
n
n X
n
1
=n ≈ n ln n
j j
j=1 j=1
This proof also suggest a way to compute the divisors of all the integers
1, 2, ..., n in Θ(n ln n) time. For each integer i, we find all the numbers divisible
by i (in Θ( ni ) time), which are 0i, 1i, 2i, . . . d ni ei. This is an extension of the
algorithm commonly known as the Sieve of Eratosthenes, an algorithm to find
the objects which are our next topic of study – prime numbers.
From the concept of divisibility comes the building blocks of the integers, the
famous prime numbers. With divisibility, we got factorizations. For example,
given the number 12, we could factor it as 2 · 6, or 3 · 4, or even 2 · 2 · 3. This last
factorization is special, in that no matter how hard we try, it cannot be factored
further. It consists only of prime numbers.
and p.
The numbers that are not prime numbers are called composite numbers. There
are an infinite number of primes. This be proven by a simple proof by con-
tradiction. If p1 , p2 , ..., pq are the only primes, then P = p1 p2 . . . pq + 1 is
not divisible by any prime number (and by extension has no divisors but the
trivial ones), so it is not composite. However, P is larger than any prime, so it
is not a prime number either, a contradiction.
Example 16.2 The first 10 prime numbers are 2, 3, 5, 7, 11, 13, 17, 19, 23, 29.
Which is the next one?
Since the prime numbers have no other divisors besides the trivial ones, a
factorization consisting only of prime numbers is special.
A List Game
Spotify Challenge 2010 – Per Austrin
You are playing the following simple game with a friend:
1. The first player picks a positive integer X.
2. The second player gives a list of k positive integers Y1 , . . . , Yk such
that
(Y1 + 1)(Y2 + 1) · · · (Yk + 1) = X
and gets k points.
Write a program that plays the second player.
16.2. PRIME NUMBERS 291
Input
The input consists of a single integer X satisfying 103 ≤ X ≤ 109 , giving
the number picked by the first player.
Output
Write a single integer k, giving the number of points obtained by the
second player, assuming she plays as good as possible.
The problem seeks the factorization of X that contains the largest number of
factors, where every factor is at least 2. This factorization must in fact be the
prime factorization of X, which we will prove by contradiction. Assume that
the optimal list of integers Y1 , . . . , Yk contains a number Yi that is composite.
In this case, it can be further factored into Yi = ab with a, b > 1. Replacing
Yi with the numbers a and b will keep the product of the list invariant (since
Yi = ab), but extend the length of the list by 1, thus giving a higher-scoring
sequence. The only case when this is impossible is if all the Yi are primes, so
the list we seek must indeed be the prime factorization of X.
This begs the question, how do we compute the prime factorization of a
number? Mainly two different algorithms are used when computing prime
factorizations: trial division and the Sieve of Eratosthenes3 . In computing the
prime
√ factorization of a few, large numbers, trial division is used. It runs in
O( N) time, and use the same insight we used in the improved algorithm to
compute the divisors of √ an algorithm. Since any composite number N must √
have a divisor less than N, it must also have a prime divisor less than N.
Implementing this is straightforward.
The other algorithm is used to factor every number in a large interval.
Product Divisors
Given a sequence
Qn of integers a1 , a2 , . . . an , compute the number of divi-
sors of A = i=1 ai .
Input
Input starts with an integer 0 ≤ n ≤ 1 000 000, the length of the sequence
3 Many more exist, but are too complicated to be used in problem solving context, although
Let A have the prime factorization A = pe11 · pe22 · · · · · pekk . A divisor of A must
e0 e0 e0
be of the form d = p11 · p22 · · · · · pkk , where 0 ≤ ei0 ≤ ei . This can be proven
using the uniqueness of the prime factorization, and the fact that A = dq for
some integer q. More importantly, any number of this form is a divisor of A,
by the same cancellation argument used when proving the uniqueness of the
prime factorization.
As of now, the crucial step of factoring all integers in [1..106 ] remains. This
is the purpose of the Sieve of Eratosthenes. We have already seen the basic
idea when computing the divisors of all numbers in the interval [1...n] in
Section 16.1. Extending this to factoring is only a matter of restricting the
algorithm to finding prime divisors (Algorithm 16.3).
iterations, and so on. Summing this over every p which is used in the sieve
gives us the bound
X n n n
X 1 1 1
+ 2 + 3 + ... = n + 2 + 3 + ...
√ p p p √ p p p
p≤ n p≤ n
p
Using the formula for the sum of a geometric series ( p1 + 1
p2
+ ... = p−1 ) gives
us the simplification
X 1 X 1
n = Θ n
√ p−1 √ p
p≤ n p≤ n
294 CHAPTER 16. NUMBER THEORY
This P
last sum is out of reach to estimate with our current tools. It turns out
that p≤n p1 = O(ln ln n). With this, the final complexity becomes a simple
√
O(n ln ln n) = O(n ln ln n).
Factovisors – factovisors
Divisors – divisors
The Euclidean algorithm is one of the oldest known algorithms, dating back
to Greek mathematician Euclid who wrote of it in his mathematical treatise
Elements. It regards those numbers which are divisors two different integers,
with an extension capable of solving integer equations of the form ax + by =
c.
√ √
We already know of an O( a + b) algorithm to compute (a, b), namely to
enumerate all divisors of a and b. Two simple identities are key to the much
faster Euclidean algorithm.
Identity 16.1 is obvious. An integer a cannot have a larger divisor than |a|,
and this is certainly a divisor of a. Identity 16.2 needs a bit more work. We
can prove their equality by proving an even stronger result – that all common
divisors of a and b are also common divisors of a and b − a. Assume d is a
common divisor of a and b, so that a = da 0 and b = db 0 for integers a 0 , b 0 .
Then b − a = db 0 − da 0 = d(b 0 − a 0 ), with b 0 − a 0 being an integer, is sufficient
for d also being a divisor of b − a. Hence the divisors of a and b are also
divisors of a and b − a. In particular, their largest common divisor is the same.
The application of these identities yield a recursive solution to the problem. If
we wish to compute (a, b) where a, b are positive and a > b, we reduce the
problem to a smaller one by instead computing (a, b), we compute (a − b, b).
This gives us a smaller problem, in the sense that a + b decrease. Since both
a and b are non-negative, this means we must at some point arrive at the
situation where either a or b are 0. In this case, we use the base case that is
Identity 16.1.
One simple but important step remains before the algorithm is useful. Note
how computing (109 , 1) requires about 109 steps right now, since we will
do the reductions (109 − 1, 1), (109 − 2, 1), (109 − 3, 1)... The fix is easy – the
repeated application of subtraction of a number b from a while a > b is exactly
the modulo operation, meaning
(a, b) = (a mod b, b)
This last piece of our Euclidean puzzle complete our algorithm, and gives us a
remarkably short algorithm, as seen in Algorithm 16.4.
296 CHAPTER 16. NUMBER THEORY
Competitive Tip
Granica
Croatian Open Competition in Informatics 2007/2008, Contest #6
Given integers a1 , a2 , ..., an , find all those numbers d such that upon
division by d, all of the numbers ai leave the same remainder.
Input
The first line contains the integer 2 ≤ n ≤ 100, the length of the sequence
ai . The second line contains the integers n integers 1 ≤ a1 , a2 , . . . , an ≤
109 .
Output
Output all such integers d, separated by spaces.
What does it mean for two numbers ai and aj to have the same remainder
when dividing by d? Letting this remainder be r we can write ai = dn + r
and aj = dm + r for integers n and m. Thus, ai − aj = d(n − m) so that d is
divisor of ai − aj ! This gives us a necessary condition for our numbers d. Is it
sufficient? If ai = dn+r and aj = dm+r 0 , we have ai −aj = d(n−m)+(r−r 0 ).
Since d is a divisor of ai − aj it must be a divisor of d(n − m) + (r − r 0 ) too,
16.3. THE EUCLIDEAN ALGORITHM 297
gcd ai − aj
1≤i<j≤n
which seek the greatest common divisor of many rather than just two num-
bers.
To our rescue comes the prime factor interpretation of divisors, namely that a
divisor of a number
n = pe11 · · · pekk
is of the form
e0 e0
d = p11 · · · pkk
where 0 ≤ ei0 ≤ ei . Then, the requirement for d to be a common divisor of n
and another number
m = pf11 · · · pfk1
is that 0 ≤ ei0 ≤ min(fi , ei ), with ei0 = min(fi , ei ) giving us the GCD.
Using this interpretation of the GCD, we can extend the result can be extended
to finding the GCD d of a sequence b1 , b2 , . . . Consider any prime p, such that
pqi || bi . Then, we must have pmin(q1 ,q2 ,... ) || d. This suggests the recursion
formula d = gcd(b1 , b2 , . . . ) = gcd(b1 , gcd(b2 , . . . )).
We only need one more insight to solve the problem, namely that the common
divisors d of a and b are exactly the divisors of (a, b).
A related concept is given if we instead of taking the minimum of prime factors
of two numbers take the maximum.
298 CHAPTER 16. NUMBER THEORY
The computation of the LCM is basically the same as for the GCD.
A multiple d of an integer a of the form
a = pe11 · · · pekk
b = pf11 · · · pfk1
it must be that max(fi , ei ) ≤ ei0 , with ei0 = max(fi , ei ) giving us the LCM. Since
max(ei , fi )+min(ei , fi ) = ei ·+fi , we must have that lcm(a, b)·gcd(a, b) = ab.
a
This gives us the formula lcm(a, b) = gcd(a,b) b to compute the LCM. The order
of operations is chosen to avoid overflows in computing the product ab.
Since max is associative just like min, the LCM operation is too, meaning
Diophantine Equation
Given integers a, b, c, find an integer solution x, y to
ax + by = (a, b)
First of all, it is not obvious such an integer solution exists. This is the case
however, which we will prove constructively thus gaining an algorithm to
find such a solution at the same time. The trick lies in the exact same identities
used when we formulated the Euclidean algorithm. Since (a, 0) = a, the
16.3. THE EUCLIDEAN ALGORITHM 299
Originally, we have the solution [1, 0] to the equation with coefficients [1, 0]:
1 · 1 + 0 · 0 = (1, 0)
3 · 0 + 1 · 1 = (3, 1)
The next application gives us the solution [1, 0 − b 34 c1] = [1, −1] to [4, 3].
4 · 1 + 3 · (−1) = (4, 3)
[4, 3] → [−1, 1 − b 11
4 c(−1)] = [−1, 3] which gives us
11 · (−1) + 4 · 3 = (11, 4)
Exercise 16.5
Find an integer solution to the equation 24x + 52y = 2.
This gives us a single solution, but can we find all solutions? First, assume a
and b are co-prime. Then, given two solutions
ax1 + by1 = 1
ax2 + by2 = 1
a simple subtraction gives us that
a(x1 − x2 ) + b(y1 − y2 ) = 0
a(x1 − x2 ) = b(y2 − y1 )
Since a and b are co-prime, we have that
b | x1 − x2
16.4. MODULAR ARITHMETIC 301
−akb = b(y2 − y1 )
−ak = y2 − y1
y1 − ak = y2
That these are indeed solutions to the original equation can be verified by sub-
stituting x and y for these values. This result is called Bezout’s identity.
When first learning division, one is often introduced to the concept of remain-
ders. For example, when diving 7 by 3, you would get “2 with a remainder
of 1”. In general, when dividing a number a with a number n, you would
get a quotient q and a remainder r. These numbers would satisfy the identity
a = nq + r, with 0 ≤ r < b.
5
= 1, remainder 1
4
6
= 1, remainder 2
4
Note how the remainder always increase by 1 when the nominator increased.
As you might remember from Chapter 2 on C++ (or from your favorite pro-
gramming language), there is an operator which compute this remainder
called the modulo operator. Modular arithmetic is then the computation on
numbers, where every number is taken modulo some integer n. Under such a
scheme, we have that e.g. 3 and 7 are basically the same if computing modulo
4, since 3 mod 4 = 3 = 7 mod 4. This concept, where numbers with the same
remainder are treated as if they are equal is called congruence.
a≡b (mod n)
Exercise 16.6
What does it mean for a number a to be congruent to 0 modulo n?
+ 0 1 2
0 0 1 2
1 1 2 3≡0
2 2 3≡0 4≡1
16.4. MODULAR ARITHMETIC 303
* 0 1 2
0 0 0 0
1 0 1 2
2 0 2 4≡1
When we wish to perform arithmetic of this form, we use the integers modulo
n rather than the ordinary integers. These has a special set notation as well:
Zn .
While addition and multiplication is quite natural (i.e. performing the op-
eration as usual and then taking the result modulo n), division is a more
complicated story. For real numbers, the inverse x−1 of a number x is defined
as the number which satisfy the equation xx−1 = 1. For example, the inverse of
4 is 0.25, since 4 · 0.25 = 1. The division a
b is then simply a multiplied with the
inverse of b. The same definition is applicable to modular arithmetic:
Proof. Since a⊥n, there is a number a−1 such that aa−1 ≡ 1 (mod n).
ab ≡ ac (mod n)
b ≡ c (mod n)
This procedure is clearly Θ(log2 m), since applying the recursive formula for
even numbers halve the m to be computed, while applying it an odd number
will first make it even and then halve it in the next iteration. It is very important
m
that a 2 mod n is computed only once, even though it is squared! Computing
it twice causes the complexity to degrade to Θ(m) again.
x ≡ a1 (mod m1 )
x ≡ a2 (mod m2 )
...
x ≡ am (mod mn )
Proof. We will prove the theorem inductively. The theorem is clearly true for
n = 1, with the unique solution x = a1 . Now, consider the two equations
x ≡ a1 (mod m1 )
x ≡ a2 (mod m2 )
Since a solution exist for every a1 , a2 , this solution must be unique by the
pigeonhole principle – there are m1 · m2 possible values for a1 , a2 , and
m1 · m2 possible values for x. Thus, the theorem is also true for n = 2.
Assume the theorem is true for k − 1 equations. Then, we can replace the
equations
x ≡ a1 (mod m1 )
x ≡ a2 (mod m2 )
306 CHAPTER 16. NUMBER THEORY
x≡x∗ (mod m1 m2 )
where x∗ is the solution to the first two equations. We just proved those
two equations are equivalent with regards to x. This reduces the number
of equations to k − 1, which by assumption the theorem holds for. Thus, it
also holds for k equations.
Note that the theorem used an explicit construction of the solution, allowing
us to find what the unique solution to such a system is.
Radar
KTH Challenge 2014
We say that an integer z is within distance y of an integer x modulo an
integer m if
z ≡ x + t (mod m)
where |t| ≤ y.
Find the smallest non-negative integer z such that it is:
• within distance y1 of x1 modulo m1
• within distance y2 of x2 modulo m2
• within distance y3 of x3 modulo m3
Input
The integers 0 ≤ m1 , m2 , m3 ≤ 106 . The integers 0 ≤ x1 , x2 , x3 ≤ 106 .
The integers 0 ≤ y1 , y2 , y3 ≤ 300.
Output
The integer z.
z ≡ xi + ti (mod mi )
solving the system of equations using CRT. We could then find all possible
values of z, and choose the minimum one. This requires applying the CRT
construction about 2 · 6003 = 432 000 000 times. Since the modulo operation
involved is quite expensive, this approach would use too much time. Instead,
let us exploit a useful greedy principle in finding minimal solutions.
Assume that z is the minimal answer to an instance. There are only two
situations where z − 1 cannot be a solution as well:
• z = 0 – since z must be non-negative, this is the smallest possible answer
• z ≡ xi − yi – then, decreasing z would violate one of the constraints
In the first case, we only need to verify whether z = 0 is a solution to the three
inequalities. In the second case, we managed to change an inequality to a
linear equation. By testing which of the i this equation holds for, we only need
to test the values of ti for the two other equations. This reduce the number of
times we need to use the CRT to 6002 = 360 000 times, a modest amount well
within the time limit.
Now that we have talked about modular arithmetic, we can give the numbers
which are not divisors to some integer n their well-deserved attention. This
discussion will start with the φ-function.
Definition 16.8 Two integers a and b are said to be relatively prime if their
only (and thus greatest) common divisor is 1. If a and b are relatively
prime, we write that a⊥b.
Example 16.6 The numbers 74 and 22 are not relatively prime, since they are
both divisible by 2.
The numbers 72 and 65 are relatively prime. The prime factorization of 72
is 2 · 2 · 2 · 3 · 3, and the factorization of 65 is 5 · 13. Since these numbers
have no prime factors in common, they have no divisors other than 1 in
common.
308 CHAPTER 16. NUMBER THEORY
Example 16.7 What is φ(12)? The numbers 2, 4, 6, 8, 10 all have the factor 2
in common with 12 and the numbers 3, 6, 9 all have the factor 3 in common
with 12.
This leaves us with the integers 1, 5, 7, 11 which are relatively prime to 12.
Thus, φ(12) = 4.
For prime powers, φ(pk ) is easy to compute. The only integers which are not
k
relatively prime to φ(pk ) are the multiples of p, which there are pp = pk−1 of,
meaning
φ(pk ) = pk − pk−1 = pk−1 (p − 1)
It turns out φ(n) has a property which is highly useful in computing certain
number theoretical functions – it is multiplicative, meaning
aφ(n) ≡ 1 (mod n)
Proof. The proof of this theorem isn’t trivial, but it is number theoretically
interesting and helps to build some intuition for modular arithmetic. The
idea behind the proof will be to consider the product of the φ(n) positive
integers less than n which are relatively prime to n. We will call these
x1 , x2 , . . . , xφ(n) . Since these are all distinct integers between 1 and n,
they are incongruent modulo n. We call such a set of φ(n) numbers, all
incongruent modulo n a complete residue system (CRS) modulo n.
Next, we will prove that ax1 , ax2 , . . . , axφ also form a CRS modulo n. We
need to show two properties for this:
1. All numbers are relatively prime to n
2. All numbers are incongruent modulo n
We will start with the first property. Since both a and xi are relatively prime
to n, neither number have a prime factor in common with n. This means axi
have no prime factor in common with n either, meaning the two numbers
are relatively prime. The second property requires us to make use of the
310 CHAPTER 16. NUMBER THEORY
Since all the xi are relatively prime to n, we can again use the cancellation
law, leaving
aφ(n) ≡ 1 (mod n)
completing our proof of Euler’s theorem.
For primes p we get a special case of Euler’s theorem when since φ(p) =
p − 1.
Corollary 16.1 Fermat’s Theorem For a prime p and an integer a⊥p, we have
ap−1 ≡ 1 (mod p)
Competitive Tip
Exponial
Nordic Collegiate Programming Contest 2016
Define the exponial of n as the function
2 1
(n−2)...
exponial(n) = n(n−1)
pei i ≤ n ⇒
log2 (N)
ei ≤ ≤ log2 (N)
log2 (pi )
312 CHAPTER 16. NUMBER THEORY
4 http://www.shoup.net/ntb/
Chapter 17
Competitive Programming
Strategy
17.1 IOI
The IOI is an international event where a large number of countries send teams
of up to 4 high school students to compete individually against each other
during two days of competition. Every participating country has its own
national selection olympiad first.
313
314 CHAPTER 17. COMPETITIVE PROGRAMMING STRATEGY
During a standard IOI contest, contestants are given 5 hours to solve 3 prob-
lems, each worth at most 100 points. These problems are not given in any
particular order, and the scores of the other contestants are hidden until the
end of the contest. Generally none of the problems are “easy” in the sense that
it is immediately obvious how to solve the problem in the same way the first
1-2 problems of most other competitions are. This poses a large problem, in
particular for the amateur. Without any trivial problems nor guidance from
other contestants on what problems to focus on, how does an IOI competi-
tor prioritize? The problem is further exacerbated by problems not having
a simple binary scoring, with a submission being either accepted or rejected.
Instead, IOI problems contain many so-called subtasks. These subtasks give
partial credit for the problem, and contain additional restrictions and limits on
either input or output. Some problems do not even use discrete subtasks. In
these tasks, scoring is done on some scale which determines how “good” the
output produced by your program is.
17.1.1 Strategy
Very few contestants manage to solve every problem fully during an IOI
contest. There is a very high probability you are not one of them, which
leaves you with two options – you either skip a problem entirely, or you solve
some of its subtasks. At the start of the competition, you should read through
every problem and all of the subtasks. In the IOI you do not get extra points
for submitting faster. Thus, it does not matter if you read the problems at
the beginning instead of rushing to solve the first problem you read. Once
you have read all the subtasks, you will often see the solutions to some of
the subtasks immediately. Take note of the subtasks which you know how to
solve!
Deciding on which order you should solve subtasks in is probably one of the
most difficult parts of the IOI for contestants at or below the silver medal level.
In IOI 2016, the difference between receiving a gold medal and a silver medal
was a mere 3 points. On one of the problems, with subtasks worth 11, 23,
30 and 36 points, the first silver medalist solved the third subtask, worth 30
points (a submission that possibly was a failed attempt at 100 points). Most
competitors instead solved the first two subtasks, together worth 34 points. If
the contestant had solved the first two subtasks instead, he would have gotten
a gold medal.
17.1. IOI 315
The problem basically boils down to the question when should I solve subtasks
instead of focusing on a 100 point solution? There is no easy answer to this ques-
tion, due to the lack of information about the other contestants’ performances.
First of all, you need to get a good sense of how difficult a solution will be to
implement correctly before you attempt it. If you only have 30 minutes left
of a competition, it might not be a great idea to go for a 100 point solution
on a very tricky problem. Instead, you might want to focus on some of the
easier subtasks you have left on this or other problems. If you fail your 100
point solution which took over an hour to code, it is nice to know you did not
have some easy subtasks worth 30-60 points which could have given you a
medal.
Problems without discrete scoring (often called heuristic problems) are almost
always the hardest ones to get a full score on. These problems tend to be very
fun, and some contestants often spend way too much time on these problems.
They are treacherous in that it is often easy to increase your score by something.
However, those 30 minutes you spent to gain one additional point may have
been better spent coding a 15 point subtask on another problem. As a general
rule, go for the heuristic problem last during a competition. This does not
mean to skip the problem unless you completely solve the other two, just to
focus on them until you decide that the heuristic problem is worth more points
if given the remaining time.
In IOI, you are allowed to submit solution attempts a large number of times,
without any penalty. Use this opportunity! When submitting a solution, you
will generally be told the results of your submission on each of the secret test
cases. This provides you with much details. For example, you can get a sense
of how correct or wrong your algorithm is. If you only fail 1-2 cases, you
probably just have a minor bug, but your algorithm in general is probably
correct. You can also see if your algorithm is fast enough, since you will be
told the execution time of your program on the test cases. Whenever you
make a change to your code which you think affect correctness or speed –
submit it again! This gives you a sense of your progress, and also works as
a good regression test. If your change introduced more problems, you will
know.
Whenever your solution should pass a subtask, submit it. These subtask results
will help you catch bugs earlier when you have less code to debug.
316 CHAPTER 17. COMPETITIVE PROGRAMMING STRATEGY
The IOI usually tend to have pretty hard problems. Some areas get rather little
attention. For example, there are basically no pure implementation tasks and
very little geometry.
First and foremost, make sure you are familiar with all the content in the IOI
syllabus1 . This is an official document which details what areas are allowed
in IOI tasks. This book deals with most, if not all of the topics in the IOI
syllabus.
In the Swedish IOI team, most of the top performers tend to also be good
mathematical problem solvers (also getting IMO medals). Combinatorial
problems from mathematical competitions tend to be somewhat similar to
the algorithmic frame of mind, and can be good practice for the difficult IOI
problems.
When selecting problems to practice on, there are a large number of national
olympiads with great problems. The Croatian Open Competition in Informat-
ics2 is a good source. Their competitions are generally a bit easier than solving
IOI with full marks, but are good practice. Additionally, they have a final
round (the Croatian Olympiad in Informatics) which are of high quality and
difficulty. COCI publishes solutions for all of their contests. These solutions
help a lot in training.
One step up in difficulty from COCI is the Polish Olympiad in Informatics3 .
This is one of the most difficult European national olympiad published in
English, but unfortunately they do not publish solutions in English for their
competitions.
There are also many regional olympiads, such as the Baltic, Balkan, Central
European, Asia-Pacific Olympiads in Informatics. Their difficulty is often
higher than that of national olympiads, and of the same format as an IOI
contest (3 problems, 5 hours). These, and old IOI problems, are probably the
best sources of practice.
1 https://people.ksp.sk/~misof/ioi-syllabus/
2 http://hsin.hr/coci/
3 http://main.edu.pl/en/archive/oi
17.2. ICPC 317
17.2 ICPC
In ICPC, you compete in teams of three to solve about 10-12 problems during 5
hours. A twist in in the ICPC-style competitions is that the team shares a single
computer. This makes it a bit harder to prioritize tasks in ICPC competitions
than in IOI competitions. You will often have multiple problems ready to be
coded, and wait for the computer. In ICPC, you see the progress of every
other team as well, which gives you some suggestions on what to solve. As a
beginner or medium-level team, this means you will generally have a good
idea on what to solve next, since many better teams will have prioritized tasks
correctly for you.
ICPC scoring is based on two factors. First, teams are ranked by the number
of solved problems. As a tie breaker, the penalty time of the teams are used.
The penalty time of a single problem is the number of minutes into the contest
when your first successful attempt was submitted, plus a 20 minute penalty
for any rejected attempts. Your total penalty time is the sum of penalties for
every problem.
17.2.1 Strategy
really know how many problems you have to solve to get the position that
you want. Learning your own limits and practicing a lot as a team – especially
on difficult contests – will help you get a feeling for how likely you are to get
in all of your problems if you parallelize.
Read all the problems! You do not want to be in a situation where you run out
of time during a competition, just to discover there was some easy problem you
knew how to solve but never read the statement of. ICPC contests are made
more complex by the fact that you are three different persons, with different
skills and knowledge. Just because you can not solve a problem does not mean
your team mates will not find the problem trivial, have seen something similar
before or are just better at solving this kind of problem.
The scoreboard also displays failed attempts. If you see a problem where many
teams require extra attempts, be more careful in your coding. Maybe you can
perform some extra tests before submitting, or make a final read-through of
the problem and solution to make sure you did not miss any details.
If you get Wrong Answer, you may want to spend a few minutes to code up
your own test case generators. Prefer generators which create cases where you
already know the answers. Learning e.g. Python for this helps, since it usually
takes under a minute to code a reasonable complex input generator.
If you get Time Limit Exceeded, or even suspect time might be an issue –
code a test case generator. Losing a minute on testing your program on the
worst case, versus a risk of losing 20 minutes to penalty is be a trade-off worth
considering on some problems.
You are allowed to ask questions to the judges about ambiguities in the prob-
lems. Do this the moment you think something is ambiguous (judges generally
take a few valuable minutes in answering). Most of the time they give you a
“No comment” response, in which case the perceived ambiguity probably was
not one.
If neither you nor your team mates can find a bug in a rejected solution,
consider coding it again from scratch. Often, this can be done rather quickly
when you have already coded a solution.
320 CHAPTER 17. COMPETITIVE PROGRAMMING STRATEGY
• Practice a lot with your team. Having a good team dynamic and learning
what problems the other team members excel at can be the difference
that helps you to solve an extra problem during a contest.
• Learn to debug on paper. Wasting computer time for debugging means
not writing code! Whenever you submit a problem, print the code. This
can save you a few minutes in getting your print-outs when judging is
slow (in case your submission will need debugging). If your attempt
was rejected, you can study your code on paper to find bugs. If you fail
on the sample test cases and it takes more than a few minutes to fix, add
a few lines of debug output and print it as well (or display it on half the
computer screen).
• Learn to write code on paper while waiting for the computer. In particu-
lar, tricky subroutines and formulas are great to hammer out on paper
before occupying valuable computer time.
• Focus your practice on your weak areas. If you write buggy code, learn
your programming language better and code many complex solutions.
If your team is bad at geometry, practice geometry problems. If you get
stressed during contests, make sure you practice under time pressure.
eforces.com For example, Codeforceshas an excellent gym feature, where you can
compete retroactively in a contest using the same amount of time as in
the original contest. The scoreboard will then show the corresponding
scoreboard from the original contest during any given time.
Appendix A
Discrete Mathematics
This appendix reviews some basic discrete mathematics. Without a good grasp
on the foundations of mathematics, algorithmic problem solving is basically
impossible. When we analyze the efficiency of algorithms, we use sums,
recurrence relations and a bit of algebra. One of the most common topics of
algorithms is graph theory, an area within discrete mathematics. Without some
basic topics, such as set theory, some of the proofs – and even problems – in
this book will be hard to understand.
This mathematical preliminary touches lightly upon these topics and is meant
to complement a high school education in mathematics in preparation for the
remaining text. While you can probably get by with the mathematics from
this chapter, we highly recommend that you (at some point) delve deeper into
discrete mathematics.
We do assume that you are familiar with basic proof techniques, such as proofs
by induction or contradiction, and mathematics that is part of a pre-calculus
course (trigonometry, polynomials, etc). Some more mathematically advanced
parts of this book will go beyond these assumptions, but this is only the case
in very few places.
321
322 APPENDIX A. DISCRETE MATHEMATICS
A.1 Logic
To express such statements, a language has been developed where all these
logical operations such as existence, implication and so on have symbols
assigned to them. This enables us to remove the ambiguity inherent in the
English language, which is of utmost importance when dealing with the
exactness required by logic.
The disjunction (a is true or b is true) is a common logical connective. It is
given the symbol ∨, so that the above statement is written as a ∨ b. The other
common connective, the conjunction (a is true and b is true) is assigned the
symbol ∧. For example, we write that a ∧ b for the statement that both a and
b are true.
An implication is a statement of the form “if a is true, then b must also be true”.
This is a statement on its own, which is true whenever a is false (meaning it
does not say anything of b), or when a is true and b is true. We use the symbol
→ for this, writing the statement as a → b. The third statement would hence
be written as
x < 0 ↔ x3 < 0
Logical also contains quantifiers. The fifth statement, that every apple is blue,
actually makes a large number of statements – one for each apple. This concept
is captured using the universal quantifier ∀, read as “for every”. For example,
we could write the statement as
∀ apple a : a is blue
In the final statement, another quantifier was used, which speaks of the ex-
istence of something; the existential quantifier ∃, which we read as “there
exists”. We would write the second statement as
∃x : x is an integer
324 APPENDIX A. DISCRETE MATHEMATICS
The negation operator ¬ inverts a statement. The statement “no penguin can
fly” would thus be written as
or, equivalently
∀ penguin p : ¬p can fly
Exercise A.1
Write the following statements using the logical symbols, and determine
whether they are true or false:
1) If a and b are odd integers, a + b is an even integer,
2) a and b are odd integers if and only if a + b is an even integer,
3) Whenever it rains, the sun does not shine,
4) ab is 0 if and only if a or b is 0
Our treatment of logic ends here. Note that much is left unsaid – it is the most
rudimentary walk-through in this chapter. This section is mainly meant to
give you some familiarity with the basic symbols used in logic, since they
will appear later. If you wish to gain a better understanding of logic, you can
follow the references in the chapter notes.
Certain sets are used often enough to be assigned their own symbols:
• Z – the set of integers {. . . , −2, −1, 0, 1, 2, . . . },
• Z+ – the set of positive integers {1, 2, 3, . . . },
• N – the set of non-negative integers {0, 1, 2, . . . },
• Q – the set of all rational numbers { q
p
| p, q integers where q 6= 0},
• R – the set of all real numbers,
• [n] – the set of the first n positive integers {1, 2, . . . , n},
• ∅ – the empty set.
Exercise A.2
1) Use the set builder notation to describe the set of all odd integers.
2) Use the set builder notation to describe the set of all negative integers.
3) Compute the elements of the set {k | k is prime and k2 ≤ 30}.
example
{2, 3} ⊆ {2, 3, 5, 7}
and
2 −1
, 2, ⊆Q
4 7
For any set S, we have that ∅ ⊆ S and S ⊆ S. Whenever a set A is not a subset
of another set B, we write that A 6⊆ B. For example,
{2, π} 6⊆ Q
Exercise A.3
1) List all subsets of the set {1, 2, 3}.
2) How many subsets does a set containing n elements have?
3) Determine which of the following sets are subsets of each other:
• ∅
• Z
• Z+
• {2k | k ∈ Z}
• {2, 16, 12}
Sets also have many useful operations defined on them. The intersection A ∩ B
of two sets A and B is the set containing all the elements which are members
of both sets, i.e.,
x∈A∩B⇔x∈A∧x∈B
If the intersection of two sets is the empty set, we call the sets disjoint. A
similar concept is the union A ∪ B of A and B, defined as the set containing
those elements which are members of either set.
A.3. SUMS AND PRODUCTS 327
Then,
X ∩ Y = {4}
X∩Y∩Z=∅
X ∪ Y = {1, 2, 3, 4, 5, 6, 7}
X ∪ Z = {1, 2, 3, 4, 6, 7}
Exercise A.4
Compute the intersection and union of:
1) A = {1, 4, 2}, B = {4, 5, 6}
2) A = {a, b, c}, B = {d, e, f}
3) A = {apple, orange}, B = {pear, orange}
The most common mathematical expressions we deal with are sums of se-
quences of numbers, such as 1 + 2 + · · · + n. Such sums often have a variable
number of terms and complex summands, such as 1 · 3 · 5 + 3 · 5 · 7 + · · · + (2n +
1)(2n + 3)(2n + 5). In these cases, sums given in the form of a few leading and
trailing terms, with the remaining part hidden by . . . is too imprecise. Instead,
we use a special syntax for writing sums in a formal way – the sum operator:
328 APPENDIX A. DISCRETE MATHEMATICS
X
k
ai
i=j
Exercise A.5
Compute the sum
X
4
2·i−1
i=−2
Many useful sums have closed forms – expressions in which we do not need
sums of a variable number of terms.
Exercise A.6
Prove the following identities:
X
n
c = cn
i=1
Xn
n(n + 1)
i=
2
i=1
X
n
n(n + 12 )(n + 1)
i2 =
3
i=1
X
n
2i = 2n+1 − 1
i=0
A.4. GRAPHS 329
The sum of the inverses of the first n natural numbers happen to have a very
neat approximation, which we will occasionally make use of later on:
X
n
1
≈ ln n
n
i=1
Rl 1
This is a reasonable approximation, since 1 x
dx = ln(l)
Q
There is an analogous notation for products, using the product operator :
Y
k
ai
i=j
Exercise A.7
Prove that
Y
n
n Y
n+2 Y
n+2
(n + 2) i+ i= i
n+1
i=1 i=1 i=1
A.4 Graphs
Graphs are one of the most common objects of study in algorithmic problem
solving. They are an abstract way of representing various types of relations,
such as roads between cities, friendships, computer networks and so on. For-
mally, we say that a graph consists of two things – a set V of vertices, and a
set E of edges. An edge consists of a pair of vertices {u, v}, which are called the
endpoints of the edge. We say that the two endpoints of the edge are connected
by the edge.
330 APPENDIX A. DISCRETE MATHEMATICS
4 5
2 3
Graphs bring a large vocabulary with them. While you might find it hard to
remember all of them now, you will become intimately familiar with them
when studying graph theoretical algorithms later on.
A path is a sequence of distinct vertices p0 , p1 , ..., pl−1 , pl such that {pi , pi+1 } ∈
E (i = 0, 1, ..., l − 1). This means any two vertices on a path must be connected
by an edge. We say that this path has length l, since it consists of l edges. In
Figure A.1, the sequence 3, 1, 4, 2 is a path of length 3.
If we relax the constraint that a path must contain only distinct vertices, we
instead get a walk. A walk which only contain distinct edges is called a trail.
The graph in Figure A.1 contains the walk 1, 3, 1, 2 (which is not a trail, since
the edge {1, 3} is used twice). 3, 1, 4, 2, 1 on the other hand is a trail.
If a path additionally satisfy that {p0 , pl } ∈ E, we may append this edge to
make the path cyclical. This is called a cycle . Similarly, a walk with starts and
ends at the same vertex is called a closed walk. If a trail starts and ends at the
same vertex, we call it a closed trail.
A graph where any pair of vertices have a path between then is called a
connected graph. The (maximal) subsets of a graph which are connected form
the connected components of the graph. In Figure A.1, the graph consists of
two components, {1, 2, 3, 4} and {5}.
A tree is a special kind of graph – a connected graph which does not contain
any cycle. The graph in Figure A.1 is not a tree, since it contains the cycle
1, 2, 4, 1. The graph in Figure A.2 on the other hand, contains no cycle.
A.4. GRAPHS 331
2 3
Figure A.2: The tree given by V = {1, 2, 3, 4} and E = {{1, 2}, {3, 1}, {4, 1}}.
Exercise A.8
Prove that a tree of n vertices have exactly n − 1 edges.
So far, we have only considered the undirected graphs, in which an edge simply
connects two vertices. Often, we want to use graphs to model asymmetric
relations, in which an edge should be given a direction – it should go from a
vertex to another vertex. This is called a directed graph. In this case, we will
write edges as ordered pairs of vertices (u, v), where the edge goes from u to v.
When representing directed graphs graphically, edges will be arrows, with the
arrowhead pointing from u to v (Figure A.3).
3 4
Terms such as cycles, paths, trails and walks translate to directed graphs in a
natural way, except we must follow edges in their direction. Thus, a path is a
sequence of edges (p0 , p1 ), (p1 , p2 ), ..., (pl−1 , pl ). In the graph in Figure A.3,
332 APPENDIX A. DISCRETE MATHEMATICS
you may notice that there is no directed cycle. We call such a graph a directed
acyclic graph (DAG). This type of graph will be a recurring topic, so it is a
term important to be well acquainted with.
Another augmentation of a graph is the weighted graph. In a weighted graph,
each edge e ∈ E is assigned a weight w(e). This will often represent a length
or a cost of the edge. For example, when using a graph to model a network of
roads, we may associate each road {u, v} between two locations u and v with
the length of the road w({u, v}).
So far, we have only allowed E to be a set. Additionally, an edge has always
connected to different vertices (in contrast to a self-loop, an edge from a vertex
to itself). Such a graph is sometimes called a simple graph. Generally, our
examples of graphs will be simple graphs (even though most of our algorithms
can handle duplicate edges and self-loops). A graph where E may be a multiset,
or contain self-loops is called a multigraph.
Graph Theory [6] by Reinhard Diestel is widely acknowledged as the go-to book
on more advanced graph theory concepts. The author is freely available for
viewing at the book’s home page1 .
1 http://diestel-graph-theory.com/
334 APPENDIX A. DISCRETE MATHEMATICS
Bibliography
335
336 BIBLIOGRAPHY
337
338 INDEX
float, 28 operator, 30
flow network, 222 optimization problem, 123
or, 323
generate and test, 124 oracle, 98
graph, 329 order
of a permutation, 261
heap, 116 output description, 4
time complexity, 87
trail, 330
travelling salesman problem, 124
tree, 330
variable declaration, 23
vertex, 329
Visual Studio Code, 18
walk, 330
weighted graph, 332