Notes

ANALYSIS OF BOOLEAN FUNCTIONS
TOM SANDERS
1. Introduction
There are two sources of problems which motivate the ideas in this course: the first is
additive number theory, and the second is computer science. In number theory it is the
many and varied questions on the representation of integers which we shall want to keep
at the back of our minds.
It will be helpful to have a little notation. For sets of integers A and B we write
A + B := {a + b : a ∈ A, b ∈ B},
and then if k ∈ N we put
kA := A + · · · + A,
where the sum is k-fold, and
k.A := {ka : a ∈ A}.
Note that kA and k.A are very different beasts, for example 2N is the set of natural numbers
bigger than 1, whereas 2.N is the set of even natural numbers.
We turn to some examples.
Theorem (Lagrange’s theorem). Every non-negative integer can be written as the sum of
four squares, that is N0 = 4S where S := {0, 1, 4, 9, . . . }.
Conjecture (Goldbach’s conjecture). Every even integer bigger than 2 can be written as
the sum of two primes, that is 2.N \ {2} = 2P where P := {2, 3, 5, 7, 11, . . . }.
Theorem (Roth’s theorem). Every subset of the integers of positive relative density con-
tains three distinct elements in arithmetic progression.
The problems above involve showing the existence of something and often when we try
to do this it is helpful to show that there are many of that thing by counting. To this end
we introduce the notion of convolution: given sets of integers A and B we write
X
1A ∗ 1B (x) := 1A (z)1B (y)
z+y=x
and call it the convolution of 1A with 1B . What is important about convolution is that
A + B = supp 1A ∗ 1B := {x : 1A ∗ 1B (x) 6= 0},
the support of 1A ∗ 1B .
Last updated : 28th April, 2012.
1
2 TOM SANDERS
Focussing on the example of Lagrange’s theorem, we should like to show that

1S ∗ 1S ∗ 1S ∗ 1S (x) > 0 whenever x ∈ N0 .
In fact, it is easier to show that it is really quite large if x is large, that is to say one
develops an asymptotic for this four-fold convolution. The other two examples also have
expressions involving convolutions (and the inner product) and these can be analysed with
varying degrees of success.
Convolution requires nothing more than a group structure and, indeed, many of the
questions of additive number theory have formulations in general abelian groups. More
than this it turns out that there are much better behaved groups which are very good
models for Z: the dyadic groups. These groups are the focus of this course.
1.1. The dyadic groups. Throughout the course we shall write G for a finite dyadic
group, that is a group in which every element has order 2. It is an exercise to check that
any such G is isomorphic to (the additive group of) Fn2 for some n, where F2 is the field with
two elements. In view of this we shall often put G := Fn2 , and tend to use the languages
of vector spaces for discussing these groups, so that subgroup are (vector) subspaces and
cosets of subgroups are affine subspaces
One of the key ideas of harmonic analysis is to analyse a group G through the space of
functions on G and to this end we introduce the Lebesgue spaces.
1.2. Lebesgue spaces. Given a finite set X there are two classes of Lebesgue spaces
which we shall be interested in corresponding to two natural measures on X. The first is
counting measure δX defined by
δX (A) := |A| for all A ⊂ X;
the second is normalised1 counting measure µX defined by
|A|
µX (A) = for all A ⊂ X.
|X|
For obvious reasons we call δX (A) the size of the set A, and µX (A) the density of the set
A. Note that for a set A ⊂ X the measure µA can be decomposed as
1 X
µA = δ{a}
|A| a∈A
and as the measure induced by the map
Z
1 1A
f 7→ f 1A dµX , or equivalently µA = µX .
µX (A) µX (A)
1Normalised here refers to the fact that the integral is 1; it isR normalised toR have norm 1 with respect
to the natural norm on measures µ ∈ M (X) defined by kµk := d|µ| := sup{ f dµ : kf kL∞ (X) 6 1}. In
particular it does not refer to the L2 -norm, which, to the extent that it makes sense, has kµX k2L2 (X) =
|X|−1 .
ANALYSIS OF BOOLEAN FUNCTIONS 3
The Lebesgue spaces of interest are Lp (X), the space of real valued functions on X with
norm defined by
Z 1/p !1/p
1 X
kf kLp (X) := |f |p dµX = |f (x)|p ,
|X| x∈X
and `p (X), the same space of functions with the norm

Z 1/p !1/p
X
kf k`p (X) := |f |p dδX = |f (x)|p .
x∈G
The Lebesgue spaces satisfy a useful nesting of norms property:

kf kLp (X) 6 kf kLq (X) whenever p 6 q
and
kf k`p (X) 6 kf k`q (X) whenever p > q.
We take the usual convention for p = ∞ that
kf kL∞ (X) = kf k`∞ (X) = sup |f (x)| = max |f (x)|,
x∈X x∈X
and so kf kLp (X) tends to kf kL∞ (X) from below as p → ∞ and kf k`p (X) tends to kf k`∞ (X)
from above in the same limit.
When p = 2 the spaces are, of course, Hilbert spaces so that they have an inner product
and we write these
Z
1 X
hf, giL2 (X) := f gdµX = f (x)g(x) for all f, g ∈ L2 (X),
|X| x∈X
and Z X
hf, gi`2 (X) := f gdδX = f (x)g(x) for all f, g ∈ `2 (X).
x∈X
Finally we have Hölder’s inequality that

hf, giL2 (X) 6 kf kLp (X) kgkLq (X) for all f ∈ Lp (X), g ∈ Lq (X)
and
hf, gi`2 (X) 6 kf k`p (X) kgk`q (X) for all f ∈ `p (X), g ∈ `q (X)
whenever p−1 + q −1 = 1. It is easy to check that these inequalities are sharp by considering
δ-functions.
The pair (p, q) is called a pair of conjugate indices, and the case p = q = 2 is the
Cauchy-Schwarz inequality.
4 TOM SANDERS
1.3. Convolution. Suppose that G := Fn2 and f, g ∈ L1 (G). Then we define the convolu-
tion of f and g with a slightly different normalisation to before:
Z
1 X
f ∗ g(x) := f (x − y)g(y)dµG (y) = f (z)g(y) for all x ∈ G,
|G| z+y=x
and so if µ and ν are measures on G then
Z
µ ∗ ν(E) = 1E (x + y)dµ(x)dν(y) for all E ⊂ G.
Since our groups are always finite we often make the abuse of writing µ(x) for µ({x}).
We shall frequently find ourselves changing the order of integration (really summation),
and here we get that Z Z Z
f ∗ gdµG = f dµG gdµG .
It turns out that with absolute value signs this becomes a special case of Young’s inequality.
In general by Young’s inequality we shall mean the statement
kf ∗ gkLr (G) 6 kf kLp (G) kgkLq (G) for all f ∈ Lp (G) and g ∈ Lq (G)
for a triple p, q, r provided 1 + r−1 = p−1 + q −1 . Of particular interest are the cases
p, q = 2 and r = ∞ which encodes the idea that the convolution of two functions in L2 is
‘continuous’, and p, q, r = 1 which tells us that L1 is ‘closed under convolution’.
As a check of understanding it may be helpful to note that
Z
f ∗ f (0G ) = f (−x)f (x)dµG (x) = kf k2L2 (G)
since −x = x in G. It follows that Young’s inequality certainly can’t be any better than
this for r = ∞ and p, q = 2 and, in fact, it is relatively easy to see that it is tight by
considering δ-functions.
As before the crucial identity for us is that
A + B = supp 1A ∗ 1B .
Indeed, it is far easier to analyse the function 1A ∗ 1B than 1A+B because the former is far
smoother: it is literally an average of 1A over translates of B. Indeed, there is a useful
maxim here that the more times you convolve the smoother a function becomes.
It will be instructive to bear some examples in mind.
Example (Convolution of subspaces). Suppose that W = x + V is an affine subspace with
vector subspace V . We see immediately that supp 1W ∗ 1W = W + W = V (since we are
in characteristic 2) and if y ∈ V then
Z Z
1W ∗ 1W (y) = 1W (y − z)1W (z)dµG (z) = 1W (−z)1W (z)dµG (z) = µG (W ),
so
1W ∗ 1W = µG (W )1V .
Example (Convolution of random sets). Suppose that x ∈ G is placed in the set A

independently with probability α. Then EµG (A) = α and
(
α2 if y 6= 0G
Z
E1A ∗ 1A (y) = E1A (y − x)1A (x)dµG (x) =
α otherwise,
so we expect it to be very likely that A + A is essentially the whole of G provided α is not
too small. In particular,
Var(1A ∗ 1A (x)) = α2 (1 − α2 )/|G| if x 6= 0G ,
and so by the central limit theorem we expect
r
α2 |G|
Z −
2 2 1 4(1−α2 )
P(1A ∗ 1A (x) − α < −α /2) ∼ √ exp(−u2 /2)du
2π −∞
p
= Oα ( |G| exp(−|G|α2 /8(1 − α2 )))
for each x 6= 0G . If x 6∈ A+A then 1A ∗1A (x) = 0 and so (certainly) 1A ∗1A (x)−α2 < −α2 /2.
Thus the expected number of such x is Oα (|G|3/2 exp(−Ωα (|G|))). This is much less than
1 if |G| is large (and α is not too small) which leads to the conclusion.
Example (Convolution as a sum of random variables). Suppose that A ⊂ G and X and
Y are two independent A-valued uniform random variables, so that
(
1
if x ∈ A
P(X = x) = |A|
0 otherwise,
i.e. P(X = x) = µA (x) and similarly for Y . Then Z = X + Y has
P(Z = a) = µA ∗ µA (x);
the law of the sum of two independent random variables is the convolution of their laws. In
general if we sample uniformly and independently at random k times from the set A then
the probability that the sum of the samples is x is µA ∗ · · · ∗ µA (x) where the convolution
is k-fold.
Example (Convolution as a measure of relative density). Suppose that X, A ⊂ G. Then
1A ∗ µX (x) is the relative density of A on the set x + X, that is the number of points in
A ∩ (x + X) divided by the number of points in x + X. To see this note that
Z
1A ∗ µX (x) = 1A (y)dµX (x + y)
Z
1X (x + y)
= 1A (y) dµG (x + y)
µG (X)
Z
1
= 1A (y)1x+X (y)dµG (x + y).
µG (X)
6 TOM SANDERS
But then by translation invariance of µG we have

Z Z
1 1
1A (y)1x+X (y)dµG (x + y) = 1A (y)1x+X (y)dµG (y)
µG (X) µG (X)
µG (A ∩ (x + X)) |A ∩ (x + X)|
= = .
µG (X) |x + X|
We shall frequently use this in the case when X = V for some subspace V 6 G. In this
case 1A ∗ µV (x) is the relative density of A on the coset x + V .
More than this, if v ∈ V then x+v+V = x+V , so we see that 1A ∗µV (x+v) = 1A ∗µV (x),
and hence 1A ∗ µV is constant on cosets of V .
The case of highly structured sets such as subspaces, and random-like sets (such as
random sets!) will form a dichotomy which will pervade our work; to quantify the notion
of being random-like we introduce the Fourier transform.
1.4. The Fourier transform. For G := Fn2 we write G

b for the collection of characters on
G, that is maps of the form
x 7→ (−1)r.x where r.x := r1 x1 + · · · + rn xn
and r ∈ Fn2 . Characters can be added via the slightly confusing formula
(γ + γ 0 )(x) := γ(x)γ 0 (x) for all x ∈ G
and form a group which is (non-canonically) isomorphic to G. They are easily seen to be
homomorphisms from G to {−1, 1} under multiplication and remarkably they turn out to
be an orthonormal basis of L2 (G). To see that they are pair-wise orthogonal we note that
Z
0
hγ, γ iL2 (G) = γ(x)γ 0 (x)dµG (x)
Z
= γ(y)γ (y) γ(x)γ 0 (x)dµG (x) = γ(y)γ 0 (y)hγ, γ 0 iL2 (G) .
0
Hence, either γ(y)γ 0 (y) = 1 for all y ∈ G, whereupon γ = γ 0 and we have that kγkL2 (G) = 1;
or γ(y)γ 0 (y) = −1 for some y and we conclude that the inner product is 0. We write this
formally as
(
1 if γ = γ 0
hγ, γ 0 iL2 (G) =
0 otherwise.
The characters then form a basis of L2 (G) because they are orthogonal (and so linearly
independent) and there are |G| of them which is the dimension of L2 (G). Since they form
an orthonormal basis we define the Fourier transform to be the map taking f ∈ L1 (G) to
fb ∈ `∞ (G)
b determined by
Z
fb(γ) := hf, γiL2 (G) = f (x)γ(x)dµG (x),
and so that it is completely clear if µ is a measure on G then

Z
µb(γ) := γ(x)dµ(x).
It is easy to see by the triangle inequality that we have the Hausdorff-Young inequality:
1
kfbk`∞ (G)
b 6 kf kL1 (G) for all f ∈ L (G).
Since G
b is an orthonormal basis we have the Fourier inversion formula:
X
f (x) = fb(γ)γ(x) for all x ∈ G.
γ∈G
b
More than this the change of basis is unitary and so we have Plancherel’s theorem:
2
hf, giL2 (G) = hfb, gbi`2 (G)
b for all f, g ∈ L (G),
and the special case when f = g, called Parseval’s theorem:

2
kf kL2 (G) = kfbk`2 (G)
b for all f ∈ L (G).
The Fourier transform is so useful because it is an (essentially unique) change of basis

which simultaneously diagonalises all convolution operators. Specifically, given f ∈ L1 (G),
we get a linear operator
L2 (G) → L2 (G); g 7→ f ∗ g.
This operator is diagonalised by the Fourier basis:
Z Z
f ∗ γ(y) = f (y − x)γ(x)dµG (x) = γ(y) f (z)γ(z)dµG (z) = fb(γ)γ(y)
by the change of variables z = y − x. Thus,

X
f ∗g = fb(γ)b
g (γ)γ,
γ∈G
b
∗ g = fb · gb.
so that f[
Example (Annihilators and the Fourier transform of subspaces). Given A ⊂ G we write
A⊥ for the annihilator of A, that is the set {γ ∈ Gb : γ(x) = 1 for all x ∈ A}, and similarly
if Γ ⊂ G b we write Γ⊥ for the annihilator of Γ, the set {x ∈ G : γ(x) = 1 for all γ ∈ Γ}.
It is immediate that annihilators are subspaces and that if V 6 G then V ⊂ (V ⊥ )⊥ ; in
fact we have equality as we shall see shortly.
From our calculation on the convolution of indicator functions of subspaces we see that
2
1cV = µG (V )1V , whence 1V (γ) takes only the values 0 and µG (V ). On the other hand, if
c c
⊥
1cV (γ) = µG (V ) then γ ∈ V and conversely so we have
(
µG (V ) if γ ∈ V ⊥
1c
V (γ) =
0 otherwise.
8 TOM SANDERS
It follows by Parseval’s theorem that

1 X 1
µG (V )|V ⊥ | = |1c 2
V (γ)| = .µG (V ) = 1.
µG (V ) ⊥
µG (V )
γ∈V
It follows from this that µG ((V ⊥ )⊥ ) = µG (V ) and hence that V = (V ⊥ )⊥ . Finally, the
co-dimension of V is the dimension of G/V , that is n − dim V and so from the previous
dim V ⊥ = cod V.
Example (An uncertainty principle). By Hölder’s inequality, the Hausdorff-Young in-
equality and Parseval’s theorem we have that any function with unit L2 -norm has
kf kL1 (G) kfbk 1 b > kfbk ∞ b kfbk 1 b > kfbk22 b = kf k2 2 = 1.
` (G) ` (G) ` (G) ` (G) L (G)
It follows that a function cannot both have concentrated support on G (physical space)
and G b (momentum space). In particular, by Cauchy-Schwarz we have
kf kL1 (G) 6 µG (supp f )1/2 kf kL2 (G) = µG (supp f )1/2
and similarly kfbk`1 (G) b 1/2 . Thus,
b 6 | supp f |
µG (supp f )| supp fb| > 1

and we see from the preceding example that equality can be achieved when f is a scalar
multiple of an affine subspace.
2. Subspaces, sumsets and counting solutions to equations

Our first result along the theme of the course shows that by adding a set to itself a few
times we can ensure that the resulting set contains a large algebraic structure.
To get a grip on the problem notice that if A is a subspace of density α then 4A is
also a subspace of density α (and so co-dimension log2 α−1 ); whereas if A is a random
set of density about α then in all likelihood 4A is the whole of G (that is a subspace of
co-dimension 0).
Theorem 2.1 (Bogolyubov’s lemma). Suppose that G := Fn2 and A, B ⊂ G have density
α, β > 0. Then 2(A − B) contains a subspace of co-dimension O(α−1 β −1 ).
Proof. We shall examine the convolution convolution of 1A ∗ 1B with itself:
X
2 c 2
1A ∗ 1B ∗ 1A ∗ 1B (x) = |1c
A (γ)| |1B (γ)| γ(x).
γ∈G
b
We separate into those characters supporting large and small values of the Fourier trans-
form:
L := {γ ∈ G b : |1c
B (γ)| > β}.
Then Parseval’s theorem gives an upper bound on L:
X
|L|(β)2 6 |1c 2 c 2 b = k1B k2L2 (G) = β,
B (γ)| 6 k1B k`2 (G)
γ∈L
so |L| 6 −2 β −1 . We put V := L⊥ and note that the bound on L implies that the co-
dimension of V is at most −2 β −1 . On the other hand if x ∈ V then γ(x) = 1 for all γ ∈ L
and so we have that
X
2 c 2 c b )|2 |1c 2 2 2
|1c
A (γ)| |1B (γ)| γ(x) > |1A (0G B (0G
b )| = α β
γ∈L
since 1c
A (0G
c b ) = β and hence 0 b ∈ L. On the other hand by the triangle
b ) = α, and 1B (0G G
inequality and Parseval’s theorem we have

X X
2 c 2 2 2
|1 (γ)| |1 (γ)| γ(x) 6 sup |1c
B (γ)| |1c
A (γ)|

A B
c
γ6∈L
γ6∈L γ∈G
b
6 (β)2 k1A k2L2 (G) = 2 β 2 α.

p
Thus, if := α/2 then we see that for all x ∈ V ⊥ we have 1A ∗ 1B ∗ 1A ∗ 1B (x) > α2 β 2 /2.
The result is proved.
This lemma is typical of the sort of results in the course. It inspires the definition of the
spectrum of a function: suppose that f ∈ L2 (G) and ∈ (0, 1]. Then we write
Spec (f ) := {γ ∈ G
b : |fb(γ)| > kf kL1 (G) },
for the -spectrum of f . Note that if f is non-negative then 0Gb is in Spec (f ) for all
∈ (0, 1], and it makes no sense to consider > 1 since the set is invariable empty by the
Hausdorff-Young inequality.
An essential ingredient of Bogolyubov’s lemma was the so called Parseval bound on the
size of the spectrum which lets us project out large Fourier coefficients. In particular we
implicitly proved the following lemma.
Lemma 2.2. Suppose that B ⊂ G has density β and ∈ (0, 1]. Then cod Spec (1B )⊥ =
O(−2 β −1 ) and
sup |1c
B (γ)| 6 β.
γ6∈(Spec (1B )⊥ )⊥
This may be seen as giving us a low complexity approximation to 1B . In particular, we

get a subspace V of controlled co-dimension such that 1B ≈ 1B ∗ µV in a certain norm. It
is not, however, clear why this norm should be useful.
Suppose that we have a set A ⊂ G and wish to count the number of sums in A, that is
triples (x, y, z) ∈ A3 such that x + y = z. We write the density of such as
X
3
T (A) := h1A ∗ 1A , 1A iL2 (G) = 1c
A (γ) .
γ∈G
b
As usual 1c b ) = α, and if supγ6=0 b |1A (γ)| 6 α then

A (0G
c
G
X
T (A) > α3 − α |1c 2
A (γ)| .
γ∈G
b
10 TOM SANDERS
It follows that T (A) > α3 /2 if 6 α/2. We think of α3 as being the ‘expected’ number
of solutions to x + y = z in a random set of density α; supγ6=0Gb |1cA (γ)| measures how far
from being random we are.
At the other end of the spectrum from appearing random we have subspaces. Of course,
if A is a vector subspace then T (A) = α2 . However, if A is an affine subspace that is not a
vector subspace then T (A) = 0 and so we see that T (A) is not always large. Despite this
we shall prove the following theorem.
Theorem 2.3 (Arithmetic removal lemma). Suppose that A ⊂ G, and that if A0 ⊂ A has
T (A0 ) = 0 then µG (A \ A0 ) > . Then T (A) = Ω (1).
This result and the approach we take to it was first developed by Green in [Gre05].
It will be useful to begin by introducing a more general tri-linear form based on T : for
functions f0 , f1 , f2 on G we put
T (f0 , f1 , f2 ) := hf0 ∗ f1 , f2 iL2 (G) ,
so that T (A) = T (1A , 1A , 1A ). Importantly we have the following lemma for governing
the behaviour of T which captures the content of our earlier argument for sets behaving
‘randomly’.
Lemma 2.4. Suppose that f0 , f1 , f2 are functions on G. Then
|T (f0 , f1 , f2 )| 6 kfbi k ∞ b kfj kL2 (G) kfk kL2 (G) 6 kfi kL1 (G) kfj kL2 (G) kfk kL2 (G) .
` (G)
for any permutation {i, j, k} of {0, 1, 2}.

Proof. Notice that the second inequality is a consequence of the first and the Hausdorff-
Young inequality. Otherwise, by Fourier inversion we have
X X
T (f0 , f1 , f2 ) = fb0 (γ)fb1 (γ)fb2 (γ) = fbi (γ)fbj (γ)fbk (γ)
γ∈G
b γ∈G
b
for any permutation {i, j, k} of {0, 1, 2}. We apply Hölder’s inequality and Cauchy-Schwarz
to this to see that
 1/2  1/2
X X
|T (f0 , f1 , f2 )| 6 sup |fbi (γ)|  |fbj (γ)|2   |fbk (γ)|2  .
γ∈G
b
γ∈G
b γ∈G
b
The result now follows by Parseval’s theorem.

For this to be useful we require a scale on which we can control the uniformity of a
function and this is provided to us by the following arithmetic regularity lemma.
Proposition 2.5 (Arithmetic regularity lemma). Suppose that A ⊂ G has density α,
B ⊂ G and δ, η ∈ (0, 1]. Then there are subspaces V 0 6 V 6 G with cod V 0 = Oδ,η (1) such
that
k1A ∗ µV − 1A ∗ µV 0 k2L2 (G) 6 δα and sup |1c
B (γ)| 6 ηµG (V ).
γ6∈V 0⊥
Proof. We define a sequence of subspaces iteratively letting V0 = G, and assuming we have

defined Vi we let U be the subspace of co-dimension O(η −2 µG (Vi )−2 ) provided by Lemma
2.2 (with parameter min{1, ηµG (Vi )β −1 }), and Vi+1 := Vi ∩ U so that
sup |1c
B (γ)| 6 sup |1B (γ)| 6 ηµG (Vi ).
c
⊥
γ6∈Vi+1 γ6∈U ⊥
By Parseval’s theorem and the fact that convolution goes to multiplication we have
X
k1A ∗ µVi − 1A ∗ µVi+1 k2L2 (G) = |1c 2
A (γ)| .
⊥ \V ⊥
γ∈Vi+1 i
⊥
However, the sets (Vi+1 \ Vi⊥ )i are disjoint so it follows by averaging that there is some
i = O(η −1 ) such that
X
k1A ∗ µVi − 1A ∗ µVi+1 k2L2 (G) 6 η |1c 2
A (γ)| = ηα.
γ∈G
b
The result follows on setting V := Vi and V 0 := Vi+1 .

Notice that the proof leads to a very poor bound on cod V 0 . Indeed, it is a tower of
−O(1)
η s of height δ −1 .
We can now prove the arithmetic removal lemma.
Proof of Theorem 2.3. We apply the regularity lemma with the sets A and B := A to get
subspaces V 0 6 V 6 G with cod V 0 = Oδ,η (1) and
k1A ∗ µV − 1A ∗ µV 0 k2L2 (G) 6 δα and sup |1c
A (γ)| 6 ηµG (V ).
γ6∈V 0⊥
Let
A1 := {x ∈ A : 1A ∗ µV (x) 6 /4}
and
A2 := {x ∈ A : |1A ∗ µV 0 − 1A ∗ µV (x)|2 ∗ µV (x) > 4δ−1 }.
We want to get upper bounds for the sizes of the sets A1 and A2 . For A1 , first note that
A01 := {x ∈ G : 1A ∗ µV (x) 6 /4}
is invariant under translation by elements of V , so 1A01 ∗ µV = 1A01 . Now
Z Z
µG (A1 ) = 1A01 1A dµG = 1A01 ∗ µV 1A dµG
1V (x − y)
Z Z
= 1A01 (y) dµG (y)1A (x)dµG (x)
µG (V )
1V (y − x)
Z Z
= 1A (x) dµG (x)1A01 (y)dµG (y)
µG (V )
Z
= 1A ∗ µV 1A01 dµG 6 /4
where the last inequality is since 1A ∗ µV (x) 6 /4 on A01 .

12 TOM SANDERS
For A2 note that

Z
−1
µG (A2 )4δ 6 |1A ∗ µV 0 − 1A ∗ µV |2 ∗ µV (x)dµG (x)
1V (x − y)
Z Z
= |1A ∗ µV 0 − 1A ∗ µV |2 (y) dµG (y)dµG (x)
µG (V )
1V (x − y)
Z Z
2
= |1A ∗ µV − 1A ∗ µV | (y)
0 dµG (x)dµG (y)
µG (V )
Z
= |1A ∗ µV 0 − 1A ∗ µV |2 (y)dµG (y)
= k1A ∗ µV 0 − 1A ∗ µV k2L2 (G) 6 δα,
whence µG (A2 ) 6 /4. It follows that A0 := A \ (A1 ∪ A2 ) has T (A0 ) 6= 0, and so there is a
triple (x0 , x1 , x2 ) with x0 + x1 = x2 and
1A ∗ µV (xi ) > /4 and |1A ∗ µ0V − 1A ∗ µV (xi )|2 ∗ µV (xi ) 6 4δ−1
for all i ∈ {0, 1, 2}. We put Si := A ∩ (xi + V ), so that
µG (Si ) = 1A ∗ µV (xi )µG (V ),
and put fi := (1A − 1A ∗ µV )|xi +V and gi := (1A − 1A ∗ µV 0 )|xi +V . By Cauchy-Schwarz we

have
(2.1) kfi − gi k2L1 (G) 6 µG (V )kfi − gi k2L2 (G)
since supp(fi − gi ) ⊂ xi + V . However,

Z
2
kfi − gi kL2 (G) = |1A ∗ µV 0 (y) − 1A ∗ µV (y)|2 1xi +V (y)dµG (y)
1V (xi − y)
Z
= µG (V ) |1A ∗ µV 0 (y) − 1A ∗ µV (y)|2 dµG (y).
µG (V )
Now 1A ∗ µV (y) = 1A ∗ µV (xi ) if y ∈ xi + V , whence
1V (xi − y)
Z
kfi − gi kL2 (G) = µG (V ) |1A ∗ µV 0 (y) − 1A ∗ µV (xi )|2
2
dµG (y)
µG (V )
= µG (V )|1A ∗ µV 0 − 1A ∗ µV (xi )|2 ∗ µV (xi ).
Combining this with (2.1) we get that
kfi − gi k2L1 (G) 6 µG (V )2 |1A ∗ µV 0 − 1A ∗ µV (xi )|2 ∗ µV (xi ) 6 4δ−1 µG (V )2 .

Finally,

Z
X 0 0

kb
gi k`∞ (G)
b = sup 1c
A (γ ) γ (x)1xi +V (x)γ(x)dµG (x)
γ∈G
b 0
γ 6∈V 0⊥

X 0 0 0

= sup 1c
A (γ )γ(xi )γ (xi )µG (xi + V )1V ⊥ (γ + γ ) 6 ηµG (V )

γ∈G γ 0 6∈V 0⊥
b
by Fourier inversion.
Now we examine
2
Y
2
T (1S0 , 1S1 , 1S2 ) = µG (V ) 1A ∗ µV (xi ) + 1A ∗ µV (x1 )T (f0 , 1V , 1S2 ) + T (1S0 , f1 , 1S2 ).
i=0
The first of these terms is at least (/4)3 µG (V )2 ; the second has

|T (f0 , 1V , 1S2 )| 6 |T (f0 − g0 , 1V , 1S2 )| + |T (g0 , 1V , 1S2 )|
6 (kf0 − g0 kL1 (G) + ηµG (V ))k1V kL2 (G) k1S2 kL2 (G)
6 (2δ 1/2 −1/2 + η)µG (V )2
by Lemma 2.4; and similarly for the third. We conclude that
T (A) > ((/4)3 − 4δ 1/2 −1/2 − 2η)µG (V )2
and the result follows on suitable choice of δ and η.
The bound resulting from this proof shows that T (A) is at least the reciprocal of a tower
of 2s of height −O(1) . It is possible (see the work of Fox [Fox11]) to make this a tower of
height O(log −1 ), but nothing better is known.
3. Sums of independent random variables

The setting G := Fn2 affords a remarkable unification of the algebraic and statistical
notions of independence. Thinking of G b as another vector space over F2 , a set Λ ⊂ G
b is
algebraically independent if
X
σγ .γ = 0Gb and σ : Λ → F2 iff σ ≡ 0.
γ∈Λ
However, the elements of Gb can also be thought of as random variables on G with underlying
probability measure µG . A set Λ ⊂ G b is statistically independent if
Y
µG ({x : γ(x) = zγ for all γ ∈ Λ0 }) = µG ({x : γ(x) = zγ })
γ∈Λ0
for all Λ0 ⊂ Λ and z : Λ0 → {−1, 1}.

Theorem 3.1. Suppose that Λ ⊂ G.
b Then Λ is algebraically independent iff it is statisti-
cally independent.
14 TOM SANDERS
Proof. Since the characters γ are homomorphisms we see that any y, z ∈ {x : γ(x) =
zγ for all γ ∈ Λ0 } have γ(y − z) = 1 for all γ ∈ Λ0 and so the set is just a translate of the
annihilator of Λ0 . Thus Λ is statistically independent iff
Y
µG (Λ0⊥ ) = µG ({γ}⊥ ) for all Λ0 ⊂ Λ.
γ∈Λ0
Now, if Λ is algebraically independent then none of the γs in Λ are identically 1 and so

µG ({γ}⊥ ) = 1/2 for all γ ∈ Λ. On the other hand (Λ0⊥ )⊥ is the subspace generated by Λ0
which has size 2d since Λ0 is independent. Hence µG (Λ0⊥ ) = µG (((Λ0⊥ )⊥ )⊥ ) = 2−d and we
see that Λ is statistically independent.
On the other hand if Λ is statistically independent then by a similar argument the
subspace generated by Λ has size 2|Λ| and hence is |Λ|-dimensional. It follows that Λ is
algebraically independent.
In view of the above theorem we shall treat sums of algebraically independent characters
as sums of independent random variables, and hence the central limit theorem will provide
a useful heuristic. This asserts that if Λ is an independent set of characters then
Z η
X √ 1
µG (x : λ(x) 6 η n) ∼ √ exp(−x2 /2)dx
λ∈Λ
2π −∞
as n → ∞. We should like to use this to estimate the probability that the sum when
η is a large negative and show that the probability is very small. However, the rate of
convergence in the central limit theorem is not very rapid, and necessarily so when η is
close to zero. Since this is not our range of interest we shall formulate some other, rather
simpler tools, for dealing with this range.
If one is interested in estimating the probability that a function (on G) takes large values
then one naturally turns to the higher moments of that function. To this end we have the
following simple inequality which may also be found in [Rud90].
Proposition 3.2 (Rudin’s inequality). Suppose that Λ ⊂ G b is independent and p ∈ [2, ∞).
Then
X √
k f (γ)γkLp (G) = O( pkf k`2 (Λ) ) for all f ∈ `2 (Λ).
γ∈Λ
Proof. By nesting of norms the result follows if we can show it for p an even integer, say
2k. In this case we can multiply out the left hand side:
X X Y 2k 2k
Z Y
2k
k f (γ)γkL2k (G) = f (γi ) γi dµG
γ∈Λ γ1 ,...,γ2k ∈Λ i=1 i=1
X 2k
Y
= f (γi ).
γ1 ,...,γ2k ∈Λ i=1
γ1 +···+γ2k =0 b
G
Since Λ is independent it follows that for each summand there is a set I ⊂ {1, . . . , 2k} of
size k and bijection φQ: I → {1, . . . , 2k} \ I such that γi = γφ(i) for all i ∈ I. In this case
the summand is just i∈I |f (γi )|2 , and so
X 2k
Y X X Y
f (γi ) 6 |f (γi )|2
γ1 ,...,γ2k ∈Λ i=1 I,φ γi ∈Λ for all i∈I i∈I
γ1 +···+γ2k =0 b
G
!k
X X
2 2k
= |fb(γ)| 6 k!kf k2k
`2 (Λ) .
I,φ γ∈Λ
k
The result follows.
Although we call this Rudin’s inequality, it is properly called Kintchine’s inequality as

established by Kintchine and Littlewood in the early 1920s. We use the name Rudin’s
inequality because in the more general case of arbitrary finite abelian groups that is the
result one uses and it is the result to which modern literature points.
We shall find the dual formulation of this particularly useful.
Proposition 3.3. Suppose that Λ ⊂ G b is independent and p ∈ (1, 2]. Then

r
p
kfbk`2 (Λ) = O kf kLp (G) for all f ∈ Lp (G).
p−1
Proof. The argument is by duality. Suppose that f ∈ Lp (G) and put
X
g := fb(λ)λ.
λ∈Λ
Then by Plancherel’s theorem and Hölder’s inequality we have
kfbk2`2 (Λ) = hfb, fbi`2 (Λ) = hfb, gbi`2 (G)

b = hf, giL2 (G) 6 kf kLp (G) kgkLp0 (G)
where p0 is conjugate to p. We then apply Rudin’s inequality to see that

r
p p
kgkLp0 (G) = O( p0 kgk`2 (Λ) ) = O kf k`2 (Λ)
p−1
and the result follows on dividing out by kf k`2 (Λ) .
As a simple corollary of this we can refine Lemma 2.2, improving the bound on the
co-dimension of the subspace there from O(−2 β −1 ) to O(−2 log β −1 ).
Corollary 3.4 (Chang’s theorem, [Cha02]). Suppose that B ⊂ G has density β and ∈
(0, 1]. Then
cod Spec (1B )⊥ = O(−2 log β −1 ).
16 TOM SANDERS
Proof. Let Λ ⊂ Spec (1B ) be a maximal algebraically independent subset. Then we have
that Spec (1B )⊥ = Λ⊥ , and cod Spec (1B )⊥ = |Λ|. On the other hand

2 2
X
2 p 2 p 2/p
β |Λ| 6 |1c
B (λ)| = O k1B kLp (G) = O β
λ∈Λ
p−1 p−1
by the dual of Rudin’s inequality. Setting p = 1 + 1/ log β −1 then gives the result.
Chang’s theorem can be easily used to refine Bogolyubov’s lemma (Theorem 2.1), but
we shall use it to look at a harder problem: two-fold sumsets.
3.5. Application: Subspaces in sumsets. We have seen that if A has density α then
4A contains a subspace of co-dimension Oα (1). On the other hand a random set shows
that A itself need not contain a large subspace. What happens in between?
If A is highly structured or highly random then A+A contains a large subspace. However
there is an example due to Ruzsa [Ruz91] adapted to the model setting by Green [Gre02b]
which indicates some limitations.
Example (Niveau set construction). Let G := Fn2 and
√
A := {x ∈ G : x has at least n/2 + η n/2 ones in it.}.
Then Z ∞
1 1
µG (A) ∼ √ exp(−x2 /2)dx > − O(η),
2π η 2
by the central limit theorem so we think of A as√having density close to 1/2. On the other
hand, if x, y ∈ A then x + y has at most n − 2η n ones in it.
Now suppose that W is an affine subspace, say a coset of some linear subspace V , and
cod V 6 d. Then it is an exercise in linear algebra to show that W contains a vector with
at least n−d√ones in it, and it follows that A+A cannot contain a subspace of co-dimension
less than 2η n + 1.
Example (Sumsets of very large sets). Suppose that A ⊂ G has µG (A) > 1/2. Then
1A ∗ 1A (x) = µG (A ∩ (x + A))
= 2µG (A) − µG (A ∪ (x + A)) > 2µG (A) − 1 > 0
for all x ∈ G. It follows that A + A = G and so the sumset contains a subspace of
co-dimension 0.
Complementing this we have the following result due to Green. It is formally proved in
[Gre02b], although the method is that of [Gre02a].
Theorem 3.6. Suppose that A ⊂ G has density α. Then A+A contains an affine subspace
of dimension Ω(α2 n).
The argument is iterative and based around the following lemma. The proof of this
lemma uses Chang’s theorem which is not altogether surprising given the Niveau set ex-
ample.
Lemma 3.7. Suppose that G := Fn2 , A ⊂ G has density α and k 6 n is a natural. Then
either A + A contains an affine subspace of dimension k or else there is a subspace V of
co-dimension O(α−2 k) such that k1A ∗ µV kL∞ (G) > α(1 + 1/2).
Proof. Write S := (A + A)c and suppose that σ := µG (S) < 2−k . Now, suppose that
H 6 G is any subspace of dimension k, then
Z Z
µG (S ∩ (x + H))dµG (x) = 1S ∗ 1H (x)dµG (x) = σµG (H) < 1/|G|,
and so there is some x ∈ G such that S ∩ (x + H) = ∅, whence A + A ⊃ x + H. Thus we

assume that this is not the case so that σ > 2−k .
Now we apply Plancherel’s theorem to the obvious inner product:
X
2b
0 = h1A ∗ 1A , 1S iL2 (G) = |1c
A (γ)| 1S (γ).
γ∈G
b
We have 1c
A (0G
b b ) = σ, so
b ) = α and 1S (0G
X
α2 σ 6 |1c 2 b
A (γ)| |1S (γ)|
γ6=0G
b
by the triangle inequality. As before we apply Parseval to get that

X X
2 b 2 2 2
|1c
A (γ)| |1S (γ)| > α σ − (ασ/2). |1c
A (γ)| > α σ/2.
b}
γ∈Specα/2 (1S )\{0G γ∈G
b
Dividing out by |1bS | (which is at most σ by Hausdorff-Young) we get that

X X
2 2 2 2
|1c
A (γ)| > α + |1c
A (γ)| > α (1 + 1/2).
γ∈Specα/2 (1S ) b}
γ∈Specα/2 (1S )\{0G
Let V = Specα/2 (1S )⊥ and apply Chang’s theorem to bound its co-dimension from above
by O(α−2 log σ −1 ) = O(α−2 k). On the other hand µc V (γ) = 1 if γ ∈ Specα/2 (1S ) and so by
Parseval’s theorem we have
X
α2 (1 + 1/2) 6 |1c 2
A (γ)| |µ
c 2 2
V (γ)| = k1A ∗ µV kL2 (G) .
γ∈G
b
Now Hölder’s inequality tells us that

k1A ∗ µV k2L2 (G) 6 k1A ∗ µV kL1 (G) k1A ∗ µV kL∞ (G) = αk1A ∗ µV kL∞ (G) ,
and the result follows on dividing by α.
Proof of Theorem 3.6. We define a sequence of (linear) subspaces (Vi )i iteratively and write
αi := k1A ∗ µVi kL∞ (G) and di := codG (Vi ).
Here codG (V ) denotes the co-dimension of V in G, rather than any other superspace. We
initiate the sequence with V0 := G and so α0 = α.
Suppose that we are at stage i. Then let xi be such that 1A ∗ µVi (xi ) = αi and apply
Lemma 3.7 to (A − xi ) ∩ Vi and the linear space Vi . Then either dim Vi 6 k; or (A −
18 TOM SANDERS
xi ) ∩ Vi + (A − xi ) ∩ Vi contains an affine subspace of dimension k; or there is a subspace

Vi+1 6 Vi such that
di+1 6 di + O(αi−2 k)
and
αi (1 + 1/2) 6 k1(A−xi )∩Vi ∗ µVi+1 kL∞ (Vi ) 6 k1A ∗ µVi+1 kL∞ (G) .
Note that after i steps of the last case we have αi > (1+1/2)i α. However, this quantity can
be at most 1 so we must have ended up in one of the first two cases inside i0 = O(log α−1 )
steps; stop the iteration at this point. Then
di0 = O(α0−2 k + α1−2 k + · · · + αi−2
0 −1
k)
= O(α−2 k(1 + (1 + 1/2)−2 + (1 + 1/2)−4 + . . . )) = O(α−2 k).
If dim Vi0 6 k then n 6 k + di0 = O(α−2 k), so we can pick k = Ω(α2 n) such that this is
not so. In that case we have some (affine) subspace of dimension k inside
(A − xi ) ∩ Vi + (A − xi ) ∩ Vi ⊂ A + A,
and the result is proved.
This sort of density increment argument is very common in additive combinatorics and
originates with the work of Roth [Rot52, Rot53]. The above approach actually uses an
energy-increment variant due to Heath-Brown [HB87] and Szemerédi [Sze90].
3.8. Application: Grothendieck’s inequality. We now turn to an application of a
more function-analytic nature. Suppose that M is an n × n matrix such that
X
(3.1) | Mij xi yi | 6 sup |xi ||yj | for all real sequences (xi )i , (yj )j .
i,j
i,j
(Another way of saying this is that kM k∞→1 6 1.) The real numbers equipped with
multiplication form a 1-dimensional (real) Hilbert space and we can ask to what extent
the sequence of real numbers can be replaced with elements of a higher dimensional (real)
Hibert space.
In particular, suppose that H is a (real) d-dimensional Hilbert space. We may certainly
suppose2 that H ∼ = L2 (X) for some finite set X of size d, and hence that H = L2 (X).
Then
X Z X
(3.2) Mi,j hvi , wj iL2 (X) = Mi,j vi (x)wj (x)dµX (x).
i,j i,j
On the other hand the Cauchy-Schwarz inequality tells us that

√
(3.3) kvkL∞ (X) 6 dkvkL2 (X) for all v ∈ L2 (X),
so we get that X
| Mi,j hvi , wj iL2 (X) | 6 d sup kvi kL2 (X) kwj kL2 (X)
i,j
i,j
2See the exercises.

from the hypothesis (3.1). It follows that we can write Kd for the smallest constant such
that for all n × n matrices M satisfying (3.1) and all d-dimensional real Hilbert spaces H
we have X
| Mi,j hvi , wj iH | 6 Kd sup kvi kL2 (X) kwj kH .
i,j
i,j
Note that if we restrict to the (Hilbert) subspace generated by (vi )i , (wj )j , then none of
the quantities of concern change and so we have Kd 6 K2n .
In this notation our previous argument showed that Kd 6 d, whence Kd 6 min{d, 2n},
and Grothendieck’s inequality tells us that Kd is bounded by an absolute constant.
Theorem 3.9 (Grothendieck’s inequality). We have that Kd = O(1).
Proof. We continue to assume, as we may, that H = L2 (X). In general we cannot do
better than (3.3). However, if the large values of the vectors vi and wj have small L2 -mass
then we can. Let vi and wi be such that
X
Kd = | Mi,j hvi , wj iL2 (X) | and kvi kL2 (X) , kwj kL2 (X) 6 1.
i,j
Decompose the vi s and wj s into their large and small parts: vi = viL +viS and wj = wjL +wjS
where
( (
vi (x) if |vi (x)| > K wj (x) if |wj (x)| > K
viL (x) := and wjL (x) :=
0 otherwise. 0 otherwise.
Then
X X
| Mi,j hvi , wj iL2 (X) | 6 | Mi,j hviS , wjS iL2 (X) |
i,j i,j
X X
+| Mi,j hviL , wj iL2 (X) | + | Mi,j hviS , wjL iL2 (X) |
i,j i,j
6 K 2 + Kd max kviL kL2 (X) + Kd max kwjL kL2 (X) .

i j
Since the left hand side is just Kd , we are done if we can show that the two maxima on
the right are small for some K = O(1). Of course this is not true, but Rudin’s inequality
provides us with an isometric embedding to a space where it is.
Specifically, let G = Fd2 and (λx )x∈X be a set of d independent characters in G
b and put
1 X 1 X
vei := √ fj := √
vi (x)λx and w wj (x)λx .
d x∈X d x∈X
By Plancherel’s theorem we have that
hvei , w
fj iL2 (G) = hvi , wj iL2 (X) .
Now, writing vei = vei L + vei S and w fj L + w
fj = w fj S in the same way as before, we see that
X
(3.4) | Mi,j hvei , w wj L kL2 (G) .
fj iL2 (G) | 6 K 2 + K2d max kvei L kL2 (G) + K2d max kf
i j
i,j
20 TOM SANDERS
On the other hand, by Rudin’s inequality for p = 4 we have that

Z
K kvei kL2 (G) = |vei L |2 K 2 dµG 6 kvei k4L4 (G) = O(kvi k4L2 (X) ) = O(1),
2 L 2
fj L . It follows that there is a choice of K = O(1) such that the maxima

and similarly for w
in (3.4) are each at most 1/4, and hence
X 1
Kd = | Mi,j hvei , w
fj iL2 (G) | 6 O(1) + K2d .
i,j
2
Finally K2d 6 2n for all d whence

1 1 1 1
Kd 6 O(1) + O(1) + · · · + l O(1) + l+1 n 6 O(1) + l n
2 2 2 2
for all l. Letting l tend to infinity completes the proof.
The smallest universal constant above is called Grothendieck’s constant for the reals,
and determining its value is an open problem. One can prove a version for Hilbert spaces
over the complex numbers by splitting into real and imaginary parts, and the constant
there is different.
4. Voting, influence and Boolean functions

We are finally at a point where we shall introduce the definition of a Boolean function,
although we have been using them for some time.
A Boolean function is a function f : {0, 1}n → {0, 1}. Such functions are the indicator
functions of subsets of {0, 1}n , in particular f = 1supp f , and we shall think of Boolean
functions and sets interchangeably.
Often Boolean functions are envisaged as voting schemes: one imagines having two
candidates 0 and 1 and n voters each of whom votes for one of the candidates. The winner
is the value of f (x) when the ith voter votes for candidate xi . With this is mind we have
some examples.
Example (Dictatorships). The function f (x) = xi for some i is called a dictatorship
because the candidate chosen only depends on who the ith person voted for.
Example (Majority). The majority function is the function
(
1 if x1 + · · · + xn > n/2
f (x) :=
0 otherwise.
Example (Parity). The parity function if the function

(
1 if x1 + · · · + xn is odd
f (x) :=
0 otherwise.
Example (Juntas). A function is a k-junta if there is some subset S ⊂ {1, . . . , n} such

that the value of f (x) only depends on the values of (xi )i∈S . In particular a dictatorship
is a 1-junta.
We shall be interested in questions such as how much influence a voter or block of voters
has in a system, but we should remark that we make no assumptions about the voting
system being particularly sensible. Indeed, it seems unlikely that one would choose to
implement the parity system above.
4.1. The connection to dyadic groups and Beckner’s inequality. The set {0, 1}n
may be endowed with a group (in fact vector space) structure in a natural way. We put
(x + y)i := xi + yi (mod 2) for all x, y ∈ {0, 1}n .
The resulting group is (isomorphic to) Fn2 , and we shall denote it by G. What is different
about this chapter is that we have implicitly chosen a set of generators – the canonical
basis of {0, 1}n , denoted e1 , . . . , en , so that ei has zeros everywhere except the ith position
where it has a 1. In particular, xi = x · ei .
If G has the above form then we let γ1 , . . . , γn be the maps x 7→ (−1)xi . The vectors
γ1 , . . . , γn are independent and so form a basis for G, b and we write |γ| for the number of
characters from {γ1 , . . . , γn } in the expression for γ.
A rather powerful tool in this setting (where a basis is specified) is called Beckner’s
inequality. Suppose G = {0, 1}n (thought of as a group of exponent 2) and ∈ (0, 1]. We
define
Y n n
Y
p (x) := (1 + γi ) = (1 + (−1)xi ).
i=1 i=1
We can then calculate the Fourier transform and see that
X Z Y
|S|
pb (γ) = γ γi dµG = |γ| .
S⊂[n] i∈S
We can now state Beckner’s inequality which should remind you of the dual of Rudin’s
inequality since pb (γi ) is on the set {γ1 , . . . , γn } and o() everywhere else except the trivial
character.
Theorem 4.2 (Beckner’s inequality). Suppose that G and p are as above. Then
1+2
kpb fbk`2 (G)
b = kp ∗ f kL2 (G) 6 kf kL1+2 (G) for all f ∈ L (G).
We shall effectively prove this by taking tensor products of the n = 1 case which we
begin with in the following lemma.
Lemma 4.3 (The two-point lemma). For all reals a, b we have
!2/(1+2 )
1+2 1+2
|a + b| + |a − b|
a2 + 2 b2 6 .
2
22 TOM SANDERS
Proof. Clearly we may assume that |a| > |b| since the right hand side is symmetric in a
and b and so dividing by |a| it follows that the result is proved if we can show that
!2/(1+2 )
1+2 1+2
(1 + y) + (1 − y)
(4.1) 1 + 2 y 2 6 for all |y| 6 1.
2
Both sides are continuous in y, so it suffices to prove the inequality for |y| < 1 where we
can use the binomial theorem to expand out the powers of (1 ± y) on the right. Specifically,
put p = 1 + 2 and note that
∞
1 1+2 1+2
X p 1 r
(1 + y) + (1 − y) = (y + (−y)r )
2 r=0
r 2
∞
X p 2l p
= y > 1 + 2 y 2
l=0
2l 2
since 2lp > 0 for all l ∈ N0 . On the other hand (1 + x)θ 6 1 + θx for all θ ∈ [0, 1] and

x > 0, which applied with x = 2 y 2 and θ = p/2 gives us (4.1). The result is proved.
Proof of Theorem 4.2. We proceed by induction on n. The base case is done by the two-
point lemma. We write Gj := {0, 1}j and define the operator Tj as follows:
j
Z Y
2 p
Tj : L (Gj ) → L (Gj ); f 7→ (1 + (−1)xi +yi )f (y1 , . . . , yj )dx1 . . . dxj ,
i=1
and suppose that we have established the jth case of the induction. Given f ∈ L2 (Gj+1 )
write f = g + γj+1 h for two functions g, h ∈ L2 (Gj ), and note that
Tj+1 f = Tj g + γj+1 Tj h.
In particular
Tj+1 f (x1 , . . . , xj+1 ) = Tj g(x1 , . . . , xj ) + (−1)xj+1 Tj h(x1 , . . . , xj ).
Now, put p = 1 + 2 and note that
Z
2
kTj+1 f kL2 (Gj+1 ) = |Tj g(x)|2 + 2 |Tj h(x)|2 dµGj (x)
2/p
|Tj (g + h)(x)|p + |Tj (g − h)(x)|p
Z
6 dµGj (x)
2
by the two-point lemma. Now, put X := Gj and Y := {0, 1} and define k on X × Y by
(
|Tj (g + h)(x)|p if y = 0
k(x, y) := ,
|Tj (g − h)(x)|p if y = 1
so that
2/p 2/p
|Tj (g + h)(x)|p + |Tj (g − h)(x)|p
Z Z Z
dµGj (x) = |k(x, y)|dµY (y) dµX (x).
2
By the integral triangle inequality (see Appendix A for a statement and proof) with q = 2/p
we get that
!2/p
Z Z 2/p Z Z p/2
2/p
|k(x, y)|dµY (y) dµX (x) 6 |k(x, y)| dµX (x) dµY (y)
Z p/2
1 2
= |Tj (g + h)(x)| dµGj (x)
2
Z p/2 !!2/p
+ |Tj (g − h)(x)|2 dµGj (x) .
However, by the inductive hypothesis we have

Z p/2
2
|Tj (g + h)(x)| dµGj (x) 6 kg + hkpLp (Gj ) ,
and similarly for g − h, hence (combining everything we have done so far)

1 2/p
2 p p
kTj+1 f kL2 (Gj+1 ) 6 k(g + h)kLp (Gj ) + k(g − h)kLp (Gj ) = kf k2Lp (Gj+1 ) .
2
The result is proved.
Again the name here is not quite right. Theorem 4.2 is more properly attributed to
Bonami [Bon70], although some point to Nelson [Nel73]. The additive combinatorial liter-
ature often refers to it as Beckner’s inequality and much of the computer science literature
to it as the Bonami-Beckner inequality.
4.4. Influence. Given a Boolean function f , the influence of voter i is denoted σi (f ) and
is defined to be the probability that i changing his vote effects the outcome if all voters
vote uniformly at random:
Z
σi (f ) := |fi |2 dµG (x) where fi (x) = f (x) − f (x + ei ).
In particular, notice by Parseval that

X
(4.2) σi (f ) = 4 |fb(γ)|2 .
γ∈G:γ(e
b i )=−1
Example (Influence in dictatorships). If f is a dictatorship with i the dictator then σi (f ) =

1 and σj (f ) = 0 for all j 6= i. This is, perhaps, not altogether surprising.
Example (Influence in parity). If f is the parity function then it is easy to see that
σi (f ) = 1 for all i.
24 TOM SANDERS
One of the central questions we want to ask is whether there is always a voter with large
influence. In some sense this is obviously not the case: if f has very small variance then
clearly no voter has a large influence. However, if we insist that the voting scheme is ‘fair’
in the sense that Z
f dµG ∼ 1/2
then we might hope to find an influential voter. A trivial estimate follows from (4.2): the
total influence is
n
X X X
I(f ) := σi (f ) = 4 |γ||fb(γ)|2 > 4 |fb(γ)|2 = 4 Var(f ).
i=1 γ∈G
b γ6=0G
b
If f is ‘fair’ then this is asymptotically 1 and hence, by averaging, there is a voter with
influence at least Ω(1/n). It turns out that there are examples where no voter has much
more influence than this.
Example (The tribes example). Suppose that
f (x) := x1 . . . xk ∨ xk+1 . . . x2k ∨ · · · ∨ x(r−1)k+1 . . . xrk
where ∨ denotes logical OR. First we need to determine a relationship between r and k
that will make this function ‘fair’. Specifically, then, we put
Z
1
∼ f dµG = 1 − (1 − 2−k )r .
2
This tells us that we want to take r ∼ 2k log 2, and of course putting n := rk we find that
n
k = log2 n − log2 log n + O(1) and r ∼ .
log2 n
Now, by symmetry all voters have the same influence: so, for example, x1 influences the
outcome iff x(i−1)k+1 . . . xik = 0 for all i ∈ {2, . . . , r} and xi = 1 for all i ∈ {2, . . . , k}.
The probability that x(i−1)k+1 . . . xik = 0 for some i is 1 − 2−k , so the probability that x1
influences the outcome is

1−k −k r−1 1−k −k log n
2 (1 − 2 ) 6 2 exp(−2 (r − 1)) = O .
n
That is to say, none of the voters has very much influence.
Interestingly, Beckner’s inequality was used by Kahn, Kalai and Linial in [KKL88] to
establish that the tribes upper bound has a matching lower bound although the argument
is more complicated than the trivial averaging earlier.
Theorem 4.5 (KKL). Suppose that f is a Boolean function with Var(f ) = Ω(1). Then
there is some i such that
log n
σi (f ) = Ω .
n
Proof. By Beckner’s inequality with = 1/2 we have that

X X
|fb(γ)|2 6 4d 2−2|γ| |fb(γ)|2
|γ|6d,γ(ei )=−1 γ(ei )=−1
= 4d−1 kp1/2 ∗ fi k2L2 (G) 6 4d−1 kfi k2L5/4 (G) = 4d−1 σi (f )8/5 .
It follows that if σi (f ) 6 n−5/6 for all i (which we may as well assume since otherwise we’d
be done) then X
|γ||fb(γ)|2 6 4d−1 n.(n−5/6 )8/5 6 4d−1 n−1/3 .
|γ|6d
Now pick d = Ω(log n) such that this is at most Var(f )/2 for sufficiently large n. Then
X X
Var(f ) = |fb(γ)|2 6 |fb(γ)|2 + Var(f )/2,
γ6=0G
b γ:|γ|>d
and it follows immediately that

X X
I(f ) = 4 |γ||fb(γ)|2 > d |fb(γ)|2 = Ω(d) = Ω(log n),
γ γ:|γ|>d
and we are done by averaging.

What we actually showed in this argument is that if the individual influences are small,
then the total influence is rather large. It turns out that we can establish something
stronger, namely that if f has small total influence then it is in some sense close to a junta.
Theorem 4.6 (Friedgut). Suppose that f is a Boolean function with I(f ) 6 C and η ∈
(0, 1] is a parameter. Then there is an exp(O(Cη −1 ))-junta g such that kf − gk2L2 (G) 6 η.
Proof. Suppose that V is a subspace of G and put
(
1 if f ∗ µV (x) > 1/2
g(x) = .
0 otherwise.
Note that g is also Boolean, constant on cosets of V and we have the point-wise estimate
|f (x) − g(x)| 6 2|f (x) − f ∗ µV (x)| for all x ∈ G.
Hence X
kf − gk2L2 (G) 6 4kf − f ∗ µV k2L2 (G) = 4 |fb(γ)|2
γ6∈V ⊥
Of course control of the total influence implies that the energy is concentrated on low
levels: X X
d |fb(γ)|2 6 |γ||fb(γ)|2 6 C,
γ:|γ|>d γ∈G
b
and so
X X C
|fb(γ)|2 6 4d 2−2|γ| |fb(γ)|2 + .
d
γ6∈V ⊥ γ6∈V ⊥ :|γ|6d
26 TOM SANDERS
Write I := {i : σi (f ) > τ } (whence |I| 6 τ −1 C) and let V = {γi : i ∈ I}⊥ , so that if

γ 6∈ V ⊥ then there is some i 6∈ I such that γ(ei ) = −1. It follows that
X X X
2−2|γ| |fb(γ)|2 6 2−2|γ| |fb(γ)|2
γ6∈V ⊥ :|γ|6d i6∈I γ(ei )=−1
X X
= 4−1 kp1/2 ∗ fi k2L2 (G) 6 σi (f )τ 3/5 6 Cτ 3/5 ,
i6∈I i6∈I
where the fi s are as in the previous proof. Combining all this we conclude that
C
kf − gk2L2 (G) 6 C4d τ 3/5 + .
d
Putting d = d2Cη −1 e we see that we can take τ = exp(−O(Cη −1 )) and ensure that the
difference is at most η. The result is proved since g is clearly an O(τ −1 C)-junta.
Note that in both proofs we were only interested in using Beckner’s inequality for some
< 1 − Ω(1), rather than a very small value as we did in Chapter 3. On the other hand
we needed the additional strength of Beckner to estimate the `2 -mass of fb on the sets
{γ : |γ| = d}.
5. Approximate structures: sets with small sumset

In this chapter we are interested in approximate substructures of Fn2 . The reason for
studying them is that approximate structures are often far more plentiful than exact ones
(by virtue of having relaxed requirements), but frequently still support a lot of analysis
making them almost as useful as their exact counterparts. Our basic aim is to tease out
the structure of ‘approximate subspaces’.
Suppose that H is an affine subspace of G. Then it is easy to see that |H + H| = |H|.
Conversely, if A ⊂ G has |A + A| = |A| then it is not much harder to see that A + A is a
subspace of G, and hence A is an affine subspace. It turns out that if the sumset is only a
bit bigger then much of this phenomenon persists.
Proposition 5.1. Suppose that A ⊂ G is a non-empty set with |A + A| < 1.5|A|. Then
A + A is a subspace of G.
To prove this it will be convenient to introduce the notion of a symmetry set. The
symmetry set of a (non-empty) set A at threshold η is
Symη (A) := {x ∈ G : 1A ∗ 1A (x) > ηµG (A)}.
Note that Symη (A) is a (symmetric) neighbourhood of 0G contained in A+A; these sets are
particularly useful because of the following trivial application of the pigeonhole principle.
Lemma 5.2. Suppose that A ⊂ G is a non-empty set and , 0 ∈ [0, 1) are parameters.
Then
Sym1− (A) + Sym1−0 (A) ⊂ Sym1−(+0 ) (A).
Proof. Suppose that x ∈ Sym1− (A) and y ∈ Sym1−0 (A). Then

1A ∗ 1A (x + y) = µG ((x + A) ∩ (A + y))
> µG (((x + A) ∩ A) ∩ ((A + y) ∩ A))
> µG ((x + A) ∩ A)) + µG ((A + y) ∩ A)
−µG (((x + A) ∩ A) ∪ ((A + y) ∩ A))
> 1A ∗ 1A (x) + 1A ∗ 1A (y) − µG (A)
> (1 − + 1 − 0 − 1)µG (A),
and x + y ∈ Sym1−(+0 ) (A). The result follows.
Notice, in particular, that Sym1 (A) is a subspace of G.
Proof of Proposition 5.1. Suppose that a, a0 ∈ A and write K := µG (A + A)/µG (A). Then
1A ∗ 1A (a + a0 ) = µG ((a + A) ∩ (A + a0 ))
> µG (a + A) + µG (A + a0 ) − µG ((a + A) ∪ (A + a0 ))
> 2µG (A) − µG (A + A) > (2 − K)µG (A).
It follows that A − A ⊂ Sym2−K (A) and hence, by the previous lemma, we have that
(A + A) + (A + A) ⊂ Sym2−K (A) + Sym2−K (A) ⊂ Sym3−2K (A) ⊂ A + A
since K < 3/2; it follows that (A + A) + (A + A) = (A + A). On the other hand it is also
a (symmetric) neighbourhood of the identity so it follows that it is a subspace.
To make good use of Lemma 5.2 we needed the symmetry sets we found to be large.
For large threshold values this will typically not be the case, however, a simple application
of Cauchy-Schwarz actually shows that there are some not too small threshold values for
which it is.
Lemma 5.3. Suppose that A has |A + A| 6 K|A|. Then
µG (Sym1/2K (A)) > µG (A)/2K.
Proof. First of all by the Cauchy-Schwarz inequality we have that
Z Z 2
2 1
(1A ∗ 1A ) dµG > 1A ∗ 1A dµG = µG (A)3 /K.
µG (A + A)
On the other hand
µG (A)3
Z Z
2 µG (A)
(1A ∗ 1A ) dµG 6 1A ∗ 1A dµG = ,
Sym1/2K (A)c 2K 2K
and it follows by the triangle inequality that
µG (A)3
Z
6 (1A ∗ 1A )2 dµG 6 µG (A)2 µG (Sym1/2K (A))
2K Sym1/2K (A)
and the result follows on dividing out µG (A)2 .

28 TOM SANDERS
At a certain point the phenomenon in Proposition 5.1 starts to fail. Indeed, if A =

{0G , e1 , e2 , e3 } then
A + A = {0G , e1 , e2 , e3 , e1 + e2 , e2 + e3 , e1 + e3 },
and so |A + A| 6 1.75|A|, but A + A is not a subspace of G.
Despite this example it turns out that a covering argument of Ruzsa [Ruz99] shows that
A is contained in a subspace which is not too large. This is the Fn2 analogue of Freı̆man’s
theorem [Fre73] for subsets of Z with small sumset.
Theorem 5.4 (Freı̆man’s theorem for Fn2 ). Suppose that A ⊂ G is non-empty with |A +
A| 6 K|A|. Then hAi, the group generated by A, has size at most exp(O(K O(1) ))|A|.
The constant K is called the doubling constant of A, and roughly we think of A as
having ‘small’ doubling if K = O(1).
The theorem is not far from best possible. Indeed, suppose that A is a set of K linearly
independent elements. Then |A + A| 6 K|A|, but |hAi| > 2K , whence the bound on the
size of hAi cannot be exp(o(K))|A|. In fact a rather precise answer has been given by
Green and Tao in [GT09] using compressions.
The core Ruzsa covering argument takes the following form.
Lemma 5.5 (Ruzsa’s covering lemma). Suppose that A, S ⊂ G are non-empty with |A +
S| 6 K|S|. Then there is a set X with |X| 6 K such that A ⊂ X + S + S.
Proof. Let X ⊂ A be a maximal S-separated subset meaning maximal such that if x, x0 ∈ X
are distinct then (x + S) ∩ (x0 + S) = ∅. By separation we see that the sets x + S are
disjoint subsets of A + S whence |X|.|S| 6 |A + S| 6 K|S| and |X| 6 K.
On the other hand, by maximality we see that if x ∈ A then either x ∈ X ⊂ X + S + S,
or else x 6∈ X and so there is some x0 ∈ X such that (x + S) ∩ (x0 + S) 6= ∅ and hence
x ∈ x0 + S + S ⊂ X + S + S. The lemma is proved.
Corollary 5.6. Suppose that S ⊂ G is non-empty with |4S| 6 K|S|. Then hSi has size
at most K2K |S|.
Proof. We apply the previous lemma with A = 3S to get a set X with |X| 6 K such that
3S ⊂ X + 2S. It follows by induction that nS ⊂ (n − 2)X + 2S, and hence hSi ⊂ hXi + 2S.
On the other hand |hXi| 6 2K , and we certainly have |hSi| 6 |hXi||2S| so the result
follows.
Ideally we should now like to prove that |A + A| 6 K|A| implies that |4A| 6 K O(1) |A|.
This is true and a special case of the Plünnecke-Ruzsa inequalities which show that
|A + A| 6 K|A| ⇒ |nA| 6 K n |A|.
The proofs use graph theoretic methods and takes some time, so we shall follow an easier
route. Those interested in the Plünnecke-Ruzsa arguments may wish to consult [TV06,
Chapter 6].
We shall work with symmetry sets again because they are somewhat smoother than
arbitrary sets with small doubling. This means that the next argument of Tao [Tao08,
Proposition 4.5] will give us a Plünnecke-type handle on the growth of symmetry sets.
Proposition 5.7. Suppose that A has |A + A| 6 K|A|. Then
|n Symc (A) + A + A| 6 c−n K n+1 |A|.
Proof. Suppose that s1 , . . . , sn ∈ Symc (A) and a, a0 ∈ A, and note that
(n+1) 1 X
12A (s1 + · · · + sn + a + a0 ) = 12A (z1 ) . . . 12A (zn+1 ).
|G|n z1 +···+zn+1 =s1 +···+sn +a+a0
On the other hand if (b1 , . . . , bn ) ∈ Gn and we put
z1 = a + b1 , z2 = b1 + s1 + b2 , . . . , zn = bn−1 + sn−1 + bn , zn+1 = bn + sn + a0
then
z1 + · · · + zn+1 = s1 + · · · + sn + a + a0
and each vector b ∈ Gn determines a unique vector z ∈ Gn+1 . It follows that
(n+1) 1 X
12A (s1 + · · · + sn + a + a0 ) > 12A (a + b1 ) . . . 12A (bn + sn + a0 )
|G|n b ,...,b
1 n
1 X
> n
1A (a)1A (b1 ) . . . 1A (bn + sn )1A (a0 )
|G| b ,...,b
1 n
n
Y
= 1A ∗ 1A (si ) > (cµG (A))n .
i=1
Thus
Z
n (n+1)
µG (n Symc (A) + A + A)(cµG (A)) 6 12A dµG = K n+1 µG (A)n+1
and the result follows on rearrangement.
Proof of Theorem 5.4. By Lemma 5.3 we see that S := Sym1/2K (A) has |S| > |A|/2K.
Then by Proposition 5.7 we have that
|4S| 6 |4S + 2A| 6 (2K)4 K 5 |A| = O(K O(1) |S|).
It follows by Corollary 5.6 that the group generated by S has size at most
|hSi| 6 exp(O(K O(1) ))|S| 6 exp(O(K O(1) ))|4S + 2A| 6 exp(O(K O(1) ))|A|.
Finally, by Ruzsa’s covering lemma we have a set X of size O(K O(1) ) such that
A ⊂ X + S + S ⊂ hXi + hSi
and the result follows.

30 TOM SANDERS
6. Correlation with approximate structures

A quantity which is closely connected to the doubling constant, and also to symmetry
sets is the additive energy of a set. Specifically, if A ⊂ G then the additive energy of A is
defined to be
!2
X X X
E(A) := 1A (x)1A (y)1A (z)1A (w) = 1A (x)1A (y) .
x+y=z+w u x+y=u
In our usual notation this is

Z
3
E(A) = |G| (1A ∗ 1A )2 dµG ,
and in words it is the number of additive quadruples in A, that is the number of quadruples
(x, y, z, w) ∈ A4 such that x + y = z + w. It is easy to see that
X X X
(6.1) E(A) 6 sup 1A (x)1A (y) · 1A (x)1A (y) 6 |A| · |A|2 = |A|3 .
u
x+y=u u x+y=u
On the other hand, if |A + A| 6 K|A| then by the same application of Cauchy-Schwarz as

in Lemma 5.3 we get that
!2 !2
X X 1 X X
E(A) = 1A (x)1A (y) > 1A (x)1A (y) > |A|3 /K,
u x+y=u
|A + A| u x+y=u
which is to say it is close to the maximum value in (6.1). Of course this maximum in (6.1)
is achieved by affine subspaces since another way of thinking of the quantity E(A) is as
the number of triples (x, y, z) ∈ A3 which have x + y − z ∈ A; if a non-empty set A has
this property, it is well known that it is a coset of a subgroup.
The additive energy is a particularly useful quantity because it is very stable under small
perturbations of the underlying set. Indeed, if we add or remove o(|A|) elements from A
then the additive energy changes by o(|A|3 ) which is not much if E(A) = Ω(|A|3 ). On the
other hand, because of this stability under small perturbations, large additive energy does
not imply small doubling.
To see this concretely consider, the example of A as a subspace V union η|A| independent
elements of G whose span intersects V in the trivial vector. It is easy to see that
E(A) > (1 − O(η))|A|3 and |A + A| = Ω(η|A|2 ).
In this example, A nevertheless has a large structured part and fortunately this can be
recovered.
Theorem 6.1 (Balog-Szemerédi-Gowers). Suppose that A ⊂ G satisfies E(A) > c|A|3 .
Then there is a subset A0 ⊂ A with |A0 | > cO(1) |A| such that |A0 + A0 | 6 c−O(1) |A0 |.
We shall prove this using symmetry sets because in this regard it does turn out that sets
with large additive energy behave in a similar way to sets with small sumset. Indeed, the
reader may wish to compare the next lemma with Lemma 5.3.
Lemma 6.2. Suppose that A ⊂ G has E(A) > c|A|3 . Then

c−1 µG (A) > µG (Symc/2 (A)) > cµG (A)/2 and h1A ∗ 1A , 1Symc/2 (A) iL2 (G) > cµG (A)2 /2.
Proof. The first inequality is a simple application of the triangle inequality:
Z
cµG (A)
µG (Symc/2 (A)) 6 1A ∗ 1A dµG = µG (A)2 .
2 Symc/2 (A)
The third inequality follows since

1
h1A ∗ 1A , 1Symc/2 (A) iL2 (G) > h(1A ∗ 1A )2 , 1Symc/2 (A) iL2 (G)
µG (A)
Z
1
= (1A ∗ 1A )2 dµG
µG (A) Symc/2 (A)
Z Z !
1
= (1A ∗ 1A )2 dµG − (1A ∗ 1A )2 dµG
µG (A) Symc/2 (A)c

1 cµG (A)
> cµG (A)2 − µG (A)2
µG (A) 2
2
> cµG (A) /2,
and hence the second inequality follows immediately by Hölder’s inequality.
The aim now is to prove Theorem 6.1 in two parts following Sudakov, Szemerédi and
Vu [SSV05]. The result was first proved by Balog and Szemerédi in [BS94], and with good
bounds by Gowers in [Gow98].
In the next lemma we shall find a large subset A0 of A such that almost all pairs (x, y) ∈
A02 have x + y in a symmetry set of A. In particular this means that almost all of the pairs
in A02 represent one of at most O(|A|) elements. This is close to having small doubling
and we then complete the proof by a pigeonhole argument.
Lemma 6.3. Suppose that A ⊂ G has E(A) > c|A|3 and ∈ (0, 1] is a parameter. Then
there is a subset A0 ⊂ A with |A0 | = Ω(c|A|) such that
|{(x, y) ∈ A02 : x + y ∈ Symc2 /2 (A)}| > (1 − )|A0 |2 .
Proof. We let X be a random variable such that
1A ∗ 1A (z)
P(X = z) = ,
µG (A)|A|
and put A0 := A ∩ (X + A). (Note that this is a valid probability distribution for X.) Then
1 X 1
E|A0 | = E|G|1A ∗ 1A (X) = 2
1A ∗ 1A (z)2 = E(A) > c|A|,
µG (A) z∈G |A|2
and so, by the Cauchy-Schwarz inequality, we have that E|A0 |2 > c2 |A|2 .
32 TOM SANDERS
On the other hand for x, y ∈ A we have

P((x, y) ∈ A02 ) = P(x ∈ X + A, y ∈ X + A)
= P(X ∈ (x + A) ∩ (y + A))
6 sup P(X = z).|(x + A) ∩ (y + A)|
z∈G
1A ∗ 1A (x + y)
6 .
µG (A)
If we now write B := {(x, y) ∈ A02 : x + y 6∈ Symc2 /2 (A)} then
X
E|B| = P((x, y) ∈ A02 ) 6 |A|2 .c2 /2 6 E|A0 |2 .
x,y∈A
2
x+y6∈Sym 2 (A)
c /2
It follows that
1 1
E(|A0 |2 − −1 |B|) > E|A0 |2 > c2 |A|2
2 2 √
and we may pick X such that |A | − |B|) > c |A| /2, and hence |A0 | > c|A|/ 2 and
0 2 −1 2 2
|B| 6 |A0 |2 . The result is proved.

Proof of Theorem 6.1. Apply Lemma 6.3 with3 = 1/6 to see that there is a set A00 ⊂ A
with |A00 | = Ω(c|A|) such that
(6.2) |{(x, y) ∈ A002 : x + y ∈ Symc2 /12 (A)}| > (1 − 1/6)|A00 |2 .
In words this says that for most pairs (x, y) ∈ A002 we have x + y ∈ Symc2 /12 (A). We let
A0 be the set of xs in A00 such that there are many ys in A00 with x + y ∈ Symc2 /12 (A). In
particular, for each x ∈ A00 write
Nx := {y ∈ A00 : x + y ∈ Symc2 /12 (A)}
and put
A0 := {x ∈ A00 : |Nx | > 2|A00 |/3}.
The total number of pairs (x, y) ∈ A00 such that x + y ∈ Symc2 /12 (A) is then at most
|A0 |. max0 |Nx | + |A00 \ A0 |. max
00 0
|Nx | 6 |A0 |.|A00 | + (|A00 | − |A0 |).2|A00 |/3
x∈A x∈A \A
= |A0 ||A00 |/3 + 2|A00 |2 /3.

On the other hand combining this with (6.2) we see that
|A0 ||A00 |/3 + 2|A00 |2 /3 > (1 − 1/6)|A00 |2 ,
and hence |A0 | > |A00 |/2.
Now, if (x, y) ∈ A02 then |Nx |, |Ny | > 2|A00 |/3 and Nx , Ny ⊂ A00 . It follows that
|Nx ∩ Ny | = |Nx | + |Ny | − |Nx ∪ Ny | > |Nx | + |Ny | − |A00 | > |A00 |/3;
3The reason for the choice of will become clear later.
if words, there are at least |A00 |/3 elements z ∈ A00 such that x + z ∈ Symc2 /12 (A) and
y + z ∈ Symc2 /12 (A). We conclude that if x, y ∈ A0 then
Z
1A ∗ 1A ∗ 1A ∗ 1A (x + y) = 1A ∗ 1A (x + y + z)1A ∗ 1A (z)dµG (z)
Z
= 1A ∗ 1A (x + z)1A ∗ 1A (y + z)dµG (z)
Z
> 1A ∗ 1A (x + z)1A ∗ 1A (y + z)1A00 (z)dµG (z)
> (c2 µG (A)/12)2 µG (A00 )/3,

whence
µG (A0 + A0 ).(c2 µG (A)/12)2 µG (A00 )/3 6 µG (A)4 .
This rearranges to give
µG (A0 + A0 ) 6 O(c−4 µG (A)2 µG (A00 )−1 ) = O(c−6 µG (A0 )),
We now turn our attention to rough morphisms. A map φ : G → G is a morphism
(linear map) if
φ(x + y) = φ(x) + φ(y) for all x, y ∈ G.
Our next result combines the work of this section and the last to show that if a function φ
satisfies this relationship for many pairs (x, y) then it is equal to a genuine morphism on
a large set. The result itself is due to Samorodnitsky [Sam07] by an argument which was
originally developed for the integers by Gowers in [Gow98].
Theorem 6.4 (Rough morphisms). Suppose that φ : G → G is such that
µG2 ({(x, y) : φ(x + y) = φ(x) + φ(y)}) > c.
Then there is a homomorphism θ : G → G such that
µG ({x : φ(x) = θ(x)}) > exp(−O(c−O(1) )).
Proof. We examine the set A := {(x, φ(x)) : x ∈ G} ⊂ G2 which has size |G|. It turns out
that this has large additive energy:
 2
X X X
E(A) = 1 > 1


x+y=z+w u x+y=u
φ(x)+φ(y)=φ(z)+φ(w) φ(x)+φ(y)=φ(u)
 2
1 X X
> 1 > c2 |G|3 = c2 |A|3 .

|G|

u x+y=u
φ(x)+φ(y)=φ(u)
By the Balog-Szemerédi-Gowers lemma there is a set A0 ⊂ A with |A0 | > cO(1) |A| and
|A0 + A0 | 6 c−O(1) |A0 |.
34 TOM SANDERS
Let π : G2 → G be the natural co-ordinate projection map (x, y) 7→ x so that π(A) = G

and |π(A0 )| > cO(1) |G|, and let b1 , . . . , bm be a maximal4 set of linearly independent elements
in π(A0 ). Now, by maximality if a ∈ π(A0 ) then either a = bi for some i or {a, b1 , . . . , bm }
is linearly dependent. In the second case
Xm
µ.a + µi .bi = 0G for some µ, µ1 , . . . , µm ∈ F2
i=1
not all zero. Since b1 , . . . , bm are linearly independent we see that µ 6= 0, hence µ = 1
and a ∈ hb1 , . . . , bm i. Thus in either case a ∈ hb1 , . . . , bm i and so it follows that π(A0 ) ⊂
hb1 , . . . , bm i, and hence
cO(1) |G| 6 2m ,
so m > n − O(log c−1 ). Now let v1 , . . . , vm ∈ G be such that (bi , vi ) ∈ hA0 i for all
i ∈ {1, . . . , m} and put H := h(b1 , v1 ), . . . , (bm , vm )i.
It is easy to see that π(H) = π(hA0 i) and so |H| > cO(1) |G|. Moreover, by Freı̆man’s
theorem |hA0 i| 6 exp(O(c−O(1) ))|G|, and hence |hA0 i/H| 6 exp(O(c−O(1) )). The cosets of
H in hA0 i partition hA0 i and hence
X
|A0 | = |A0 ∩ hA0 i| = |A0 ∩ W | 6 |hA0 i/H| sup |A0 ∩ W |.
W ∈hA0 i/H
W ∈hA0 i/H
It follows that there is some coset W = (w, z) + H with (w, z) ∈ hA0 i such that
|((w, z) + H) ∩ A0 | > exp(−O(c−O(1) ))|A0 | = exp(−O(c−O(1) ))|G|.
Since π(H) = π(hA0 i) we see that w + π(H) = π(H), so we may assume that w = 0G and
hence
|((0G , z) + H) ∩ A| > |((0G , z) + H) ∩ A0 | > exp(−O(c−O(1) ))|G|.
If z were also to be 0G we’d be done as we could define θ to be the linear extension of
θ(bi ) = vi ; unfortunately this is not true, but we can correct the situation.
Claim. We may assume that there is some j with 1 6 j 6 m such that for at least 1/4 of
elements a ∈ π(((0G , z) + H) ∩ A) we have a · bj = 1.
Proof. Write P := π(((0G , z) + H) ∩ A) which has µG (P ) > exp(−O(c−O(1) )), and define
characters λi (x) := (−1)x·bi . Suppose that |1c P (λi )| > µG (P )/2 for all 1 6 i 6 m. Then by
Chang’s theorem, since the b1 , . . . , bm are independent, we have
n − O(log c−1 ) 6 m = O(log µG (P )−1 ) = O(c−O(1) ).
It follows that n = O(c−O(1) ) and we are trivially done since the lower bound in the
conclusion simply asserts that φ(x) = θ(x) for at least one x; defining such a θ is trivial. It
follows that we may assume there is some 1 6 j 6 m such that |1c P (λj )| 6 µG (P )/2. Now,
write
P + := {x ∈ P : x · bj = 1} and P − := {x ∈ P : x · bj = 0}.
4The reader may care to compare this use of maximality with that in Ruzsa’s covering lemma, Lemma
5.5.
From the definition of the Fourier transform and P as a disjoint union of P + and P − we
have
|µG (P + ) − µG (P − )| 6 µG (P )/2 and µG (P + ) + µG (P − ) = µG (P ).
It follows that µG (P + ) > µG (P )/4, that is a · bj = 1 for at least 1/4 of the elements
a ∈ P.
Finally, we extend the vectors b1 , . . . , bm by bm+1 , . . . , bn such that b1 , . . . , bn is a basis
for G and define θ by linear extension from its definition on the basis (bi )i :

vi
 if 1 6 i 6 m, i 6= j
θ(bi ) = vj + z if i = j

0
G otherwise.
Now, suppose that a ∈ π(((0G , z) + H) ∩ A) has a · bj = 1. Then
X X
a= bi and φ(a) = z + vi .
i:a·bi =1 i:a·bi =1
On the other hand X X

θ(a) = θ(bi ) = z + vi = φ(a),
i:a·bi =1 i:a·bi =1
and there are many such a by the claim and the lower bound on the size of π(((0G , z) +
H) ∩ A).
7. Polynomial testing
In this section we shall consider polynomials p : Fn2 → F2 . These are, of course, the same
as Boolean functions in the sense that if we are given A ⊂ Fn2 then there is a polynomial
p : Fn2 → F2 such that {x : p(x) = 1} = A. Indeed, we simply define
X Y Y
p(x) = xi (1 − xi ).
a∈A i:a·ei =1 i:a·ei =0
We are interested in determining whether or not a given function f : Fn2 → F2 correlates

with a low degree polynomial.
7.1. Correlation with linear polynomials. In the first instance we should like to see
if f has a large (affine) linear part, meaning whether it correlates with a function of the
form
x 7→ a · x + b
where a ∈ Fn2 , b ∈ F2 and a · x = a1 x1 + · · · + an xn . If f has this form then it is easy to see
(by linearity) that
(7.1) f (x) + f (x + y) + f (x + z) + f (x + y + z) = 0F2
36 TOM SANDERS
for all x, y, z ∈ Fn2 . Bearing in mind the Rough Morphism Theorem (Theorem 6.4), we
shall be interested in what we can say if this equality is satisfied an unusually large amount
of the time. We define
Z
4
kgkU 2 := g(x)g(x + y)g(x + z)g(x + y + z)dµG (x)dµG (y)dµG (z),
which it turns out is a norm:

Lemma 7.2. We have the identity,
∞
kgkU 2 = kb
g k`4 (G)
b for all g ∈ L (G),
so that, in particular, k · kU 2 is a norm.

Proof. This is an easy calculation using Parseval’s theorem and the change of variables
w = x + z:
Z
4
kgkU 2 = g(x)g(x + y)g(x + z)g(x + y + z)dµG (x)dµG (y)dµG (z)
Z
= g(x)g(y + x)g(w)g(y + w)dµG (x)dµG (w)dµG (y)
g k4`4 (G)
= hg ∗ g, g ∗ giL2 (G) = kb b .

Now, the proportion of triples for which (7.1) holds is
Z
1 1 1
P (f ) := (1 + (−1)f (x)+f (x+y)+f (x+z)+f (x+y+z) )dµG (x)dµG (y)dµG (z) = + kgk4U 2
2 2 2
where g = (−1)f . Since k · kU 2 is a norm we see that (7.1) holds at least half the time for
any function f : Fn2 → F2 ; it turns out if it holds an absolute proportion more of the time
then f correlates with a linear polynomial.
Theorem 7.3 (U 2 -inverse theorem). Suppose that f : Fn2 → F2 has k(−1)f kU 2 > . Then
there is a linear polynomial p : Fn2 → F2 such that
h(−1)f , (−1)p iL2 (G) > O(1) .
Proof. We put g := (−1)f and note that by hypotheses, the previous Lemma and Parseval’s
theorem we have
4 6 kgk4U 2 = kb
g k4`4 (G) g (γ)|2 kb
b 6 sup |b g k2`2 (G) g (γ)|2 kgk2L2 (G) .
b = sup |b
γ∈G
b γ∈G
b
Of course kgk2L2 (G) = 1 and so there is some character γ ∈ G

b and phase σ ∈ {−1, 1} such
that
σbg (γ) > 2
Now, since γ is a character there is some a ∈ Fn2 such that γ(x) = (−1)a·x ; since σ ∈ {−1, 1}
there is some b ∈ F2 such that σ = (−1)b . It follows that the linear polynomial p defined
by p(x) = a · x + b satisfies
hg, (−1)p iL2 (G) = hg, γ.(−1)b iL2 (G) = σb
g (γ) > 2 ,
The point is that if we are given a black box into which we can input values of x and
which outputs f (x), then in a number of steps independent of the size of the underlying
group, we can determine (with, say, 99% reliability) if k(−1)f kU 2 is large or not and hence
whether f correlates with a linear polynomial.
7.4. Characterising higher degree polynomials. In the previous subsection we en-

coded the idea of f being linear by differencing (differentiating). If f is a linear polynomial,
then for each fixed y, the map x 7→ f (x + y) − f (x) is constant and so, if we difference
again we get that
(f (x + y + z) − f (x + z)) − (f (x + y) − f (x)) = 0F2 .
This can be rewritten as (7.1). With higher order polynomials we have to keep differencing,
so that if f is a polynomial of degree d then
X
(7.2) f (x + ω · h) = 0F2
ω∈{0,1}d+1
for all x ∈ G, h ∈ Gd+1 . As with the previous section we define

Z Y k
2k
Y
kgkU k := g(x + ω · h)dµG (x) dµG (hi ).
ω∈{0,1}k i=1
It is also helpful to formulate this inductively: we define a differencing operator on functions

g ∈ L∞ (G) by
∂y (g)(x) := g(x + y)g(x),
and then note that
(7.3)
Z k Z
2k k−1
Y Y
kgkU k = ∂hk (g)(x + ω · h)dµG (x) dµG (hi ) = k∂hk (g)kU2 k−1 dµG (hk ).
ω∈{0,1}k−1 i=1
It turns out that k · kU k is a norm for k > 2, and to prove this we need an analogue of the
Cauchy-Schwarz inequality. Given a family of functions (fω )ω∈{0,1}k we define the Gowers
inner product to be
Z Y k
Y
h(fω )ω∈{0,1}k iU k := fω (x + ω · h)dµG (x) dµG (hi ).
ω∈{0,1}k i=1
38 TOM SANDERS
Lemma 7.5 (Gowers-Cauchy-Schwarz inequality). For all families of functions (fω )ω∈{0,1}k
we have Y
|h(fω )ω∈{0,1}k iU k | 6 kfω kU k .
ω∈{0,1}k
The proof here is notationally heavy; readers interested in an alternative source may
wish to consult [TV06, p419] or [Gow01, Lemma 3.8].
Proof. Note that the inner product is equal to
Z Z Y
fω,0 (x + ω · h)fω,1 (x + ω · h + hk )dµG (x)dµG (hk )dµGk−1 (h).
ω∈{0,1}k−1
Make the change of variables y = x + hk so that this is, in turn, equal to

  
Z Z Y Z Y
 fω,0 (x + ω · h)dµG (x)  fω,1 (y + ω · h)dµG (y) dµGk−1 (h).
ω∈{0,1}k−1 ω∈{0,1}k−1
By Cauchy-Schwarz this is at most

  2 1/2
Z Z Y
  fω,0 (x + ω · h)dµG (x) dµGk−1 (h)
ω∈{0,1}k−1
  2 1/2
Z Z Y
×  fω,1 (y + ω · h)dµG (y) dµGk−1 (h) .
ω∈{0,1}k−1
Changing variables back we see that this is equal to

1/2 1/2
h(fω0 ,0 )ω∈{0,1}k iU k h(fω0 ,1 )ω∈{0,1}k iU k
where ω 0 is ω projected onto the first k − 1 co-ordinates. By symmetry the same is true
for all the other co-ordinates and so after k applications we get that
1/2k
Y
|h(fω )ω∈{0,1}k iU k | 6 h(fρ )ω∈{0,1}k iU k ,
ρ∈{0,1}k
where (fρ )ω∈{0,1}k just means that all 2k functions in the vector are the same function,
fρ .
Lemma 7.6. We have the nesting property
Z
| f dµG | = kf kU 1 6 kf kU 2 6 . . . 6 kf kU k 6 . . . ,
and k · kU k is a norm for all k > 2.

Proof. The nesting follows immediately from the Gowers-Cauchy-Schwarz inequality: given
f ∈ L∞ (G) write fω = f if ωk = 1 and fω ≡ 1 if ωk = 0. Then
k−1 k−1 k−1 k−1
kf k2U k−1 = h(fω )ω∈{0,1}k iU k 6 kf k2U k k1k2U k = kf kU2 k
and we are done.
It is immediate that k · kU k is homogenous and zero on the zero function. Since k · kU 2
is a norm by Lemma 7.2 we see that kf kU k = 0 must imply f = 0 for k > 2. It remains to
check the triangle inequality: First note that
k
X
kf0 + f1 k2U k = h(f0 + f1 )ω∈{0,1}k iU k = h(f1ω∈I )ω∈{0,1}k iU k ,
I⊂{0,1}k
where we are using the multi-linearity of the Gowers inner product. In particular the Gow-
ers inner product takes 2k terms indexed by ω ∈ {0, 1}k and the product h(f1ω∈I )ω∈{0,1}k iU k
simply denotes the Gowers inner product of some copies of f0 and f1 , with f0 in the position
indexed by ω if ω ∈ {0, 1}k \ I, and f1 in the position indexed by ω if ω ∈ I.
Now, by the Gowers-Cauchy-Schwarz inequality we see that
2k −|I| |I|
|h(f1ω∈I )ω∈{0,1}k iU k | 6 kf0 kU k kf1 kU k ,
and hence
k 2k −|I| |I| k
X
kf0 + f1 k2U k 6 kf0 kU k kf1 kU k = (kf0 kU k + kf1 kU k )2 .
I⊂{0,1}k
The inequality follows on taking 2k th roots, and the lemma is proved.

Finally, as before, the proportion of tuples satisfying (7.2) is
1 1 k
Pk (f ) =+ k(−1)f k2U k ,
2 2
which again means that (7.2) holds at least half the time, and so again we should like
to show that k(−1)f kU k > implies correlation with a degree k polynomial. There are
subtleties to this request as it is stated and we shall only be able to deal with the quadratic
case here.
7.7. Correlating with quadratic polynomials. Our object here is to prove the follow-
ing theorem due to Samorodnitsky [Sam07]. In other groups the U 3 -inverse theorem is
more complicated to state and is due to Green and Tao [GT08].
Theorem 7.8 (U 3 -inverse theorem). Suppose that f : Fn2 → F2 has k(−1)f kU 3 > . Then
there is a quadratic polynomial p : Fn2 → F2 such that
h(−1)f , (−1)p iL2 (G) > exp(−O(−O(1) )).
If f (x) is a quadratic polynomial then it has the form
x 7→ hAx, xi + ha, xi + b
40 TOM SANDERS
were A is an F2 -valued matrix, a ∈ Fn2 and b ∈ F2 . By symmetry and since the diagonal
term ha, xi can be absorbed into A by replacing Aii by Aii +ai , we may assume that a = 0G
and A is upper triangular. (Note that x2i = xi in F2 .)
Differencing once we have
f (x + y) − f (x) = h(A + At )y, xi + hAy, yi,
which is a linear polynomial in x for fixed y. It follows that
(f (x + y + z) − f (x + z)) − (f (x + y) − f (x)) = h(A + At )y, zi
which is constant in x for fixed y and z. Crucially A + At is symmetric and has zero
diagonal; we shall now set about establishing a converse.
For the remainder of this section it will be convenient to identify G with G
b via the map
r 7→ (x 7→ (−1)r·x ).
Lemma 7.9. Suppose that f : Fn2 → F2 , S is a symmetric matrix with zero diagonal and
g := (−1)f has Z
2
∂c
y g(Sy) dµG (y) > δ.
Then there is a quadratic polynomial p such that h(−1)f , (−1)p iL2 (G) > δ O(1) .
Proof. Let A be a matrix such that A + At = S, and consider h(x) := (−1)hAx,xi . By our
earlier calculation we have
t
∂y h(x) = (−1)h(A+A )y,xi+hAy,yi ,
and so
h∂y g, ∂y hiL2 (G) = (−1)hAy,yi ∂c t
y g((A + A )y) = (−1)
hAy,yi c
∂y g(Sy).
It follows that Z
h∂y g, ∂y hi2L2 (G) dµG (y) > δ.
On the other hand the integral is equal to
Z
g(x)g(x + y)g(z)g(z + y)h(x)h(x + y)h(z)h(z + y)dµG (x)dµG (z)dµG (y),
which is in turn equal to

Z
(gh)(x)(gh)(x + y)(gh)(z)(gh)(z + y)dµG (x)dµG (z)dµG (y) = kghk4U 2 .
It follows by the U 2 -inverse theorem that there is some a ∈ Fn2 and b ∈ F2 such that
l(x) := ha, xi + b has
hgh, (−1)l(x) iL2 (G) > δ O(1) .
The result follows on putting p(x) = hAx, xi + ha, xi + b.
Our problem now is to find a suitable linear map, and to do that we shall find a roughly
linear choice function.
Lemma 7.10. Suppose that g : G → {−1, 1} has kgkU 3 > . Then there is a function
φ:G→G b and a set A of density at least O(1) such that ∂d 2
x g(φ(x)) >
O(1)
for all x ∈ A
and
µG2 ({(x, y) ∈ G2 : φ(x) + φ(y) = φ(x + y), x, y, x + y ∈ A}) > O(1) .
Proof. Pick a function φ : G → G
b randomly with
2
P(φ(x) = γ) = ∂
d x g(γ) ,
such that the choices are independent for distinct xs. Note that
X X
2 2 2
P(φ(x) = γ) = ∂ x g(γ) = g ∗ g (x) = 1
d
γ∈G
b γ∈G
b
by Parseval’s theorem so these are genuine probability distributions.

Now, write
2 16
A(φ) := {x ∈ G : ∂ dx g(φ(x)) > /6},
and
L(φ) := µG2 ({(x, y) ∈ G2 : φ(x) + φ(y) = φ(x + y), x, y, x + y ∈ A}).
Then
Z X
2c 0 2\ 0 2
EL(φ) = ∂
d x g(γ) ∂y g(γ ) ∂x+y g(γ + γ ) dµG (x)dµG (y)
x,y,x+y∈A(φ) γ,γ 0
Z X
2c 0 2\ 0 2
> ∂
dx g(γ) ∂y g(γ ) ∂x+y g(γ + γ ) dµG (x)dµG (y)
x,y,x+y γ,γ 0
16
Z X
2c 0 2
−3. ∂
d x g(γ) ∂y g(γ ) dµG (x)dµG (y).
6 x,y γ,γ 0
On the other hand, by Parseval’s theorem we have that

Z X Z Z
2
(7.4) ∂z g(γ) dµG (z) =
c ∂z (g)(x)2 dµG (x)dµG (z) = kgk4L2 (G) = 1,
γ∈G
b
whence
Z X
2c 0 2\ 0 2 16
EL(φ) > ∂x g(γ) ∂y g(γ ) ∂x+y g(γ + γ ) dµG (x)dµG (y) − /2.
d
γ,γ 0
It follows that if we can show that

Z X
2c 0 2\ 0 2 16
(7.5) T := ∂
d x g(γ) ∂y g(γ ) ∂x+y g(γ + γ ) dµG (x)dµG (y) > ,
γ,γ 0
16 2
then by averaging we
√ can pick a φ such that L(φ) > /2. In particular µG (A(φ)) > L(φ)
8
so µG (A(φ)) > / 2, and so, putting A := A(φ) we have
2 16
A ⊂ {x ∈ G : ∂
d x g(φ(x)) > /6}
42 TOM SANDERS
and
µG2 ({(x, y) ∈ G2 : φ(x) + φ(y) = φ(x + y), x, y, x + y ∈ A}) > 16 /2.
It will follow that we are done, and it remains to establish (7.5). By the definition of the
Fourier transform
Z
2
∂w g(λ) = (∂w g) ∗ (∂w g)(z)λ(z)dµG (z) for all w ∈ G, λ ∈ G,
d b
which, inserted in place of all the Fourier transforms in (7.5) gives

Z Z
T = (∂x g) ∗ (∂x g)(z)(∂y g) ∗ (∂y g)(z)(∂x+y g) ∗ (∂x+y g)(z)dµG (z)dµG (x)dµG (y).
Now, note that

(7.6) ∂x g ∗ ∂x g(z) = ∂z g ∗ ∂z g(x) for all x, y ∈ G,
so writing hz for the function (∂z g) ∗ (∂z g) we get
Z Z Z X
T = hz (x)hz (y)hz (x + y)dµG (z)dµG (x)dµG (y) = hbz (γ)3 dµG (z),
γ∈G
b
where the equality is by Fourier inversion. We conclude that

Z X Z X
2c 0 2\ 0 2 6
∂x g(γ) ∂y g(γ ) ∂x+y g(γ + γ ) dµG (x)dµG (y) =
d ∂c
z g(γ) dµG (z),
γ,γ 0 γ∈G
b
3
z g(γ) ×∂z g(γ)
an identity called Samorodnitzky’s identity. By Cauchy-Schwarz applied to ∂c c
p
(or log-covexity of L -norms) we have
 1/2  1/2
Z X Z X Z X
2 6 4
 ∂cz g(γ) dµG (z)
  ∂cz g(γ) dµG (z)
 > ∂c
z g(γ) dµG (z).
γ∈G
b γ∈G
b γ∈G
b
Of course, the term on the right is equal to kgk8U 3 > 8 by the inductive definition of the
U 3 -norm and Lemma 7.2, and so inserting (7.4) into the first term on the left we get
Z X
2c 0 2\ 0 2 16
∂
d x g(γ) ∂y g(γ ) ∂x+y g(γ + γ ) dµG (x)dµG (y) > ,
γ,γ 0

We should like to couple the previous lemma with the Rough Morphism Theorem, but
the set A presents a problem. It could be that an application of the theorem gives us
a morphism which coincides with φ on a set disjoint from A. The proof of the Rough
Morphism Theorem can be adapted to ensure that the coincidences happen on A; we shall
establish this strengthening as a consequence.
We shall take φ to be highly random on the complement of A which will force any
correlation with a morphism to be largely on A. The following trivial counting argument
will let us do this.
Lemma 7.11. There is a function ν : G → G b such that for any morphism θ : G → G

b we
have
|{x : ν(x) = θ(x)}| = O(n2 ).
Proof. Write N := 2n and note that there are N N functions G → G.b On the other hand
there are at most N n morphisms G → G,
b and hence at most
m
em N N +n

n N N −m N N +n−m eN
N . .N = .N 6 N N +n−m = =: M
N −m m m mm
functions G → G b which differ from a morphism in at most N −m places. Thus, if M < N N
then there is a function which differs from every morphism in at least N − m places. We
can take m = O(n2 ) to guarantee this and the result is proved.
We could actually have taken m = O(n2 / log n), but this slight improvement is not
important.
Lemma 7.12. Suppose that g : G → {−1, 1} has kgkU 3 > . Then there is a symmetric
matrix S with zero diagonal such that
Z
2 −O(1)
∂
d x g(Sx) dµG (x) > exp(−O( )).
Proof. Apply Lemma 7.10 to get a function φ and set A and let φ̃ be equal to φ on A,
and equal to ν (from Lemma 7.11) on Ac . By the Rough Morphism theorem there is a
morphism θ such that
µG (x ∈ G : θ(x) = φ̃(x)) > exp(−O(−O(1) )).
It follows that either exp(−O(−O(1) )) = 2−n .O(n2 ), or else
µG (x ∈ A : θ(x) = φ(x) > exp(−O(−O(1) )).
In the first case the conclusion is trivial by taking S ≡ 0, since
∂d d b ) = g ∗ g(0G ) = kgk2 2 = 1,
0G g(S0G ) = ∂0G g(0G L (G)
and so the integral is at least 2−n which has the required size in this case. Thus we assume
we are in the second case.
We write M for the matrix corresponding to θ under the identification r 7→ (x 7→ (−1)r·x )
of G and G,
b so that
Z
2 −O(1)
∂
dx g(M x) dµG (x) > exp(−O( )).
Write h(x) := (−1)hM x,xi and suppose that k is any function.

Claim. Z Z
X
2 t 2
k(x)∂
d x g(M x) dµG (x) = k(r)
b ∂c
y g(M y + r) dµG (y)
r
44 TOM SANDERS
Proof. This is a calculation following from the Fourier inversion formula for k and (7.6).
The left hand side is equal to
Z X Z
hr,xi
k(r)(−1)
b (∂x g) ∗ (∂x g)(y)(−1)hM x,yi dµG (y)dµG (x)
Zr X Z
= k(r)
b (∂y g) ∗ (∂y g)(x)(−1)hM x,yi (−1)hr,xi dµG (y)dµG (x)
r
Z X Z
t y+ri
= k(r)
b (∂y g) ∗ (∂y g)(x)(−1)hx,M dµG (y)dµG (x)
r
X Z
t 2
= k(r)
b ∂c
y g(M y + r) dµG (y).
r

Now, apply the claim with k = h ∗ h to get
Z X Z
2 2 t 2
(7.7) h ∗ h(x)∂x g(M x) dµG (x) =
d h(r)
b ∂
d x g(M x + r) dµG (x).
r
On the other hand

XZ X
t 2 2 2 2
∂ x g(M x + r) dµG (x) = ∂x g(γ) = g ∗ g (x) = 1,
d d
r γ∈G
b
and so the measure P defined by

XZ
t 2
P(R) := ∂
d x g(M x + r) dµG (x)
r∈R
is a probability measure. Hence we can apply Cauchy-Schwarz in (7.7) to get that

X Z
2 t 2 b 2
h(r)
b ∂
d x g(M x + r) dµG (x) = E(h(r) )
r
Z !2
X
h(r))2 =
> (Eb h(r)
b ∂
d t 2
x g(M x + r) dµG (x) ,
r
where E denotes integration against the probability measure P.

What we have shown, then, is that
Z Z !2
X
2 t 2
h ∗ h(x)∂
dx g(M x) dµG (x) > h(r) ∂
b dx g(M x + r) dµG (x)
r
Z 2
2
= h(x)∂x g(M x) dµG (x) ,
d
where the last equality is from the claim taken with k = h.

Now note that

1{y:M t y=M y} (x) = h ∗ h(x)h(x) > h ∗ h(x),
so it follows that
Z Z
2 2
∂
dx g(M x) dµG (x) > h ∗ h(x)∂d x g(M x) dµG (x)
M t x=M x
Z 2
2
> h(x)∂
d x g(M x) dµG (x) .
To lower bound the right hand side we have the following claim.
Claim.
(7.8) ∂x g(r) = 0 unless hr, xi = 0F2 .
d
Proof. By definition we have

Z
∂x g(r) =
d g(y)g(x + y)(−1)hr,yi dµG (y)
Z
hr,xi
= (−1) g(z + x)g(z)(−1)hr,zi dµG (z) = (−1)hr,xi ∂
d x g(r),
and the claim follows.

In particular, h(x) = 1 if ∂x g(M x) 6= 0 and so
d
Z Z
2 2 −O(1)
h(x)∂x g(M x) dµG (x) = ∂
d d x g(M x) dµG (x) > exp(−O( )).
Let U be the space generated by {x : M t x = M x}, and let Bx = M x for all x ∈ U . B is

symmetric by construction and can be extended to the whole space while preserving this
symmetry, whence Z
2 −O(1)
∂
d x g(Bx) dµG (x) > exp(−O( )).
Finally, note that ∂x g(Bx) = 0 unless hBx, xi = 0F2 by (7.8), but since B is symmetric
d
hBx, xi = hr, xi, where r is the vector on the diagonal of B, whence
Z
2 −O(1)
∂
d x g(Bx) dµG (x) > exp(−O( )).
hr,xi=0F2
We now define Sx = Bx + hx, rir, which is symmetric and Sx = Bx if hr, xi = 0F2 . Finally
S has zero diagonal: write r0 for the diagonal of S, so that hr0 , xi = hSx, xi for all x. So
hr0 , xi = hSx, xi = hBx, xi + hx, ri2 = hx, ri + hx, ri2 = 0F2 for all x ∈ G,
and hence r0 = 0G . The result is proved.
The argument for converting the matrix M into the symmetric matrix S with zero
diagonal is, for obvious reasons, called the symmetrisation argument.
Proof of Theorem 7.8. This follows immediately on combining Lemma 7.12 with Lemma
7.9.
46 TOM SANDERS
Acknowledgement
The author should like to thank all those who have provided feedback, comments and
corrections, and particularly the students who took the course in the academic year 2010–
2011.
Appendix A. The integral triangle inequality

In this brief section we shall prove the integral triangle inequality, more commonly
referred to as the integral Minkowski inequality which is used in the proof of Beckner’s
inequality (Theorem 4.2).
Lemma A.1 (Integral triangle inequality). Suppose X and Y are finite sets, q ∈ [1, ∞]
and f : X × Y → C. Then we have
Z Z q 1/q Z Z 1/q
q
|f (x, y)|dµY (y) dµX (x) 6 |f (x, y)| dµX (x) dµY (y).
Proof. Define auxiliary functions gy by gy (x) := |f (x, y)|, and then note that the inequality
is simply the statement
Z Z
k gy dµY (y)kLq (X) 6 kgy kLq (X) dµY (y).
On the other hand since Y is finite this is just a weighted sum and hence follows from the
usual triangle inequality for Lq (X).
References
[Bon70] A. Bonami. Étude des coefficients de Fourier des fonctions de Lp (G). Ann. Inst. Fourier (Greno-
ble), 20(fasc. 2):335–402 (1971), 1970.
[BS94] A. Balog and E. Szemerédi. A statistical theorem of set addition. Combinatorica, 14(3):263–268,
1994.
[Cha02] M.-C. Chang. A polynomial bound in Freı̆man’s theorem. Duke Math. J., 113(3):399–419, 2002.
[Fox11] J. Fox. A new proof of the graph removal lemma. Ann. of Math. (2), 174(1):561–579, 2011,
arXiv:1006.1300.
[Fre73] G. A. Freı̆man. Foundations of a structural theory of set addition. American Mathematical Soci-
ety, Providence, R. I., 1973. Translated from the Russian, Translations of Mathematical Mono-
graphs, Vol 37.
[Gow98] W. T. Gowers. A new proof of Szemerédi’s theorem for arithmetic progressions of length four.
Geom. Funct. Anal., 8(3):529–551, 1998.
[Gow01] W. T. Gowers. A new proof of Szemerédi’s theorem. Geom. Funct. Anal., 11(3):465–588, 2001.
[Gre02a] B. J. Green. Arithmetic progressions in sumsets. Geom. Funct. Anal., 12(3):584–597, 2002.
[Gre02b] B. J. Green. Restriction and Kakeya phenomena. Available at www.dpmms.cam.ac.uk/~bjg23,
2002.
[Gre05] B. J. Green. A Szemerédi-type regularity lemma in abelian groups, with applications. Geom.
Funct. Anal., 15(2):340–376, 2005.
[GT08] B. J. Green and T. C. Tao. An inverse theorem for the Gowers U 3 (G) norm. Proc. Edinb. Math.
Soc. (2), 51(1):73–153, 2008.
[GT09] B. J. Green and T. C. Tao. Freı̆man’s theorem in finite fields via extremal set theory. Combin.
Probab. Comput., 18(3):335–355, 2009.
[HB87] D. R. Heath-Brown. Integer sets containing no arithmetic progressions. J. London Math. Soc.
(2), 35(3):385–394, 1987.
[KKL88] J. Kahn, G. Kalai, and N. Linial. The influence of variables on Boolean functions. In Proceedings
of the 29th Annual Symposium on Foundations of Computer Science, pages 68–80, Washington,
DC, USA, 1988. IEEE Computer Society.
[Nel73] E. Nelson. The free Markoff field. J. Functional Analysis, 12:211–227, 1973.
[Rot52] K. F. Roth. Sur quelques ensembles d’entiers. C. R. Acad. Sci. Paris, 234:388–390, 1952.
[Rot53] K. F. Roth. On certain sets of integers. J. London Math. Soc., 28:104–109, 1953.
[Rud90] W. Rudin. Fourier analysis on groups. Wiley Classics Library. John Wiley & Sons Inc., New
York, 1990. Reprint of the 1962 original, A Wiley-Interscience Publication.
[Ruz91] I. Z. Ruzsa. Arithmetic progressions in sumsets. Acta Arith., 60(2):191–202, 1991.
[Ruz99] I. Z. Ruzsa. An analog of Freı̆man’s theorem in groups. Astérisque, (258):xv, 323–326, 1999.
Structure theory of set addition.
[Sam07] A. Samorodnitsky. Low-degree tests at large distances. In STOC’07—Proceedings of the 39th
Annual ACM Symposium on Theory of Computing, pages 506–515. ACM, New York, 2007.
[SSV05] B. Sudakov, E. Szemerédi, and V. H. Vu. On a question of Erdős and Moser. Duke Math. J.,
129(1):129–155, 2005.
[Sze90] E. Szemerédi. Integer sets containing no arithmetic progressions. Acta Math. Hungar., 56(1-
2):155–158, 1990.
[Tao08] T. C. Tao. Product set estimates for non-commutative groups. Combinatorica, 28(5):547–594,
2008.
[TV06] T. C. Tao and H. V. Vu. Additive combinatorics, volume 105 of Cambridge Studies in Advanced
Mathematics. Cambridge University Press, Cambridge, 2006.
Mathematical Institute, University of Oxford, 24-29 St. Giles’, Oxford OX1 3LB,
England
E-mail address: tom.sanders@maths.ox.ac.uk

Notes

Uploaded by

Notes

Uploaded by

ANALYSIS OF BOOLEAN FUNCTIONS

Focussing on the example of Lagrange’s theorem, we should like to show that

and `p (X), the same space of functions with the norm

The Lebesgue spaces satisfy a useful nesting of norms property:

Finally we have Hölder’s inequality that

Example (Convolution of random sets). Suppose that x ∈ G is placed in the set A

But then by translation invariance of µG we have

1.4. The Fourier transform. For G := Fn2 we write G

and so that it is completely clear if µ is a measure on G then

and the special case when f = g, called Parseval’s theorem:

The Fourier transform is so useful because it is an (essentially unique) change of basis

by the change of variables z = y − x. Thus,

It follows by Parseval’s theorem that

µG (supp f )| supp fb| > 1

2. Subspaces, sumsets and counting solutions to equations

6 (β)2 k1A k2L2 (G) = 2 β 2 α.

This may be seen as giving us a low complexity approximation to 1B . In particular, we

As usual 1c b ) = α, and if supγ6=0 b |1A (γ)| 6 α then

for any permutation {i, j, k} of {0, 1, 2}.

The result now follows by Parseval’s theorem.

Proof. We define a sequence of subspaces iteratively letting V0 = G, and assuming we have

The result follows on setting V := Vi and V 0 := Vi+1 .

where the last inequality is since 1A ∗ µV (x) 6 /4 on A01 .

For A2 note that

= k1A ∗ µV 0 − 1A ∗ µV k2L2 (G) 6 δα,

for all i ∈ {0, 1, 2}. We put Si := A ∩ (xi + V ), so that

µG (Si ) = 1A ∗ µV (xi )µG (V ),

and put fi := (1A − 1A ∗ µV )|xi +V and gi := (1A − 1A ∗ µV 0 )|xi +V . By Cauchy-Schwarz we

(2.1) kfi − gi k2L1 (G) 6 µG (V )kfi − gi k2L2 (G)

since supp(fi − gi ) ⊂ xi + V . However,

Now 1A ∗ µV (y) = 1A ∗ µV (xi ) if y ∈ xi + V , whence

Combining this with (2.1) we get that

kfi − gi k2L1 (G) 6 µG (V )2 |1A ∗ µV 0 − 1A ∗ µV (xi )|2 ∗ µV (xi ) 6 4δ−1 µG (V )2 .

The first of these terms is at least (/4)3 µG (V )2 ; the second has

3. Sums of independent random variables

for all Λ0 ⊂ Λ and z : Λ0 → {−1, 1}.

Now, if Λ is algebraically independent then none of the γs in Λ are identically 1 and so

The result follows.

Although we call this Rudin’s inequality, it is properly called Kintchine’s inequality as

Proposition 3.3. Suppose that Λ ⊂ G b is independent and p ∈ (1, 2]. Then

Then by Plancherel’s theorem and Hölder’s inequality we have

kfbk2`2 (Λ) = hfb, fbi`2 (Λ) = hfb, gbi`2 (G)

where p0 is conjugate to p. We then apply Rudin’s inequality to see that

and so there is some x ∈ G such that S ∩ (x + H) = ∅, whence A + A ⊃ x + H. Thus we

by the triangle inequality. As before we apply Parseval to get that

Dividing out by |1bS | (which is at most σ by Hausdorff-Young) we get that

Now Hölder’s inequality tells us that

xi ) ∩ Vi + (A − xi ) ∩ Vi contains an affine subspace of dimension k; or there is a subspace

On the other hand the Cauchy-Schwarz inequality tells us that

2See the exercises.

6 K 2 + Kd max kviL kL2 (X) + Kd max kwjL kL2 (X) .

On the other hand, by Rudin’s inequality for p = 4 we have that

fj L . It follows that there is a choice of K = O(1) such that the maxima

Finally K2d 6 2n for all d whence

4. Voting, influence and Boolean functions

Example (Parity). The parity function if the function

Example (Juntas). A function is a k-junta if there is some subset S ⊂ {1, . . . , n} such

However, by the inductive hypothesis we have

and similarly for g − h, hence (combining everything we have done so far)

In particular, notice by Parseval that

Example (Influence in dictatorships). If f is a dictatorship with i the dictator then σi (f ) =

Proof. By Beckner’s inequality with  = 1/2 we have that

and it follows immediately that

and we are done by averaging.

Write I := {i : σi (f ) > τ } (whence |I| 6 τ −1 C) and let V = {γi : i ∈ I}⊥ , so that if

5. Approximate structures: sets with small sumset

Proof. Suppose that x ∈ Sym1− (A) and y ∈ Sym1−0 (A). Then

and the result follows on dividing out µG (A)2 .

At a certain point the phenomenon in Proposition 5.1 starts to fail. Indeed, if A =

Proposition 5.7. Suppose that A has |A + A| 6 K|A|. Then

|n Symc (A) + A + A| 6 c−n K n+1 |A|.

Proof. Suppose that s1 , . . . , sn ∈ Symc (A) and a, a0 ∈ A, and note that

On the other hand if (b1 , . . . , bn ) ∈ Gn and we put

z1 = a + b1 , z2 = b1 + s1 + b2 , . . . , zn = bn−1 + sn−1 + bn , zn+1 = bn + sn + a0

and the result follows on rearrangement.

|4S| 6 |4S + 2A| 6 (2K)4 K 5 |A| = O(K O(1) |S|).

6 (β)2 k1A k2L2 (G) = 2 β 2 α.

As usual 1c b ) = α, and if supγ6=0 b |1A (γ)| 6 α then

where the last inequality is since 1A ∗ µV (x) 6 /4 on A01 .

kfi − gi k2L1 (G) 6 µG (V )2 |1A ∗ µV 0 − 1A ∗ µV (xi )|2 ∗ µV (xi ) 6 4δ−1 µG (V )2 .

The first of these terms is at least (/4)3 µG (V )2 ; the second has

Proof. By Beckner’s inequality with = 1/2 we have that

Proof. Suppose that x ∈ Sym1− (A) and y ∈ Sym1−0 (A). Then

|B| 6 |A0 |2 . The result is proved.