Notes
Notes
TOM SANDERS
1. Introduction
There are two sources of problems which motivate the ideas in this course: the first is
additive number theory, and the second is computer science. In number theory it is the
many and varied questions on the representation of integers which we shall want to keep
at the back of our minds.
It will be helpful to have a little notation. For sets of integers A and B we write
A + B := {a + b : a ∈ A, b ∈ B},
and then if k ∈ N we put
kA := A + · · · + A,
where the sum is k-fold, and
k.A := {ka : a ∈ A}.
Note that kA and k.A are very different beasts, for example 2N is the set of natural numbers
bigger than 1, whereas 2.N is the set of even natural numbers.
We turn to some examples.
Theorem (Lagrange’s theorem). Every non-negative integer can be written as the sum of
four squares, that is N0 = 4S where S := {0, 1, 4, 9, . . . }.
Conjecture (Goldbach’s conjecture). Every even integer bigger than 2 can be written as
the sum of two primes, that is 2.N \ {2} = 2P where P := {2, 3, 5, 7, 11, . . . }.
Theorem (Roth’s theorem). Every subset of the integers of positive relative density con-
tains three distinct elements in arithmetic progression.
The problems above involve showing the existence of something and often when we try
to do this it is helpful to show that there are many of that thing by counting. To this end
we introduce the notion of convolution: given sets of integers A and B we write
X
1A ∗ 1B (x) := 1A (z)1B (y)
z+y=x
and call it the convolution of 1A with 1B . What is important about convolution is that
A + B = supp 1A ∗ 1B := {x : 1A ∗ 1B (x) 6= 0},
the support of 1A ∗ 1B .
Last updated : 28th April, 2012.
1
2 TOM SANDERS
1.1. The dyadic groups. Throughout the course we shall write G for a finite dyadic
group, that is a group in which every element has order 2. It is an exercise to check that
any such G is isomorphic to (the additive group of) Fn2 for some n, where F2 is the field with
two elements. In view of this we shall often put G := Fn2 , and tend to use the languages
of vector spaces for discussing these groups, so that subgroup are (vector) subspaces and
cosets of subgroups are affine subspaces
One of the key ideas of harmonic analysis is to analyse a group G through the space of
functions on G and to this end we introduce the Lebesgue spaces.
1.2. Lebesgue spaces. Given a finite set X there are two classes of Lebesgue spaces
which we shall be interested in corresponding to two natural measures on X. The first is
counting measure δX defined by
δX (A) := |A| for all A ⊂ X;
the second is normalised1 counting measure µX defined by
|A|
µX (A) = for all A ⊂ X.
|X|
For obvious reasons we call δX (A) the size of the set A, and µX (A) the density of the set
A. Note that for a set A ⊂ X the measure µA can be decomposed as
1 X
µA = δ{a}
|A| a∈A
and as the measure induced by the map
Z
1 1A
f 7→ f 1A dµX , or equivalently µA = µX .
µX (A) µX (A)
1Normalised here refers to the fact that the integral is 1; it isR normalised toR have norm 1 with respect
to the natural norm on measures µ ∈ M (X) defined by kµk := d|µ| := sup{ f dµ : kf kL∞ (X) 6 1}. In
particular it does not refer to the L2 -norm, which, to the extent that it makes sense, has kµX k2L2 (X) =
|X|−1 .
ANALYSIS OF BOOLEAN FUNCTIONS 3
The Lebesgue spaces of interest are Lp (X), the space of real valued functions on X with
norm defined by
Z 1/p !1/p
1 X
kf kLp (X) := |f |p dµX = |f (x)|p ,
|X| x∈X
and so kf kLp (X) tends to kf kL∞ (X) from below as p → ∞ and kf k`p (X) tends to kf k`∞ (X)
from above in the same limit.
When p = 2 the spaces are, of course, Hilbert spaces so that they have an inner product
and we write these
Z
1 X
hf, giL2 (X) := f gdµX = f (x)g(x) for all f, g ∈ L2 (X),
|X| x∈X
and Z X
hf, gi`2 (X) := f gdδX = f (x)g(x) for all f, g ∈ `2 (X).
x∈X
1.3. Convolution. Suppose that G := Fn2 and f, g ∈ L1 (G). Then we define the convolu-
tion of f and g with a slightly different normalisation to before:
Z
1 X
f ∗ g(x) := f (x − y)g(y)dµG (y) = f (z)g(y) for all x ∈ G,
|G| z+y=x
and so if µ and ν are measures on G then
Z
µ ∗ ν(E) = 1E (x + y)dµ(x)dν(y) for all E ⊂ G.
Since our groups are always finite we often make the abuse of writing µ(x) for µ({x}).
We shall frequently find ourselves changing the order of integration (really summation),
and here we get that Z Z Z
f ∗ gdµG = f dµG gdµG .
It turns out that with absolute value signs this becomes a special case of Young’s inequality.
In general by Young’s inequality we shall mean the statement
kf ∗ gkLr (G) 6 kf kLp (G) kgkLq (G) for all f ∈ Lp (G) and g ∈ Lq (G)
for a triple p, q, r provided 1 + r−1 = p−1 + q −1 . Of particular interest are the cases
p, q = 2 and r = ∞ which encodes the idea that the convolution of two functions in L2 is
‘continuous’, and p, q, r = 1 which tells us that L1 is ‘closed under convolution’.
As a check of understanding it may be helpful to note that
Z
f ∗ f (0G ) = f (−x)f (x)dµG (x) = kf k2L2 (G)
since −x = x in G. It follows that Young’s inequality certainly can’t be any better than
this for r = ∞ and p, q = 2 and, in fact, it is relatively easy to see that it is tight by
considering δ-functions.
As before the crucial identity for us is that
A + B = supp 1A ∗ 1B .
Indeed, it is far easier to analyse the function 1A ∗ 1B than 1A+B because the former is far
smoother: it is literally an average of 1A over translates of B. Indeed, there is a useful
maxim here that the more times you convolve the smoother a function becomes.
It will be instructive to bear some examples in mind.
Example (Convolution of subspaces). Suppose that W = x + V is an affine subspace with
vector subspace V . We see immediately that supp 1W ∗ 1W = W + W = V (since we are
in characteristic 2) and if y ∈ V then
Z Z
1W ∗ 1W (y) = 1W (y − z)1W (z)dµG (z) = 1W (−z)1W (z)dµG (z) = µG (W ),
so
1W ∗ 1W = µG (W )1V .
ANALYSIS OF BOOLEAN FUNCTIONS 5
Hence, either γ(y)γ 0 (y) = 1 for all y ∈ G, whereupon γ = γ 0 and we have that kγkL2 (G) = 1;
or γ(y)γ 0 (y) = −1 for some y and we conclude that the inner product is 0. We write this
formally as
(
1 if γ = γ 0
hγ, γ 0 iL2 (G) =
0 otherwise.
The characters then form a basis of L2 (G) because they are orthogonal (and so linearly
independent) and there are |G| of them which is the dimension of L2 (G). Since they form
an orthonormal basis we define the Fourier transform to be the map taking f ∈ L1 (G) to
fb ∈ `∞ (G)
b determined by
Z
fb(γ) := hf, γiL2 (G) = f (x)γ(x)dµG (x),
ANALYSIS OF BOOLEAN FUNCTIONS 7
It is easy to see by the triangle inequality that we have the Hausdorff-Young inequality:
1
kfbk`∞ (G)
b 6 kf kL1 (G) for all f ∈ L (G).
Since G
b is an orthonormal basis we have the Fourier inversion formula:
X
f (x) = fb(γ)γ(x) for all x ∈ G.
γ∈G
b
More than this the change of basis is unitary and so we have Plancherel’s theorem:
2
hf, giL2 (G) = hfb, gbi`2 (G)
b for all f, g ∈ L (G),
∗ g = fb · gb.
so that f[
Example (Annihilators and the Fourier transform of subspaces). Given A ⊂ G we write
A⊥ for the annihilator of A, that is the set {γ ∈ Gb : γ(x) = 1 for all x ∈ A}, and similarly
if Γ ⊂ G b we write Γ⊥ for the annihilator of Γ, the set {x ∈ G : γ(x) = 1 for all γ ∈ Γ}.
It is immediate that annihilators are subspaces and that if V 6 G then V ⊂ (V ⊥ )⊥ ; in
fact we have equality as we shall see shortly.
From our calculation on the convolution of indicator functions of subspaces we see that
2
1cV = µG (V )1V , whence 1V (γ) takes only the values 0 and µG (V ). On the other hand, if
c c
⊥
1cV (γ) = µG (V ) then γ ∈ V and conversely so we have
(
µG (V ) if γ ∈ V ⊥
1c
V (γ) =
0 otherwise.
8 TOM SANDERS
It follows from this that µG ((V ⊥ )⊥ ) = µG (V ) and hence that V = (V ⊥ )⊥ . Finally, the
co-dimension of V is the dimension of G/V , that is n − dim V and so from the previous
dim V ⊥ = cod V.
Example (An uncertainty principle). By Hölder’s inequality, the Hausdorff-Young in-
equality and Parseval’s theorem we have that any function with unit L2 -norm has
kf kL1 (G) kfbk 1 b > kfbk ∞ b kfbk 1 b > kfbk22 b = kf k2 2 = 1.
` (G) ` (G) ` (G) ` (G) L (G)
It follows that a function cannot both have concentrated support on G (physical space)
and G b (momentum space). In particular, by Cauchy-Schwarz we have
kf kL1 (G) 6 µG (supp f )1/2 kf kL2 (G) = µG (supp f )1/2
and similarly kfbk`1 (G) b 1/2 . Thus,
b 6 | supp f |
We separate into those characters supporting large and small values of the Fourier trans-
form:
L := {γ ∈ G b : |1c
B (γ)| > β}.
Then Parseval’s theorem gives an upper bound on L:
X
|L|(β)2 6 |1c 2 c 2 b = k1B k2L2 (G) = β,
B (γ)| 6 k1B k`2 (G)
γ∈L
ANALYSIS OF BOOLEAN FUNCTIONS 9
so |L| 6 −2 β −1 . We put V := L⊥ and note that the bound on L implies that the co-
dimension of V is at most −2 β −1 . On the other hand if x ∈ V then γ(x) = 1 for all γ ∈ L
and so we have that
X
2 c 2 c b )|2 |1c 2 2 2
|1c
A (γ)| |1B (γ)| γ(x) > |1A (0G B (0G
b )| = α β
γ∈L
since 1c
A (0G
c b ) = β and hence 0 b ∈ L. On the other hand by the triangle
b ) = α, and 1B (0G G
inequality and Parseval’s theorem we have
X X
2 c 2 2 2
|1 (γ)| |1 (γ)| γ(x) 6 sup |1c
B (γ)| |1c
A (γ)|
A B
c
γ6∈L
γ6∈L γ∈G
b
It follows that T (A) > α3 /2 if 6 α/2. We think of α3 as being the ‘expected’ number
of solutions to x + y = z in a random set of density α; supγ6=0Gb |1cA (γ)| measures how far
from being random we are.
At the other end of the spectrum from appearing random we have subspaces. Of course,
if A is a vector subspace then T (A) = α2 . However, if A is an affine subspace that is not a
vector subspace then T (A) = 0 and so we see that T (A) is not always large. Despite this
we shall prove the following theorem.
Theorem 2.3 (Arithmetic removal lemma). Suppose that A ⊂ G, and that if A0 ⊂ A has
T (A0 ) = 0 then µG (A \ A0 ) > . Then T (A) = Ω (1).
This result and the approach we take to it was first developed by Green in [Gre05].
It will be useful to begin by introducing a more general tri-linear form based on T : for
functions f0 , f1 , f2 on G we put
T (f0 , f1 , f2 ) := hf0 ∗ f1 , f2 iL2 (G) ,
so that T (A) = T (1A , 1A , 1A ). Importantly we have the following lemma for governing
the behaviour of T which captures the content of our earlier argument for sets behaving
‘randomly’.
Lemma 2.4. Suppose that f0 , f1 , f2 are functions on G. Then
|T (f0 , f1 , f2 )| 6 kfbi k ∞ b kfj kL2 (G) kfk kL2 (G) 6 kfi kL1 (G) kfj kL2 (G) kfk kL2 (G) .
` (G)
for any permutation {i, j, k} of {0, 1, 2}. We apply Hölder’s inequality and Cauchy-Schwarz
to this to see that
1/2 1/2
X X
|T (f0 , f1 , f2 )| 6 sup |fbi (γ)| |fbj (γ)|2 |fbk (γ)|2 .
γ∈G
b
γ∈G
b γ∈G
b
By Parseval’s theorem and the fact that convolution goes to multiplication we have
X
k1A ∗ µVi − 1A ∗ µVi+1 k2L2 (G) = |1c 2
A (γ)| .
⊥ \V ⊥
γ∈Vi+1 i
⊥
However, the sets (Vi+1 \ Vi⊥ )i are disjoint so it follows by averaging that there is some
i = O(η −1 ) such that
X
k1A ∗ µVi − 1A ∗ µVi+1 k2L2 (G) 6 η |1c 2
A (γ)| = ηα.
γ∈G
b
Let
A1 := {x ∈ A : 1A ∗ µV (x) 6 /4}
and
A2 := {x ∈ A : |1A ∗ µV 0 − 1A ∗ µV (x)|2 ∗ µV (x) > 4δ−1 }.
We want to get upper bounds for the sizes of the sets A1 and A2 . For A1 , first note that
A01 := {x ∈ G : 1A ∗ µV (x) 6 /4}
is invariant under translation by elements of V , so 1A01 ∗ µV = 1A01 . Now
Z Z
µG (A1 ) = 1A01 1A dµG = 1A01 ∗ µV 1A dµG
1V (x − y)
Z Z
= 1A01 (y) dµG (y)1A (x)dµG (x)
µG (V )
1V (y − x)
Z Z
= 1A (x) dµG (x)1A01 (y)dµG (y)
µG (V )
Z
= 1A ∗ µV 1A01 dµG 6 /4
whence µG (A2 ) 6 /4. It follows that A0 := A \ (A1 ∪ A2 ) has T (A0 ) 6= 0, and so there is a
triple (x0 , x1 , x2 ) with x0 + x1 = x2 and
1A ∗ µV (xi ) > /4 and |1A ∗ µ0V − 1A ∗ µV (xi )|2 ∗ µV (xi ) 6 4δ−1
1V (xi − y)
Z
kfi − gi kL2 (G) = µG (V ) |1A ∗ µV 0 (y) − 1A ∗ µV (xi )|2
2
dµG (y)
µG (V )
= µG (V )|1A ∗ µV 0 − 1A ∗ µV (xi )|2 ∗ µV (xi ).
Finally,
Z
X 0 0
kb
gi k`∞ (G)
b = sup 1c
A (γ ) γ (x)1xi +V (x)γ(x)dµG (x)
γ∈G
b 0
γ 6∈V 0⊥
X 0 0 0
= sup 1c
A (γ )γ(xi )γ (xi )µG (xi + V )1V ⊥ (γ + γ ) 6 ηµG (V )
γ∈G γ 0 6∈V 0⊥
b
by Fourier inversion.
Now we examine
2
Y
2
T (1S0 , 1S1 , 1S2 ) = µG (V ) 1A ∗ µV (xi ) + 1A ∗ µV (x1 )T (f0 , 1V , 1S2 ) + T (1S0 , f1 , 1S2 ).
i=0
However, the elements of Gb can also be thought of as random variables on G with underlying
probability measure µG . A set Λ ⊂ G b is statistically independent if
Y
µG ({x : γ(x) = zγ for all γ ∈ Λ0 }) = µG ({x : γ(x) = zγ })
γ∈Λ0
Proof. Since the characters γ are homomorphisms we see that any y, z ∈ {x : γ(x) =
zγ for all γ ∈ Λ0 } have γ(y − z) = 1 for all γ ∈ Λ0 and so the set is just a translate of the
annihilator of Λ0 . Thus Λ is statistically independent iff
Y
µG (Λ0⊥ ) = µG ({γ}⊥ ) for all Λ0 ⊂ Λ.
γ∈Λ0
In view of the above theorem we shall treat sums of algebraically independent characters
as sums of independent random variables, and hence the central limit theorem will provide
a useful heuristic. This asserts that if Λ is an independent set of characters then
Z η
X √ 1
µG (x : λ(x) 6 η n) ∼ √ exp(−x2 /2)dx
λ∈Λ
2π −∞
as n → ∞. We should like to use this to estimate the probability that the sum when
η is a large negative and show that the probability is very small. However, the rate of
convergence in the central limit theorem is not very rapid, and necessarily so when η is
close to zero. Since this is not our range of interest we shall formulate some other, rather
simpler tools, for dealing with this range.
If one is interested in estimating the probability that a function (on G) takes large values
then one naturally turns to the higher moments of that function. To this end we have the
following simple inequality which may also be found in [Rud90].
Proposition 3.2 (Rudin’s inequality). Suppose that Λ ⊂ G b is independent and p ∈ [2, ∞).
Then
X √
k f (γ)γkLp (G) = O( pkf k`2 (Λ) ) for all f ∈ `2 (Λ).
γ∈Λ
Proof. By nesting of norms the result follows if we can show it for p an even integer, say
2k. In this case we can multiply out the left hand side:
X X Y 2k 2k
Z Y
2k
k f (γ)γkL2k (G) = f (γi ) γi dµG
γ∈Λ γ1 ,...,γ2k ∈Λ i=1 i=1
X 2k
Y
= f (γi ).
γ1 ,...,γ2k ∈Λ i=1
γ1 +···+γ2k =0 b
G
ANALYSIS OF BOOLEAN FUNCTIONS 15
Since Λ is independent it follows that for each summand there is a set I ⊂ {1, . . . , 2k} of
size k and bijection φQ: I → {1, . . . , 2k} \ I such that γi = γφ(i) for all i ∈ I. In this case
the summand is just i∈I |f (γi )|2 , and so
X 2k
Y X X Y
f (γi ) 6 |f (γi )|2
γ1 ,...,γ2k ∈Λ i=1 I,φ γi ∈Λ for all i∈I i∈I
γ1 +···+γ2k =0 b
G
!k
X X
2 2k
= |fb(γ)| 6 k!kf k2k
`2 (Λ) .
I,φ γ∈Λ
k
As a simple corollary of this we can refine Lemma 2.2, improving the bound on the
co-dimension of the subspace there from O(−2 β −1 ) to O(−2 log β −1 ).
Corollary 3.4 (Chang’s theorem, [Cha02]). Suppose that B ⊂ G has density β and ∈
(0, 1]. Then
cod Spec (1B )⊥ = O(−2 log β −1 ).
16 TOM SANDERS
Proof. Let Λ ⊂ Spec (1B ) be a maximal algebraically independent subset. Then we have
that Spec (1B )⊥ = Λ⊥ , and cod Spec (1B )⊥ = |Λ|. On the other hand
2 2
X
2 p 2 p 2/p
β |Λ| 6 |1c
B (λ)| = O k1B kLp (G) = O β
λ∈Λ
p−1 p−1
by the dual of Rudin’s inequality. Setting p = 1 + 1/ log β −1 then gives the result.
Chang’s theorem can be easily used to refine Bogolyubov’s lemma (Theorem 2.1), but
we shall use it to look at a harder problem: two-fold sumsets.
3.5. Application: Subspaces in sumsets. We have seen that if A has density α then
4A contains a subspace of co-dimension Oα (1). On the other hand a random set shows
that A itself need not contain a large subspace. What happens in between?
If A is highly structured or highly random then A+A contains a large subspace. However
there is an example due to Ruzsa [Ruz91] adapted to the model setting by Green [Gre02b]
which indicates some limitations.
Example (Niveau set construction). Let G := Fn2 and
√
A := {x ∈ G : x has at least n/2 + η n/2 ones in it.}.
Then Z ∞
1 1
µG (A) ∼ √ exp(−x2 /2)dx > − O(η),
2π η 2
by the central limit theorem so we think of A as√having density close to 1/2. On the other
hand, if x, y ∈ A then x + y has at most n − 2η n ones in it.
Now suppose that W is an affine subspace, say a coset of some linear subspace V , and
cod V 6 d. Then it is an exercise in linear algebra to show that W contains a vector with
at least n−d√ones in it, and it follows that A+A cannot contain a subspace of co-dimension
less than 2η n + 1.
Example (Sumsets of very large sets). Suppose that A ⊂ G has µG (A) > 1/2. Then
1A ∗ 1A (x) = µG (A ∩ (x + A))
= 2µG (A) − µG (A ∪ (x + A)) > 2µG (A) − 1 > 0
for all x ∈ G. It follows that A + A = G and so the sumset contains a subspace of
co-dimension 0.
Complementing this we have the following result due to Green. It is formally proved in
[Gre02b], although the method is that of [Gre02a].
Theorem 3.6. Suppose that A ⊂ G has density α. Then A+A contains an affine subspace
of dimension Ω(α2 n).
The argument is iterative and based around the following lemma. The proof of this
lemma uses Chang’s theorem which is not altogether surprising given the Niveau set ex-
ample.
ANALYSIS OF BOOLEAN FUNCTIONS 17
Lemma 3.7. Suppose that G := Fn2 , A ⊂ G has density α and k 6 n is a natural. Then
either A + A contains an affine subspace of dimension k or else there is a subspace V of
co-dimension O(α−2 k) such that k1A ∗ µV kL∞ (G) > α(1 + 1/2).
Proof. Write S := (A + A)c and suppose that σ := µG (S) < 2−k . Now, suppose that
H 6 G is any subspace of dimension k, then
Z Z
µG (S ∩ (x + H))dµG (x) = 1S ∗ 1H (x)dµG (x) = σµG (H) < 1/|G|,
We have 1c
A (0G
b b ) = σ, so
b ) = α and 1S (0G
X
α2 σ 6 |1c 2 b
A (γ)| |1S (γ)|
γ6=0G
b
Let V = Specα/2 (1S )⊥ and apply Chang’s theorem to bound its co-dimension from above
by O(α−2 log σ −1 ) = O(α−2 k). On the other hand µc V (γ) = 1 if γ ∈ Specα/2 (1S ) and so by
Parseval’s theorem we have
X
α2 (1 + 1/2) 6 |1c 2
A (γ)| |µ
c 2 2
V (γ)| = k1A ∗ µV kL2 (G) .
γ∈G
b
(Another way of saying this is that kM k∞→1 6 1.) The real numbers equipped with
multiplication form a 1-dimensional (real) Hilbert space and we can ask to what extent
the sequence of real numbers can be replaced with elements of a higher dimensional (real)
Hibert space.
In particular, suppose that H is a (real) d-dimensional Hilbert space. We may certainly
suppose2 that H ∼ = L2 (X) for some finite set X of size d, and hence that H = L2 (X).
Then
X Z X
(3.2) Mi,j hvi , wj iL2 (X) = Mi,j vi (x)wj (x)dµX (x).
i,j i,j
from the hypothesis (3.1). It follows that we can write Kd for the smallest constant such
that for all n × n matrices M satisfying (3.1) and all d-dimensional real Hilbert spaces H
we have X
| Mi,j hvi , wj iH | 6 Kd sup kvi kL2 (X) kwj kH .
i,j
i,j
Note that if we restrict to the (Hilbert) subspace generated by (vi )i , (wj )j , then none of
the quantities of concern change and so we have Kd 6 K2n .
In this notation our previous argument showed that Kd 6 d, whence Kd 6 min{d, 2n},
and Grothendieck’s inequality tells us that Kd is bounded by an absolute constant.
Theorem 3.9 (Grothendieck’s inequality). We have that Kd = O(1).
Proof. We continue to assume, as we may, that H = L2 (X). In general we cannot do
better than (3.3). However, if the large values of the vectors vi and wj have small L2 -mass
then we can. Let vi and wi be such that
X
Kd = | Mi,j hvi , wj iL2 (X) | and kvi kL2 (X) , kwj kL2 (X) 6 1.
i,j
Decompose the vi s and wj s into their large and small parts: vi = viL +viS and wj = wjL +wjS
where
( (
vi (x) if |vi (x)| > K wj (x) if |wj (x)| > K
viL (x) := and wjL (x) :=
0 otherwise. 0 otherwise.
Then
X X
| Mi,j hvi , wj iL2 (X) | 6 | Mi,j hviS , wjS iL2 (X) |
i,j i,j
X X
+| Mi,j hviL , wj iL2 (X) | + | Mi,j hviS , wjL iL2 (X) |
i,j i,j
Since the left hand side is just Kd , we are done if we can show that the two maxima on
the right are small for some K = O(1). Of course this is not true, but Rudin’s inequality
provides us with an isometric embedding to a space where it is.
Specifically, let G = Fd2 and (λx )x∈X be a set of d independent characters in G
b and put
1 X 1 X
vei := √ fj := √
vi (x)λx and w wj (x)λx .
d x∈X d x∈X
By Plancherel’s theorem we have that
hvei , w
fj iL2 (G) = hvi , wj iL2 (X) .
Now, writing vei = vei L + vei S and w fj L + w
fj = w fj S in the same way as before, we see that
X
(3.4) | Mi,j hvei , w wj L kL2 (G) .
fj iL2 (G) | 6 K 2 + K2d max kvei L kL2 (G) + K2d max kf
i j
i,j
20 TOM SANDERS
4.1. The connection to dyadic groups and Beckner’s inequality. The set {0, 1}n
may be endowed with a group (in fact vector space) structure in a natural way. We put
(x + y)i := xi + yi (mod 2) for all x, y ∈ {0, 1}n .
The resulting group is (isomorphic to) Fn2 , and we shall denote it by G. What is different
about this chapter is that we have implicitly chosen a set of generators – the canonical
basis of {0, 1}n , denoted e1 , . . . , en , so that ei has zeros everywhere except the ith position
where it has a 1. In particular, xi = x · ei .
If G has the above form then we let γ1 , . . . , γn be the maps x 7→ (−1)xi . The vectors
γ1 , . . . , γn are independent and so form a basis for G, b and we write |γ| for the number of
characters from {γ1 , . . . , γn } in the expression for γ.
A rather powerful tool in this setting (where a basis is specified) is called Beckner’s
inequality. Suppose G = {0, 1}n (thought of as a group of exponent 2) and ∈ (0, 1]. We
define
Y n n
Y
p (x) := (1 + γi ) = (1 + (−1)xi ).
i=1 i=1
We can then calculate the Fourier transform and see that
X Z Y
|S|
pb (γ) = γ γi dµG = |γ| .
S⊂[n] i∈S
We can now state Beckner’s inequality which should remind you of the dual of Rudin’s
inequality since pb (γi ) is on the set {γ1 , . . . , γn } and o() everywhere else except the trivial
character.
Theorem 4.2 (Beckner’s inequality). Suppose that G and p are as above. Then
1+2
kpb fbk`2 (G)
b = kp ∗ f kL2 (G) 6 kf kL1+2 (G) for all f ∈ L (G).
We shall effectively prove this by taking tensor products of the n = 1 case which we
begin with in the following lemma.
Lemma 4.3 (The two-point lemma). For all reals a, b we have
!2/(1+2 )
1+2 1+2
|a + b| + |a − b|
a2 + 2 b2 6 .
2
22 TOM SANDERS
Proof. Clearly we may assume that |a| > |b| since the right hand side is symmetric in a
and b and so dividing by |a| it follows that the result is proved if we can show that
!2/(1+2 )
1+2 1+2
(1 + y) + (1 − y)
(4.1) 1 + 2 y 2 6 for all |y| 6 1.
2
Both sides are continuous in y, so it suffices to prove the inequality for |y| < 1 where we
can use the binomial theorem to expand out the powers of (1 ± y) on the right. Specifically,
put p = 1 + 2 and note that
∞
1 1+2 1+2
X p 1 r
(1 + y) + (1 − y) = (y + (−y)r )
2 r=0
r 2
∞
X p 2l p
= y > 1 + 2 y 2
l=0
2l 2
since 2lp > 0 for all l ∈ N0 . On the other hand (1 + x)θ 6 1 + θx for all θ ∈ [0, 1] and
x > 0, which applied with x = 2 y 2 and θ = p/2 gives us (4.1). The result is proved.
Proof of Theorem 4.2. We proceed by induction on n. The base case is done by the two-
point lemma. We write Gj := {0, 1}j and define the operator Tj as follows:
j
Z Y
2 p
Tj : L (Gj ) → L (Gj ); f 7→ (1 + (−1)xi +yi )f (y1 , . . . , yj )dx1 . . . dxj ,
i=1
and suppose that we have established the jth case of the induction. Given f ∈ L2 (Gj+1 )
write f = g + γj+1 h for two functions g, h ∈ L2 (Gj ), and note that
Tj+1 f = Tj g + γj+1 Tj h.
In particular
Tj+1 f (x1 , . . . , xj+1 ) = Tj g(x1 , . . . , xj ) + (−1)xj+1 Tj h(x1 , . . . , xj ).
Now, put p = 1 + 2 and note that
Z
2
kTj+1 f kL2 (Gj+1 ) = |Tj g(x)|2 + 2 |Tj h(x)|2 dµGj (x)
2/p
|Tj (g + h)(x)|p + |Tj (g − h)(x)|p
Z
6 dµGj (x)
2
by the two-point lemma. Now, put X := Gj and Y := {0, 1} and define k on X × Y by
(
|Tj (g + h)(x)|p if y = 0
k(x, y) := ,
|Tj (g − h)(x)|p if y = 1
so that
2/p 2/p
|Tj (g + h)(x)|p + |Tj (g − h)(x)|p
Z Z Z
dµGj (x) = |k(x, y)|dµY (y) dµX (x).
2
ANALYSIS OF BOOLEAN FUNCTIONS 23
By the integral triangle inequality (see Appendix A for a statement and proof) with q = 2/p
we get that
!2/p
Z Z 2/p Z Z p/2
2/p
|k(x, y)|dµY (y) dµX (x) 6 |k(x, y)| dµX (x) dµY (y)
Z p/2
1 2
= |Tj (g + h)(x)| dµGj (x)
2
Z p/2 !!2/p
+ |Tj (g − h)(x)|2 dµGj (x) .
Again the name here is not quite right. Theorem 4.2 is more properly attributed to
Bonami [Bon70], although some point to Nelson [Nel73]. The additive combinatorial liter-
ature often refers to it as Beckner’s inequality and much of the computer science literature
to it as the Bonami-Beckner inequality.
4.4. Influence. Given a Boolean function f , the influence of voter i is denoted σi (f ) and
is defined to be the probability that i changing his vote effects the outcome if all voters
vote uniformly at random:
Z
σi (f ) := |fi |2 dµG (x) where fi (x) = f (x) − f (x + ei ).
One of the central questions we want to ask is whether there is always a voter with large
influence. In some sense this is obviously not the case: if f has very small variance then
clearly no voter has a large influence. However, if we insist that the voting scheme is ‘fair’
in the sense that Z
f dµG ∼ 1/2
then we might hope to find an influential voter. A trivial estimate follows from (4.2): the
total influence is
n
X X X
I(f ) := σi (f ) = 4 |γ||fb(γ)|2 > 4 |fb(γ)|2 = 4 Var(f ).
i=1 γ∈G
b γ6=0G
b
If f is ‘fair’ then this is asymptotically 1 and hence, by averaging, there is a voter with
influence at least Ω(1/n). It turns out that there are examples where no voter has much
more influence than this.
Example (The tribes example). Suppose that
f (x) := x1 . . . xk ∨ xk+1 . . . x2k ∨ · · · ∨ x(r−1)k+1 . . . xrk
where ∨ denotes logical OR. First we need to determine a relationship between r and k
that will make this function ‘fair’. Specifically, then, we put
Z
1
∼ f dµG = 1 − (1 − 2−k )r .
2
This tells us that we want to take r ∼ 2k log 2, and of course putting n := rk we find that
n
k = log2 n − log2 log n + O(1) and r ∼ .
log2 n
Now, by symmetry all voters have the same influence: so, for example, x1 influences the
outcome iff x(i−1)k+1 . . . xik = 0 for all i ∈ {2, . . . , r} and xi = 1 for all i ∈ {2, . . . , k}.
The probability that x(i−1)k+1 . . . xik = 0 for some i is 1 − 2−k , so the probability that x1
influences the outcome is
1−k −k r−1 1−k −k log n
2 (1 − 2 ) 6 2 exp(−2 (r − 1)) = O .
n
That is to say, none of the voters has very much influence.
Interestingly, Beckner’s inequality was used by Kahn, Kalai and Linial in [KKL88] to
establish that the tribes upper bound has a matching lower bound although the argument
is more complicated than the trivial averaging earlier.
Theorem 4.5 (KKL). Suppose that f is a Boolean function with Var(f ) = Ω(1). Then
there is some i such that
log n
σi (f ) = Ω .
n
ANALYSIS OF BOOLEAN FUNCTIONS 25
= 4d−1 kp1/2 ∗ fi k2L2 (G) 6 4d−1 kfi k2L5/4 (G) = 4d−1 σi (f )8/5 .
It follows that if σi (f ) 6 n−5/6 for all i (which we may as well assume since otherwise we’d
be done) then X
|γ||fb(γ)|2 6 4d−1 n.(n−5/6 )8/5 6 4d−1 n−1/3 .
|γ|6d
Now pick d = Ω(log n) such that this is at most Var(f )/2 for sufficiently large n. Then
X X
Var(f ) = |fb(γ)|2 6 |fb(γ)|2 + Var(f )/2,
γ6=0G
b γ:|γ|>d
and so
X X C
|fb(γ)|2 6 4d 2−2|γ| |fb(γ)|2 + .
d
γ6∈V ⊥ γ6∈V ⊥ :|γ|6d
26 TOM SANDERS
where the fi s are as in the previous proof. Combining all this we conclude that
C
kf − gk2L2 (G) 6 C4d τ 3/5 + .
d
Putting d = d2Cη −1 e we see that we can take τ = exp(−O(Cη −1 )) and ensure that the
difference is at most η. The result is proved since g is clearly an O(τ −1 C)-junta.
Note that in both proofs we were only interested in using Beckner’s inequality for some
< 1 − Ω(1), rather than a very small value as we did in Chapter 3. On the other hand
we needed the additional strength of Beckner to estimate the `2 -mass of fb on the sets
{γ : |γ| = d}.
(n+1) 1 X
12A (s1 + · · · + sn + a + a0 ) = 12A (z1 ) . . . 12A (zn+1 ).
|G|n z1 +···+zn+1 =s1 +···+sn +a+a0
then
z1 + · · · + zn+1 = s1 + · · · + sn + a + a0
and each vector b ∈ Gn determines a unique vector z ∈ Gn+1 . It follows that
(n+1) 1 X
12A (s1 + · · · + sn + a + a0 ) > 12A (a + b1 ) . . . 12A (bn + sn + a0 )
|G|n b ,...,b
1 n
1 X
> n
1A (a)1A (b1 ) . . . 1A (bn + sn )1A (a0 )
|G| b ,...,b
1 n
n
Y
= 1A ∗ 1A (si ) > (cµG (A))n .
i=1
Thus
Z
n (n+1)
µG (n Symc (A) + A + A)(cµG (A)) 6 12A dµG = K n+1 µG (A)n+1
Proof of Theorem 5.4. By Lemma 5.3 we see that S := Sym1/2K (A) has |S| > |A|/2K.
Then by Proposition 5.7 we have that
It follows by Corollary 5.6 that the group generated by S has size at most
|hSi| 6 exp(O(K O(1) ))|S| 6 exp(O(K O(1) ))|4S + 2A| 6 exp(O(K O(1) ))|A|.
Finally, by Ruzsa’s covering lemma we have a set X of size O(K O(1) ) such that
A ⊂ X + S + S ⊂ hXi + hSi
and in words it is the number of additive quadruples in A, that is the number of quadruples
(x, y, z, w) ∈ A4 such that x + y = z + w. It is easy to see that
X X X
(6.1) E(A) 6 sup 1A (x)1A (y) · 1A (x)1A (y) 6 |A| · |A|2 = |A|3 .
u
x+y=u u x+y=u
which is to say it is close to the maximum value in (6.1). Of course this maximum in (6.1)
is achieved by affine subspaces since another way of thinking of the quantity E(A) is as
the number of triples (x, y, z) ∈ A3 which have x + y − z ∈ A; if a non-empty set A has
this property, it is well known that it is a coset of a subgroup.
The additive energy is a particularly useful quantity because it is very stable under small
perturbations of the underlying set. Indeed, if we add or remove o(|A|) elements from A
then the additive energy changes by o(|A|3 ) which is not much if E(A) = Ω(|A|3 ). On the
other hand, because of this stability under small perturbations, large additive energy does
not imply small doubling.
To see this concretely consider, the example of A as a subspace V union η|A| independent
elements of G whose span intersects V in the trivial vector. It is easy to see that
E(A) > (1 − O(η))|A|3 and |A + A| = Ω(η|A|2 ).
In this example, A nevertheless has a large structured part and fortunately this can be
recovered.
Theorem 6.1 (Balog-Szemerédi-Gowers). Suppose that A ⊂ G satisfies E(A) > c|A|3 .
Then there is a subset A0 ⊂ A with |A0 | > cO(1) |A| such that |A0 + A0 | 6 c−O(1) |A0 |.
We shall prove this using symmetry sets because in this regard it does turn out that sets
with large additive energy behave in a similar way to sets with small sumset. Indeed, the
reader may wish to compare the next lemma with Lemma 5.3.
ANALYSIS OF BOOLEAN FUNCTIONS 31
and so, by the Cauchy-Schwarz inequality, we have that E|A0 |2 > c2 |A|2 .
32 TOM SANDERS
It follows that
1 1
E(|A0 |2 − −1 |B|) > E|A0 |2 > c2 |A|2
2 2 √
and we may pick X such that |A | − |B|) > c |A| /2, and hence |A0 | > c|A|/ 2 and
0 2 −1 2 2
if words, there are at least |A00 |/3 elements z ∈ A00 such that x + z ∈ Symc2 /12 (A) and
y + z ∈ Symc2 /12 (A). We conclude that if x, y ∈ A0 then
Z
1A ∗ 1A ∗ 1A ∗ 1A (x + y) = 1A ∗ 1A (x + y + z)1A ∗ 1A (z)dµG (z)
Z
= 1A ∗ 1A (x + z)1A ∗ 1A (y + z)dµG (z)
Z
> 1A ∗ 1A (x + z)1A ∗ 1A (y + z)1A00 (z)dµG (z)
By the Balog-Szemerédi-Gowers lemma there is a set A0 ⊂ A with |A0 | > cO(1) |A| and
|A0 + A0 | 6 c−O(1) |A0 |.
34 TOM SANDERS
It follows that there is some coset W = (w, z) + H with (w, z) ∈ hA0 i such that
|((w, z) + H) ∩ A0 | > exp(−O(c−O(1) ))|A0 | = exp(−O(c−O(1) ))|G|.
Since π(H) = π(hA0 i) we see that w + π(H) = π(H), so we may assume that w = 0G and
hence
|((0G , z) + H) ∩ A| > |((0G , z) + H) ∩ A0 | > exp(−O(c−O(1) ))|G|.
If z were also to be 0G we’d be done as we could define θ to be the linear extension of
θ(bi ) = vi ; unfortunately this is not true, but we can correct the situation.
Claim. We may assume that there is some j with 1 6 j 6 m such that for at least 1/4 of
elements a ∈ π(((0G , z) + H) ∩ A) we have a · bj = 1.
Proof. Write P := π(((0G , z) + H) ∩ A) which has µG (P ) > exp(−O(c−O(1) )), and define
characters λi (x) := (−1)x·bi . Suppose that |1c P (λi )| > µG (P )/2 for all 1 6 i 6 m. Then by
Chang’s theorem, since the b1 , . . . , bm are independent, we have
n − O(log c−1 ) 6 m = O(log µG (P )−1 ) = O(c−O(1) ).
It follows that n = O(c−O(1) ) and we are trivially done since the lower bound in the
conclusion simply asserts that φ(x) = θ(x) for at least one x; defining such a θ is trivial. It
follows that we may assume there is some 1 6 j 6 m such that |1c P (λj )| 6 µG (P )/2. Now,
write
P + := {x ∈ P : x · bj = 1} and P − := {x ∈ P : x · bj = 0}.
4The reader may care to compare this use of maximality with that in Ruzsa’s covering lemma, Lemma
5.5.
ANALYSIS OF BOOLEAN FUNCTIONS 35
From the definition of the Fourier transform and P as a disjoint union of P + and P − we
have
|µG (P + ) − µG (P − )| 6 µG (P )/2 and µG (P + ) + µG (P − ) = µG (P ).
It follows that µG (P + ) > µG (P )/4, that is a · bj = 1 for at least 1/4 of the elements
a ∈ P.
Finally, we extend the vectors b1 , . . . , bm by bm+1 , . . . , bn such that b1 , . . . , bn is a basis
for G and define θ by linear extension from its definition on the basis (bi )i :
vi
if 1 6 i 6 m, i 6= j
θ(bi ) = vj + z if i = j
0
G otherwise.
Now, suppose that a ∈ π(((0G , z) + H) ∩ A) has a · bj = 1. Then
X X
a= bi and φ(a) = z + vi .
i:a·bi =1 i:a·bi =1
and there are many such a by the claim and the lower bound on the size of π(((0G , z) +
H) ∩ A).
7. Polynomial testing
In this section we shall consider polynomials p : Fn2 → F2 . These are, of course, the same
as Boolean functions in the sense that if we are given A ⊂ Fn2 then there is a polynomial
p : Fn2 → F2 such that {x : p(x) = 1} = A. Indeed, we simply define
X Y Y
p(x) = xi (1 − xi ).
a∈A i:a·ei =1 i:a·ei =0
7.1. Correlation with linear polynomials. In the first instance we should like to see
if f has a large (affine) linear part, meaning whether it correlates with a function of the
form
x 7→ a · x + b
where a ∈ Fn2 , b ∈ F2 and a · x = a1 x1 + · · · + an xn . If f has this form then it is easy to see
(by linearity) that
(7.1) f (x) + f (x + y) + f (x + z) + f (x + y + z) = 0F2
36 TOM SANDERS
for all x, y, z ∈ Fn2 . Bearing in mind the Rough Morphism Theorem (Theorem 6.4), we
shall be interested in what we can say if this equality is satisfied an unusually large amount
of the time. We define
Z
4
kgkU 2 := g(x)g(x + y)g(x + z)g(x + y + z)dµG (x)dµG (y)dµG (z),
g k4`4 (G)
= hg ∗ g, g ∗ giL2 (G) = kb b .
Now, the proportion of triples for which (7.1) holds is
Z
1 1 1
P (f ) := (1 + (−1)f (x)+f (x+y)+f (x+z)+f (x+y+z) )dµG (x)dµG (y)dµG (z) = + kgk4U 2
2 2 2
where g = (−1)f . Since k · kU 2 is a norm we see that (7.1) holds at least half the time for
any function f : Fn2 → F2 ; it turns out if it holds an absolute proportion more of the time
then f correlates with a linear polynomial.
Theorem 7.3 (U 2 -inverse theorem). Suppose that f : Fn2 → F2 has k(−1)f kU 2 > . Then
there is a linear polynomial p : Fn2 → F2 such that
h(−1)f , (−1)p iL2 (G) > O(1) .
Proof. We put g := (−1)f and note that by hypotheses, the previous Lemma and Parseval’s
theorem we have
4 6 kgk4U 2 = kb
g k4`4 (G) g (γ)|2 kb
b 6 sup |b g k2`2 (G) g (γ)|2 kgk2L2 (G) .
b = sup |b
γ∈G
b γ∈G
b
Now, since γ is a character there is some a ∈ Fn2 such that γ(x) = (−1)a·x ; since σ ∈ {−1, 1}
there is some b ∈ F2 such that σ = (−1)b . It follows that the linear polynomial p defined
by p(x) = a · x + b satisfies
hg, (−1)p iL2 (G) = hg, γ.(−1)b iL2 (G) = σb
g (γ) > 2 ,
and the result is proved.
The point is that if we are given a black box into which we can input values of x and
which outputs f (x), then in a number of steps independent of the size of the underlying
group, we can determine (with, say, 99% reliability) if k(−1)f kU 2 is large or not and hence
whether f correlates with a linear polynomial.
It turns out that k · kU k is a norm for k > 2, and to prove this we need an analogue of the
Cauchy-Schwarz inequality. Given a family of functions (fω )ω∈{0,1}k we define the Gowers
inner product to be
Z Y k
Y
h(fω )ω∈{0,1}k iU k := fω (x + ω · h)dµG (x) dµG (hi ).
ω∈{0,1}k i=1
38 TOM SANDERS
Lemma 7.5 (Gowers-Cauchy-Schwarz inequality). For all families of functions (fω )ω∈{0,1}k
we have Y
|h(fω )ω∈{0,1}k iU k | 6 kfω kU k .
ω∈{0,1}k
The proof here is notationally heavy; readers interested in an alternative source may
wish to consult [TV06, p419] or [Gow01, Lemma 3.8].
Proof. Note that the inner product is equal to
Z Z Y
fω,0 (x + ω · h)fω,1 (x + ω · h + hk )dµG (x)dµG (hk )dµGk−1 (h).
ω∈{0,1}k−1
where (fρ )ω∈{0,1}k just means that all 2k functions in the vector are the same function,
fρ .
Lemma 7.6. We have the nesting property
Z
| f dµG | = kf kU 1 6 kf kU 2 6 . . . 6 kf kU k 6 . . . ,
Proof. The nesting follows immediately from the Gowers-Cauchy-Schwarz inequality: given
f ∈ L∞ (G) write fω = f if ωk = 1 and fω ≡ 1 if ωk = 0. Then
k−1 k−1 k−1 k−1
kf k2U k−1 = h(fω )ω∈{0,1}k iU k 6 kf k2U k k1k2U k = kf kU2 k
and we are done.
It is immediate that k · kU k is homogenous and zero on the zero function. Since k · kU 2
is a norm by Lemma 7.2 we see that kf kU k = 0 must imply f = 0 for k > 2. It remains to
check the triangle inequality: First note that
k
X
kf0 + f1 k2U k = h(f0 + f1 )ω∈{0,1}k iU k = h(f1ω∈I )ω∈{0,1}k iU k ,
I⊂{0,1}k
where we are using the multi-linearity of the Gowers inner product. In particular the Gow-
ers inner product takes 2k terms indexed by ω ∈ {0, 1}k and the product h(f1ω∈I )ω∈{0,1}k iU k
simply denotes the Gowers inner product of some copies of f0 and f1 , with f0 in the position
indexed by ω if ω ∈ {0, 1}k \ I, and f1 in the position indexed by ω if ω ∈ I.
Now, by the Gowers-Cauchy-Schwarz inequality we see that
2k −|I| |I|
|h(f1ω∈I )ω∈{0,1}k iU k | 6 kf0 kU k kf1 kU k ,
and hence
k 2k −|I| |I| k
X
kf0 + f1 k2U k 6 kf0 kU k kf1 kU k = (kf0 kU k + kf1 kU k )2 .
I⊂{0,1}k
7.7. Correlating with quadratic polynomials. Our object here is to prove the follow-
ing theorem due to Samorodnitsky [Sam07]. In other groups the U 3 -inverse theorem is
more complicated to state and is due to Green and Tao [GT08].
Theorem 7.8 (U 3 -inverse theorem). Suppose that f : Fn2 → F2 has k(−1)f kU 3 > . Then
there is a quadratic polynomial p : Fn2 → F2 such that
h(−1)f , (−1)p iL2 (G) > exp(−O(−O(1) )).
If f (x) is a quadratic polynomial then it has the form
x 7→ hAx, xi + ha, xi + b
40 TOM SANDERS
were A is an F2 -valued matrix, a ∈ Fn2 and b ∈ F2 . By symmetry and since the diagonal
term ha, xi can be absorbed into A by replacing Aii by Aii +ai , we may assume that a = 0G
and A is upper triangular. (Note that x2i = xi in F2 .)
Differencing once we have
f (x + y) − f (x) = h(A + At )y, xi + hAy, yi,
which is a linear polynomial in x for fixed y. It follows that
(f (x + y + z) − f (x + z)) − (f (x + y) − f (x)) = h(A + At )y, zi
which is constant in x for fixed y and z. Crucially A + At is symmetric and has zero
diagonal; we shall now set about establishing a converse.
For the remainder of this section it will be convenient to identify G with G
b via the map
r 7→ (x 7→ (−1)r·x ).
Lemma 7.9. Suppose that f : Fn2 → F2 , S is a symmetric matrix with zero diagonal and
g := (−1)f has Z
2
∂c
y g(Sy) dµG (y) > δ.
Then there is a quadratic polynomial p such that h(−1)f , (−1)p iL2 (G) > δ O(1) .
Proof. Let A be a matrix such that A + At = S, and consider h(x) := (−1)hAx,xi . By our
earlier calculation we have
t
∂y h(x) = (−1)h(A+A )y,xi+hAy,yi ,
and so
h∂y g, ∂y hiL2 (G) = (−1)hAy,yi ∂c t
y g((A + A )y) = (−1)
hAy,yi c
∂y g(Sy).
It follows that Z
h∂y g, ∂y hi2L2 (G) dµG (y) > δ.
On the other hand the integral is equal to
Z
g(x)g(x + y)g(z)g(z + y)h(x)h(x + y)h(z)h(z + y)dµG (x)dµG (z)dµG (y),
It follows by the U 2 -inverse theorem that there is some a ∈ Fn2 and b ∈ F2 such that
l(x) := ha, xi + b has
hgh, (−1)l(x) iL2 (G) > δ O(1) .
The result follows on putting p(x) = hAx, xi + ha, xi + b.
Our problem now is to find a suitable linear map, and to do that we shall find a roughly
linear choice function.
ANALYSIS OF BOOLEAN FUNCTIONS 41
Lemma 7.10. Suppose that g : G → {−1, 1} has kgkU 3 > . Then there is a function
φ:G→G b and a set A of density at least O(1) such that ∂d 2
x g(φ(x)) >
O(1)
for all x ∈ A
and
µG2 ({(x, y) ∈ G2 : φ(x) + φ(y) = φ(x + y), x, y, x + y ∈ A}) > O(1) .
Proof. Pick a function φ : G → G
b randomly with
2
P(φ(x) = γ) = ∂
d x g(γ) ,
such that the choices are independent for distinct xs. Note that
X X
2 2 2
P(φ(x) = γ) = ∂ x g(γ) = g ∗ g (x) = 1
d
γ∈G
b γ∈G
b
16
Z X
2c 0 2
−3. ∂
d x g(γ) ∂y g(γ ) dµG (x)dµG (y).
6 x,y γ,γ 0
whence
Z X
2c 0 2\ 0 2 16
EL(φ) > ∂x g(γ) ∂y g(γ ) ∂x+y g(γ + γ ) dµG (x)dµG (y) − /2.
d
γ,γ 0
and
µG2 ({(x, y) ∈ G2 : φ(x) + φ(y) = φ(x + y), x, y, x + y ∈ A}) > 16 /2.
It will follow that we are done, and it remains to establish (7.5). By the definition of the
Fourier transform
Z
2
∂w g(λ) = (∂w g) ∗ (∂w g)(z)λ(z)dµG (z) for all w ∈ G, λ ∈ G,
d b
3
z g(γ) ×∂z g(γ)
an identity called Samorodnitzky’s identity. By Cauchy-Schwarz applied to ∂c c
p
(or log-covexity of L -norms) we have
1/2 1/2
Z X Z X Z X
2 6 4
∂cz g(γ) dµG (z)
∂cz g(γ) dµG (z)
> ∂c
z g(γ) dµG (z).
γ∈G
b γ∈G
b γ∈G
b
Of course, the term on the right is equal to kgk8U 3 > 8 by the inductive definition of the
U 3 -norm and Lemma 7.2, and so inserting (7.4) into the first term on the left we get
Z X
2c 0 2\ 0 2 16
∂
d x g(γ) ∂y g(γ ) ∂x+y g(γ + γ ) dµG (x)dµG (y) > ,
γ,γ 0
Proof. Apply Lemma 7.10 to get a function φ and set A and let φ̃ be equal to φ on A,
and equal to ν (from Lemma 7.11) on Ac . By the Rough Morphism theorem there is a
morphism θ such that
µG (x ∈ G : θ(x) = φ̃(x)) > exp(−O(−O(1) )).
It follows that either exp(−O(−O(1) )) = 2−n .O(n2 ), or else
µG (x ∈ A : θ(x) = φ(x) > exp(−O(−O(1) )).
In the first case the conclusion is trivial by taking S ≡ 0, since
∂d d b ) = g ∗ g(0G ) = kgk2 2 = 1,
0G g(S0G ) = ∂0G g(0G L (G)
and so the integral is at least 2−n which has the required size in this case. Thus we assume
we are in the second case.
We write M for the matrix corresponding to θ under the identification r 7→ (x 7→ (−1)r·x )
of G and G,
b so that
Z
2 −O(1)
∂
dx g(M x) dµG (x) > exp(−O( )).
Proof. This is a calculation following from the Fourier inversion formula for k and (7.6).
The left hand side is equal to
Z X Z
hr,xi
k(r)(−1)
b (∂x g) ∗ (∂x g)(y)(−1)hM x,yi dµG (y)dµG (x)
Zr X Z
= k(r)
b (∂y g) ∗ (∂y g)(x)(−1)hM x,yi (−1)hr,xi dµG (y)dµG (x)
r
Z X Z
t y+ri
= k(r)
b (∂y g) ∗ (∂y g)(x)(−1)hx,M dµG (y)dµG (x)
r
X Z
t 2
= k(r)
b ∂c
y g(M y + r) dµG (y).
r
Now, apply the claim with k = h ∗ h to get
Z X Z
2 2 t 2
(7.7) h ∗ h(x)∂x g(M x) dµG (x) =
d h(r)
b ∂
d x g(M x + r) dµG (x).
r
To lower bound the right hand side we have the following claim.
Claim.
(7.8) ∂x g(r) = 0 unless hr, xi = 0F2 .
d
Finally, note that ∂x g(Bx) = 0 unless hBx, xi = 0F2 by (7.8), but since B is symmetric
d
hBx, xi = hr, xi, where r is the vector on the diagonal of B, whence
Z
2 −O(1)
∂
d x g(Bx) dµG (x) > exp(−O( )).
hr,xi=0F2
We now define Sx = Bx + hx, rir, which is symmetric and Sx = Bx if hr, xi = 0F2 . Finally
S has zero diagonal: write r0 for the diagonal of S, so that hr0 , xi = hSx, xi for all x. So
hr0 , xi = hSx, xi = hBx, xi + hx, ri2 = hx, ri + hx, ri2 = 0F2 for all x ∈ G,
and hence r0 = 0G . The result is proved.
The argument for converting the matrix M into the symmetric matrix S with zero
diagonal is, for obvious reasons, called the symmetrisation argument.
Proof of Theorem 7.8. This follows immediately on combining Lemma 7.12 with Lemma
7.9.
46 TOM SANDERS
Acknowledgement
The author should like to thank all those who have provided feedback, comments and
corrections, and particularly the students who took the course in the academic year 2010–
2011.
Proof. Define auxiliary functions gy by gy (x) := |f (x, y)|, and then note that the inequality
is simply the statement
Z Z
k gy dµY (y)kLq (X) 6 kgy kLq (X) dµY (y).
On the other hand since Y is finite this is just a weighted sum and hence follows from the
usual triangle inequality for Lq (X).
References
[Bon70] A. Bonami. Étude des coefficients de Fourier des fonctions de Lp (G). Ann. Inst. Fourier (Greno-
ble), 20(fasc. 2):335–402 (1971), 1970.
[BS94] A. Balog and E. Szemerédi. A statistical theorem of set addition. Combinatorica, 14(3):263–268,
1994.
[Cha02] M.-C. Chang. A polynomial bound in Freı̆man’s theorem. Duke Math. J., 113(3):399–419, 2002.
[Fox11] J. Fox. A new proof of the graph removal lemma. Ann. of Math. (2), 174(1):561–579, 2011,
arXiv:1006.1300.
[Fre73] G. A. Freı̆man. Foundations of a structural theory of set addition. American Mathematical Soci-
ety, Providence, R. I., 1973. Translated from the Russian, Translations of Mathematical Mono-
graphs, Vol 37.
[Gow98] W. T. Gowers. A new proof of Szemerédi’s theorem for arithmetic progressions of length four.
Geom. Funct. Anal., 8(3):529–551, 1998.
[Gow01] W. T. Gowers. A new proof of Szemerédi’s theorem. Geom. Funct. Anal., 11(3):465–588, 2001.
[Gre02a] B. J. Green. Arithmetic progressions in sumsets. Geom. Funct. Anal., 12(3):584–597, 2002.
[Gre02b] B. J. Green. Restriction and Kakeya phenomena. Available at www.dpmms.cam.ac.uk/~bjg23,
2002.
[Gre05] B. J. Green. A Szemerédi-type regularity lemma in abelian groups, with applications. Geom.
Funct. Anal., 15(2):340–376, 2005.
[GT08] B. J. Green and T. C. Tao. An inverse theorem for the Gowers U 3 (G) norm. Proc. Edinb. Math.
Soc. (2), 51(1):73–153, 2008.
[GT09] B. J. Green and T. C. Tao. Freı̆man’s theorem in finite fields via extremal set theory. Combin.
Probab. Comput., 18(3):335–355, 2009.
ANALYSIS OF BOOLEAN FUNCTIONS 47
[HB87] D. R. Heath-Brown. Integer sets containing no arithmetic progressions. J. London Math. Soc.
(2), 35(3):385–394, 1987.
[KKL88] J. Kahn, G. Kalai, and N. Linial. The influence of variables on Boolean functions. In Proceedings
of the 29th Annual Symposium on Foundations of Computer Science, pages 68–80, Washington,
DC, USA, 1988. IEEE Computer Society.
[Nel73] E. Nelson. The free Markoff field. J. Functional Analysis, 12:211–227, 1973.
[Rot52] K. F. Roth. Sur quelques ensembles d’entiers. C. R. Acad. Sci. Paris, 234:388–390, 1952.
[Rot53] K. F. Roth. On certain sets of integers. J. London Math. Soc., 28:104–109, 1953.
[Rud90] W. Rudin. Fourier analysis on groups. Wiley Classics Library. John Wiley & Sons Inc., New
York, 1990. Reprint of the 1962 original, A Wiley-Interscience Publication.
[Ruz91] I. Z. Ruzsa. Arithmetic progressions in sumsets. Acta Arith., 60(2):191–202, 1991.
[Ruz99] I. Z. Ruzsa. An analog of Freı̆man’s theorem in groups. Astérisque, (258):xv, 323–326, 1999.
Structure theory of set addition.
[Sam07] A. Samorodnitsky. Low-degree tests at large distances. In STOC’07—Proceedings of the 39th
Annual ACM Symposium on Theory of Computing, pages 506–515. ACM, New York, 2007.
[SSV05] B. Sudakov, E. Szemerédi, and V. H. Vu. On a question of Erdős and Moser. Duke Math. J.,
129(1):129–155, 2005.
[Sze90] E. Szemerédi. Integer sets containing no arithmetic progressions. Acta Math. Hungar., 56(1-
2):155–158, 1990.
[Tao08] T. C. Tao. Product set estimates for non-commutative groups. Combinatorica, 28(5):547–594,
2008.
[TV06] T. C. Tao and H. V. Vu. Additive combinatorics, volume 105 of Cambridge Studies in Advanced
Mathematics. Cambridge University Press, Cambridge, 2006.
Mathematical Institute, University of Oxford, 24-29 St. Giles’, Oxford OX1 3LB,
England
E-mail address: tom.sanders@maths.ox.ac.uk