0% found this document useful (0 votes)
40 views27 pages

Clifford-Steerable Convolutional Neural Networks

Uploaded by

Fernanda Ricome
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views27 pages

Clifford-Steerable Convolutional Neural Networks

Uploaded by

Fernanda Ricome
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Clifford-Steerable Convolutional Neural Networks

Maksim Zhdanov 1 David Ruhe * 1 2 3 Maurice Weiler * 1 Ana Lucic 4 Johannes Brandstetter 5 6 Patrick Forré 1 2

Abstract
We present Clifford-Steerable Convolutional Neu-
ral Networks (CS-CNNs), a novel class of E(p, q)-
arXiv:2402.14730v2 [[Link]] 11 Jun 2024

equivariant CNNs. CS-CNNs process multivector


fields on pseudo-Euclidean spaces Rp,q . They
cover, for instance, E(3)-equivariance on R3 and
Poincaré-equivariance on Minkowski spacetime
R1,3 . Our approach is based on an implicit
parametrization of O(p, q)-steerable kernels via
Clifford group equivariant neural networks. We
significantly and consistently outperform baseline
methods on fluid dynamics as well as relativistic
electrodynamics forecasting tasks.
Figure 1. CS-CNNs process multivector fields while respecting
E(p, q)-equivariance. Shown here is a Lorentz-boost O(1, 1) of
electromagnetic data on 1+1-dimensional spacetime R1,1 .
1. Introduction
This work proposes Clifford-steerable CNNs (CS-CNNs),
Physical systems are often described by fields on (pseudo)-
which process multivector fields on pseudo-Euclidean
Euclidean spaces. Their equations of motion obey
spaces Rp,q , and are equivariant to the pseudo-Euclidean
various symmetries, such as isometries E(3) of Eu-
group E(p, q): the isometries of Rp,q . Multivectors are el-
clidean space R3 or relativistic Poincaré transforma-
ements of the Clifford (or geometric) algebra Cl(Rp,q ) of
tions E(1, 3) of Minkowski spacetime R1,3 . PDE
Rp,q . Neural networks based on Clifford algebras have seen
solvers should respect these symmetries. In the case
a recent surge in popularity in the field of deep learning
of deep learning based surrogates, this property is
and were used to build both non-equivariant (Brandstetter
ensured by making the neural networks equivariant
et al., 2023; Ruhe et al., 2023b) and equivariant (Ruhe et al.,
(commutative) w.r.t. the transformations of interest.
2023a; Brehmer et al., 2023) models. While multivectors do
A fairly general class of equivariant CNNs covering arbi- not cover all possible field types, e.g. general tensor fields,
trary spaces and field types is described by the theory of they include those most relevant in physics. For instance,
steerable CNNs (Weiler et al., 2023). The central result the Maxwell or Dirac equation and General Relativity can
there is that equivariance requires a “G-steerability” con- be formulated using the spacetime algebra Cl(R1,3 ).
straint on convolution kernels, where G = O(n) or O(p, q)
The steerability constraint on convolution kernels is usually
for E(n)- or E(p, q)-equivariant CNNs, respectively. This
either solved analytically or numerically, however, such so-
constraint was solved and implemented for O(n) (Lang &
lutions are not yet known for O(p, q). Observing that the
Weiler, 2021; Cesa et al., 2022), however, O(p, q)-steerable
G-steerability constraint is just a G-equivariance constraint,
kernels are so far still missing.
Zhdanov et al. (2023) propose to implement G-steerable ker-
*
Equal contribution 1 AMLab, Informatics Institute, Univer- nels implicitly via G-equivariant MLPs. Our CS-CNNs fol-
sity of Amsterdam 2 AI4Science Lab, Informatics Institute, Uni- low this approach, implementing implicit O(p, q)-steerable
versity of Amsterdam 3 Anton Pannekoek Institute for Astron- kernels via the O(p, q)-equivariant neural networks for mul-
omy, University of Amsterdam 4 AI4Science, Microsoft Research tivectors developed by Ruhe et al. (2023a).
5
ELLIS Unit Linz, Institute for Machine Learning, JKU Linz,
Austria 6 NXAI GmbH. Correspondence to: Maksim Zhdanov We demonstrate the efficacy of our approach by predict-
<[Link]@[Link]>. ing the evolution of several physical systems. In particular,
Proceedings of the 41 st International Conference on Machine we consider a fluid dynamics forecasting task on R2 , as
Learning, Vienna, Austria. PMLR 235, 2024. Copyright 2024 by well as relativistic electrodynamics simulations on both R3
the author(s). and R1,2 . CS-CNNs are the first models respecting the full

1
Clifford-Steerable Convolutional Neural Networks

spacetime symmetries of these problems. They significantly


outperform competitive baselines, including conventional
steerable CNNs and non-equivariant Clifford CNNs. This
result remains consistent over dataset sizes. When evalu-
ating the empirical equivariance error of our approach for
E(2) symmetries, we find that we perform on par with the
analytical solutions of Weiler & Cesa (2019).
Figure 2. Examples of pseudo-Euclidean spaces R2,0 and R1,1 .
The main contributions of this work are the following: Colors depict O(p, q)-orbits, given by sets of all points v ∈ Rp,q
with the same squared distance η p,q (v, v) from the origin.
• While prior work considered only individual multi-
vectors, CS-CNNs process full multivector fields on non-degenerate1 symmetric bilinear form
pseudo-Euclidean spaces or manifolds. η : V × V → R, (v1 , v2 ) 7→ η(v1 , v2 ) (1)
• We investigate the representation theory of O(p, q)- with p and q positive and negative eigenvalues, respectively.
steerable kernels for multivector fields and develop an
implicit implementation via O(p, q)-equivariant MLPs. If q = 0, η becomes positive-definite, and (V, η) is a con-
ventional Euclidean inner product space. For q ≥ 1, η(v, v)
• The resulting E(p, q)-equivariant CNNs are evaluated can be negative, rendering (V, η) pseudo-Euclidean.
on various PDE simulation tasks, where they consis-
tently outperform strong baselines. Since every inner product space (V, η) of signature (p, q)
has an orthonormal basis, we can always find a linear isome-
This paper is organized as follows: Section 2 introduces the try with the standard pseudo-Euclidean space Rp,q ∼ = (V, η),
theoretical background underlying our method. CS-CNNs to which we mostly will restrict our attention in this paper.
are then developed in Section 3, and empirically evaluated
in Section 4. A generalization from flat spaces to general Definition 2.2 (Standard pseudo-Euclidean vector spaces).
pseudo-Riemannian manifolds is presented in Appendix F. Let e1 , . . . , ep+q be the standard basis of Rp+q . Define an
inner product of signature (p,q)
η p,q (v1 , v2 ) := v1⊤ ∆p,q v2 (2)
2. Theoretical Background
in this basis via its matrix representation
The core contribution of this work is to provide a frame-
∆p,q := diag(1, . . . , 1, −1, . . . , −1) . (3)
work for the construction of steerable CNNs for processing | {z } | {z }
multivector fields on general pseudo-Euclidean spaces. We p times q times
provide background on pseudo-Euclidean spaces and their We call the inner product space Rp,q := (Rp+q , η p,q ) the
symmetries in Section 2.1, on equivariant (steerable) CNNs standard pseudo-Euclidean vector space of signature (p, q).
in Section 2.2, and on multivectors and the Clifford algebra
formed by them in Section 2.3. Example 2.3. R3,0 ≡ R3 recovers the 3-dimensional Eu-
clidean vector space with its standard positive-definite inner
product ∆3,0 = diag(1, 1, 1). The signature (p, q) = (1, 3)
2.1. Pseudo-Euclidean spaces and groups
corresponds, instead, to Minkowski spacetime R1,3 with
Conventional Euclidean spaces are metric spaces, i.e. they Minkowski inner product ∆1,3 = diag(1, −1, −1, −1) .2
are equipped with a metric that assigns positive distances to
any pair of distinct points. Pseudo-Euclidean spaces allow 2.1.2. P SEUDO -E UCLIDEAN GROUPS
for more general indefinite metrics, which relax the positiv-
We are interested in neural networks that respect (i.e., com-
ity requirement on distances. Pseudo-Euclidean spaces ap-
mute with, or are equivariant to) the symmetries of pseudo-
pear in our theory in two distinct settings: First, the (affine)
Euclidean spaces, which we define here. For concreteness,
base spaces on which feature vector fields are supported, e.g.
we give these definitions for the standard pseudo-Euclidean
Minkowski spacetime, are pseudo-Euclidean. Second, the
vector spaces Rp,q . Let us start with the two cornerstone
feature vectors attached to each point of spacetime are them-
groups that define such symmetries:
selves elements of pseudo-Euclidean vector spaces. We in-
troduce these spaces and their symmetries in the following. Definition 2.4 (Translation groups). The translation group
(Rp,q , +) associated with Rp,q is formed by its set of vectors
2.1.1. P SEUDO -E UCLIDEAN VECTOR SPACES and its (canonical) vector addition.
1
Note that we explicitly refrain from imposing positive-
Definition 2.1 (Pseudo-Euclidean vector space). A pseudo-
definiteness onto the definition of inner product, in order to include
Euclidean vector space (inner product space) (V, η) of sig- typical Minkowski spacetime inner products, etc.
nature (p, q) is a p + q-dimensional vector space V over 2
There exist different conventions regarding whether time or
R equipped with an inner product η, which we define as a space components are assigned the negative sign.

2
Clifford-Steerable Convolutional Neural Networks

Definition 2.5 (Pseudo-orthogonal groups). The pseudo- 2.2.1. F EATURE VECTOR FIELDS
orthogonal group O(p, q) associated to Rp,q is formed by all
Feature vector fields are functions f : Rp,q → W that assign
invertible linear maps that preserve its inner product,
to each point x ∈ Rp,q a feature f (x) in some feature vector
O(p, q) := g ∈ GL(Rp,q ) g ⊤∆p,q g = ∆p,q , (4)

space W . They are additionally equipped with an Aff(G)-
together with matrix multiplication. O(p, q) is compact for action determined by a G-representation ρ on W .
p = 0 or q = 0, and non-compact for mixed signatures. The specific choice of (W, ρ) fixes the geometric “type” of
Example 2.6. For (p, q) = (3, 0), we obtain the usual or- feature vectors. For instance, W = R and trivial ρ(g) = 1
thogonal group O(3), i.e. rotations and reflections, while corresponds to scalars, W = Rp,q and ρ(g) = g describes
(p, q) = (1, 3) corresponds to the relativistic Lorentz group tangent vectors. Higher order tensor spaces and representa-
O(1, 3), which also includes boosts between inertial frames. tions give rise to tensor fields. Later on, W = Cl(Rp,q ) will
be the Clifford algebra and feature vectors will be multivec-
Taken together, translations and pseudo-orthogonal transfor- tors with a natural O(p, q)-representation ρCl .
mations of Rp,q form its pseudo-Euclidean group, which is Definition 2.9 (Feature vector field). Consider a pseudo-
the group of all metric preserving symmetries (isometries).3 Euclidean “base space” Rp,q . Fix any G ≤ GL(Rp,q ) and
Definition 2.7 (Pseudo-Euclidean groups). The pseudo- consider a G-representation (W, ρ), called “field type”.
Euclidean group for Rp,q is defined as semidirect product Let Γ(Rp,q , W ) := {f : Rp,q → W } denote the vector
space of W-feature fields. Define an Aff(G)-action
E(p, q) := (Rp,q , +) ⋊ O(p, q) (5)
▷ρ : Aff(G) × Γ(Rp,q , W ) → Γ(Rp,q , W ) (7)
with group multiplication defined by (t̃,g̃) · (t,g) =
by setting ∀ (t,g) ∈ Aff(G), f ∈ Γ(Rp,q , W ), x ∈ Rp,q :
(t̃ + g̃t, g̃g). Its canonical action on Rp,q is given by
(t,g) ▷ρ f (x) := ρ(g)f (t,g)−1 x = ρ(g)f g −1 (x−t) .
   
E(p, q) × Rp,q → Rp,q ,

(t,g), x 7→ gx + t (6)
Since Γ(Rp,q , W ) is a vector space and ▷ρ is linear, the
Example 2.8. The usual Euclidean group E(3) is re- tuple Γ(Rp,q , W ), ▷ρ forms the Aff(G)-representation
produced for (p, q) = (3, 0). For Minkowski spacetime, of feature vector fields of type (W, ρ).4
(p, q) = (1, 3), we obtain the Poincaré group E(1, 3).
Remark 2.10. Intuitively, (t,g) acts on f by
2.2. Feature vector fields & Steerable CNNs 1. moving feature vectors across the base space, from
points g −1 (x − t) to new locations x, and
Convolutional neural networks operate on spatial signals,
formalized as fields of feature vectors on a base space Rp,q . 2. G-transforming individual feature vectors f (x) ∈ W
Transformations of the base space imply corresponding themselves by means of the G-representation ρ(g).
transformations of the feature vector fields defined on them, Besides the field types mentioned above, equivariant neural
see Fig. 1 (left column). The specific transformation laws networks often rely on irreducible, regular or quotient rep-
depend thereby on their geometric “field type” (e.g., scalar, resentations. More choices of field types are discussed and
vector, or tensor fields). Equivariant CNNs commute with benchmarked in Weiler & Cesa (2019).
such transformations of feature fields. The theory of steer-
able CNNs shows that this requires a G-equivariance con- 2.2.2. S TEERABLE CNN S
straint on convolution kernels (Weiler et al., 2023). We
briefly review the definitions and basic results of feature Steerable convolutional neural networks are composed of
fields and steerable CNNs in Sections 2.2.1 and 2.2.2 below. layers that are Aff(G)-equivariant, that is, which commute
with affine group actions on feature fields:
For generality, this section considers topologically closed
Definition 2.11 (Aff(G)-equivariance). Consider any two
matrix groups G ≤ GL(Rp,q ) and affine groups Aff(G) =
G-representations (Win , ρin ) and (Wout , ρout ). Let L :
(Rp,q , +) ⋊ G, and allows for any field type. Section 3 will
Γ(Rp,q , Win ) → Γ(Rp,q , Wout ) be a function (“layer”) be-
more specifically focus on pseudo-orthogonal groups G =
tween the corresponding spaces of feature fields. This layer
O(p, q), pseudo-Euclidean groups Aff(O(p, q)) = E(p, q),
is said to be Aff(G)-equivariant iff it satisfies
and multivector fields. For a detailed review of Euclidean 
steerable CNNs and their generalization to Riemannian man- L (t,g) ▷ρin f = (t,g) ▷ρout L(f ) (8)
ifolds we refer to Weiler et al. (2023). Aff(G)
4
Γ(Rp,q, W ), ▷ρ is called induced representation IndG

ρ
3
As the translations contained in E(p, q) move the origin of (Cohen et al., 2019b). From a differential geometry perspective, it
Rp,q , they do not preserve the vector space structure of Rp,q , but can be viewed as the space of bundle sections of a G-associated
only its structure as affine space. feature vector bundle; see Defs. F.6, F.7 and (Weiler et al., 2023).

3
Clifford-Steerable Convolutional Neural Networks

for any (t,g) ∈ Aff(G) and any f ∈ Γ(Rp,q , Win ). Equiva- Remark 2.13 (Discretized kernels). In practice, kernels are
lently, the following diagram should commute: often discretized as arrays of shape

L X1 , . . . , Xp+q , Cout , Cin
Γ(Rp,q , Win ) Γ(Rp,q , Wout )
with Cout = dim(Wout ) and Cin = dim(Win ). The first p + q
(t,g) ▷ρin (t,g) ▷ρout (9) axes are indexing a pixel grid on the domain Rp,q , while the
last two axes represent the linear operators in the codomain
Γ(Rp,q , Win ) Γ(Rp,q , Wout ) by Cout × Cin matrices.
L

The most basic operations used in neural networks are pa- The main takeaway of this section is that one needs to im-
rameterized linear layers. If one demands translation equiv- plement G-steerable kernels in order to implement Aff(G)-
ariance, these layers are necessarily convolutions (see The- equivariant CNNs. This is a notoriously difficult problem,
orem 3.2.1 in (Weiler et al., 2023)). Similarly, linearity and requiring specialized approaches for different categories of
Aff(G)-equivariance requires steerable convolutions, that groups G and field types (W, ρ). Unfortunately, the usual
is, convolutions with G-steerable kernels: approaches do not immediately apply to our goal of im-
plementing O(p, q)-steerable kernels for multivector fields.
Theorem 2.12 (Steerable convolution). Consider a layer
These include the following cases:
L : Γ(Rp,q , Win ) → Γ(Rp,q , Wout ) mapping between fea-
ture fields of types (Win , ρin ) and (Wout , ρout ), respectively. Analytical: Most commonly, steerable kernels are parame-
If L is demanded to be linear and Aff(G)-equivariant, then: terized in analytically derived steerable kernel bases.9 So-
1. L needs to be a convolution integral 5 lutions are known for SO(3) (Weiler et al., 2018a), O(3)
Z (Geiger et al., 2020) and any G ≤ O(2) (Weiler & Cesa,
    
L fin (u) = K ∗ fin (u) := K(v) fin (u−v) dv, 2019). Lang & Weiler (2021) and Cesa et al. (2022) gen-
Rp,q eralized this to any compact groups G ≤ U(d). However,
parameterized by a convolution kernel their solutions still require knowledge of irreducible rep-
resentations, Clebsch-Gordan coefficients and harmonic
K : Rp,q → HomVec (Win , Wout ) . (10) basis functions, which need to be derived and imple-
The kernel is operator-valued since it aggregates input mented for each single group individually. Furthermore,
features in Win linearly into output features in Wout .67 these solutions do not cover pseudo-orthogonal groups
2. The kernel is required to be G-steerable, that is, it O(p, q) of mixed signature, since these are non-compact.
needs to satisfy the G-equivariance constraint8 Regular: For regular and quotient representations, steerable
1 kernels can be implemented via channel permutations
K(gx) = ρout (g)K(x)ρin (g)−1 (11)
| det(g)| in the matrix dimensions. This is, for instance, done
=: ρHom (g)(K(x)) in regular group convolutions (Cohen & Welling, 2016;
Weiler et al., 2018b; Bekkers et al., 2018; Cohen et al.,
for any g ∈ G and x ∈ Rp,q . This constraint is dia- 2019a; Finzi et al., 2020). However, these approaches
grammatically visualized by the commutativity of: require finite G or rely on sampling compact G, again
K ruling out general (non-compact) O(p, q).
Rp,q HomVec (Win , Wout )
Numerical: Cohen & Welling (2017) solved the kernel con-
g· ρHom (g) (12) straint for finite G numerically. For SO(2), Haan et al.
(2021) derived numerical solutions based on Lie-algebra
Rp,q HomVec (Win , Wout ) representation theory. The numerical routine by Shutty
K
& Wierzynski (2022) solves for Lie-algebra irreps given
their structure constants. Corresponding Lie group irreps
Proof. See Theorem 4.3.1 in (Weiler et al., 2023).
follow via the matrix exponential, however, only on con-
5
dv is the usual Lebesgue measure on Rp+q . For the integral nected groups like the subgroups SO+ (p, q) of O(p, q).
to exist, we assume f to be bounded and have compact support.
6
HomVec (Win ,Wout ), the space of vector space homomor- Implicit: Steerable kernels are merely G-equivariant maps
phisms, consists of all linear maps Win → Wout . When putting between vector spaces Rp,q and HomVec (Win , Wout ).
Win = RCin and Wout = RCout , this space can be identified with Based on this insight, Zhdanov et al. (2023) parameterize
the space RCout ×Cin of Cout ×Cin matrices. them implicitly via G-equivariant MLPs. However, to
7
K : Rp,q → HomVec (Win , Wout ) itself need not be linear.
8 9
This is in particular not demanding K(v) to be (equivariant) Unconstrained kernels, Eq. (10), can be linearly combined,
homomorphisms of G-representations in HomG (Win , Wout ), de- and therefore form a vector space. The steerability constraint,
spite (Win , ρin ) and (Wout , ρout ) being G-representations. Only K Eq. (11) is linear. Steerable kernels span hence a linear subspace
itself is G-equivariant as map Rp,q → HomVec (Win ,Wout ). and can be parameterized in terms of a basis of steerable kernels.

4
Clifford-Steerable Convolutional Neural Networks

implement these MLPs, one usually requires irreps, ir- d



name grade k dim k
basis k-vectors norm
rep endomorphisms and Clebsch-Gordan coefficients for scalar 0 1 1 +1
each G of interest. e1 +1
vector 1 3
e2 , e3 −1
Our approach presented in Section 3 is based on the implicit e12 , e13 −1
kernel parametrization via neural networks by Zhdanov et al. pseudovector 2 3
e23 +1
(2023), which requires us to implement O(p, q)-equivariant pseudoscalar 3 1 e123 +1
neural networks. Fortunately, the Clifford group equivariant
neural networks by Ruhe et al. (2023a) establish O(p, q)- Table 1. Orthonormal basis for Cl(Rp,q ) with (p, q) = (1, 2).
“Norm” refers to η̄(eA , eA ) = ηA ; see Eq. (18).
equivariance for the practically relevant case of Clifford-
algebra representations ρCl , i.e., O(p, q)-actions on multi-
vectors. The Clifford algebra, and Clifford group equivariant linear combination of such products,
X
neural networks, are introduced in the next section. x= ci · vi,1 • · · · • vi,li , (13)
i∈I

2.3. The Clifford Algebra & Clifford Group with some finite index set I and vi,k ∈ V and ci ∈ R.
Equivariant Neural Networks The main algebraic property of the Clifford algebra is that it
relates the geometric product of vectors v ∈ V to the inner
This section introduces multivector features, a specific type
product η on V by requiring:
of geometric feature vectors with O(p, q)-action. Multivec-
!
tors are the elements of a Clifford algebra Cl(V, η) corre- v • v = η(v, v) · 1Cl(V,η) ∀ v ∈ V ⊂ Cl(V, η) (14)
sponding to a pseudo-Euclidean R-vector space (V, η). The
Intuitively, this means that the product of a vector with itself
most relevant properties of Clifford algebras in relation to
collapses to a scalar value η(v, v) ∈ R ⊆ Cl(V, η), from
applications in geometric deep learning are the following:
which all other properties of the algebra follow by bilinearity.
• Cl(V, η) is, in itself, an R-vector space of dimension 2d This leads in particular to the fundamental relation11 :
with d := dim(V ) = p + q. This allows to use multivec-
v2 • v1 = −v1 • v2 + 2η(v1 , v2 )·1Cl(V,η) ∀ v1 , v2 ∈ V.
tors as feature vectors of neural networks (Brandstetter
et al., 2023; Ruhe et al., 2023b; Brehmer et al., 2023). For the standard orthonormal basis [e1 , . . . , ep+q ] of Rp,q
• As an algebra, Cl(V,η) comes with an R-bilinear opera- this reduces to the following simple rules:
tion
 −ej • ei for i ̸= j (15a)

• : Cl(V, η) × Cl(V, η) → Cl(V, η),
ei • ej = η(ei , ei ) = +1 for i = j ≤ p (15b)
called geometric product.10 We can therefore multiply 
η(ei , ei ) = −1 for i = j > p (15c)
multivectors with each other, which will be a key aspect
in various neural network operations. An (orthonormal) basis of Cl(V, η) is constructed by repeat-
• Cl(V, η) is furthermore a representation space of the edly taking geometric products of any basis vectors ei ∈ V .
pseudo-orthogonal group O(V, η) via ρCl , defined in Eq Note that, up to sign flip, (1) the ordering of elements in any
(19) below. This allows to use multivectors as features product is irrelevant due to Eq. (15a), and (2) any elements
of O(V, η)-equivariant networks (Ruhe et al., 2023a). occurring twice cancel out due to Eqs. (15b,15c).

A formal definition of Clifford algebras can be found in The basis elements constructed this way can be identified
Appendix D. Section 2.3.1 offers a less technical introduc- with (and labeled by) subsets A ⊆ [d] := {1, . . . , d}, where
tion, highlighting basic constructions and results. Sections the presence or absence of an index i ∈ A signifies whether
2.3.2 and 2.3.3 focus on the natural O(p, q)-action on multi- the corresponding ei appears in the product. Agreeing fur-
vectors, and on Clifford group equivariant neural networks. thermore on an ordering to disambiguate signs, we define
While we will later mostly be interested in (V, η) = Rp,q and
̸ ∅
eA := ei1 • ei2 • . . . • eik for A = {i1 < · · · < ik } =
O(V, η) = O(p, q), we keep the discussion here general.
and e∅ := 1Cl(V,η) . From this, it is clear that dim Cl(V, η)
2.3.1. I NTRODUCTION TO THE C LIFFORD ALGEBRA = 2d . Table 1 gives a specific example for (V, η) = R1,2 .
Multivectors are constructed by multiplying and summing Any multivector x ∈ Cl(V, η) can be uniquely expanded in
vectors. Specifically, l vectors v1 , . . . , vl ∈ V multiply to this basis, X
v1 • . . . • vl ∈ Cl(V, η). A general multivector arises as a x = xA · eA , (16)
A⊆[d]
10
The geometric product is unital, associative, non-commutative, where xA ∈ R are coefficients.
and O(V, η)-equivariant. Its main defining property is highlighted
11
in Eq. (14). A proper definition is given in Definition D.2, Eq. (73). To see this, use v := v1 + v2 in Eq. (14) and expand.

5
Clifford-Steerable Convolutional Neural Networks

Note that there are kd basis elements eA of “grade” |A| = k, invertible: ρCl (g)−1 (x) = ρCl (g −1 )(x),


i.e., which are composed from k out of the d distinct ei ∈ V .


These span d + 1 linear subspaces Cl(k) (V, η), the elements orthogonal: η̄(ρCl (g)(x1 ), ρCl (g)(x2 )) = η̄(x1 , x2 )
of which are called k-vectors. They include scalars (k = 0), Moreover, the geometric product is O(V, η)-equivariant,
vectors (k = 1), bivectors (k = 2), etc. The full Clifford making ρCl an (orthogonal) algebra representation:
algebra decomposes thus into a direct sum over grades:
Md
(k) (k)
 
d ρCl (g)(x1 ) • ρCl (g)(x2 ) = ρCl (g)(x1 • x2 ). (22)
Cl(V, η) = Cl (V, η), dim Cl (V, η) = .
k=0 k •
Cl(V, η) × Cl(V, η) Cl(V, η)
Given any multivector x, expanded as in Eq. (16), we can
define its k-th grade projection on Cl(k) (V, η) as: ρCl (g) × ρCl (g) ρCl (g) (23)
X
x(k) = xA · eA . (17) Cl(V, η) × Cl(V, η) • Cl(V, η)
A⊆[d], |A|=k

Finally, the inner product η on V is naturally extended to This representation ρCl reduces furthermore to independent
Cl(V, η) by defining η̄ : Cl(V, η) × Cl(V, η) → R as sub-representations on individual k-vectors.
X Theorem 2.15 (O(V, η)-action on grades Cl(k) (V, η)). Let
η̄(x, y) := η A · xA · yA , (18) g ∈ O(V, η), x ∈ Cl(V, η) and k ∈ 0, . . . , d a grade.
A⊆[d]
Q The grade projection ( · )(k) is O(V, η)-equivariant:
where ηA := i∈A η(ei ,ei ) ∈ {±1} are sign factors. The
tuple (eA )A⊆[d] is an orthonormal basis of Cl(V, η) w.r.t. η̄. ρCl (g) x
(k)
= ρCl (g) x(k)

(24)
All of these constructions and statements are more formally
defined and proven in the appendix of (Ruhe et al., 2023b). ( · )(k)
Cl(V, η) Cl(k) (V, η)

2.3.2. C LIFFORD GRADES AS O(p,q)- REPRESENTATIONS ρCl (g) ρCl (g) (25)

The individual grades Cl(k) (V, η) turn out to be representa- Cl(V, η) Cl(k) (V, η)
tion spaces of the (abstract) pseudo-orthogonal group (19) ( · )(k)

O(V, η) := g ∈ GL(V ) ∀v ∈ V : η(gv,gv) = η(v,v) , This implies in particular that Cl(V, η) is reducible to sub-
representations Cl(k) (V, η), i.e. ρCl (g) does not mix grades.
which coincides for (V, η) = Rp,q with O(p, q) in Def. 2.2.
O(V, η) acts thereby on multivectors by individually multi- Proof. Both theorems are proven in (Ruhe et al., 2023a).
plying each 1-vector from which they are constructed with g.
Definition/Theorem 2.14 (O(V, η)-action on Cl(V, η)). 2.3.3. O(p,q)- EQUIVARIANT C LIFFORD N EURAL N ETS
Let (V, η) be a pseudo-Euclidean space, g, gi ∈ O(V, η),
ci ∈ R, vi,j ∈ V , x, xi ∈ Cl(V, η), and I a finite index set. Based on those properties, Ruhe et al. (2023a) proposed
Define the orthogonal algebra representation Clifford group equivariant neural networks (CGENNs). Due
to a group isomorphism, this is equivalent to the network’s
ρCl : O(V, η) → OAlg (Cl(V, η), η̄) 12 (20) O(V, η)-equivariance.
Definition/Theorem 2.16 (Clifford Group Equivariant NN).
of O(V, η) via the canonical O(V, η)-action on each of the
Consider a grade k = 0, ..., d and weights wmn k
∈ R. A
contained 1-vectors:
X  Clifford group equivariant neural network (CGENN) is con-
ρCl (g) ci · vi1 • . . . • viji (21) structed from the following functions, operating on one or
X
i∈I more multivectors xi ∈ Cl(V, η).
:= ci · (gvi1 ) • . . . • (gviji ). Linear layers: mix k-vectors. For each 1 ≤ m ≤ cout :
i∈I
Xcin
ρCl is well-defined as an orthogonal representation: L(k)
m (x1 , . . . , xcin ) :=
k
wmn · x(k)
n (26)
n=1
linear: ρCl (g)(c1 · x1 + c2 · x2 ) Such weighted linear mixing within sub-representations
= c1 · ρCl (g)(x1 ) + c2 · ρCl (g)(x2 ) Cl(k) (V, η) is common in equivariant MLPs.
Geometric product layers: compute weighted geometric
composing: ρCl (g2 ) (ρCl (g1 )(x)) = ρCl (g2 g1 )(x) products with grade-dependent weights:
(27)
12
 Xd Xd (n) (k)
OAlg Cl(V, η), η̄ is the group of all linear orthogonal trans- (k)
P (x1 , x2 ) := k (m)
wmn · x1 • x2
formations of Cl(V, η) that are also multiplicative w.r.t. • . m=0 n=0

6
Clifford-Steerable Convolutional Neural Networks

Figure 3. Implicit Clifford-steerable kernel


with light-cone structure for (p,q) = (1,1)
and cin = cout = 1. It is parameterized
by a kernel network K, producing a field
of (cin ×cout ) multi-vector valued outputs.
These are convolved with multivector fields
by taking their weighted geometric prod-
uct at each location in a convolutional man-
ner. This is equivalent to a conventional
steerable convolution after expansion to a
O(1,1)-steerable kernel via a kernel head
operation H. For more details and equiv-
ariance properties see the commutative dia-
gram in Fig. 4. A more detailed variant for
R2,0 and O(2) which additionally visualizes
weighting parameters is shown in Fig. 8.

This is similar to the irrep-feature tensor products in that satisfies the following O(p, q)-steerability (equivari-
MACE (Batatia et al., 2022). ance) constraint for every g ∈ O(p, q) and v ∈ Rp,q .13 (32)
K(gv) = ρcClout (g) K(v) ρcClin (g −1 ) =: ρHom (g)(K(v)),
 !
Nonlinearity: As activations, we use A(x) := x · Φ x(0)
where Φ is the CDF of the Gaussian distribution. This is
inspired by GatedGELU from Brehmer et al. (2023). As mentioned in Section 2.2.2, constructing such O(p, q)-
steerable kernels is typically difficult. To overcome this chal-
All of these operations are by Theorems 2.14 and 2.15 lenge, we follow Zhdanov et al. (2023) and implement the
O(V, η)-equivariant. kernels implicitly. Specifically, they are based on O(p, q)-
equivariant “kernel networks”14
3. Clifford-Steerable CNNs K : Rp,q → Cl(Rp,q )cout ×cin , (33)

This section presents Clifford-Steerable Convolutional Neu- implemented as CGENNs (Section 2.3.3).
ral Networks (CS-CNNs), which operate on multivector Unfortunately, the codomain of K is Cl(R p,q cout ×cin
fields on Rp,q , and are equivariant to the isometry group  ) in-
stead of HomVec Cl(Rp,q )cin , Cl(Rp,q )cout , as required by
E(p, q) of Rp,q . To achieve E(p, q)-equivariance, we need steerable kernels, Eq. (31). To bridge the gap between these
to find a way to implement O(p, q)-steerable kernels (Sec- spaces, we introduce an O(p,q)-equivariant linear layer,
tion 2.2), which we do by leveraging the connection between called kernel head H. Its purpose is to transform the kernel
Cl(Rp,q ) and O(p, q) presented in Section 2.3. network’s output k := K(v) ∈ Cl(Rp,q )cout ×cin into the
CS-CNNs process (multi-channel) multivector fields desired R-linear map between multivector
 channels H(k ) ∈
HomVec Cl(Rp,q )cin , Cl(Rp,q )cout . The relation between
f : Rp,q → Cl(Rp,q )c (28) kernel network K, kernel head H, and the resulting steer-
of type (W, ρ) = (Cl(Rp,q )c , ρcCl ) with c ≥ 1 channels. able kernel K := H ◦ K is visualized in Figs. 3 and 4.
The representation To achieve O(p,q)-equivariance (steerability) of K = H ◦ K,
Lc
ρcCl = p,q c we have to make the kernel head H of a specific form:

i=1 ρCl : O(p, q) → GL Cl(R ) (29)
Definition 3.1 (Kernel head). A kernel head is a map
is given by the action ρCl from Definition/Theorem 2.14,
H : Cl(Rp,q )cout×cin → HomVec Cl(Rp,q )cin, Cl(Rp,q )cout

however, applied to each of the c components individually.
Following Theorem 2.12, our main goal is the construction k 7→ H(k ), (34)
of a convolution operator where the R-linear operator
L : Γ Rp,q , Cl(Rp,q )cin → Γ Rp,q , Cl(Rp,q )cout ,
 
Z H(k ) : Cl(Rp,q )cin → Cl(Rp,q )cout , f 7→ H(k )[ f ],
 
L(fin )(u) := K(v) fin (u − v) dv, (30) is defined on each output channel i ∈ [cout ] and grade
Rp,q
13
parameterized by a convolution kernel The volume factor | det g| = 1 drops out for g ∈ O(p, q).
14
The kernel network’s output Cl(Rp,q )cout ·cin is here reshaped
K : Rp,q → HomVec Cl(Rp,q )cin , Cl(Rp,q ) cout

(31) to matrix form Cl(Rp,q )cout ×cin .

7
Clifford-Steerable Convolutional Neural Networks

K
Figure 4. Construction and O(p,q)-
equivariance of implicit steerable
K H
Rp,q Cl(Rp,q )cout ×cin

kernels K = H ◦ K, which are com- HomVec Cl(Rp,q )cin , Cl(Rp,q )cout
posed from a kernel network K with
cout×cin multivector outputs and a g· cout ×cin
ρCl (g) ρHom (g)
kernel head H. The whole diagram
commutes. The two inner squares
Rp,q Cl(Rp,q )cout ×cin

show the individual equivariance of HomVec Cl(Rp,q )cin , Cl(Rp,q )cout
K H
K and H, from which the kernel’s
overall equivariance follows.
K

component k = 0, . . . , d, by: (35)


 (k)
(k) (m) (n) Proof. K and H are O(p, q)-equivariant by Definition/The-
X
k
H(k )[ f ]i := j∈[cin ] wmn,ij · kij • fj
m,n=0,...,d orem 2.16 and Proposition 3.2, respectively. The O(p, q)-
equivariance of the composition K = H ◦ K then follows
m, n = 0, . . . , d label grades and j ∈ [cin ] input channels.
from Fig. 4 or by direct calculation:
k
The wmn,ij ∈ R are parameters that allow for weighted 
mixing between grades and channels. K(gv) = H K(gv) (38)
cout ×cin

= H ρCl (g)(K(v))
Our implementation of the kernel head is discussed in Ap- 
pendix A.5. Note that the kernel head H can be seen as a = ρHom (g) H K(v)

linear combination of partially evaluated geometric product = ρHom (g) K(v) .
layers P (k) (kij , ·) from (27), which mixes input channels
to get the output channels. The specific form of the kernel A direct Corollary of Theorem 3.4 and Theorem 2.12 is now
head H comes from the following, most important property: the following desired result.
Proposition 3.2 (Equivariance of the kernel head). The ker- Corollary 3.5. Let K = H ◦ K be a Clifford-steerable kernel.
nel head H is O(p, q)-equivariant w.r.t. ρcClout ×cin and ρHom , The corresponding convolution operator L (Eq. (30)) is then
i.e. for g ∈ O(p, q) and k ∈ Cl(Rp,q )cout ×cin we have: E(p, q)-equivariant, i.e. ∀ fin ∈ Γ Rp,q , Cl(Rp,q )cin :
H ρcClout ×cin (g)( k ) = ρHom (g)(H( k ) ).

(36) 
(t, g) ▷ L(fin ) = L (t, g) ▷ fin ∀ (t,g) ∈ E(p, q)
Proof. The proof relies on the O(p, q)-equivariance of the Definition 3.6 (Clifford-steerable CNN). We call a convo-
geometric product and of linear combinations within grades. lutional network (that operates on multivector fields and is)
It can be found in the Appendix in Proof E.1. based on Clifford-steerable kernels a Clifford-Steerable
Convolutional Neural Network (CS-CNN).
With these obstructions out of the way, we can now give the
core definition of this paper: Remark 3.7. Brandstetter et al. (2023) use a similar kernel
head H as ours, Eq. (35). However, their kernel network K
Definition 3.3 (Clifford-steerable kernel). A Clifford-
is not O(p, q)-equivariant, making their overall architecture
steerable kernel K is a map as in Eq. (31) that factorizes
merely equivariant to translations instead of E(p, q).
as: K = H ◦ K with a kernel head H from Eq. (35) and
a kernel network K given by a Clifford group equivariant Remark 3.8. The vast majority of parameters of CS-CNNs
neural network (CGENN)15 from Definition/Theorem 2.16: reside in their kernel networks K. Further parameters are
found in the kernel heads’ weighted geometric product op-
K = [Kij ]i∈[cout ] : Rp,q → Cl(Rp,q )cout ×cin . (37)
j∈[cin ]
eration and summation of steerable biases to scalar grades.
Remark 3.9. While CS-CNNs are formalized in continuous
The main theoretical result of this paper is that Clifford-
space, they are in practice typically applied to discretized
steerable kernels are always O(p, q)-steerable:
fields. Our implementation allows for any sampling points,
Theorem 3.4 (Equivariance of Clifford-steerable kernels). thus covering both pixel grids and point clouds.
Every Clifford-steerable kernel K = H ◦ K is O(p, q)-
steerable w.r.t. the standard action ρ(g) = g and ρHom : Appendix F generalizes CS-CNNs from flat spacetimes to
K(gv) = ρHom (g)(K(v)) ∀ g ∈ O(p,q), v ∈ R p,q general curved pseudo-Riemannian manifolds. Appendix A
provides details on our implementation of CS-CNNs, avail-
15
More generally we could employ any O(p, q)-equivariant neu- able at [Link]
cout ×cin
ral network K w.r.t. the standard action ρ(g) = g and ρCl . rd-group-equivariant-cnns.

8
Clifford-Steerable Convolutional Neural Networks

Clifford-steerable ResNet (Ours) Basic ResNet Steerable ResNet Clifford ResNet FNO G-FNO

100 No. Simulations


Navier-Stokes R2
512 2048 8192
10−2 10−2 1024 4096
Maxwell R 3 60
MSE (←)

40
30
−3
10
10−3 20 Maxwell R1,2

0 2000 4000 200 400 25000 50000 75000 10−5 10−2 100
No. Training Simulations No. Training Simulations Training Step O(2) Equivariance Error
Figure 5. Plots 1 & 2: Mean squared errors (MSEs) on the Navier-Stokes R2 and Maxwell R3 forecasting tasks (one-step loss) as a
function of number of training simulations. Plot 3: MSE test loss convergence of our model vs. a basic ResNet on the relativistic
Maxwell R1,2 task. The ResNet does not match the performance of CS-CNNs even for vastly larger training datasets. Plot 4: Relative
O(2)-equivariance errors of models trained on Navier-Stokes R2 . G-FNOs fail as they cannot correctly ingest multivector data.

4. Experimental Results The basic ResNet model is described in Apx. C. Clifford,


Steerable, and our CS-ResNets are variations of it that sub-
To assess CS-CNNs, we investigate how well they can learn stitute vanilla convolutions with their Clifford (Brandstetter
to simulate dynamical systems by testing their ability to et al., 2023), O(n)-steerable (Weiler & Cesa, 2019; Cesa
predict future states given a history of recent states (Gupta et al., 2022), and Clifford-Steerable counterparts, respec-
& Brandstetter, 2022). We consider three tasks: tively. We also test Fourier Neural Operators (FNO) (Li
(1) Fluid dynamics on R2 (incompressible Navier-Stokes) et al., 2021) and G-FNO (Helwig et al., 2023). The latter
(2) Electrodynamics on R 3
(Maxwell’s Eqs.) add equivariance to the Dihedral group D4 < O(2). Assum-
(3) Electrodynamics on R1,2 (Maxwell’s Eqs., relativistic) ing scalar or regular representations, they are incapable of
digesting multivector-valued data. We address this by replac-
Only the last setting is properly incorporating time into 1+2- ing the initial lifting and final projection with unconstrained
dimensional spacetime, while the former two are treating operations that are able to learn a geometrically correct map-
time steps improperly as feature channels. The improper ping from/to multivectors. All models scale their number of
setting allows us to compare our method with prior work, channels to match the parameter count of the basic ResNet.
which was not able to incorporate the full spacetime sym-
metries E(1, n), but only the spatial subgroup E(n) (which Results: To evaluate the models, we report mean-squared
is also covered by CS-CNNs). error losses (MSE) on test sets. As shown in Fig. 5, our CS-
ResNets outperform all baselines on all tasks, especially in
Data & Tasks: For both tasks (1) and (2), the goal is to higher dimensional space(time)s R3 and R1,2 . CS-ResNets
predict the next state given previous 4 time steps. In (1), the are extremely sample-efficient: for the Navier-Stokes exper-
inputs are scalar pressure and vector velocity fields. In (2) iment, they require only 64 training simulations to outper-
the inputs are vector electric and bivector magnetic fields. form the basic ResNet and FNOs trained on 80× more data.
For task (3), the goal is to predict 16 future states given On Maxwell R1,2 the basic ResNet does not manage to
the previous 16 time steps. In this case, the entire elec- come close to the CS-ResNet’s performance when supplied
tromagnetic field forms a bivector (Orbán & Mira, 2021). with 16× more data.
Individual training samples are randomly sliced from long
simulations. More details on the datasets are found in Ap- Plot 1 proves CS-CNNs to be a good alternative to classical
pendix C.3. O(2)-steerable CNNs in the nonrelativistic case. We didn’t
run O(3)-steerable CNNs on Maxwell R3 due to resource
Architectures: We evaluate six network architectures: constraints and on R1,2 as they are not Lorentz-equivariant.
architecture matrix group G isometry group G-FNO does not support either of these symmetries.
Conventional ResNet {e} translations The Maxwell data on spacetime R1,2 is naturally modeled
Clifford ResNet {e} translations by space-time algebra Cl(R1,2 ) (Hestenes, 2015). Contrary
Fourier Neural Operators {e} translations
G-Fourier Neural Operators D4 < O(2) ≈ E(2)
to tasks (1) and (2), time appears here as a proper grid di-
Steerable ResNet O(n) E(n) mension, not as a feature channel. The light cone structure
Clifford-Steerable ResNet O(p, q) E(p, q) of CS-CNN kernels (Fig. 4) ensures the models’ consistency

9
Clifford-Steerable Convolutional Neural Networks

Figure 6. Visual comparison of


target and predicted fields. Left:
Our CS-ResNet clearly produces
better results than the basic
ResNet on Navier Stokes R2 , de-
spite only being trained on 64 in-
stead of 5120 simulations. Right:
On the relativistic Maxwell simu-
lation task on R1,2 , CS-ResNets
capture crisp details like wave-
fronts more accurately. This
is since they generalize over
any isometries of space and any
boosted frames of reference.

across different inertial frames of reference. This is relevant • CGENNs and CS-CNNs rely on equivariant operations
as the simulated electromagnetic fields are induced by parti- that treat multivector-grades Cl(k) (V, η) as “atomic” fea-
cles moving at relativistic velocities. We see in Plot 3 that tures. However, it is not clear whether grades are always
CS-CNNs converge significantly faster and are more sample irreducible representations, that is, there might be fur-
efficient than basic ResNets. ther equivariant degrees of freedom which would treat
irreducible sub-representations independently.
Fig. 6 visualizes predictions of CS-ResNets and basic
ResNets on Navier-Stokes R2 and Maxwell R1,2 . Our • We observed that the steerable kernel spaces of CS-CNNs
model is much more accurately capturing fine details, de- are not necessarily complete, i.e., certain degrees of free-
spite being trained on less data. dom might be missing. However, we show in Apx. B how
they are recovered by composing multiple convolutions.
Equivariance error: To assess the models’ E(2)-equivari- • O(p, q) and their group orbits on Rp,q are for p, q ̸= 0
ance, we measure the relative error |f (g.x)−g.f (x)|
|f (g.x)+g.f (x)| between non-compact; for instance, the hyperbolas in spacetimes
(1) the output computed from a transformed input; and (2) R1,q extend to infinity. In practice, we sample convo-
the transformed output, given the original input. As shown lution kernels on a finite sized grid as shown in Fig. 3.
in Fig. 5 (right), both steerable models are equivariant up to This introduces a cutoff, breaking equivariance for large
numerical artefacts. Despite training, the other models did transformations. Note that this is an issue not specific to
not become equivariant at all. This holds in particular for CS-CNNs, but it applies e.g. to scale-equivariant CNNs
G-FNO, which covers only a subgroup of discrete rotations. as well (Bekkers, 2020; Romero et al., 2024).
Despite these limitations, CS-CNNs excel in our experi-
5. Conclusions ments. A major advantage of CGENNs and CS-CNNs is
that they allow for a simple, unified implementation for
We presented Clifford-Steerable CNNs, a new theoretical arbitrary signatures (p,q). This is remarkable, since steer-
framework for E(p,q)-equivariant convolutions on pseudo- able kernels usually need to be derived for each symmetry
Euclidean spaces such as Minkowski-spacetime. CS-CNNs group individually. Furthermore, our implementation ap-
process fields of multivectors – geometric features which plies both to multivector fields sampled on pixel grids and
naturally occur in many areas of physics. The required point clouds.
O(p,q)-steerable convolution kernels are implemented im-
plicitly via Clifford group equivariant neural networks. This CS-CNNs are, to the best of our knowledge, the first con-
makes so far unknown analytic solutions for the steerability volutional networks that respect the full symmetries E(p,q)
constraint unnecessary. CS-CNNs significantly outperform of Minkowski spacetime or any other pseudo-Euclidean
baselines on a variety of physical dynamics tasks. spaces. Even more generally, CS-CNNs are readily ex-
tended to arbitrary curved pseudo-Riemannian manifolds,
From the viewpoint of general steerable CNNs, there are and such convolutions will necessarily rely on O(p,q)-
some limitations: steerable kernels. For more details see Appendix F and
• There exist more general field types (O(p,q)-rep- (Weiler et al., 2023). They could furthermore be adapted
resentations) than multivectors, for which CS-CNNs do to steerable PDOs (partial differential operators) (Jenner &
not provide steerable kernels. For connected Lie groups, Weiler, 2022), which would connect them to the multivec-
e.g. the subgroups SO+(p,q), these types can in principle tor calculus used in mathematical physics (Hestenes, 1968;
be computed numerically (Shutty & Wierzynski, 2022). Hitzer, 2002; Lasenby et al., 1993).

10
Clifford-Steerable Convolutional Neural Networks

Impact Statement Cohen, T. S. and Welling, M. Steerable CNNs. In Interna-


tional Conference on Learning Representations (ICLR),
The broader implications of our work are primarily in the 2017.
improved modeling of PDEs, other physical systems, or
multi-vector based applications in computational geometry. Cohen, T. S., Geiger, M., and Weiler, M. A General The-
Being able to model such systems more accurately can lead ory of Equivariant CNNs on Homogeneous Spaces. In
to better understanding about the physical systems govern- Conference on Neural Information Processing Systems
ing our world, while being able to model such systems more (NeurIPS), 2019b.
efficiently could greatly improve the ecological footprint of
training ML models for modeling physical systems. Filipovich, M. J. and Hughes, S. Pycharge: an open-source
python package for self-consistent electrodynamics sim-
ulations of lorentz oscillators and moving point charges.
Acknowledgements Computer Physics Communications, 274:108291, 2022.
This research was supported by Microsoft Research
Finzi, M., Stanton, S., Izmailov, P., and Wilson, A. G. Gener-
AI4Science. All content represents the opinion of the au-
alizing Convolutional Neural Networks for Equivariance
thors, which is not necessarily shared or endorsed by their
to Lie Groups on Arbitrary Continuous Data. In Inter-
respective employers/sponsors.
national Conference on Machine Learning (ICML), pp.
3165–3176, 2020.
References
Finzi, M., Welling, M., and Wilson, A. G. A Practical
Batatia, I., Kovács, D. P., Simm, G. N. C., Ortner, C., and Method for Constructing Equivariant Multilayer Percep-
Csányi, G. Mace: Higher Order Equivariant Message trons for Arbitrary Matrix Groups. In International Con-
Passing Neural Networks for Fast and Accurate Force ference on Machine Learning (ICML), 2021.
Fields. In Conference on Neural Information Processing
Systems (NeurIPS), 2022. Geiger, M., Smidt, T., Alby, M., Miller, B. K., Boomsma,
W., Dice, B., Lapchevskyi, K., Weiler, M., Tyszkiewicz,
Bekkers, E. B-spline CNNs on Lie groups. International M., Batzner, S., et al. Euclidean neural networks: e3nn.
Conference on Learning Representations (ICLR), 2020. Zenodo. [Link] org/10.5281/zenodo, 2020.
Bekkers, E. J., Lafarge, M. W., Veta, M., Eppenhof, K. A. J., Ghosh, R. and Gupta, A. Scale Steerable Filters for Locally
Pluim, J. P. W., and Duits, R. Roto-Translation Covariant Scale-Invariant Convolutional Neural Networks. ArXiv,
Convolutional Networks for Medical Image Analysis. In abs/1906.03861, 2019.
International Conference on Medical Image Computing
and Computer-Assisted Intervention (MICCAI), 2018. Gupta, J. K. and Brandstetter, J. Towards Multi-
spatiotemporal-scale Generalized PDE Modeling. ArXiv,
Brandstetter, J., Berg, R. v. d., Welling, M., and Gupta, abs/2209.15616, 2022.
J. K. Clifford Neural Layers for PDE Modeling. In
International Conference on Learning Representations Haan, P. d., Weiler, M., Cohen, T., and Welling, M. Gauge
(ICLR), 2023. Equivariant Mesh CNNs: Anisotropic convolutions on ge-
ometric graphs. In International Conference on Learning
Brehmer, J., Haan, P. d., Behrends, S., and Cohen, T. S. Ge- Representations (ICLR), 2021.
ometric Algebra Transformer. In Conference on Neural
Information Processing Systems (NeurIPS), 2023. Helwig, J., Zhang, X., Fu, C., Kurtin, J., Wojtowytsch, S.,
and Ji, S. Group Equivariant Fourier Neural Operators
Cesa, G., Lang, L., and Weiler, M. A Program to Build E(N)- for Partial Differential Equations. In International Con-
Equivariant Steerable CNNs. In International Conference ference on Machine Learning (ICML), 2023.
on Learning Representations (ICLR), 2022.
Hendrycks, D. and Gimpel, K. Gaussian Error Linear Units
Cohen, T. and Welling, M. Group Equivariant Convolu- (GELUs). arXiv: Learning, 2016.
tional Networks. In International Conference on Machine Hestenes, D. Multivector calculus. J. Math. Anal. Appl, 24
Learning (ICML), pp. 2990–2999, 2016. (2):313–325, 1968.
Cohen, T., Weiler, M., Kicanaoglu, B., and Welling, M. Hestenes, D. Space-time algebra. Springer, 2015.
Gauge Equivariant Convolutional Networks and the Icosa-
hedral CNN. In International Conference on Machine Hitzer, E. M. Multivector differential calculus. Advances in
Learning (ICML), pp. 1321–1330, 2019a. Applied Clifford Algebras, 12:135–182, 2002.

11
Clifford-Steerable Convolutional Neural Networks

Holl, P., Thuerey, N., and Koltun, V. Learning to Con- Sosnovik, I., Szmaja, M., and Smeulders, A. W. M. Scale-
trol PDEs with Differentiable Physics. In International Equivariant Steerable Networks. In International Confer-
Conference on Learning Representations (ICLR), 2020. ence on Learning Representations (ICLR), 2020.
Jenner, E. and Weiler, M. Steerable Partial Differential Op- Wang, R., Walters, R., and Yu, R. Incorporating Symmetry
erators for Equivariant Neural Networks. In International into Deep Dynamics Models for Improved Generalization.
Conference on Learning Representations (ICLR), 2022. In International Conference on Learning Representations
(ICLR), 2021.
Kingma, D. P. and Ba, J. Adam: A Method for Stochastic
Optimization. In International Conference on Learning Wang, S. Extensions to the navier–stokes equations. Physics
Representations (ICLR), volume abs/1412.6980, 2015. of Fluids, 34(5), 2022.
Lang, L. and Weiler, M. A Wigner-Eckart Theorem for Weiler, M. and Cesa, G. General E(2)-Equivariant Steerable
Group Equivariant Convolution Kernels. In International CNNs. In Conference on Neural Information Processing
Conference on Learning Representations (ICLR), 2021. Systems (NeurIPS), pp. 14334–14345, 2019.
Lasenby, A., Doran, C., and Gull, S. A multivector deriva- Weiler, M., Geiger, M., Welling, M., Boomsma, W., and
tive approach to lagrangian field theory. Foundations of Cohen, T. 3d Steerable CNNs: Learning Rotationally
Physics, 23(10):1295–1327, 1993. Equivariant Features in Volumetric Data. In Conference
on Neural Information Processing Systems (NeurIPS), pp.
Li, Z., Kovachki, N. B., Azizzadenesheli, K., Liu, B., Bhat-
10402–10413, 2018a.
tacharya, K., Stuart, A. M., and Anandkumar, A. Fourier
Neural Operator for Parametric Partial Differential Equa- Weiler, M., Hamprecht, F. A., and Storath, M. Learning
tions. In International Conference on Learning Represen- Steerable Filters for Rotation Equivariant CNNs. In Com-
tations (ICLR), 2021. puter Vision and Pattern Recognition (CVPR), 2018b.
Lindeberg, T. Scale-space. 2009. Weiler, M., Forré, P., Verlinde, E., and Welling, M. Coordi-
nate Independent Convolutional Networks – Isometry and
Loshchilov, I. and Hutter, F. Sgdr: Stochastic Gradient
Gauge Equivariant Convolutions on Riemannian Mani-
Descent with Warm Restarts. In International Conference
folds. arXiv preprint arXiv:2106.06020, 2021.
on Learning Representations (ICLR), 2017.
Weiler, M., Forré, P., Verlinde, E., and Welling, M. Equivari-
Marcos, D., Kellenberger, B., Lobry, S., and Tuia, D. Scale
ant and Coordinate Independent Convolutional Networks.
equivariance in CNNs with vector fields. arXiv preprint
2023. URL [Link]
arXiv:1807.11783, 2018.
.io/cnn_book/EquivariantAndCoordinat
Orbán, X. P. and Mira, J. Dimensional scaffolding of electro- [Link].
magnetism using geometric algebra. European Journal
Worrall, D. E. and Welling, M. Deep Scale-spaces: Equiv-
of Physics, 42(1):015204, 2021.
ariance Over Scale. In Conference on Neural Information
Romero, D. W., Bekkers, E., Tomczak, J. M., and Hoogen- Processing Systems (NeurIPS), pp. 7364–7376, 2019.
doorn, M. Wavelet Networks: Scale-Translation Equiv-
Wu, Y. and He, K. Group Normalization. In European
ariant Learning From Raw Time-Series. Transactions on
Conference on Computer Vision (ECCV), pp. 3–19, 2018.
Machine Learning Research, 2024.
Ruhe, D., Brandstetter, J., and Forré, P. Clifford Group Zhang, X. and Williams, L. R. Similarity equivariant linear
Equivariant Neural Networks. In Conference on Neu- transformation of joint orientation-scale space representa-
ral Information Processing Systems (NeurIPS), volume tions. arXiv preprint arXiv:2203.06786, 2022.
abs/2305.11141, 2023a. Zhdanov, M., Hoffmann, N., and Cesa, G. Implicit Convo-
Ruhe, D., Gupta, J. K., Keninck, S. D., Welling, M., and lutional Kernels for Steerable CNNs. In Conference on
Brandstetter, J. Geometric Clifford Algebra Networks. In Neural Information Processing Systems (NeurIPS), 2023.
International Conference on Machine Learning (ICML), Zhu, W., Qiu, Q., Calderbank, A. R., Sapiro, G., and Cheng,
pp. 29306–29337, 2023b. X. Scaling-Translation-Equivariant Networks with De-
Shutty, N. and Wierzynski, C. Computing Representations composed Convolutional Filters. Journal of Machine
for Lie Algebraic Networks. NeurIPS 2022 Workshop Learning Research (JMLR), [Link]–68:45, 2022.
on Symmetry and Geometry in Neural Representations,
2022.

12
Clifford-Steerable Convolutional Neural Networks

Appendix
A. Implementation details In the positive-definite case of O(n), this means that the
only degree of freedom is the radial distance from the origin,
This appendix provides details on the implementation of resulting in (hyper)spherical orbits. Examples of such ker-
CS-CNNs.16 nels can be seen in Fig. 8. Other radial kernels are obtained
Before detailing the Clifford-steerable kernels and convolu- typically through e.g. Gaussian shells, Bessel functions, etc.
tions, we first define the following “kernel shell” operation, In the nondefinite case of O(p, q), the orbits are hyper-
which is used twice in the final kernel computation. Re- boloids, resulting in hyperboloid shells for e.g. the Lorentz
call that given the base space Rp,q equipped with the inner group O(1, 3) as in Fig. 3. In this case, we extend the
product η p,q , we have a Clifford algebra Cl(Rp,q ). We want input to the kernel with a scalar component that now relates
to compute a kernel that maps from cin multivector input to the hyperbolic (squared) distance from the origin.
channels to cout multivector output channels, i.e.,
Specifically, we define an exponentially decaying η p,q -
K : Rp,q → HomVec Cl(Rp,q )cin , Cl(Rp,q )cout . (39)

induced (parameterized) scalar orbital shell (analogous to
the radial shell of typical Steerable CNNs) in the following
K is defined on any v ∈ Rp,q , which allows to model point way. We parameterize a kernel width σ and compute the
clouds. In this work, however, we sample it on a grid of shell value as
shape X1 , . . . , Xp+q , analogously to typical CNNs.
|η p,q (v, v)|
 
p,q
sσ (v) = sgn (η (v, v)) · exp − . (42)
2σ 2
A.1. Clifford Embedding
The width σ ∼ U(0.4, 0.6) is, inspired by (Cesa et al., 2022),
We briefly discuss how one is able to embed scalars and initialized with a uniform distribution. Since η p,q (v, v) can
vectors into the Clifford algebra. This extends to other be negative in the nondefinite case, we take the absolute
grades such as bivectors. value and multiply the result by the sign of η p,q (v, v). Com-
Let s ∈ R and v ∈ Rp,q . Using the natural isomorphisms putation of the kernel shell (S CALAR S HELL) is outlined
∼ ∼
E (0) : R −
→ Cl(Rp,q )(0) and E (1) : Rp,q −
→ Cl(Rp,q )(1) , in Function 1. Intuitively, we obtain exponential decay for
we embed the scalar and vector components into a multivec- points far from the origin. However, the sign of the in-
tor as ner product ensures that we clearly disambiguate between
m := E (0) (s) + E (1) (v) ∈ Cl(Rp,q ) . (40) “light-like” and “space-like” points. I.e., they are close in Eu-
clidean distance but far in the η p,q -induced distance. Note
This is a standard operation in Clifford algebra computa-
that this choice of parameterizing scalar parts of the kernel
tions, where we leave the other components of the multi-
is not unique and can be experimented with.
vector zero. We denote such embeddings in the algorithms
provided below jointly as “CL EMBED([s, v])”.
A.3. Kernel Network
A.2. Scalar Orbital Parameterizations Recall from Section 3 that the kernel K is parameterized by
a kernel network, which is a map
Note that the O(p, q)-steerability constraint
K : Rp,q → Cl(Rp,q )cout ×cin (43)
(g) K(v) ρcClin (g −1 )
! cout
K(gv) = ρCl =: ρHom (g)(K(v)) implemented as an O(p, q)-equivariant CGENN. It consists
∀ v ∈ Rp,q , g ∈ O(p, q) of (linearly weighted) geometric product layers followed by
multivector activations.
couples kernel values within but not across different O(p, q)-
orbits Let {vn }N n=1 be a set of sampling points, where N :=
O(p, q).v := {gv | g ∈ O(p, q)} (41) X1 · . . . · Xp+q . In the remainder, we leave iteration over n
implicit and assume that the operations are performed for
= {w | η(w, w) = η(v, v)} . each n. We obtain a sequence of scalars using the kernel
The first line here is the usual definition of group orbits, shell
while the second line makes use of the Def. 2.5 of pseudo- sn := sσ (vn ) . (44)
orthogonal groups as metric-preserving linear maps.
The input to the kernel network is a batch of multivectors
16
[Link]
up-equivariant-cnns xn := CL EMBED([sn , vn ]) . (45)

13
Clifford-Steerable Convolutional Neural Networks

Function 1 S CALAR S HELL I.e., taking s and v together, they form the scalar and vec-
input η p,q , v ∈ Rp,q , σ.  tor components of the CEGNN’s input multivector. We
found including the scalar component crucial for the correct
p,q

s ← sgn (η p,q (v, v)) · exp − |η 2σ(v,v)|
2
scaling of the kernel to the range of the grid.
return s
Let i = 1, . . . , cin and o = 1, . . . , cout be a sequence of
Function 2 C LIFFORD S TEERABLE K ERNEL input and output channels. We then have the kernel network
N
output
input p, q Λ, cin , cout , (vn )n=1 ∈ Rp,q , CGENN
d d
output k ∈ R(cout ·2 )×(cin ·2 )×X1 ×···×Xp+q knoi := K(vn )oi := CGENN(xn )oi , (46)

# Weighted Cayley. where knoi ∈ Cl(Rp,q ) is the output of the kernel network
for i = 1 . . . cin , o = 1 . . . cout , a, b, c = 1 . . . p + q do for the input multivector xn (embedded from the scalar sn
c
woiab ∼ N (0, √c 1 ·N ) # Weight init. and vector vn ). Once the output stack of multivectors is
in
computed, we reshape it from shape (N, cout · cin ) to shape
Woiab ← Λcab · woiab
c c
(N, cout , cin ), resulting in the kernel matrix
end for
k ← RESHAPE (k , (N, cout , cin )) , (47)
σ ∼ U(0.4, 0.6) # Init if needed.
# Compute scalars. where now k ∈ Cl(Rp,q )N ×cout ×cin . Note that kn ∈
sn ← S CALAR S HELL(η p,q , vn , σ) Cl(Rp,q )cout ×cin is a matrix of multivectors, as desired.
# Embed s and v into a multivector.
xn ← CL EMBED ([sn , vn ]) A.4. Masking
# Evaluate kernel network. We compute a second set of scalars which will act as a
knio := CGENN (xn ) mask for the kernel. This is inspired by Steerable CNNs
to ensure that the (e.g., radial) orbits of compact groups
# Reshape to kernel matrix. are fully represented in the kernel, as shown in Figure 8.
k ← RESHAPE (k , (N, cout , cin )) However, note that for O(p, q)-steerable kernels with both
p, q ̸= 0 this is never fully possible since O(p, q) is in
# Compute kernel mask. general not compact, and all orbits except for the origin
for i = 1 . . . cin , o = 1 . . . cout , k = 0 . . . p + q do extend to infinity. This can e.g. be seen in the hyperbolic-
σkio ∼ U(0.4, 0.6) # Init if needed. shaped kernels in Figure 4.
sknoi ← S CALAR S HELL(η p,q , vn , σkio )
For equivariance to hold in practice, whole orbits would
end for
need to be present in the kernel, which is not possible if
(k) (k) the kernel is sampled on a grid with finite support. This
knoi ← knoi · sknoi # Mask kernel.
is not specific to our architecture, but is a consequence of
the orbits’ non-compactness. The same issue arises e.g. in
# Kernel
P2dhead.
c a c scale-equivariant CNNs (Romero et al., 2024; Worrall &
knoib ← a=1 knoi · Woiab # Partial weighted
Welling, 2019; Ghosh & Gupta, 2019; Sosnovik et al., 2020;
geometric product.
Bekkers, 2020; Zhu et al., 2022; Marcos et al., 2018; Zhang
& Williams, 2022). Further experimenting is needed to
# Reshape to final kernel.  understand the impact of truncating the kernel on the final
k ← RESHAPE k , cout · 2d , cin · 2d , X1 , . . . , Xp+q performance of the model.
return k
We invoke the kernel shell function again to compute a mask
Function 3 C LIFFORD S TEERABLE C ONVOLUTION for each k = 0, . . . , p + q, i = 1, . . . , cin , o = 1, . . . , cout .
N
That is, we have a weight array σkio , initialized identically
input Fin , (vn )n=1 , A RGS as earlier, which is reused for each position in the grid.
output Fout
Fin ← RESHAPE(Fin , (B, cin · 2d , Y1 , . . . , Yp+q )) sknio := sσkio (vn ) . (48)
k ← C LIFFORD S TEERABLE K ERNEL((vn )N n=1 , A RGS )
Fout ← C ONV(Fin , k ) We then mask the kernel by scalar multiplication with the
Fout ← RESHAPE(Fout , (B, cout , Y1 , . . . , Yp+q , 2d )) shell, i.e.,
return Fout
(k) (k)
kkio ← knio · sknio . (49)

14
Clifford-Steerable Convolutional Neural Networks

A.5. Kernel Head A.6. Clifford-steerable convolution:


Finally, the kernel head turns the “multivector matri- As defined in Section 3, Clifford-steerable con-
ces” into a kernel that can be used by, for example, volutions can be efficiently implemented with
[Link] or [Link]. This is done conventional convolutional machinery such as
by a partial evaluation of a (weighted) geometric prod- [Link] or [Link] (see Function
uct. Let µ, ν ∈ Cl(Rp,q ) be two multivectors. Recall that 3 (C LIFFORD S TEERABLE C ONVOLUTION) for pseudocode).
d d
dim Cl(Rp,q ) = 2p+q = 2d . We now have a kernel k ∈ R(cout ·2 )×(cin ·2 )×X1 ×···×Xp+q
X X that can be used in a convolutional layer. Given batch size
(µ • ν)C = µA · ν B · ΛC
AB , (50) B, we now reshape the input stack of multivector fields
A B

where A, B, C ⊆ [d] are multi-indices running over the 2d (B, cin , Y1 , . . . , Yp+q , 2d ) into (B, cin · 2d , Y1 , . . . , Yp+q ).
d d d The output array of shape (B, cout · 2d , Y1 , . . . , Yp+q ) is
basis elements of Cl(Rp,q ). Here, Λ ∈ R2 ×2 ×2 is the
p,q obtained by convolving the input with the kernel, which
Clifford multiplication table of Cl(R ), also sometimes
is then reshaped to (B, cout , Y1 , . . . , Yp+q , 2d ), which can
called a Cayley table. It is defined as
( then be interpreted as a stack of multivector fields again.
C 0 if A△B ̸= C
ΛA,B = .
sgn A,B
·η̄(eA∩B , eA∩B ) if A△B = C B. Completeness of kernel spaces
(51) In order to not over-constrain the model, it is essential to
Here, △ denotes the symmetric difference of sets, i.e., parameterize a complete basis of O(p,q)-steerable kernels.
A△B = (A \ B) ∪ (B \ A). Further, Comparing our implicit O(2,0) = O(2)-steerable kernels
with the analytical solution by (Weiler & Cesa, 2019), we
sgnA,B := (−1)nA,B , (52) find that certain degrees of freedom are missing; see Fig. 8.
where nA,B is the number of adjacent “swaps” one needs However, while these degrees of freedom are missing in a
to fully sort the tuple (i1 , . . . , is , j1 , . . . , jt ), where A = single convolution operation, they can be fully recovered by
{i1 , . . . , is } and B = {j1 , . . . , jt }. In the following, we applying two consecutively convolutions. This suggests that
identify the multi-indices A, B, and C with a relabeling a, the overall expressiveness of CS-CNNs is (at least for O(2))
b, and c that run from 1 to 2d .
not diminished. Moreover, two convolutions with kernels K b
Altogether, Λ defines a multivector-valued bilinear form and K can always be expressed as a single convolution with
which represents the geometric product relative to the cho- a composed kernel K b ∗K. As visualized below, this com-
sen multivector basis. We can weight its entries with pa- posed kernel recovers the full degrees of freedom reported
c
rameters woiab ∈ R, initialized as woiab
c
∼ N (0, √c 1 ·N ). in (Weiler & Cesa, 2019):
in
These weightings can be redone for each input channel and
output channel, as such we have a weighted Cayley table
d d d
W ∈ R2 ×2 ×2 ×cin ×cout with entries
c
Woiab := Λcab woiab
c
. (53)
An ablation study in appendix C.4 demonstrates the great
relevance of the weighting parameters empirically.
Given the kernel matrix k , we compute the kernel by partial
Figure 7.
(weighted) geometric product evaluation, i.e.,
X2d
c a c
knoib ← knoi · Woiab . (54)
a=1

c
Finally, we reshape and permute knoib from shape
d d
(N, cout , cin , 2 , 2 ) to its final shape, i.e.,
k ← RESHAPE k , cout · 2d , cin · 2d , X1 , . . . , Xp+q .


This is the final kernel that can be used in a convolutional The following two sections discuss the initial differences
layer, and can be interpreted (at each sample coordinate)  in kernel parametrizations and how they are resolved by
as an element of HomVec Cl(Rp,q )cin , Cl(Rp,q )cout . adding a second linear or convolution operation. Unless
The pseudocode for the Clifford-steerable kernel stated otherwise, we focus here on cin = cout = 1 channels
(C LIFFORD S TEERABLE K ERNEL) is given in Function 2. to reduce clutter.

15
Clifford-Steerable Convolutional Neural Networks

CS-CNN parametrization
in scalar vector pseudoscalar
out
 ⊤
1 e1 , e2 e12

s s

   
1 wss Rs (r) 1 wvv Rv (r) − sin(ϕ) cos(ϕ)
" # " # " # " #
e1 v −sin(ϕ) v 1 0 v cos(ϕ)
wvs Rv (r) wsv Rs (r) wvp Rv (r)
e2 cos(ϕ) 0 1 sin(ϕ)

∅ p p
   
e12 wvv Rv (r) cos(ϕ) sin(ϕ) wsp Rs (r) 1

complete e2cnn parametrization (Weiler & Cesa, 2019)

in  ⊤
out 1 e1 , e 2 e12

Rss (r) 1 Rsv (r) − sin(ϕ) cos(ϕ) ∅


   
1
" # " # " # " # " #
e1 −sin(ϕ) 1 0 cos(2ϕ) sin(2ϕ) cos(ϕ)
Rvs (r) Rvv (r) v
, R v (r)
b Rvp (r)
e2 cos(ϕ) 0 1 sin(2ϕ) − cos(2ϕ) sin(ϕ)

∅ Rpv (r) cos(ϕ) sin(ϕ) Rpp (r) 1


   
e12

Figure 8. Comparison of the parametrization of O(2)-steerable kernels in CS-CNNs (top and middle) and e2cnn (bottom). While the
e2cnn solutions are proven to be complete, CS-CNN seems to miss certain degrees of freedom:
(1) Their radial parts are coupled in the components highlighted in blue and green, while escnn allows for independent radial parts. By
k
“coupled” we mean that they are merely scaled relative to each other with weights wmn from the weighted geometric product operation in
(m)
the kernel head H, where m labels grade K of the kernel network output while n, k label input and output grades of the expanded
kernel in HomVec Cl(Rp,q ), Cl(Rp,q ) ;


(2) CS-CNN is missing kernels of angular frequency 2 that are admissible for mapping between vector fields; highlighted in red.
As explained in Appendix B, these missing degrees of freedom are recovered when composing two convolution layers. A kernel
corresponding to the composition of two convolutions in a single one is visualized in Fig. 7.

16
Clifford-Steerable Convolutional Neural Networks

B.1. Coupled radial dependencies in CS-CNN kernels Comparison: Note that the complete solutions by (Weiler
& Cesa, 2019) allow for a different radial part Rkn for
The first issue is that the CS-CNN parametrization implies a
each pair of input and output type (grade/irrep). In con-
coupling of radial degrees of freedom. To make this precise,
trast, the CS-CNN parametrization expands coupled radial
note that the O(2)-steerability constraint k
parts Rm , additionally multiplying them with weights wmn
K(gv) = ρcClout (g) K(v) ρcClin (g −1 ) ∀ v ∈ R2 , g ∈ O(2) (highlighted in the table in blue and green). The CS-CNN
!
parametrization is therefore clearly less general (incom-
decouples into independent constraints on individual O(2)- plete).
orbits on R2 , which are rings at different radii (and the
origin); visualized in Fig. 2 (left). (Weiler et al., 2018a; Solutions: One idea to resolve this shortcoming is to make
Weiler & Cesa, 2019) parameterize the kernel therefore in the weighted geometric product parameters themselves radi-
(hyper)spherical coordinates. In our case these are polar ally dependent,
coordinates of R2 , i.e. a radius r ∈ R≥0 and angle ϕ ∈ S 1 :
k
wmn : R≥0 → R, k
r 7→ wmn (r) , (59)
K(r, ϕ) := R(r)κ(ϕ) (55)
The O(2)-steerability constraint affects only the angular for instance by parameterizing the weights with a neural net-
part and leaves the radial part entirely free, such that it can work. This would fully resolve the under-parametrization,
be parameterized in an arbitrary basis or via an MLP. and would preserve equivariance, since O(2)-steerability
depends only on the angular variable.
e2cnn: Weiler & Cesa (2019) solved analytically for com- However, doing this is actually not necessary, since the
plete bases of the angular parts. Specifically, they derive missing flexibility of radial parts can always be resolved
solutions by running a convolution followed by a linear layer (or a
K kn (r, ϕ) = Rkn (r)κkn (ϕ) (56) second convolution) when cout > 1. The reason for this is
that different channels i = 1, . . . , cout of a kernel network
for any pair of input and output field types (irreps of grades) K : R → Cl(R)cout ×cin do have independent radial parts.
n and k, respectively. This complete basis of O(2)-steerable Their convolution responses in different channels can by
kernels is shown in the bottom table of Fig. 8. a subsequent linear layer be mixed with grade-dependent
weights. By linearity, this is equivalent to immediately mix-
CS-CNNs: CS-CNNs parameterize the kernel in terms of ing the channels’ radial parts with grade-dependent weights,
a kernel network K : Rp,q → Cl(Rp,q )cout ×cin , visualized resulting in effectively decoupled radial parts.
in Fig. 8 (top). Expressed in polar coordinates, assuming
cin = cout = 1, and considering the independence of K B.2. Circular harmonics order 2 kernels
on different orbits due to its O(2)-equivariance, we get the
factorization A second issue is that the CS-CNN parametrization is miss-
ing a basis kernel of angular frequency 2 that maps between
K(r, ϕ)(m) = Rm (r)κm (ϕ) , (57)
vector fields; highlighted in red in the bottom table of Fig. 8.
where m is the grade of the multivector-valued output. As However, it turns out that this degree of freedom is repro-
described in Appendix A.5 (Eq. (53)), the kernel head oper- duced as the difference of two consecutive convolutions (∗),
ation H expands this output by multiplying it with weights one mapping vectors to pseudoscalars and back to vectors,
Wmn k
= Λkmn wmn k
, where wmnk
∈ R are parameters and the other one mapping vectors to scalars and back to vectors,
k
Λmn ∈ {−1, 0, 1} represents the geometric product relative as suggested in the (non-commutative!) computation flow
to the standard basis of Rp,q . Note that we do not consider diagram below:
multiple in or output channels here. The final expanded

kernel for CS-CNNs is hence given by ∗ pseudo vector

X
K kn (r, ϕ) = Wmn k
K(r, ϕ)(m) (58) vector vector
m ∗
X scalar ∗ vector
= Λkmn wmn
k
Rm (r)κm (ϕ) .
m
As background on the angular frequency 2 kernel, note that
These solutions are listed in the top table in Fig. 8, and O(2)-steerable kernels between irreducible field types of an-
visualized in the graphics above.17 gular frequencies j and l contain angular frequencies |j − l|
17
The parameter Λkmn appears in the table as selecting to which and j + l – this is a consequence of the Clebsch-Gordan de-
entry k, n of the table grade K(r, ϕ)(m) is added (optionally with composition of O(2)-irrep tensor products (Lang & Weiler,
minus signs). 2021). We identify multivector grades Cl(R2,0 )(k) with the

17
Clifford-Steerable Convolutional Neural Networks

following O(2)-irreps:1819 and back to vector fields via


 
cos(ϕ)
scalars ∈ Cl(R2,0 )(0) ↔ trivial irrep (j = 0) K vp (r, ϕ) = Rvp (r) (66)
sin(ϕ)
2,0 (1)
vectors ∈ Cl(R ) ↔ defining irrep (j = 1)
as a single convolution with combined kernel:
pseudo-scalars ∈ Cl(R2,0 )(2) ↔ sign-flip irrep (j = 0)
Πvv := K pv ∗ K vp (67)
Kernels that map vector fields (j = 1) to vector fields (l = 1)
   
should hence contain angular frequencies |j − l| = 0 and
 
   
=  ∗ =
j + l = 2. The latter is missing since O(2)-irreps of order 2

   
are not represented by any grade of Cl(R2,0 ).
To solve this issue, it seems like one would have to replace
the CEGNNs underlying the kernel network K with a more By linearity, we can define yet another convolution between
general O(2)-equivariant MLP, e.g. (Finzi et al., 2021). vector fields by taking the difference of these kernels, which
However, it can as well be implemented as a succession results in:
 
of two convolution operations. To make this claim plausi-
ble, observe first that convolutions are associative, that is,  
Πvv − Σvv =  (68)
 
two consecutive convolutions with kernels K and K b are



equivalent to a single convolution with kernel K ∗ K:
b
 
Kb ∗ K ∗f = K b ∗K ∗f (60)
Such kernels parameterize exactly the missing O(2)-
Secondly, convolutions are linear, such that steerable kernels of angular frequency 2; highlighted in red
 in the bottom table in Fig. 8. This shows that the missing
b ∗ f ) + β(K ∗ f ) = αK
α(K b + βK ∗ f (61) kernels can be recovered by two convolutions, if required.

for any α, β ∈ R. The “visual proof” by convolving kernels is clearly only sug-
gestive. To make it precise, it would be required to compute
Using associativity, we can express two consecutive convo- the convolutions of two kernels analytically. This is easily
lutions, first going from vector to scalar fields via done by identifying circular harmonics with derivatives of
Gaussian kernels; a relation that is well known in classical
K sv (r, ϕ) = Rsv (r) − sin(ϕ) cos(ϕ)

(62) computer vision (Lindeberg, 2009).
then going back from scalars to vectors via
  C. Experimental details
− sin(ϕ)
K vs (r, ϕ) = Rvs (r) (63) C.1. Model details:
cos(ϕ)
For ResNets, we follow the setup of Wang et al. (2021);
as a single convolution between vector fields, where the
Brandstetter et al. (2023); Gupta & Brandstetter (2022): the
combined kernel is given by:
ResNet baselines consist of 8 residual blocks, each com-
Σvv := K vs ∗ K sv (64) prising two convolution layers with 7 × 7 (or 7 × 7 × 7 for
3D) kernels, shortcut connections, group normalization (Wu
& He, 2018), and GeLU activation functions (Hendrycks
   
& Gimpel, 2016). We use two embedding and two output
 
   
=  ∗ = layers, i.e., the overall architectures could be classified as
  
 
Res-20 networks. Following (Gupta & Brandstetter, 2022;
Brandstetter et al., 2023), we abstain from employing down-
projection techniques and instead maintain a consistent spa-
We can similar define a convolution going from vector to
tial resolution throughout the networks. The best models
pseudoscalar fields via
have approx. 7M parameters for Navier-Stokes and 1.5M
K pv (r, ϕ) = Rpv (r) cos(ϕ) sin(ϕ)

(65) parameters for Maxwell’s equations, in both 2D and 3D.

18
As mentioned earlier, multivector grades may in general not C.2. Optimization:
be irreducible, however, for (p, q) = (2, 0) they are.
19
There are two different O(2)-irreps corresponding to j = 0 For each experiment and each model, we tuned the learning
(trivial and sign-flip); see (Weiler et al., 2023)[Section 5.3.4]. rate to find the optimal value. Each model was trained until

18
Clifford-Steerable Convolutional Neural Networks

convergence. For optimization, we used Adam optimizer 0.004


(Kingma & Ba, 2015) with no learning decay and cosine
learning rate scheduler (Loshchilov & Hutter, 2017) to re-
0.003
duce the initial value by the factor of 0.01. Training was

MSE (←)
done on a single node with 4 NVIDIA GeForce RTX 2080
0.002
Ti GPUs.

0.001
C.3. Datasets
Navier Stokes: We use the Navier-Stokes data from Gupta 0.000
& Brandstetter (2022), which is based on ΦFlow (Holl et al., Learned Fixed
2020). It is simulated on a grid with spatial resolution of Weights
128 × 128 pixels of size ∆x = ∆y = 0.25m and temporal
resolution of ∆t = 1.5s. For validation and testing, we Figure 9. Performance of CS-CNNs with freely learned weights in
k
the kernel head and such that ablate to fixed weights wmn,ij = 1.
randomly selected 1024 trajectories from corresponding
partitions.

Maxwell 3D: Simulations of the 3D Maxwell equations


are taken from Brandstetter et al. (2023). This data is dis-
cretized on a grid with a spatial resolution of 32 × 32 × 32 Dataset symmetries: The classical Navier Stokes equa-
voxels with ∆x = ∆y = ∆z = 5 · 10−7 m and was re- tions are Galilean invariant (Wang, 2022). Our CS-CNN
ported to have a temporal resolution of ∆t = 50s. In the for Cl(R2 ) is E(2)-equivariant, capturing the subgroup of
non-relativistically modeled setting Cl(R3,0 ), E is treated isometries without boosts.
as a vector field, and B as a bivector field. Validation and Maxwell’s equations are Poincaré invariant. Similar to
test sets comprise 128 simulations. the case of Navier Stokes, our model for Cl(R3 ) is E(3)-
equivariant. The relativistic spacetime model for Cl(R1,2 )
Maxwell 2D: We simulate data for Maxwell’s equations is fully equivariant w.r.t. the Poincaré group E(1, 2).
on spacetime R2,1 using PyCharge (Filipovich & Hughes, The invariance of a system’s equations of motion imply an
2022). Electromagnetic fields are emitted by point sources equivariant system dynamics. This statement assumes that
that move, orbit and oscillate at relativistic speeds. The the system is transformed as a whole, i.e. together with
spacetime grid has a resolution of 128 points in both spatial boundary conditions or background fields. It does obviously
and the temporal dimension. Its spatial extent are 50nm and not hold when fixed symmetry-breaking boundary condi-
the temporal extent are 3.77 · 10−14 s. tions or background fields are given. However, implicit
Sampled simulations contain between 2 to 4 oscillating kernels may in this case be informed about the symmetry
charges and 1 to 2 orbiting charges. The sources have breaking geometric structure by providing it in form of addi-
charges sampled uniformly as integer values between −3e tional inputs to the kernel network as described in (Zhdanov
and 3e. Their positions are sampled uniformly on the grid, et al., 2023).
with a predefined minimum initial distance between them.
Each charge has a random linear velocity and either oscil- C.4. Kernel head weight ablation
lates in a random direction or orbits with a random radius.
As discussed in Def. 3.1 and Appendix A.5, the kernel head
Oscillation and rotation frequencies, as well as velocities
is essentially a partially evaluated geometric product opera-
are sampled such that the overall particle velocity does not
tion with additional weighting parameters that are learned
exceed 0.85c, which is necessary since the PyCharge simu-
during training. To check how relevant this weighting is in
lation becomes unstable beyond this limit.
practice, we ran an ablation study that fixed all kernel head
k
As the field strengths span many orders of magnitude, we weights to wmn,ij = 1. It turns out that the weighting is
normalize the generated fields by dividing bivectors by their quite relevant: Our fully weighted CS-CNN achieved a test
Minkowski norm and multiplying them by the logarithm MSE of 2.53 · 10−3 on the Navier Stokes forecasting task,
of this norm. This step is non-trivial sincewMinkowski- while the MSE for the fixed weight CS-CNN increased to
norms can be zero or negative, however, we found that they 4.30 · 10−3 ; see Fig. 9. This drastic loss in performance
are always positive in the generated data. We filter out is explained by the fact that these weights allow to scale
numerical artifacts by removing outliers with a standard different kernel channels relative to each other as visualized
deviation greater than 20. The final dataset comprises 2048 in Fig. 8, which is essential to parameterize the complete
training, 256 validation and 256 test simulations. space of steerable kernels.

19
Clifford-Steerable Convolutional Neural Networks

D. The Clifford Algebra In some sense, (Cl(V, η), •) is the biggest (non-
commutative, unital, associative) algebra (A, •) over R
For completeness purposes and to complement Section 2.3, that is generated by V and satisfies the relations v • v =
in this sections, we give a short and formal definition of η(v, v) · 1A for all v ∈ V .
the Clifford algebra. For this, we first need to introduce the
tensor algebra of a vector space. It turns out that (Cl(V, η), •) is of the finite dimension 2d
and carries a parity grading of algebras and a multivector
Definition D.1 (The tensor algebra). Let V be finite di-
grading of vector spaces, see (Ruhe et al., 2023b) Appendix
mensional R-vector space of dimension d. Then the tensor
D. More properties are also explained in Section 2.3.
algebra of V is defined as follows:

From an abstract, theoretical point of view, the most impor-
tant property of the Clifford algebra is its universal property,
M
⊗m
Tens(V ) := V (69)
m=0
which fully characterizes it:
= span {v1 ⊗ · · · ⊗ vm | m ≥ 0, vi ∈ V } , Theorem D.3 (The universal property of the Clifford alge-
bra). Let (V, η) be a finite dimensional innner product space
where we used the following abbreviations for the m-times over R of dimension d. For every (non-commutative, unital,
tensor product of V for m ≥ 0: associative) algebra (A, ∗) over R and every R-linear map
f : V → A such that for all v ∈ V we have:
V ⊗m := V ⊗ · · · ⊗ V , V ⊗0 := R. (70)
| {z }
m-times f (v) ∗ f (v) = η(v, v) · 1A , (74)

Note that the above definition turns (Tens(V ), ⊗) into a there exists a unique algebra homomorphism (over R):
(non-commutative, infinite dimensional, unital, associative)
f¯ : (Cl(V, η), •) → (A, ∗), (75)
algebra over R. In fact, the tensor algebra (Tens(V ), ⊗) is,
in some sense, the biggest algebra generated by V . such that f¯(v) = f (v) for all v ∈ V .
We now have the tools to give a proper definition of the
Clifford algebra: Proof. The map f : V → A uniquely extends to an algebra
homomorphism on the tensor algebra:
Definition D.2 (The Clifford algebra). Let (V, η) be a finite
dimensional innner product space over R of dimension d. f ⊗ : Tens(V ) → A, (76)
The Clifford algebra of (V, η) is then defined as the following
quotient algebra: given by:
Cl(V, η) := Tens(V )/I(η), (71)
!
X
f⊗ ci · vi,1 ⊗ · · · ⊗ vi,li
I(η) := v ⊗ v − η(v, v) · 1Tens(V ) v ∈ V (72) i∈I
X
:= ci · f (vi,1 ) ∗ · · · ∗ f (vi,li ). (77)
n 
:= span x ⊗ v ⊗ v − η(v, v) · 1Tens(V ) ⊗ y
i∈I
o
v ∈ V, x, y ∈ Tens(V ) , Because of Equation (74) we have for every v ∈ V :

f ⊗ v ⊗ v − η(v, v) · 1Tens(V )

where I(η) denotes the two-sided ideal of Tens(V ) gen-
erated by the relations v ⊗ v ∼ η(v, v) · 1Tens(V ) for all = f (v) ∗ f (v) − η(v, v) · 1A (78)
v ∈V.
= 0, (79)
The product on Cl(V, η) that is induced by the tensor prod-
uct ⊗ is called the geometric product • and will be denoted and thus:
as follows:
f ⊗ (I(η)) = 0. (80)
x1 • x2 := [z1 ⊗ z2 ], (73)
This shows that f ⊗ then factors through the thus well-
with the equivalence classes xi = [zi ] ∈ Cl(V, η), i = 1, 2. defined induced quotient map of algebras:

Note that, since I(η) is a two-sided ideal, the geomet- f¯ : Cl(V, η) = Tens(V )/I(η) → A (81)
ric product is well-defined. The above construction turns f¯([z]) := f ⊗ (z). (82)
(Cl(V, η), •) into a (non-commutative, unital, associative)
algebra over R. This shows the claim.

20
Clifford-Steerable Convolutional Neural Networks

Remark D.4 (The universal property of the Clifford alge- which on each output channel i ∈ [cout ] and grade compo-
bra). The universal property of the Clifford algebra can nent k = 0, . . . , d, was given by:
more explicitely be stated as follows:  (k)
(k) (m) (n)
X
k
H(k )[ f ]i := j∈[cin ] wmn,ij · kij • fj ,
If f satisfies Equation (74) and x ∈ Cl(V, η), then we can m,n=0,...,d
take any representation of x of the following form:
with:
x=
X
ci · vi,1 • · · · • vi,li , (83)
k
wmn,ij ∈ R,
i∈I k = [ki,j ]i∈[cout ] ∈ Cl(Rp,q )cout ×cin ,
j∈ [cin ]
with any finite index sets I, any li ∈ N and any coefficients
c0 , ci ∈ R and any vectors vi,j ∈ V , j = 1, . . . , li , i ∈ I, f = [f1 , . . . , fcin ] ∈ Cl(Rp,q )cin .
and, then we can compute f¯(x) by the following formula:
X Clearly, H(k ) is a R-linear map (in f). Now let g ∈ O(p, q).
f¯(x) = ci · f (vi,1 ) ∗ · · · ∗ f (vi,li ), (84) We are left to check the following equivariance formula:
i∈I  ?
H ρcClout ×cin (g)(k ) = ρHom (g) H(k )

(92)
and no ambiguity can occur for f¯(x) if one uses a different
such representation for x. := ρcClout (g) H(k ) ρcClin (g −1 ).
Example D.5. The universal property of the Clifford alge- We abbreviate
bra can, for instance, be used to show that the action of the
s := ρcClin (g −1 )( f ) ∈ Cl(Rp,q )cin ,
(pseudo-)orthogonal group:
Q := ρcClout ×cin (g)(k ) ∈ Cl(Rp,q )cout ×cin .
O(V, η) × Cl(V, η) → Cl(V, η), (85)
(g, x) 7→ ρCl (g)(x), (86) First note that we have for j ∈ [cin ]:
given by: ρCl (g)(sj ) = fj . (93)
!
X We then get:
ρCl (g) ci · vi,1 • · · · • vi,li
i∈I
h  i(k)
ρHom (g) H(k ) [ f ]
i
X
:= ci · (gvi,1 ) • · · · • (gvi,li ), (87) h  i(k)
= ρcClout (g) H(k ) ρcClin (g −1 )( f )

i∈I
i
is well-defined. For this one only would need to check h i(k)
cout
Equation (74) for v ∈ V : = ρCl (g) H(k ) [s]
i
(gv) • (gv) = η(gv, gv) · 1Cl(V,η) (88)
 (k) 
= ρCl (g) H(k ) [s] i
= η(v, v) · 1Cl(V,η) , (89) !
 (k)
(m) (n)
X
k
where the first equality holds by the fundamental relation = ρCl (g) j∈[cin ] wmn,ij · kij •s
j
of the Clifford algebra and where the last equality holds m,n=0,...,d
by definition of O(V, η) ∋ g. So the linear map g : V → X
k
 (m)  (n) (k)
Cl(V, η), by the universal property of the Clifford algebra, = wmn,ij · ρCl (g)(kij ) • ρ (g)(sj )
Cl
j∈[cin ]
thus uniquely extends to the algebra homomorphism: m,n=0,...,d

ρCl (g) : Cl(V, η) → Cl(V, η), (90) X


k

(m) (n)
(k)
= j∈[cin ] wmn,ij · Qij • fj
as defined in Equation (87). One can then check the remain- m,n=0,...,d
ing rules for a group action in a straightforward way. h i(k)
= H(Q)[ f ]
i
More details can be found in (Ruhe et al., 2023b) Appendix h
cout ×cin
 i(k)
D and E. = H ρCl (g)(k ) [ f ] .
i
Note that we repeatedly made use of the rules in Defini-
E. Proofs tion/Theorem 2.14 and Theorem 2.15, i.e. the linearity, com-
position, multiplicativity and grade preservation of ρCl (g).
Proof E.1 for Proposition 3.2 (Equivariance of the kernel As this holds for all m, k and f we get the desired equation,
head). Recall the definition of the kernel head:
ρHom (g)(H(k )) = H(ρcClout ×cin (g)(k )), (94)
H : Cl(Rp,q )cout×cin → HomVec Cl(Rp,q )cin, Cl(Rp,q )cout

  which shows the claim.
k 7→ H(k ) = f 7→ H(k )[ f ] , (91)

21
Clifford-Steerable Convolutional Neural Networks

F. Clifford-steerable CNNs on eral pseudo-Riemannian manifolds with multi-vector fea-


pseudo-Riemannian manifolds ture fields in Appendix F.2, we first recall the general theory
of G-steerable CNNs on G-structured pseudo-Riemannian
In this section we will assume that the reader is already fa- manifolds in total analogy to Weiler et al. (2023) in the next
miliar with the general definitions of differential geometry, section, Appendix F.1.
which can also be found in Weiler et al. (2021; 2023). We
will in this section state the most important results for deep F.1. General G-steerable CNNs on G-structured
neural networks that process feature fields on G-structured pseudo-Riemannian manifolds
pseudo-Riemannian manifolds. These results are direct
generalizations from those in Weiler et al. (2023), where For the convenience of the reader, we will now recall the
they were stated for (G-structured) Riemannian manifolds, most important needed concepts from pseudo-Riemannian
but which verbatim generalize to (G-structured) pseudo- geometry in some more generality, but refer to Weiler et al.
Riemannian manifolds if one replaces O(d) with O(p, q) (2023) for further details and proofs.
everywhere. We will assume that the curved space M will carry a (non-
Recall, that in this geometric setting a signal f on the degenerate, possibly indefinite) metric tensor η of signature
manifold M is typically represented by a feature field (p, q), d = p + q, and will also come with “internal symme-
f : M → A of a certain “type”, like a scalar field, vector tries” encoded by a closed subgroup G ⊆ GL(d).
field, tensor field, multi-vector field, etc. Here f assigns to Definition F.1 (G-structure). Let (M, η) be pseudo-
each point z an n-dimensional feature f (z) ∈ Az ∼ = Rn . Riemannian manifold of signature (p, q), d = p + q, and
Formally, f is a global section of a G-associated vector G ≤ GL(d) a closed subgroup. A G-structure on (M, η)
bundle A with typical fibre Rn , i.e. f ∈ Γ(A), see Weiler is a principle G-subbundle ι : GM ,→ FM of the frame
et al. (2023) for details. We can consider Γ(A) as the vector bundle FM over M . Note that GM is supposed to carry
space of all vector fields of type A. A deep neural network the right G-action induced from FM :
F on M with N layers can then, as before, be considered  
as a composition: X
◁ : GM × G → GM, [ei ]i∈[d] ◁ g :=  ej gj,i  ,
L L L L
F : Γ(A0 ) →1 Γ(A1 ) →2 Γ(A2 ) →3 · · · →
N
Γ(AN ), (95) j∈[d]
i∈[d]
(96)
where L1 , . . . , LN are maps between the vector spaces of
vector fields Γ(Aℓ ), which are typically linear maps or sim- which thus makes the embedding ι a G-equivariant embed-
ple fixed non-linear maps. ding.
For the sake of analysis we can focus on one such linear Definition F.2 (G-structured pseudo-Riemannian manifold).
layer: L : Γ(Ain ) → Γ(Aout ). Let G ≤ GL(d) be closed subgroup. A G-structured pseudo-
Riemannian manifold (M, G, η) of signature (p, q) - per def-
Our goal is to describe the case, where L is an integral inition - consists of a pseudo-Riemannian manifold (M, η)
operator with an convolution kernel20 such that: i.) it is of dimension d = p + q with a metric tensor η of signature
well-defined, i.e. independent of the choice of (allowed) (p, q), and, a fixed choice of a G-structure ι : GM ,→ FM
local coordinate systems (covariance), ii.) we can use the on M .
same kernel K (not just corresponding ones) in any (al-
lowed) local coordinate system (gauge equivariance), iii.) it We will denote the G-structured pseudo-Riemannian mani-
can do weight sharing between different locations, meaning fold with the triple (M, G, η) and keep the fixed G-structure
that the same kernel K will be applied at every location, ι : GM ,→ FM implicit in the notation, as well as the cor-
iv.) input and output transform correspondingly under global responding G-atlas of local tangent bundle trivializations:
transformations (isometry equivariance).  
−1 ∼
The isometry equivariance here is the most important prop- AG = (Ψ , U ) πTM (UA ) −−→ UA × R
A A d
ΨA A∈I
erty. Our main results in this Appendix will be that isometry (97)
equivariance will in fact follow from the first points, see
Theorem F.27 and Theorem F.33. where I is an index set and U A ⊆ M are certain open
Before we introduce our Clifford-steerable CNNs on gen- subsets of M .
20 Remark F.3. Note that for any given G ≤ GL(d) there
Note that a convolution operator L(f )(u) =
R
K(u, v)f (v) dv can be seen as a continuous analogon to might not exists a corresponding G-structure GM on (M, η)
a matrix multiplication. In our theory K will need to depend on in general. Furthermore, even if it existed it might not be
only one argument, corresponding to a circulant matrix. unique. So, when we talk about such a G-structure in the

22
Clifford-Steerable Convolutional Neural Networks

following we always make the implicit assumption of its Remark F.8 (Isometry action). For a G-associated vector
existence and we also fix a specific choice. bundle A = GM ×ρ Rn and ϕ ∈ Isom(M, G, η) we can de-
Definition F.4 (Isometry group of a G-structured pseu- fine the induced G-associated vector bundle automorphism
do-Riemannian manifold). Let (M, G, η) be a G-structured ϕ∗,A on A as follows:
pseudo-Riemannian manifold. Its (G-structure preserving) ϕ∗,A : A → A, (104)
isometry group is defined to be:
ϕ∗,A (e, v) := (ϕ∗,GM (e), v) . (105)
Isom(M, G, η) With this we can define a left action of the group
 ∼
:= ϕ : M − → M diffeo | ∀z ∈ M, v ∈ Tz M. Isom(M, G, η) on the corresponding space of feature fields
ηϕ(z) (ϕ∗,TM (v), ϕ∗,TM (v)) = ηz (v, v), Γ(A) as follows:
ϕ∗,FM (Gz M ) = Gϕ(z) M . (98) ▷ : Isom(M, G, η) × Γ(A) → Γ(A), (106)
−1
ϕ ▷ f := ϕ∗,A ◦ f ◦ ϕ : M → A. (107)
The intuition here is that the first condition constrains ϕ to
be an isometry w.r.t. the metric η. The second condition To construct a well-behaved convolution operator on M we
constrains ϕ to be a symmetry of the G-structure, i.e. it first need to introduce the idea of a transporter of feature
maps G-frames to G-frames. fields along a curve γ : I → M .
Remark F.5 (Isometry group). Recall that the (usual/full) Remark F.9 (Transporter). A transporter TA on the vector
isometry group of a pseudo-Riemannian manifold (M, η) is bundle A over M takes any (sufficiently smooth) curve
defined as: γ : I → M with I ⊆ R some interval and two points
s, t ∈ I, s ≤ t, and provides an invertible linear map:
Isom(M, η) ∼
 ∼ Ts,t
A,γ : Aγ(s) −
→ Aγ(t) , v 7→ Ts,t
A,γ (v). (108)
:= ϕ : M − → M diffeo | ∀z ∈ M, v ∈ Tz M.
ηϕ(z) (ϕ∗,TM (v), ϕ∗,TM (v)) = ηz (v, v) . (99) TA is thought to transport the vector v ∈ Aγ(s) at location
γ(s) ∈ M along the curve γ to the location γ(t) ∈ M and
Also note that for a G-structured pseudo-Riemannian man- outputs a vector ṽ = Ts,t
A,γ (v) in Aγ(t) .
ifold (M, G, η) of signature (p, q) such that O(p, q) ≤ G For consistency we require that TA satisfies the following
we have: points for such γ:
Isom(M, G, η) = Isom(M, η). (100) ! ∼
1. For s ∈ I we get: Ts,s
A,γ = idAγ(s) : Aγ(s) −
→ Aγ(s) ,
Definition F.6 (G-associated vector bundle). Let (M, G, η) 2. For s ≤ t ≤ u we have:
be a G-structured pseudo-Riemannian manifold and let
ρ : G → GL(n) be a left linear representation of G. A ! ∼
Tt,u s,t s,u
A,γ ◦ TA,γ = TA,γ : Aγ(s) −
→ Aγ(u) . (109)
vector bundle A over M is called a G-associated vector
bundle (with typical fibre (Rn , ρ)) if there exists a vector Furthermore, the dependence on s, t and γ shall be “suffi-
bundle isomorphism over M of the form: ciently smooth” in a certain sense.
∼ We call a transporter TTM on the tangent bundle TM a
→ (GM × Rn ) /∼ρ =: GM ×ρ Rn ,
A− (101)
metric transporter if the map:
where the equivalence relation is given as follows: ∼
Ts,t
TM,γ : (Tγ(s) M, ηγ(s) ) −
→ (Tγ(t) M, ηγ(t) ) (110)
′ ′
(e , v ) ∼ρ (e, v) is always an isometry.
: ⇐⇒ ∃g ∈ G. (e′ , v ′ ) = (e ◁ g, ρ(g −1 )v). (102)
To construct transporters we need to introduce the notion
Definition F.7 (Global sections of a fibre bundle). Let πA : of a connection on a vector bundle, which formalized how
A → M be a fibre bundle over M . We denote the set of vector fields change when moving from one point to the
global sections of A as: next.
Definition F.10 (Connection). A connection on a vector
Γ(A) := {f : M → A | ∀z ∈ M. f (z) ∈ Az } , (103) bundle A over M is an R-linear map:

−1
where Az := πA (z) denotes the fibre of A over z ∈ M . ∇ : Γ(A) → Γ(T∗ M ⊗ A), (111)

23
Clifford-Steerable Convolutional Neural Networks

such that for all c : M → R and f ∈ Γ(A) we have: 2. torsion-free:

∇(c · f ) = dc ⊗ f + c · ∇(f ), (112) ∇X Y − ∇Y X = [X, Y ], (118)


where dc ∈ Γ(T∗ M ) is the differential of c. where [X, Y ] is the Lie bracket of vector fields.
A special form of a connection are affine connections, which
This affine connection is called the Levi-Cevita connection
live on the tangent space.
of (M, η) and is denoted as ∇LC .
Definition F.11 (Affine connection). An affine connection
Remark F.15 (Levi-Civita transporter). Let (M, G, η) be a
on M (or more precisely, on TM ) is an R-bilinear map:
pseudo-Riemannian manifold with Levi-Cevita connection
∇ : Γ(TM ) × Γ(TM ) → Γ(TM ), (113) ∇LC .
(X, Y ) 7→ ∇X Y, (114)
1. The corresponding Levi-Cevita transporter TTM on
such that for all c : M → R and X, Y ∈ Γ(TM ) we have: TM is always a metric transporter, i.e. it always in-
duces (linear) isometries of vector spaces:
1. ∇c·X Y = c · ∇X Y , ∼
Ts,t
TM,γ : (Tγ(s) M, ηγ(s) ) −
→ (Tγ(t) M, ηγ(t) ).
2. ∇X (c · Y ) = (∂X c) · Y + c · ∇X Y , (119)

where ∂X c denotes the directional derivative of c along X. 2. Furthermore, the Levi-Cevita transporter extends to
Remark F.12. Certainly, an affine connection can also be every G-associated vector bundle A as TA .
re-written in the usual connection form:
3. For every G-associated vector bundle A, every curve
∇ : Γ(TM ) → Γ(T∗ M ⊗ TM ). (115) γ : I → M and ϕ ∈ Isom(M, G, η), the Levi-Cevita
transporter TA,γ always satisfies:
Every connection defines a (parallel) transporter TA .
ϕ∗,A ◦ TA,γ = TA,ϕ◦γ ◦ ϕ∗,A . (120)
Definition/Lemma F.13 (Parallel transporter of a connec-
tion). Let ∇ be a connection on the vector bundle A Definition F.16 (Geodesics). Let M be a manifold with
over M . Then ∇ defines a (parallel) transporter TA for affine connection ∇ and γ : I → M a curve. We call γ a
γ : I = [s, t] → M as follows: geodesic of (M, ∇) if for all t ∈ I we have:

Ts,t
A,γ : Aγ(s) −
→ Aγ(t) , v 7→ f (t), (116) ∇γ̇(t) γ̇(t) = 0, (121)

where f is the unique vector field f ∈ Γ(γ ∗ A) with: i.e. if γ runs parallel to itself.
For pseudo-Riemannian manifolds (M, η) we will typically
1. (γ ∗ ∇)(f ) = 0,
use the Levi-Cevita connection ∇LC to define geodesics.
2. f (s) = v, Definition/Lemma F.17 (Pseudo-Riemannian exponential
map). For a manifold M with affine connection ∇, z ∈ M
which always exists. Here γ ∗ denotes the corresponding and v ∈ Tz M there exists a unique geodesic γz,v : I =
pullback from M to I. (−s, s) → M of (M, ∇) with maximal domain I such that:

For pseudo-Riemannian manifolds there is a “canonical” γz,v (0) = z, γ̇z,v (0) = v. (122)
choice of a metric connection, the Levi-Cevita connection,
which always exists and is uniquely characterized by its two The ∇-exponential map at z ∈ M is then the map:
main properties.
expz : T◦z M → M, expz (v) := γz,v (1), (123)
Definition/Theorem F.14 (Fundamental theorem of pseu-
do-Riemannian geometry: the Levi-Civita connection). Let with domain:
(M, η) be a pseudo-Riemannian manifold. Then there ex-
ists a unique affine connection ∇ on (M, η) such that the T◦z M := {v ∈ Tz M | γz,v (1) is defined} . (124)
following two conditions hold for all X, Y, Z ∈ Γ(TM );
For pseudo-Riemannian manifolds (M, η) we will call the
1. metric preservation: exponential map expz defined via the Levi-Cevita con-
nection ∇LC the pseudo-Riemannian exponential map of
∂Z (η(X, Y )) = η(∇Z X, Y ) + η(X, ∇Z Y ). (117) (M, η) at z ∈ M .

24
Clifford-Steerable Convolutional Neural Networks

Remark F.18. For a pseudo-Riemannian manifold (M, η) (M, G, η) be a G-structured pseudo-Riemannian manifold
the differential d expz |v : Tv Tz M → Texpz (v) M is the of signature (p, q), d = p + q, and Ain and Aout two G-
identity map on Tz M at v = 0 ∈ Tz M : d expz |v=0 =
! associated vector bundles with typical fibre (Win , ρin ) and
idTz M : Tz M = T0 Tz M → Texpz (0) M = Tz M . (Wout , ρout ), resp. A template convolution kernel K for
(M, Ain , Aout ):
Furthermore, there exist an open subset Uz ⊆ Tz M such
that 0 ∈ Uz and expz : Uz → expz (Uz ) ⊆ M is a diffeo- K : Rd → HomVec (Win , Wout ), (130)
morphism and expz (Uz ) ⊆ M is an open subset. will be called G-steerable if for all g ∈ G and v ∈ Rd we
Notation F.19. For a transporter TA for a vector bundle have:
on (M, ∇) we abbreviate for z ∈ M and v ∈ T◦z M : 1
K(gv) = ρout (g) K(v) ρin (g)−1 (131)
∼ | det g|
Tz,v := TA,γz,v
− : Aexpz (v) −
→ Az , (125)
=: ρHom (g)(K(v)). (132)
− −
where γz,v : [0, 1] → M is given by γz,v (t) := expz ((1 −
Remark F.24. Note that the G-steerability of K is ex-
t) · v).
pressed through Equation (131), while the G-gauge equiv-
Definition F.20 (Transporter pullback, see Weiler et al. ariance of K will, more closely, be expressed through the
(2023) Def. 12.2.4). Let (M, η) be a pseudo-Riemannian re-interpretation in Equation (132).
manifold and A a vector bundle over M . Furthermore, Definition F.25 (Convolution operator, see Weiler et al.
let expz denote the pseudo-Riemannian exponential map (2023) Thm. 12.2.9). Let (M, G, η) be a G-structured
(based on the Levi-Civita connection) and TA any trans- pseudo-Riemannian manifold and Ain and Aout two G-
porter on A. We then define the transporter pullback: associated vector bundles over M with typical fibres
Exp∗z : Γ(A) → C(T◦z M, Az ), (126) (Win , ρin ) and (Wout , ρout ) and K a G-steerable template
convolution kernel, see Equation (131). Let fin ∈ Γ(Ain )
Exp∗z (f )(v)

:= Tz,v f (expz (v)) ∈ Az . (127) and consider a local trivialization (ΨC , U C ) ∈ AG around
| {z }
∈Aexpz (v) z ∈ U C ⊆ M (which locally trivializes Ain and Aout ).
Then we have a well-defined convolution operator:
Lemma F.21 (See Weiler et al. (2023) Thm. 13.1.4). For
G-structured pseudo-Riemannian manifold (M, G, η) and L : Γ(Ain ) → Γ(Aout ), fin 7→ L(fin ) := fout , (133)
G-associated vector bundle A, z ∈ M , ϕ ∈ Isom(M, G, η) given by the local formula:
and f ∈ Γ(A) we have: Z
C
K(v C ) [Exp∗z fin ]C (v C ) dv C ,
 
fout (z) := (134)
Exp∗z (ϕ ▷ f ) = ϕ∗,A ◦ [Exp∗ϕ−1 (z) (f )] ◦ ϕ−1
∗,TM , (128) Rd

provided the transporter map TA satisfies Equation (120). where Exp∗z is the transporter pullback from Definition F.20,
where expz denotes the pseudo-Riemannian exponential
Weight sharing for the convolution operator I boils down to map (based on the Levi-Cevita connection ∇LC ) and TAin
the use of a template convolution kernel K, which is then any transporter satisfying Equation (120) (e.g. parallel
applied/re-used at every location z ∈ M . transport based on ∇LC ).
Definition F.22 (Template convolution kernel). Let M be Remark F.26 (Coordinate independence of the convolution
a manifold of dimension d and Ain and Aout two vector operator). The coordinate independence of the convolution
bundles over M with typical fibres Win and Wout , resp. A operator L : Γ(Ain ) → Γ(Aout ) comes from the following
template convolution kernel for (M, Ain , Aout ) is then a covariance relations and Equation (131).
(sufficiently smooth, non-linear) map: If we use a different local trivialization (ΨB , U B ) ∈ AG
K : Rd → HomVec (Win , Wout ), (129) in Equation (134) with z ∈ U B ∩ U C then there exists a
g ∈ G such that:
that is sufficiently decaying when moving away from the ori-
v C = g v B ∈ Rd , (135)
gin 0 ∈ Rd (to make all later constructions, like convolution
C B
operations, etc., well-defined). dv = | det g| · dv , (136)
[Exp∗z fin ]C (v C ) = ρin (g) [Exp∗z fin ]B (v B ) ∈ Win ,
The G-gauge equivariance of a convolution operator I is (137)
encoded by the following G-steerability of the template
C B
convolution kernel. fout (z) = ρout (g)fout (z) ∈ Wout . (138)
Definition F.23 (G-steerability convolution kernel con- So, fout : M → Aout is a well-defined global section in
straints). Let G ≤ GL(d) be a closed subgroup and Γ(Aout ).

25
Clifford-Steerable Convolutional Neural Networks

We are finally in the place to state the main theorem of this Definition F.29 (Othonormal frame bundle of signature
section, stating that every G-steerable template convolution (p, q).). Let (M, η) be a pseudo-Riemannian manifold of
kernel leads to an isometry equivariant convolution operator. signature (p, q) and dimension d = p + q. Abbreviate for
Theorem F.27 (Isometry equivariance of convolution op- indices i, j ∈ [d]:
erator, see Weiler et al. (2023) Thm. 13.2.6). Let G ≤ 
GL(d) be closed subgroup and (M, G, η) be a G-structured 0
 if i ̸= j,
p,q
pseudo-Riemannian manifold of signature (p, q) with d = δi,j := +1 if i = j ∈ [1, p], (141)
p + q. Let Ain and Aout be two G-associated vector bun- 

−1 if i = j ∈ [p + 1, d].
dles with typical fibres (Win , ρin ) and (Wout , ρout ). Let
K be a G-steerable template convolution kernel, see Equa- Then the orthonormal frame bundle of signature (p, q) is
tion (131). Consider the corresponding convolution op- defined as:
erator L : Γ(Ain ) → Γ(Aout ) given by Equation (134),
where expz denotes the pseudo-Riemannian exponential G
OM := Oz M, (142)
map (based on the Levi-Cevita connection ∇LC ) and TAin
z∈M
any transporter satisfying Equation (120) (e.g. parallel
transport based on ∇LC ). where we put:
Then the convolution operator L : Γ(Ain ) → Γ(Aout ) is n
equivariant w.r.t. the G-structure preserving isometry group Oz M := [e1 , . . . , ed ] ∀j ∈ [d]. ej ∈ Tz M, (143)
Isom(M, G, η): for every ϕ ∈ Isom(M, G, η) and fin ∈ p,q
o
Γ(Ain ) we have: ∀i, j ∈ [d]. ηz (ei , ej ) = δi,j . (144)

L(ϕ ▷ fin ) = ϕ ▷ L(fin ). (139)


Then OM becomes an O(p, q)-structure for (M, η) together
with the standard constructions of local trivialization, bun-
So the main obstruction for constructing a well-behaved dle projection and right group action:
convolution operator L are thus the kernel constraints Equa-
tion (131), which are generally notoriously difficult to ◁ : OM × O(p, q) → OM, (145)
solve, especially for continuous non-compact groups G like  
O(p, q). X
[ei ]i∈[d] ◁ g :=  ej gj,i  . (146)
j∈[d]
F.2. Clifford-steerable CNNs on pseudo-Riemannian i∈[d]
manifolds
Lemma F.30. Let (M, η) be a pseudo-Riemannian manifold
Let (M, η) be a pseudo-Riemannian manifold of signature of signature (p, q) and dimension d = p + q. We have an
(p, q) and dimension d = p + q. algebra bundle isomorphism over M :
Then (M, η) carries a unique O(p, q)-structure OM in-
duced by η. The intuition is that OM consists of all or- Cl(TM, η) ∼
= OM ×ρCl Cl(Rp,q ), (147)
thonormal frames w.r.t. η. In fact, the choice of an O(p, q)-
structure on M is equivalent to the choice of a metric η of where ρCl : O(p, q) → OAlg (Cl(Rp,q ), η̄ p,q ) is the usual
signature (p, q) on M . That said, we will now restrict to the action of the orthogonal group O(p, q) on Cl(Rp,q ) by ro-
structure group G = O(p, q) everywhere in the following. tating all vector components individually. In particular, the
Clifford algebra bundle Cl(TM, η) is an O(p, q)-associated
We will further restrict to multi-vector feature fields Ain := algebra bundle over M with typical fibre Cl(Rp,q ).
Cl(TM, η)cin and Aout := Cl(TM, η)cout , which we first
need to formalize properly. Definition F.31 (Multivector fields). A multivector field
on M is a global section f ∈ Γ(Cl(TM, η)c ) for some
Definition F.28 (Clifford algebra bundle). Let (M, η) be c ∈ N, i.e. a map f : M → Cl(TM, η)c that assigns
a pseudo-Riemannian manifold. Then the Clifford algebra to every point z ∈ M a tuple of multivectors: f (z) =
bundle over M is defined (as a set) as the disjoint union of [f1 (z), . . . , fc (z)] ∈ Cl(Tz M, ηz )c .
the Clifford algebras of the corresponding tangent spaces:
G Remark F.32 (The action of the isometry group on multivec-
Cl(TM, η) := Cl(Tz M, ηz ). (140) tor fields). Let ϕ ∈ Isom(M, η) then ϕ is a diffeomorphic

z∈M map ϕ : M − → M such that for every z ∈ M the differen-
Cl(T M, η) becomes an algebra bundle over M with the tial map is an isometry:
standard constructions of local trivialization and bundle ∼
projections. ϕ∗,TM,z : (Tz M, ηz ) −
→ (Tϕ(z) , ηϕ(z) ). (148)

26
Clifford-Steerable Convolutional Neural Networks

We can now describe the induced map ϕ∗,Cl(TM,η) via the Furthermore, the corresponding convolution operator L :
general construction on associated vector fields, see Re- Γ(Ain ) → Γ(Aout ), given by Equation (134), is equivariant
mark F.8, with help of the identification Equation (147): w.r.t. the full isometry group Isom(M, η): for every ϕ ∈
Isom(M, η) and fin ∈ Γ(Ain ) we have:
ϕ∗,Cl(TM,η) : OM ×ρCl Cl(Rp,q ) → OM ×ρCl Cl(Rp,q ),
ϕ∗,Cl(TM,η) (e, x) = (ϕ∗,FM (e), x), L(ϕ ▷ fin ) = ϕ ▷ L(fin ). (156)
(149)
Remark F.34. A similar theorem to Theorem F.33 can be
or we can look at the fibres directly, z ∈ M : stated for orientable pseudo-Riemannian manifolds (M, η)
and structure group G = SO(p, q), if one reduces the Clif-
ϕ∗,Cl(TM,η),z : Cl(Tz M, ηz ) → Cl(Tϕ(z) M, ηϕ(z) ),
! ford group equivariant neural network parameterizing the
X kernel network K to be (only) SO(p, q)-equivariant.
ϕ∗,Cl(TM,η),z ci · vi,1 • · · · • vi,ki
i∈I
X
= ci · ϕ∗,TM,z (vi,1 ) • · · · • ϕ∗,TM,z (vi,ki ). (150)
i∈I

With this we can define a left action of the isometry group


Isom(M, η) on the corresponding space of multivector
fields Γ(Cl(TM, η)c ) as follows:
▷ : Isom(M, η) × Γ(Cl(TM, η)c ) → Γ(Cl(TM, η)c ),
(151)
ϕ ▷ f := ϕ∗,Cl(TM,η)c ◦ f ◦ ϕ−1 : M → Cl(TM, η)c .
(152)

We are now in the position to state the main theorem of this


section.
Theorem F.33 (Clifford-steerable CNNs on pseudo-Rie-
mannian manifolds are gauge and isometry equivariant).
Let (M, η) be a pseudo-Riemannian manifold of signature
(p, q) and dimension d = p + q. We consider (M, η) to
be endowed with the structure group G = O(p, q). Con-
sider multi-vector feature fields Ain = Cl(TM, η)cin and
Aout = Cl(TM, η)cout over M .
Let K = H ◦ K be a Clifford-steerable kernel, the same
template convolution kernel as presented in the main paper
in Section 3:
K : Rp,q → HomVec (Cl(Rp,q )cin , Cl(Rp,q )cout ) , (153)
where K : Rp,q → Cl(Rp,q )cout ×cin is the kernel network,
a Clifford group equivariant neural network with (cin · cout )
number of Clifford algebra outputs, and, where H is the
O(p, q)-equivariant kernel head:
H : Cl(Rp,q )cout ×cin (154)
p,q cin p,q cout

→ HomVec Cl(R ) , Cl(R ) .
Then K is automatically O(p, q)-steerable, i.e. for g ∈
O(p, q), v ∈ Rp,q we have21 :
K(gv) = ρcClout (g) K(v) ρcClin (g)−1 . (155)
21
Note that the factor | det g|−1 does not appear here, in contrast
to the general formula in Equation (131), because | det g| = 1
anyways for all g ∈ O(p, q).

27

You might also like