Clifford-Steerable Convolutional Neural Networks
Clifford-Steerable Convolutional Neural Networks
Maksim Zhdanov 1 David Ruhe * 1 2 3 Maurice Weiler * 1 Ana Lucic 4 Johannes Brandstetter 5 6 Patrick Forré 1 2
Abstract
We present Clifford-Steerable Convolutional Neu-
ral Networks (CS-CNNs), a novel class of E(p, q)-
arXiv:2402.14730v2 [[Link]] 11 Jun 2024
1
Clifford-Steerable Convolutional Neural Networks
2
Clifford-Steerable Convolutional Neural Networks
Definition 2.5 (Pseudo-orthogonal groups). The pseudo- 2.2.1. F EATURE VECTOR FIELDS
orthogonal group O(p, q) associated to Rp,q is formed by all
Feature vector fields are functions f : Rp,q → W that assign
invertible linear maps that preserve its inner product,
to each point x ∈ Rp,q a feature f (x) in some feature vector
O(p, q) := g ∈ GL(Rp,q ) g ⊤∆p,q g = ∆p,q , (4)
space W . They are additionally equipped with an Aff(G)-
together with matrix multiplication. O(p, q) is compact for action determined by a G-representation ρ on W .
p = 0 or q = 0, and non-compact for mixed signatures. The specific choice of (W, ρ) fixes the geometric “type” of
Example 2.6. For (p, q) = (3, 0), we obtain the usual or- feature vectors. For instance, W = R and trivial ρ(g) = 1
thogonal group O(3), i.e. rotations and reflections, while corresponds to scalars, W = Rp,q and ρ(g) = g describes
(p, q) = (1, 3) corresponds to the relativistic Lorentz group tangent vectors. Higher order tensor spaces and representa-
O(1, 3), which also includes boosts between inertial frames. tions give rise to tensor fields. Later on, W = Cl(Rp,q ) will
be the Clifford algebra and feature vectors will be multivec-
Taken together, translations and pseudo-orthogonal transfor- tors with a natural O(p, q)-representation ρCl .
mations of Rp,q form its pseudo-Euclidean group, which is Definition 2.9 (Feature vector field). Consider a pseudo-
the group of all metric preserving symmetries (isometries).3 Euclidean “base space” Rp,q . Fix any G ≤ GL(Rp,q ) and
Definition 2.7 (Pseudo-Euclidean groups). The pseudo- consider a G-representation (W, ρ), called “field type”.
Euclidean group for Rp,q is defined as semidirect product Let Γ(Rp,q , W ) := {f : Rp,q → W } denote the vector
space of W-feature fields. Define an Aff(G)-action
E(p, q) := (Rp,q , +) ⋊ O(p, q) (5)
▷ρ : Aff(G) × Γ(Rp,q , W ) → Γ(Rp,q , W ) (7)
with group multiplication defined by (t̃,g̃) · (t,g) =
by setting ∀ (t,g) ∈ Aff(G), f ∈ Γ(Rp,q , W ), x ∈ Rp,q :
(t̃ + g̃t, g̃g). Its canonical action on Rp,q is given by
(t,g) ▷ρ f (x) := ρ(g)f (t,g)−1 x = ρ(g)f g −1 (x−t) .
E(p, q) × Rp,q → Rp,q ,
(t,g), x 7→ gx + t (6)
Since Γ(Rp,q , W ) is a vector space and ▷ρ is linear, the
Example 2.8. The usual Euclidean group E(3) is re- tuple Γ(Rp,q , W ), ▷ρ forms the Aff(G)-representation
produced for (p, q) = (3, 0). For Minkowski spacetime, of feature vector fields of type (W, ρ).4
(p, q) = (1, 3), we obtain the Poincaré group E(1, 3).
Remark 2.10. Intuitively, (t,g) acts on f by
2.2. Feature vector fields & Steerable CNNs 1. moving feature vectors across the base space, from
points g −1 (x − t) to new locations x, and
Convolutional neural networks operate on spatial signals,
formalized as fields of feature vectors on a base space Rp,q . 2. G-transforming individual feature vectors f (x) ∈ W
Transformations of the base space imply corresponding themselves by means of the G-representation ρ(g).
transformations of the feature vector fields defined on them, Besides the field types mentioned above, equivariant neural
see Fig. 1 (left column). The specific transformation laws networks often rely on irreducible, regular or quotient rep-
depend thereby on their geometric “field type” (e.g., scalar, resentations. More choices of field types are discussed and
vector, or tensor fields). Equivariant CNNs commute with benchmarked in Weiler & Cesa (2019).
such transformations of feature fields. The theory of steer-
able CNNs shows that this requires a G-equivariance con- 2.2.2. S TEERABLE CNN S
straint on convolution kernels (Weiler et al., 2023). We
briefly review the definitions and basic results of feature Steerable convolutional neural networks are composed of
fields and steerable CNNs in Sections 2.2.1 and 2.2.2 below. layers that are Aff(G)-equivariant, that is, which commute
with affine group actions on feature fields:
For generality, this section considers topologically closed
Definition 2.11 (Aff(G)-equivariance). Consider any two
matrix groups G ≤ GL(Rp,q ) and affine groups Aff(G) =
G-representations (Win , ρin ) and (Wout , ρout ). Let L :
(Rp,q , +) ⋊ G, and allows for any field type. Section 3 will
Γ(Rp,q , Win ) → Γ(Rp,q , Wout ) be a function (“layer”) be-
more specifically focus on pseudo-orthogonal groups G =
tween the corresponding spaces of feature fields. This layer
O(p, q), pseudo-Euclidean groups Aff(O(p, q)) = E(p, q),
is said to be Aff(G)-equivariant iff it satisfies
and multivector fields. For a detailed review of Euclidean
steerable CNNs and their generalization to Riemannian man- L (t,g) ▷ρin f = (t,g) ▷ρout L(f ) (8)
ifolds we refer to Weiler et al. (2023). Aff(G)
4
Γ(Rp,q, W ), ▷ρ is called induced representation IndG
ρ
3
As the translations contained in E(p, q) move the origin of (Cohen et al., 2019b). From a differential geometry perspective, it
Rp,q , they do not preserve the vector space structure of Rp,q , but can be viewed as the space of bundle sections of a G-associated
only its structure as affine space. feature vector bundle; see Defs. F.6, F.7 and (Weiler et al., 2023).
3
Clifford-Steerable Convolutional Neural Networks
for any (t,g) ∈ Aff(G) and any f ∈ Γ(Rp,q , Win ). Equiva- Remark 2.13 (Discretized kernels). In practice, kernels are
lently, the following diagram should commute: often discretized as arrays of shape
L X1 , . . . , Xp+q , Cout , Cin
Γ(Rp,q , Win ) Γ(Rp,q , Wout )
with Cout = dim(Wout ) and Cin = dim(Win ). The first p + q
(t,g) ▷ρin (t,g) ▷ρout (9) axes are indexing a pixel grid on the domain Rp,q , while the
last two axes represent the linear operators in the codomain
Γ(Rp,q , Win ) Γ(Rp,q , Wout ) by Cout × Cin matrices.
L
The most basic operations used in neural networks are pa- The main takeaway of this section is that one needs to im-
rameterized linear layers. If one demands translation equiv- plement G-steerable kernels in order to implement Aff(G)-
ariance, these layers are necessarily convolutions (see The- equivariant CNNs. This is a notoriously difficult problem,
orem 3.2.1 in (Weiler et al., 2023)). Similarly, linearity and requiring specialized approaches for different categories of
Aff(G)-equivariance requires steerable convolutions, that groups G and field types (W, ρ). Unfortunately, the usual
is, convolutions with G-steerable kernels: approaches do not immediately apply to our goal of im-
plementing O(p, q)-steerable kernels for multivector fields.
Theorem 2.12 (Steerable convolution). Consider a layer
These include the following cases:
L : Γ(Rp,q , Win ) → Γ(Rp,q , Wout ) mapping between fea-
ture fields of types (Win , ρin ) and (Wout , ρout ), respectively. Analytical: Most commonly, steerable kernels are parame-
If L is demanded to be linear and Aff(G)-equivariant, then: terized in analytically derived steerable kernel bases.9 So-
1. L needs to be a convolution integral 5 lutions are known for SO(3) (Weiler et al., 2018a), O(3)
Z (Geiger et al., 2020) and any G ≤ O(2) (Weiler & Cesa,
L fin (u) = K ∗ fin (u) := K(v) fin (u−v) dv, 2019). Lang & Weiler (2021) and Cesa et al. (2022) gen-
Rp,q eralized this to any compact groups G ≤ U(d). However,
parameterized by a convolution kernel their solutions still require knowledge of irreducible rep-
resentations, Clebsch-Gordan coefficients and harmonic
K : Rp,q → HomVec (Win , Wout ) . (10) basis functions, which need to be derived and imple-
The kernel is operator-valued since it aggregates input mented for each single group individually. Furthermore,
features in Win linearly into output features in Wout .67 these solutions do not cover pseudo-orthogonal groups
2. The kernel is required to be G-steerable, that is, it O(p, q) of mixed signature, since these are non-compact.
needs to satisfy the G-equivariance constraint8 Regular: For regular and quotient representations, steerable
1 kernels can be implemented via channel permutations
K(gx) = ρout (g)K(x)ρin (g)−1 (11)
| det(g)| in the matrix dimensions. This is, for instance, done
=: ρHom (g)(K(x)) in regular group convolutions (Cohen & Welling, 2016;
Weiler et al., 2018b; Bekkers et al., 2018; Cohen et al.,
for any g ∈ G and x ∈ Rp,q . This constraint is dia- 2019a; Finzi et al., 2020). However, these approaches
grammatically visualized by the commutativity of: require finite G or rely on sampling compact G, again
K ruling out general (non-compact) O(p, q).
Rp,q HomVec (Win , Wout )
Numerical: Cohen & Welling (2017) solved the kernel con-
g· ρHom (g) (12) straint for finite G numerically. For SO(2), Haan et al.
(2021) derived numerical solutions based on Lie-algebra
Rp,q HomVec (Win , Wout ) representation theory. The numerical routine by Shutty
K
& Wierzynski (2022) solves for Lie-algebra irreps given
their structure constants. Corresponding Lie group irreps
Proof. See Theorem 4.3.1 in (Weiler et al., 2023).
follow via the matrix exponential, however, only on con-
5
dv is the usual Lebesgue measure on Rp+q . For the integral nected groups like the subgroups SO+ (p, q) of O(p, q).
to exist, we assume f to be bounded and have compact support.
6
HomVec (Win ,Wout ), the space of vector space homomor- Implicit: Steerable kernels are merely G-equivariant maps
phisms, consists of all linear maps Win → Wout . When putting between vector spaces Rp,q and HomVec (Win , Wout ).
Win = RCin and Wout = RCout , this space can be identified with Based on this insight, Zhdanov et al. (2023) parameterize
the space RCout ×Cin of Cout ×Cin matrices. them implicitly via G-equivariant MLPs. However, to
7
K : Rp,q → HomVec (Win , Wout ) itself need not be linear.
8 9
This is in particular not demanding K(v) to be (equivariant) Unconstrained kernels, Eq. (10), can be linearly combined,
homomorphisms of G-representations in HomG (Win , Wout ), de- and therefore form a vector space. The steerability constraint,
spite (Win , ρin ) and (Wout , ρout ) being G-representations. Only K Eq. (11) is linear. Steerable kernels span hence a linear subspace
itself is G-equivariant as map Rp,q → HomVec (Win ,Wout ). and can be parameterized in terms of a basis of steerable kernels.
4
Clifford-Steerable Convolutional Neural Networks
2.3. The Clifford Algebra & Clifford Group with some finite index set I and vi,k ∈ V and ci ∈ R.
Equivariant Neural Networks The main algebraic property of the Clifford algebra is that it
relates the geometric product of vectors v ∈ V to the inner
This section introduces multivector features, a specific type
product η on V by requiring:
of geometric feature vectors with O(p, q)-action. Multivec-
!
tors are the elements of a Clifford algebra Cl(V, η) corre- v • v = η(v, v) · 1Cl(V,η) ∀ v ∈ V ⊂ Cl(V, η) (14)
sponding to a pseudo-Euclidean R-vector space (V, η). The
Intuitively, this means that the product of a vector with itself
most relevant properties of Clifford algebras in relation to
collapses to a scalar value η(v, v) ∈ R ⊆ Cl(V, η), from
applications in geometric deep learning are the following:
which all other properties of the algebra follow by bilinearity.
• Cl(V, η) is, in itself, an R-vector space of dimension 2d This leads in particular to the fundamental relation11 :
with d := dim(V ) = p + q. This allows to use multivec-
v2 • v1 = −v1 • v2 + 2η(v1 , v2 )·1Cl(V,η) ∀ v1 , v2 ∈ V.
tors as feature vectors of neural networks (Brandstetter
et al., 2023; Ruhe et al., 2023b; Brehmer et al., 2023). For the standard orthonormal basis [e1 , . . . , ep+q ] of Rp,q
• As an algebra, Cl(V,η) comes with an R-bilinear opera- this reduces to the following simple rules:
tion
−ej • ei for i ̸= j (15a)
• : Cl(V, η) × Cl(V, η) → Cl(V, η),
ei • ej = η(ei , ei ) = +1 for i = j ≤ p (15b)
called geometric product.10 We can therefore multiply
η(ei , ei ) = −1 for i = j > p (15c)
multivectors with each other, which will be a key aspect
in various neural network operations. An (orthonormal) basis of Cl(V, η) is constructed by repeat-
• Cl(V, η) is furthermore a representation space of the edly taking geometric products of any basis vectors ei ∈ V .
pseudo-orthogonal group O(V, η) via ρCl , defined in Eq Note that, up to sign flip, (1) the ordering of elements in any
(19) below. This allows to use multivectors as features product is irrelevant due to Eq. (15a), and (2) any elements
of O(V, η)-equivariant networks (Ruhe et al., 2023a). occurring twice cancel out due to Eqs. (15b,15c).
A formal definition of Clifford algebras can be found in The basis elements constructed this way can be identified
Appendix D. Section 2.3.1 offers a less technical introduc- with (and labeled by) subsets A ⊆ [d] := {1, . . . , d}, where
tion, highlighting basic constructions and results. Sections the presence or absence of an index i ∈ A signifies whether
2.3.2 and 2.3.3 focus on the natural O(p, q)-action on multi- the corresponding ei appears in the product. Agreeing fur-
vectors, and on Clifford group equivariant neural networks. thermore on an ordering to disambiguate signs, we define
While we will later mostly be interested in (V, η) = Rp,q and
̸ ∅
eA := ei1 • ei2 • . . . • eik for A = {i1 < · · · < ik } =
O(V, η) = O(p, q), we keep the discussion here general.
and e∅ := 1Cl(V,η) . From this, it is clear that dim Cl(V, η)
2.3.1. I NTRODUCTION TO THE C LIFFORD ALGEBRA = 2d . Table 1 gives a specific example for (V, η) = R1,2 .
Multivectors are constructed by multiplying and summing Any multivector x ∈ Cl(V, η) can be uniquely expanded in
vectors. Specifically, l vectors v1 , . . . , vl ∈ V multiply to this basis, X
v1 • . . . • vl ∈ Cl(V, η). A general multivector arises as a x = xA · eA , (16)
A⊆[d]
10
The geometric product is unital, associative, non-commutative, where xA ∈ R are coefficients.
and O(V, η)-equivariant. Its main defining property is highlighted
11
in Eq. (14). A proper definition is given in Definition D.2, Eq. (73). To see this, use v := v1 + v2 in Eq. (14) and expand.
5
Clifford-Steerable Convolutional Neural Networks
Note that there are kd basis elements eA of “grade” |A| = k, invertible: ρCl (g)−1 (x) = ρCl (g −1 )(x),
Finally, the inner product η on V is naturally extended to This representation ρCl reduces furthermore to independent
Cl(V, η) by defining η̄ : Cl(V, η) × Cl(V, η) → R as sub-representations on individual k-vectors.
X Theorem 2.15 (O(V, η)-action on grades Cl(k) (V, η)). Let
η̄(x, y) := η A · xA · yA , (18) g ∈ O(V, η), x ∈ Cl(V, η) and k ∈ 0, . . . , d a grade.
A⊆[d]
Q The grade projection ( · )(k) is O(V, η)-equivariant:
where ηA := i∈A η(ei ,ei ) ∈ {±1} are sign factors. The
tuple (eA )A⊆[d] is an orthonormal basis of Cl(V, η) w.r.t. η̄. ρCl (g) x
(k)
= ρCl (g) x(k)
(24)
All of these constructions and statements are more formally
defined and proven in the appendix of (Ruhe et al., 2023b). ( · )(k)
Cl(V, η) Cl(k) (V, η)
2.3.2. C LIFFORD GRADES AS O(p,q)- REPRESENTATIONS ρCl (g) ρCl (g) (25)
The individual grades Cl(k) (V, η) turn out to be representa- Cl(V, η) Cl(k) (V, η)
tion spaces of the (abstract) pseudo-orthogonal group (19) ( · )(k)
O(V, η) := g ∈ GL(V ) ∀v ∈ V : η(gv,gv) = η(v,v) , This implies in particular that Cl(V, η) is reducible to sub-
representations Cl(k) (V, η), i.e. ρCl (g) does not mix grades.
which coincides for (V, η) = Rp,q with O(p, q) in Def. 2.2.
O(V, η) acts thereby on multivectors by individually multi- Proof. Both theorems are proven in (Ruhe et al., 2023a).
plying each 1-vector from which they are constructed with g.
Definition/Theorem 2.14 (O(V, η)-action on Cl(V, η)). 2.3.3. O(p,q)- EQUIVARIANT C LIFFORD N EURAL N ETS
Let (V, η) be a pseudo-Euclidean space, g, gi ∈ O(V, η),
ci ∈ R, vi,j ∈ V , x, xi ∈ Cl(V, η), and I a finite index set. Based on those properties, Ruhe et al. (2023a) proposed
Define the orthogonal algebra representation Clifford group equivariant neural networks (CGENNs). Due
to a group isomorphism, this is equivalent to the network’s
ρCl : O(V, η) → OAlg (Cl(V, η), η̄) 12 (20) O(V, η)-equivariance.
Definition/Theorem 2.16 (Clifford Group Equivariant NN).
of O(V, η) via the canonical O(V, η)-action on each of the
Consider a grade k = 0, ..., d and weights wmn k
∈ R. A
contained 1-vectors:
X Clifford group equivariant neural network (CGENN) is con-
ρCl (g) ci · vi1 • . . . • viji (21) structed from the following functions, operating on one or
X
i∈I more multivectors xi ∈ Cl(V, η).
:= ci · (gvi1 ) • . . . • (gviji ). Linear layers: mix k-vectors. For each 1 ≤ m ≤ cout :
i∈I
Xcin
ρCl is well-defined as an orthogonal representation: L(k)
m (x1 , . . . , xcin ) :=
k
wmn · x(k)
n (26)
n=1
linear: ρCl (g)(c1 · x1 + c2 · x2 ) Such weighted linear mixing within sub-representations
= c1 · ρCl (g)(x1 ) + c2 · ρCl (g)(x2 ) Cl(k) (V, η) is common in equivariant MLPs.
Geometric product layers: compute weighted geometric
composing: ρCl (g2 ) (ρCl (g1 )(x)) = ρCl (g2 g1 )(x) products with grade-dependent weights:
(27)
12
Xd Xd (n) (k)
OAlg Cl(V, η), η̄ is the group of all linear orthogonal trans- (k)
P (x1 , x2 ) := k (m)
wmn · x1 • x2
formations of Cl(V, η) that are also multiplicative w.r.t. • . m=0 n=0
6
Clifford-Steerable Convolutional Neural Networks
This is similar to the irrep-feature tensor products in that satisfies the following O(p, q)-steerability (equivari-
MACE (Batatia et al., 2022). ance) constraint for every g ∈ O(p, q) and v ∈ Rp,q .13 (32)
K(gv) = ρcClout (g) K(v) ρcClin (g −1 ) =: ρHom (g)(K(v)),
!
Nonlinearity: As activations, we use A(x) := x · Φ x(0)
where Φ is the CDF of the Gaussian distribution. This is
inspired by GatedGELU from Brehmer et al. (2023). As mentioned in Section 2.2.2, constructing such O(p, q)-
steerable kernels is typically difficult. To overcome this chal-
All of these operations are by Theorems 2.14 and 2.15 lenge, we follow Zhdanov et al. (2023) and implement the
O(V, η)-equivariant. kernels implicitly. Specifically, they are based on O(p, q)-
equivariant “kernel networks”14
3. Clifford-Steerable CNNs K : Rp,q → Cl(Rp,q )cout ×cin , (33)
This section presents Clifford-Steerable Convolutional Neu- implemented as CGENNs (Section 2.3.3).
ral Networks (CS-CNNs), which operate on multivector Unfortunately, the codomain of K is Cl(R p,q cout ×cin
fields on Rp,q , and are equivariant to the isometry group ) in-
stead of HomVec Cl(Rp,q )cin , Cl(Rp,q )cout , as required by
E(p, q) of Rp,q . To achieve E(p, q)-equivariance, we need steerable kernels, Eq. (31). To bridge the gap between these
to find a way to implement O(p, q)-steerable kernels (Sec- spaces, we introduce an O(p,q)-equivariant linear layer,
tion 2.2), which we do by leveraging the connection between called kernel head H. Its purpose is to transform the kernel
Cl(Rp,q ) and O(p, q) presented in Section 2.3. network’s output k := K(v) ∈ Cl(Rp,q )cout ×cin into the
CS-CNNs process (multi-channel) multivector fields desired R-linear map between multivector
channels H(k ) ∈
HomVec Cl(Rp,q )cin , Cl(Rp,q )cout . The relation between
f : Rp,q → Cl(Rp,q )c (28) kernel network K, kernel head H, and the resulting steer-
of type (W, ρ) = (Cl(Rp,q )c , ρcCl ) with c ≥ 1 channels. able kernel K := H ◦ K is visualized in Figs. 3 and 4.
The representation To achieve O(p,q)-equivariance (steerability) of K = H ◦ K,
Lc
ρcCl = p,q c we have to make the kernel head H of a specific form:
i=1 ρCl : O(p, q) → GL Cl(R ) (29)
Definition 3.1 (Kernel head). A kernel head is a map
is given by the action ρCl from Definition/Theorem 2.14,
H : Cl(Rp,q )cout×cin → HomVec Cl(Rp,q )cin, Cl(Rp,q )cout
however, applied to each of the c components individually.
Following Theorem 2.12, our main goal is the construction k 7→ H(k ), (34)
of a convolution operator where the R-linear operator
L : Γ Rp,q , Cl(Rp,q )cin → Γ Rp,q , Cl(Rp,q )cout ,
Z H(k ) : Cl(Rp,q )cin → Cl(Rp,q )cout , f 7→ H(k )[ f ],
L(fin )(u) := K(v) fin (u − v) dv, (30) is defined on each output channel i ∈ [cout ] and grade
Rp,q
13
parameterized by a convolution kernel The volume factor | det g| = 1 drops out for g ∈ O(p, q).
14
The kernel network’s output Cl(Rp,q )cout ·cin is here reshaped
K : Rp,q → HomVec Cl(Rp,q )cin , Cl(Rp,q ) cout
(31) to matrix form Cl(Rp,q )cout ×cin .
7
Clifford-Steerable Convolutional Neural Networks
K
Figure 4. Construction and O(p,q)-
equivariance of implicit steerable
K H
Rp,q Cl(Rp,q )cout ×cin
kernels K = H ◦ K, which are com- HomVec Cl(Rp,q )cin , Cl(Rp,q )cout
posed from a kernel network K with
cout×cin multivector outputs and a g· cout ×cin
ρCl (g) ρHom (g)
kernel head H. The whole diagram
commutes. The two inner squares
Rp,q Cl(Rp,q )cout ×cin
show the individual equivariance of HomVec Cl(Rp,q )cin , Cl(Rp,q )cout
K H
K and H, from which the kernel’s
overall equivariance follows.
K
8
Clifford-Steerable Convolutional Neural Networks
Clifford-steerable ResNet (Ours) Basic ResNet Steerable ResNet Clifford ResNet FNO G-FNO
40
30
−3
10
10−3 20 Maxwell R1,2
0 2000 4000 200 400 25000 50000 75000 10−5 10−2 100
No. Training Simulations No. Training Simulations Training Step O(2) Equivariance Error
Figure 5. Plots 1 & 2: Mean squared errors (MSEs) on the Navier-Stokes R2 and Maxwell R3 forecasting tasks (one-step loss) as a
function of number of training simulations. Plot 3: MSE test loss convergence of our model vs. a basic ResNet on the relativistic
Maxwell R1,2 task. The ResNet does not match the performance of CS-CNNs even for vastly larger training datasets. Plot 4: Relative
O(2)-equivariance errors of models trained on Navier-Stokes R2 . G-FNOs fail as they cannot correctly ingest multivector data.
9
Clifford-Steerable Convolutional Neural Networks
across different inertial frames of reference. This is relevant • CGENNs and CS-CNNs rely on equivariant operations
as the simulated electromagnetic fields are induced by parti- that treat multivector-grades Cl(k) (V, η) as “atomic” fea-
cles moving at relativistic velocities. We see in Plot 3 that tures. However, it is not clear whether grades are always
CS-CNNs converge significantly faster and are more sample irreducible representations, that is, there might be fur-
efficient than basic ResNets. ther equivariant degrees of freedom which would treat
irreducible sub-representations independently.
Fig. 6 visualizes predictions of CS-ResNets and basic
ResNets on Navier-Stokes R2 and Maxwell R1,2 . Our • We observed that the steerable kernel spaces of CS-CNNs
model is much more accurately capturing fine details, de- are not necessarily complete, i.e., certain degrees of free-
spite being trained on less data. dom might be missing. However, we show in Apx. B how
they are recovered by composing multiple convolutions.
Equivariance error: To assess the models’ E(2)-equivari- • O(p, q) and their group orbits on Rp,q are for p, q ̸= 0
ance, we measure the relative error |f (g.x)−g.f (x)|
|f (g.x)+g.f (x)| between non-compact; for instance, the hyperbolas in spacetimes
(1) the output computed from a transformed input; and (2) R1,q extend to infinity. In practice, we sample convo-
the transformed output, given the original input. As shown lution kernels on a finite sized grid as shown in Fig. 3.
in Fig. 5 (right), both steerable models are equivariant up to This introduces a cutoff, breaking equivariance for large
numerical artefacts. Despite training, the other models did transformations. Note that this is an issue not specific to
not become equivariant at all. This holds in particular for CS-CNNs, but it applies e.g. to scale-equivariant CNNs
G-FNO, which covers only a subgroup of discrete rotations. as well (Bekkers, 2020; Romero et al., 2024).
Despite these limitations, CS-CNNs excel in our experi-
5. Conclusions ments. A major advantage of CGENNs and CS-CNNs is
that they allow for a simple, unified implementation for
We presented Clifford-Steerable CNNs, a new theoretical arbitrary signatures (p,q). This is remarkable, since steer-
framework for E(p,q)-equivariant convolutions on pseudo- able kernels usually need to be derived for each symmetry
Euclidean spaces such as Minkowski-spacetime. CS-CNNs group individually. Furthermore, our implementation ap-
process fields of multivectors – geometric features which plies both to multivector fields sampled on pixel grids and
naturally occur in many areas of physics. The required point clouds.
O(p,q)-steerable convolution kernels are implemented im-
plicitly via Clifford group equivariant neural networks. This CS-CNNs are, to the best of our knowledge, the first con-
makes so far unknown analytic solutions for the steerability volutional networks that respect the full symmetries E(p,q)
constraint unnecessary. CS-CNNs significantly outperform of Minkowski spacetime or any other pseudo-Euclidean
baselines on a variety of physical dynamics tasks. spaces. Even more generally, CS-CNNs are readily ex-
tended to arbitrary curved pseudo-Riemannian manifolds,
From the viewpoint of general steerable CNNs, there are and such convolutions will necessarily rely on O(p,q)-
some limitations: steerable kernels. For more details see Appendix F and
• There exist more general field types (O(p,q)-rep- (Weiler et al., 2023). They could furthermore be adapted
resentations) than multivectors, for which CS-CNNs do to steerable PDOs (partial differential operators) (Jenner &
not provide steerable kernels. For connected Lie groups, Weiler, 2022), which would connect them to the multivec-
e.g. the subgroups SO+(p,q), these types can in principle tor calculus used in mathematical physics (Hestenes, 1968;
be computed numerically (Shutty & Wierzynski, 2022). Hitzer, 2002; Lasenby et al., 1993).
10
Clifford-Steerable Convolutional Neural Networks
11
Clifford-Steerable Convolutional Neural Networks
Holl, P., Thuerey, N., and Koltun, V. Learning to Con- Sosnovik, I., Szmaja, M., and Smeulders, A. W. M. Scale-
trol PDEs with Differentiable Physics. In International Equivariant Steerable Networks. In International Confer-
Conference on Learning Representations (ICLR), 2020. ence on Learning Representations (ICLR), 2020.
Jenner, E. and Weiler, M. Steerable Partial Differential Op- Wang, R., Walters, R., and Yu, R. Incorporating Symmetry
erators for Equivariant Neural Networks. In International into Deep Dynamics Models for Improved Generalization.
Conference on Learning Representations (ICLR), 2022. In International Conference on Learning Representations
(ICLR), 2021.
Kingma, D. P. and Ba, J. Adam: A Method for Stochastic
Optimization. In International Conference on Learning Wang, S. Extensions to the navier–stokes equations. Physics
Representations (ICLR), volume abs/1412.6980, 2015. of Fluids, 34(5), 2022.
Lang, L. and Weiler, M. A Wigner-Eckart Theorem for Weiler, M. and Cesa, G. General E(2)-Equivariant Steerable
Group Equivariant Convolution Kernels. In International CNNs. In Conference on Neural Information Processing
Conference on Learning Representations (ICLR), 2021. Systems (NeurIPS), pp. 14334–14345, 2019.
Lasenby, A., Doran, C., and Gull, S. A multivector deriva- Weiler, M., Geiger, M., Welling, M., Boomsma, W., and
tive approach to lagrangian field theory. Foundations of Cohen, T. 3d Steerable CNNs: Learning Rotationally
Physics, 23(10):1295–1327, 1993. Equivariant Features in Volumetric Data. In Conference
on Neural Information Processing Systems (NeurIPS), pp.
Li, Z., Kovachki, N. B., Azizzadenesheli, K., Liu, B., Bhat-
10402–10413, 2018a.
tacharya, K., Stuart, A. M., and Anandkumar, A. Fourier
Neural Operator for Parametric Partial Differential Equa- Weiler, M., Hamprecht, F. A., and Storath, M. Learning
tions. In International Conference on Learning Represen- Steerable Filters for Rotation Equivariant CNNs. In Com-
tations (ICLR), 2021. puter Vision and Pattern Recognition (CVPR), 2018b.
Lindeberg, T. Scale-space. 2009. Weiler, M., Forré, P., Verlinde, E., and Welling, M. Coordi-
nate Independent Convolutional Networks – Isometry and
Loshchilov, I. and Hutter, F. Sgdr: Stochastic Gradient
Gauge Equivariant Convolutions on Riemannian Mani-
Descent with Warm Restarts. In International Conference
folds. arXiv preprint arXiv:2106.06020, 2021.
on Learning Representations (ICLR), 2017.
Weiler, M., Forré, P., Verlinde, E., and Welling, M. Equivari-
Marcos, D., Kellenberger, B., Lobry, S., and Tuia, D. Scale
ant and Coordinate Independent Convolutional Networks.
equivariance in CNNs with vector fields. arXiv preprint
2023. URL [Link]
arXiv:1807.11783, 2018.
.io/cnn_book/EquivariantAndCoordinat
Orbán, X. P. and Mira, J. Dimensional scaffolding of electro- [Link].
magnetism using geometric algebra. European Journal
Worrall, D. E. and Welling, M. Deep Scale-spaces: Equiv-
of Physics, 42(1):015204, 2021.
ariance Over Scale. In Conference on Neural Information
Romero, D. W., Bekkers, E., Tomczak, J. M., and Hoogen- Processing Systems (NeurIPS), pp. 7364–7376, 2019.
doorn, M. Wavelet Networks: Scale-Translation Equiv-
Wu, Y. and He, K. Group Normalization. In European
ariant Learning From Raw Time-Series. Transactions on
Conference on Computer Vision (ECCV), pp. 3–19, 2018.
Machine Learning Research, 2024.
Ruhe, D., Brandstetter, J., and Forré, P. Clifford Group Zhang, X. and Williams, L. R. Similarity equivariant linear
Equivariant Neural Networks. In Conference on Neu- transformation of joint orientation-scale space representa-
ral Information Processing Systems (NeurIPS), volume tions. arXiv preprint arXiv:2203.06786, 2022.
abs/2305.11141, 2023a. Zhdanov, M., Hoffmann, N., and Cesa, G. Implicit Convo-
Ruhe, D., Gupta, J. K., Keninck, S. D., Welling, M., and lutional Kernels for Steerable CNNs. In Conference on
Brandstetter, J. Geometric Clifford Algebra Networks. In Neural Information Processing Systems (NeurIPS), 2023.
International Conference on Machine Learning (ICML), Zhu, W., Qiu, Q., Calderbank, A. R., Sapiro, G., and Cheng,
pp. 29306–29337, 2023b. X. Scaling-Translation-Equivariant Networks with De-
Shutty, N. and Wierzynski, C. Computing Representations composed Convolutional Filters. Journal of Machine
for Lie Algebraic Networks. NeurIPS 2022 Workshop Learning Research (JMLR), [Link]–68:45, 2022.
on Symmetry and Geometry in Neural Representations,
2022.
12
Clifford-Steerable Convolutional Neural Networks
Appendix
A. Implementation details In the positive-definite case of O(n), this means that the
only degree of freedom is the radial distance from the origin,
This appendix provides details on the implementation of resulting in (hyper)spherical orbits. Examples of such ker-
CS-CNNs.16 nels can be seen in Fig. 8. Other radial kernels are obtained
Before detailing the Clifford-steerable kernels and convolu- typically through e.g. Gaussian shells, Bessel functions, etc.
tions, we first define the following “kernel shell” operation, In the nondefinite case of O(p, q), the orbits are hyper-
which is used twice in the final kernel computation. Re- boloids, resulting in hyperboloid shells for e.g. the Lorentz
call that given the base space Rp,q equipped with the inner group O(1, 3) as in Fig. 3. In this case, we extend the
product η p,q , we have a Clifford algebra Cl(Rp,q ). We want input to the kernel with a scalar component that now relates
to compute a kernel that maps from cin multivector input to the hyperbolic (squared) distance from the origin.
channels to cout multivector output channels, i.e.,
Specifically, we define an exponentially decaying η p,q -
K : Rp,q → HomVec Cl(Rp,q )cin , Cl(Rp,q )cout . (39)
induced (parameterized) scalar orbital shell (analogous to
the radial shell of typical Steerable CNNs) in the following
K is defined on any v ∈ Rp,q , which allows to model point way. We parameterize a kernel width σ and compute the
clouds. In this work, however, we sample it on a grid of shell value as
shape X1 , . . . , Xp+q , analogously to typical CNNs.
|η p,q (v, v)|
p,q
sσ (v) = sgn (η (v, v)) · exp − . (42)
2σ 2
A.1. Clifford Embedding
The width σ ∼ U(0.4, 0.6) is, inspired by (Cesa et al., 2022),
We briefly discuss how one is able to embed scalars and initialized with a uniform distribution. Since η p,q (v, v) can
vectors into the Clifford algebra. This extends to other be negative in the nondefinite case, we take the absolute
grades such as bivectors. value and multiply the result by the sign of η p,q (v, v). Com-
Let s ∈ R and v ∈ Rp,q . Using the natural isomorphisms putation of the kernel shell (S CALAR S HELL) is outlined
∼ ∼
E (0) : R −
→ Cl(Rp,q )(0) and E (1) : Rp,q −
→ Cl(Rp,q )(1) , in Function 1. Intuitively, we obtain exponential decay for
we embed the scalar and vector components into a multivec- points far from the origin. However, the sign of the in-
tor as ner product ensures that we clearly disambiguate between
m := E (0) (s) + E (1) (v) ∈ Cl(Rp,q ) . (40) “light-like” and “space-like” points. I.e., they are close in Eu-
clidean distance but far in the η p,q -induced distance. Note
This is a standard operation in Clifford algebra computa-
that this choice of parameterizing scalar parts of the kernel
tions, where we leave the other components of the multi-
is not unique and can be experimented with.
vector zero. We denote such embeddings in the algorithms
provided below jointly as “CL EMBED([s, v])”.
A.3. Kernel Network
A.2. Scalar Orbital Parameterizations Recall from Section 3 that the kernel K is parameterized by
a kernel network, which is a map
Note that the O(p, q)-steerability constraint
K : Rp,q → Cl(Rp,q )cout ×cin (43)
(g) K(v) ρcClin (g −1 )
! cout
K(gv) = ρCl =: ρHom (g)(K(v)) implemented as an O(p, q)-equivariant CGENN. It consists
∀ v ∈ Rp,q , g ∈ O(p, q) of (linearly weighted) geometric product layers followed by
multivector activations.
couples kernel values within but not across different O(p, q)-
orbits Let {vn }N n=1 be a set of sampling points, where N :=
O(p, q).v := {gv | g ∈ O(p, q)} (41) X1 · . . . · Xp+q . In the remainder, we leave iteration over n
implicit and assume that the operations are performed for
= {w | η(w, w) = η(v, v)} . each n. We obtain a sequence of scalars using the kernel
The first line here is the usual definition of group orbits, shell
while the second line makes use of the Def. 2.5 of pseudo- sn := sσ (vn ) . (44)
orthogonal groups as metric-preserving linear maps.
The input to the kernel network is a batch of multivectors
16
[Link]
up-equivariant-cnns xn := CL EMBED([sn , vn ]) . (45)
13
Clifford-Steerable Convolutional Neural Networks
Function 1 S CALAR S HELL I.e., taking s and v together, they form the scalar and vec-
input η p,q , v ∈ Rp,q , σ. tor components of the CEGNN’s input multivector. We
found including the scalar component crucial for the correct
p,q
s ← sgn (η p,q (v, v)) · exp − |η 2σ(v,v)|
2
scaling of the kernel to the range of the grid.
return s
Let i = 1, . . . , cin and o = 1, . . . , cout be a sequence of
Function 2 C LIFFORD S TEERABLE K ERNEL input and output channels. We then have the kernel network
N
output
input p, q Λ, cin , cout , (vn )n=1 ∈ Rp,q , CGENN
d d
output k ∈ R(cout ·2 )×(cin ·2 )×X1 ×···×Xp+q knoi := K(vn )oi := CGENN(xn )oi , (46)
# Weighted Cayley. where knoi ∈ Cl(Rp,q ) is the output of the kernel network
for i = 1 . . . cin , o = 1 . . . cout , a, b, c = 1 . . . p + q do for the input multivector xn (embedded from the scalar sn
c
woiab ∼ N (0, √c 1 ·N ) # Weight init. and vector vn ). Once the output stack of multivectors is
in
computed, we reshape it from shape (N, cout · cin ) to shape
Woiab ← Λcab · woiab
c c
(N, cout , cin ), resulting in the kernel matrix
end for
k ← RESHAPE (k , (N, cout , cin )) , (47)
σ ∼ U(0.4, 0.6) # Init if needed.
# Compute scalars. where now k ∈ Cl(Rp,q )N ×cout ×cin . Note that kn ∈
sn ← S CALAR S HELL(η p,q , vn , σ) Cl(Rp,q )cout ×cin is a matrix of multivectors, as desired.
# Embed s and v into a multivector.
xn ← CL EMBED ([sn , vn ]) A.4. Masking
# Evaluate kernel network. We compute a second set of scalars which will act as a
knio := CGENN (xn ) mask for the kernel. This is inspired by Steerable CNNs
to ensure that the (e.g., radial) orbits of compact groups
# Reshape to kernel matrix. are fully represented in the kernel, as shown in Figure 8.
k ← RESHAPE (k , (N, cout , cin )) However, note that for O(p, q)-steerable kernels with both
p, q ̸= 0 this is never fully possible since O(p, q) is in
# Compute kernel mask. general not compact, and all orbits except for the origin
for i = 1 . . . cin , o = 1 . . . cout , k = 0 . . . p + q do extend to infinity. This can e.g. be seen in the hyperbolic-
σkio ∼ U(0.4, 0.6) # Init if needed. shaped kernels in Figure 4.
sknoi ← S CALAR S HELL(η p,q , vn , σkio )
For equivariance to hold in practice, whole orbits would
end for
need to be present in the kernel, which is not possible if
(k) (k) the kernel is sampled on a grid with finite support. This
knoi ← knoi · sknoi # Mask kernel.
is not specific to our architecture, but is a consequence of
the orbits’ non-compactness. The same issue arises e.g. in
# Kernel
P2dhead.
c a c scale-equivariant CNNs (Romero et al., 2024; Worrall &
knoib ← a=1 knoi · Woiab # Partial weighted
Welling, 2019; Ghosh & Gupta, 2019; Sosnovik et al., 2020;
geometric product.
Bekkers, 2020; Zhu et al., 2022; Marcos et al., 2018; Zhang
& Williams, 2022). Further experimenting is needed to
# Reshape to final kernel. understand the impact of truncating the kernel on the final
k ← RESHAPE k , cout · 2d , cin · 2d , X1 , . . . , Xp+q performance of the model.
return k
We invoke the kernel shell function again to compute a mask
Function 3 C LIFFORD S TEERABLE C ONVOLUTION for each k = 0, . . . , p + q, i = 1, . . . , cin , o = 1, . . . , cout .
N
That is, we have a weight array σkio , initialized identically
input Fin , (vn )n=1 , A RGS as earlier, which is reused for each position in the grid.
output Fout
Fin ← RESHAPE(Fin , (B, cin · 2d , Y1 , . . . , Yp+q )) sknio := sσkio (vn ) . (48)
k ← C LIFFORD S TEERABLE K ERNEL((vn )N n=1 , A RGS )
Fout ← C ONV(Fin , k ) We then mask the kernel by scalar multiplication with the
Fout ← RESHAPE(Fout , (B, cout , Y1 , . . . , Yp+q , 2d )) shell, i.e.,
return Fout
(k) (k)
kkio ← knio · sknio . (49)
14
Clifford-Steerable Convolutional Neural Networks
where A, B, C ⊆ [d] are multi-indices running over the 2d (B, cin , Y1 , . . . , Yp+q , 2d ) into (B, cin · 2d , Y1 , . . . , Yp+q ).
d d d The output array of shape (B, cout · 2d , Y1 , . . . , Yp+q ) is
basis elements of Cl(Rp,q ). Here, Λ ∈ R2 ×2 ×2 is the
p,q obtained by convolving the input with the kernel, which
Clifford multiplication table of Cl(R ), also sometimes
is then reshaped to (B, cout , Y1 , . . . , Yp+q , 2d ), which can
called a Cayley table. It is defined as
( then be interpreted as a stack of multivector fields again.
C 0 if A△B ̸= C
ΛA,B = .
sgn A,B
·η̄(eA∩B , eA∩B ) if A△B = C B. Completeness of kernel spaces
(51) In order to not over-constrain the model, it is essential to
Here, △ denotes the symmetric difference of sets, i.e., parameterize a complete basis of O(p,q)-steerable kernels.
A△B = (A \ B) ∪ (B \ A). Further, Comparing our implicit O(2,0) = O(2)-steerable kernels
with the analytical solution by (Weiler & Cesa, 2019), we
sgnA,B := (−1)nA,B , (52) find that certain degrees of freedom are missing; see Fig. 8.
where nA,B is the number of adjacent “swaps” one needs However, while these degrees of freedom are missing in a
to fully sort the tuple (i1 , . . . , is , j1 , . . . , jt ), where A = single convolution operation, they can be fully recovered by
{i1 , . . . , is } and B = {j1 , . . . , jt }. In the following, we applying two consecutively convolutions. This suggests that
identify the multi-indices A, B, and C with a relabeling a, the overall expressiveness of CS-CNNs is (at least for O(2))
b, and c that run from 1 to 2d .
not diminished. Moreover, two convolutions with kernels K b
Altogether, Λ defines a multivector-valued bilinear form and K can always be expressed as a single convolution with
which represents the geometric product relative to the cho- a composed kernel K b ∗K. As visualized below, this com-
sen multivector basis. We can weight its entries with pa- posed kernel recovers the full degrees of freedom reported
c
rameters woiab ∈ R, initialized as woiab
c
∼ N (0, √c 1 ·N ). in (Weiler & Cesa, 2019):
in
These weightings can be redone for each input channel and
output channel, as such we have a weighted Cayley table
d d d
W ∈ R2 ×2 ×2 ×cin ×cout with entries
c
Woiab := Λcab woiab
c
. (53)
An ablation study in appendix C.4 demonstrates the great
relevance of the weighting parameters empirically.
Given the kernel matrix k , we compute the kernel by partial
Figure 7.
(weighted) geometric product evaluation, i.e.,
X2d
c a c
knoib ← knoi · Woiab . (54)
a=1
c
Finally, we reshape and permute knoib from shape
d d
(N, cout , cin , 2 , 2 ) to its final shape, i.e.,
k ← RESHAPE k , cout · 2d , cin · 2d , X1 , . . . , Xp+q .
This is the final kernel that can be used in a convolutional The following two sections discuss the initial differences
layer, and can be interpreted (at each sample coordinate) in kernel parametrizations and how they are resolved by
as an element of HomVec Cl(Rp,q )cin , Cl(Rp,q )cout . adding a second linear or convolution operation. Unless
The pseudocode for the Clifford-steerable kernel stated otherwise, we focus here on cin = cout = 1 channels
(C LIFFORD S TEERABLE K ERNEL) is given in Function 2. to reduce clutter.
15
Clifford-Steerable Convolutional Neural Networks
CS-CNN parametrization
in scalar vector pseudoscalar
out
⊤
1 e1 , e2 e12
s s
∅
1 wss Rs (r) 1 wvv Rv (r) − sin(ϕ) cos(ϕ)
" # " # " # " #
e1 v −sin(ϕ) v 1 0 v cos(ϕ)
wvs Rv (r) wsv Rs (r) wvp Rv (r)
e2 cos(ϕ) 0 1 sin(ϕ)
∅ p p
e12 wvv Rv (r) cos(ϕ) sin(ϕ) wsp Rs (r) 1
in ⊤
out 1 e1 , e 2 e12
Figure 8. Comparison of the parametrization of O(2)-steerable kernels in CS-CNNs (top and middle) and e2cnn (bottom). While the
e2cnn solutions are proven to be complete, CS-CNN seems to miss certain degrees of freedom:
(1) Their radial parts are coupled in the components highlighted in blue and green, while escnn allows for independent radial parts. By
k
“coupled” we mean that they are merely scaled relative to each other with weights wmn from the weighted geometric product operation in
(m)
the kernel head H, where m labels grade K of the kernel network output while n, k label input and output grades of the expanded
kernel in HomVec Cl(Rp,q ), Cl(Rp,q ) ;
(2) CS-CNN is missing kernels of angular frequency 2 that are admissible for mapping between vector fields; highlighted in red.
As explained in Appendix B, these missing degrees of freedom are recovered when composing two convolution layers. A kernel
corresponding to the composition of two convolutions in a single one is visualized in Fig. 7.
16
Clifford-Steerable Convolutional Neural Networks
B.1. Coupled radial dependencies in CS-CNN kernels Comparison: Note that the complete solutions by (Weiler
& Cesa, 2019) allow for a different radial part Rkn for
The first issue is that the CS-CNN parametrization implies a
each pair of input and output type (grade/irrep). In con-
coupling of radial degrees of freedom. To make this precise,
trast, the CS-CNN parametrization expands coupled radial
note that the O(2)-steerability constraint k
parts Rm , additionally multiplying them with weights wmn
K(gv) = ρcClout (g) K(v) ρcClin (g −1 ) ∀ v ∈ R2 , g ∈ O(2) (highlighted in the table in blue and green). The CS-CNN
!
parametrization is therefore clearly less general (incom-
decouples into independent constraints on individual O(2)- plete).
orbits on R2 , which are rings at different radii (and the
origin); visualized in Fig. 2 (left). (Weiler et al., 2018a; Solutions: One idea to resolve this shortcoming is to make
Weiler & Cesa, 2019) parameterize the kernel therefore in the weighted geometric product parameters themselves radi-
(hyper)spherical coordinates. In our case these are polar ally dependent,
coordinates of R2 , i.e. a radius r ∈ R≥0 and angle ϕ ∈ S 1 :
k
wmn : R≥0 → R, k
r 7→ wmn (r) , (59)
K(r, ϕ) := R(r)κ(ϕ) (55)
The O(2)-steerability constraint affects only the angular for instance by parameterizing the weights with a neural net-
part and leaves the radial part entirely free, such that it can work. This would fully resolve the under-parametrization,
be parameterized in an arbitrary basis or via an MLP. and would preserve equivariance, since O(2)-steerability
depends only on the angular variable.
e2cnn: Weiler & Cesa (2019) solved analytically for com- However, doing this is actually not necessary, since the
plete bases of the angular parts. Specifically, they derive missing flexibility of radial parts can always be resolved
solutions by running a convolution followed by a linear layer (or a
K kn (r, ϕ) = Rkn (r)κkn (ϕ) (56) second convolution) when cout > 1. The reason for this is
that different channels i = 1, . . . , cout of a kernel network
for any pair of input and output field types (irreps of grades) K : R → Cl(R)cout ×cin do have independent radial parts.
n and k, respectively. This complete basis of O(2)-steerable Their convolution responses in different channels can by
kernels is shown in the bottom table of Fig. 8. a subsequent linear layer be mixed with grade-dependent
weights. By linearity, this is equivalent to immediately mix-
CS-CNNs: CS-CNNs parameterize the kernel in terms of ing the channels’ radial parts with grade-dependent weights,
a kernel network K : Rp,q → Cl(Rp,q )cout ×cin , visualized resulting in effectively decoupled radial parts.
in Fig. 8 (top). Expressed in polar coordinates, assuming
cin = cout = 1, and considering the independence of K B.2. Circular harmonics order 2 kernels
on different orbits due to its O(2)-equivariance, we get the
factorization A second issue is that the CS-CNN parametrization is miss-
ing a basis kernel of angular frequency 2 that maps between
K(r, ϕ)(m) = Rm (r)κm (ϕ) , (57)
vector fields; highlighted in red in the bottom table of Fig. 8.
where m is the grade of the multivector-valued output. As However, it turns out that this degree of freedom is repro-
described in Appendix A.5 (Eq. (53)), the kernel head oper- duced as the difference of two consecutive convolutions (∗),
ation H expands this output by multiplying it with weights one mapping vectors to pseudoscalars and back to vectors,
Wmn k
= Λkmn wmn k
, where wmnk
∈ R are parameters and the other one mapping vectors to scalars and back to vectors,
k
Λmn ∈ {−1, 0, 1} represents the geometric product relative as suggested in the (non-commutative!) computation flow
to the standard basis of Rp,q . Note that we do not consider diagram below:
multiple in or output channels here. The final expanded
∗
kernel for CS-CNNs is hence given by ∗ pseudo vector
⊖
X
K kn (r, ϕ) = Wmn k
K(r, ϕ)(m) (58) vector vector
m ∗
X scalar ∗ vector
= Λkmn wmn
k
Rm (r)κm (ϕ) .
m
As background on the angular frequency 2 kernel, note that
These solutions are listed in the top table in Fig. 8, and O(2)-steerable kernels between irreducible field types of an-
visualized in the graphics above.17 gular frequencies j and l contain angular frequencies |j − l|
17
The parameter Λkmn appears in the table as selecting to which and j + l – this is a consequence of the Clebsch-Gordan de-
entry k, n of the table grade K(r, ϕ)(m) is added (optionally with composition of O(2)-irrep tensor products (Lang & Weiler,
minus signs). 2021). We identify multivector grades Cl(R2,0 )(k) with the
17
Clifford-Steerable Convolutional Neural Networks
for any α, β ∈ R. The “visual proof” by convolving kernels is clearly only sug-
gestive. To make it precise, it would be required to compute
Using associativity, we can express two consecutive convo- the convolutions of two kernels analytically. This is easily
lutions, first going from vector to scalar fields via done by identifying circular harmonics with derivatives of
Gaussian kernels; a relation that is well known in classical
K sv (r, ϕ) = Rsv (r) − sin(ϕ) cos(ϕ)
(62) computer vision (Lindeberg, 2009).
then going back from scalars to vectors via
C. Experimental details
− sin(ϕ)
K vs (r, ϕ) = Rvs (r) (63) C.1. Model details:
cos(ϕ)
For ResNets, we follow the setup of Wang et al. (2021);
as a single convolution between vector fields, where the
Brandstetter et al. (2023); Gupta & Brandstetter (2022): the
combined kernel is given by:
ResNet baselines consist of 8 residual blocks, each com-
Σvv := K vs ∗ K sv (64) prising two convolution layers with 7 × 7 (or 7 × 7 × 7 for
3D) kernels, shortcut connections, group normalization (Wu
& He, 2018), and GeLU activation functions (Hendrycks
& Gimpel, 2016). We use two embedding and two output
= ∗ = layers, i.e., the overall architectures could be classified as
Res-20 networks. Following (Gupta & Brandstetter, 2022;
Brandstetter et al., 2023), we abstain from employing down-
projection techniques and instead maintain a consistent spa-
We can similar define a convolution going from vector to
tial resolution throughout the networks. The best models
pseudoscalar fields via
have approx. 7M parameters for Navier-Stokes and 1.5M
K pv (r, ϕ) = Rpv (r) cos(ϕ) sin(ϕ)
(65) parameters for Maxwell’s equations, in both 2D and 3D.
18
As mentioned earlier, multivector grades may in general not C.2. Optimization:
be irreducible, however, for (p, q) = (2, 0) they are.
19
There are two different O(2)-irreps corresponding to j = 0 For each experiment and each model, we tuned the learning
(trivial and sign-flip); see (Weiler et al., 2023)[Section 5.3.4]. rate to find the optimal value. Each model was trained until
18
Clifford-Steerable Convolutional Neural Networks
MSE (←)
done on a single node with 4 NVIDIA GeForce RTX 2080
0.002
Ti GPUs.
0.001
C.3. Datasets
Navier Stokes: We use the Navier-Stokes data from Gupta 0.000
& Brandstetter (2022), which is based on ΦFlow (Holl et al., Learned Fixed
2020). It is simulated on a grid with spatial resolution of Weights
128 × 128 pixels of size ∆x = ∆y = 0.25m and temporal
resolution of ∆t = 1.5s. For validation and testing, we Figure 9. Performance of CS-CNNs with freely learned weights in
k
the kernel head and such that ablate to fixed weights wmn,ij = 1.
randomly selected 1024 trajectories from corresponding
partitions.
19
Clifford-Steerable Convolutional Neural Networks
D. The Clifford Algebra In some sense, (Cl(V, η), •) is the biggest (non-
commutative, unital, associative) algebra (A, •) over R
For completeness purposes and to complement Section 2.3, that is generated by V and satisfies the relations v • v =
in this sections, we give a short and formal definition of η(v, v) · 1A for all v ∈ V .
the Clifford algebra. For this, we first need to introduce the
tensor algebra of a vector space. It turns out that (Cl(V, η), •) is of the finite dimension 2d
and carries a parity grading of algebras and a multivector
Definition D.1 (The tensor algebra). Let V be finite di-
grading of vector spaces, see (Ruhe et al., 2023b) Appendix
mensional R-vector space of dimension d. Then the tensor
D. More properties are also explained in Section 2.3.
algebra of V is defined as follows:
∞
From an abstract, theoretical point of view, the most impor-
tant property of the Clifford algebra is its universal property,
M
⊗m
Tens(V ) := V (69)
m=0
which fully characterizes it:
= span {v1 ⊗ · · · ⊗ vm | m ≥ 0, vi ∈ V } , Theorem D.3 (The universal property of the Clifford alge-
bra). Let (V, η) be a finite dimensional innner product space
where we used the following abbreviations for the m-times over R of dimension d. For every (non-commutative, unital,
tensor product of V for m ≥ 0: associative) algebra (A, ∗) over R and every R-linear map
f : V → A such that for all v ∈ V we have:
V ⊗m := V ⊗ · · · ⊗ V , V ⊗0 := R. (70)
| {z }
m-times f (v) ∗ f (v) = η(v, v) · 1A , (74)
Note that the above definition turns (Tens(V ), ⊗) into a there exists a unique algebra homomorphism (over R):
(non-commutative, infinite dimensional, unital, associative)
f¯ : (Cl(V, η), •) → (A, ∗), (75)
algebra over R. In fact, the tensor algebra (Tens(V ), ⊗) is,
in some sense, the biggest algebra generated by V . such that f¯(v) = f (v) for all v ∈ V .
We now have the tools to give a proper definition of the
Clifford algebra: Proof. The map f : V → A uniquely extends to an algebra
homomorphism on the tensor algebra:
Definition D.2 (The Clifford algebra). Let (V, η) be a finite
dimensional innner product space over R of dimension d. f ⊗ : Tens(V ) → A, (76)
The Clifford algebra of (V, η) is then defined as the following
quotient algebra: given by:
Cl(V, η) := Tens(V )/I(η), (71)
!
X
f⊗ ci · vi,1 ⊗ · · · ⊗ vi,li
I(η) := v ⊗ v − η(v, v) · 1Tens(V ) v ∈ V (72) i∈I
X
:= ci · f (vi,1 ) ∗ · · · ∗ f (vi,li ). (77)
n
:= span x ⊗ v ⊗ v − η(v, v) · 1Tens(V ) ⊗ y
i∈I
o
v ∈ V, x, y ∈ Tens(V ) , Because of Equation (74) we have for every v ∈ V :
f ⊗ v ⊗ v − η(v, v) · 1Tens(V )
where I(η) denotes the two-sided ideal of Tens(V ) gen-
erated by the relations v ⊗ v ∼ η(v, v) · 1Tens(V ) for all = f (v) ∗ f (v) − η(v, v) · 1A (78)
v ∈V.
= 0, (79)
The product on Cl(V, η) that is induced by the tensor prod-
uct ⊗ is called the geometric product • and will be denoted and thus:
as follows:
f ⊗ (I(η)) = 0. (80)
x1 • x2 := [z1 ⊗ z2 ], (73)
This shows that f ⊗ then factors through the thus well-
with the equivalence classes xi = [zi ] ∈ Cl(V, η), i = 1, 2. defined induced quotient map of algebras:
Note that, since I(η) is a two-sided ideal, the geomet- f¯ : Cl(V, η) = Tens(V )/I(η) → A (81)
ric product is well-defined. The above construction turns f¯([z]) := f ⊗ (z). (82)
(Cl(V, η), •) into a (non-commutative, unital, associative)
algebra over R. This shows the claim.
20
Clifford-Steerable Convolutional Neural Networks
Remark D.4 (The universal property of the Clifford alge- which on each output channel i ∈ [cout ] and grade compo-
bra). The universal property of the Clifford algebra can nent k = 0, . . . , d, was given by:
more explicitely be stated as follows: (k)
(k) (m) (n)
X
k
H(k )[ f ]i := j∈[cin ] wmn,ij · kij • fj ,
If f satisfies Equation (74) and x ∈ Cl(V, η), then we can m,n=0,...,d
take any representation of x of the following form:
with:
x=
X
ci · vi,1 • · · · • vi,li , (83)
k
wmn,ij ∈ R,
i∈I k = [ki,j ]i∈[cout ] ∈ Cl(Rp,q )cout ×cin ,
j∈ [cin ]
with any finite index sets I, any li ∈ N and any coefficients
c0 , ci ∈ R and any vectors vi,j ∈ V , j = 1, . . . , li , i ∈ I, f = [f1 , . . . , fcin ] ∈ Cl(Rp,q )cin .
and, then we can compute f¯(x) by the following formula:
X Clearly, H(k ) is a R-linear map (in f). Now let g ∈ O(p, q).
f¯(x) = ci · f (vi,1 ) ∗ · · · ∗ f (vi,li ), (84) We are left to check the following equivariance formula:
i∈I ?
H ρcClout ×cin (g)(k ) = ρHom (g) H(k )
(92)
and no ambiguity can occur for f¯(x) if one uses a different
such representation for x. := ρcClout (g) H(k ) ρcClin (g −1 ).
Example D.5. The universal property of the Clifford alge- We abbreviate
bra can, for instance, be used to show that the action of the
s := ρcClin (g −1 )( f ) ∈ Cl(Rp,q )cin ,
(pseudo-)orthogonal group:
Q := ρcClout ×cin (g)(k ) ∈ Cl(Rp,q )cout ×cin .
O(V, η) × Cl(V, η) → Cl(V, η), (85)
(g, x) 7→ ρCl (g)(x), (86) First note that we have for j ∈ [cin ]:
given by: ρCl (g)(sj ) = fj . (93)
!
X We then get:
ρCl (g) ci · vi,1 • · · · • vi,li
i∈I
h i(k)
ρHom (g) H(k ) [ f ]
i
X
:= ci · (gvi,1 ) • · · · • (gvi,li ), (87) h i(k)
= ρcClout (g) H(k ) ρcClin (g −1 )( f )
i∈I
i
is well-defined. For this one only would need to check h i(k)
cout
Equation (74) for v ∈ V : = ρCl (g) H(k ) [s]
i
(gv) • (gv) = η(gv, gv) · 1Cl(V,η) (88)
(k)
= ρCl (g) H(k ) [s] i
= η(v, v) · 1Cl(V,η) , (89) !
(k)
(m) (n)
X
k
where the first equality holds by the fundamental relation = ρCl (g) j∈[cin ] wmn,ij · kij •s
j
of the Clifford algebra and where the last equality holds m,n=0,...,d
by definition of O(V, η) ∋ g. So the linear map g : V → X
k
(m) (n) (k)
Cl(V, η), by the universal property of the Clifford algebra, = wmn,ij · ρCl (g)(kij ) • ρ (g)(sj )
Cl
j∈[cin ]
thus uniquely extends to the algebra homomorphism: m,n=0,...,d
21
Clifford-Steerable Convolutional Neural Networks
22
Clifford-Steerable Convolutional Neural Networks
following we always make the implicit assumption of its Remark F.8 (Isometry action). For a G-associated vector
existence and we also fix a specific choice. bundle A = GM ×ρ Rn and ϕ ∈ Isom(M, G, η) we can de-
Definition F.4 (Isometry group of a G-structured pseu- fine the induced G-associated vector bundle automorphism
do-Riemannian manifold). Let (M, G, η) be a G-structured ϕ∗,A on A as follows:
pseudo-Riemannian manifold. Its (G-structure preserving) ϕ∗,A : A → A, (104)
isometry group is defined to be:
ϕ∗,A (e, v) := (ϕ∗,GM (e), v) . (105)
Isom(M, G, η) With this we can define a left action of the group
∼
:= ϕ : M − → M diffeo | ∀z ∈ M, v ∈ Tz M. Isom(M, G, η) on the corresponding space of feature fields
ηϕ(z) (ϕ∗,TM (v), ϕ∗,TM (v)) = ηz (v, v), Γ(A) as follows:
ϕ∗,FM (Gz M ) = Gϕ(z) M . (98) ▷ : Isom(M, G, η) × Γ(A) → Γ(A), (106)
−1
ϕ ▷ f := ϕ∗,A ◦ f ◦ ϕ : M → A. (107)
The intuition here is that the first condition constrains ϕ to
be an isometry w.r.t. the metric η. The second condition To construct a well-behaved convolution operator on M we
constrains ϕ to be a symmetry of the G-structure, i.e. it first need to introduce the idea of a transporter of feature
maps G-frames to G-frames. fields along a curve γ : I → M .
Remark F.5 (Isometry group). Recall that the (usual/full) Remark F.9 (Transporter). A transporter TA on the vector
isometry group of a pseudo-Riemannian manifold (M, η) is bundle A over M takes any (sufficiently smooth) curve
defined as: γ : I → M with I ⊆ R some interval and two points
s, t ∈ I, s ≤ t, and provides an invertible linear map:
Isom(M, η) ∼
∼ Ts,t
A,γ : Aγ(s) −
→ Aγ(t) , v 7→ Ts,t
A,γ (v). (108)
:= ϕ : M − → M diffeo | ∀z ∈ M, v ∈ Tz M.
ηϕ(z) (ϕ∗,TM (v), ϕ∗,TM (v)) = ηz (v, v) . (99) TA is thought to transport the vector v ∈ Aγ(s) at location
γ(s) ∈ M along the curve γ to the location γ(t) ∈ M and
Also note that for a G-structured pseudo-Riemannian man- outputs a vector ṽ = Ts,t
A,γ (v) in Aγ(t) .
ifold (M, G, η) of signature (p, q) such that O(p, q) ≤ G For consistency we require that TA satisfies the following
we have: points for such γ:
Isom(M, G, η) = Isom(M, η). (100) ! ∼
1. For s ∈ I we get: Ts,s
A,γ = idAγ(s) : Aγ(s) −
→ Aγ(s) ,
Definition F.6 (G-associated vector bundle). Let (M, G, η) 2. For s ≤ t ≤ u we have:
be a G-structured pseudo-Riemannian manifold and let
ρ : G → GL(n) be a left linear representation of G. A ! ∼
Tt,u s,t s,u
A,γ ◦ TA,γ = TA,γ : Aγ(s) −
→ Aγ(u) . (109)
vector bundle A over M is called a G-associated vector
bundle (with typical fibre (Rn , ρ)) if there exists a vector Furthermore, the dependence on s, t and γ shall be “suffi-
bundle isomorphism over M of the form: ciently smooth” in a certain sense.
∼ We call a transporter TTM on the tangent bundle TM a
→ (GM × Rn ) /∼ρ =: GM ×ρ Rn ,
A− (101)
metric transporter if the map:
where the equivalence relation is given as follows: ∼
Ts,t
TM,γ : (Tγ(s) M, ηγ(s) ) −
→ (Tγ(t) M, ηγ(t) ) (110)
′ ′
(e , v ) ∼ρ (e, v) is always an isometry.
: ⇐⇒ ∃g ∈ G. (e′ , v ′ ) = (e ◁ g, ρ(g −1 )v). (102)
To construct transporters we need to introduce the notion
Definition F.7 (Global sections of a fibre bundle). Let πA : of a connection on a vector bundle, which formalized how
A → M be a fibre bundle over M . We denote the set of vector fields change when moving from one point to the
global sections of A as: next.
Definition F.10 (Connection). A connection on a vector
Γ(A) := {f : M → A | ∀z ∈ M. f (z) ∈ Az } , (103) bundle A over M is an R-linear map:
−1
where Az := πA (z) denotes the fibre of A over z ∈ M . ∇ : Γ(A) → Γ(T∗ M ⊗ A), (111)
23
Clifford-Steerable Convolutional Neural Networks
where ∂X c denotes the directional derivative of c along X. 2. Furthermore, the Levi-Cevita transporter extends to
Remark F.12. Certainly, an affine connection can also be every G-associated vector bundle A as TA .
re-written in the usual connection form:
3. For every G-associated vector bundle A, every curve
∇ : Γ(TM ) → Γ(T∗ M ⊗ TM ). (115) γ : I → M and ϕ ∈ Isom(M, G, η), the Levi-Cevita
transporter TA,γ always satisfies:
Every connection defines a (parallel) transporter TA .
ϕ∗,A ◦ TA,γ = TA,ϕ◦γ ◦ ϕ∗,A . (120)
Definition/Lemma F.13 (Parallel transporter of a connec-
tion). Let ∇ be a connection on the vector bundle A Definition F.16 (Geodesics). Let M be a manifold with
over M . Then ∇ defines a (parallel) transporter TA for affine connection ∇ and γ : I → M a curve. We call γ a
γ : I = [s, t] → M as follows: geodesic of (M, ∇) if for all t ∈ I we have:
∼
Ts,t
A,γ : Aγ(s) −
→ Aγ(t) , v 7→ f (t), (116) ∇γ̇(t) γ̇(t) = 0, (121)
where f is the unique vector field f ∈ Γ(γ ∗ A) with: i.e. if γ runs parallel to itself.
For pseudo-Riemannian manifolds (M, η) we will typically
1. (γ ∗ ∇)(f ) = 0,
use the Levi-Cevita connection ∇LC to define geodesics.
2. f (s) = v, Definition/Lemma F.17 (Pseudo-Riemannian exponential
map). For a manifold M with affine connection ∇, z ∈ M
which always exists. Here γ ∗ denotes the corresponding and v ∈ Tz M there exists a unique geodesic γz,v : I =
pullback from M to I. (−s, s) → M of (M, ∇) with maximal domain I such that:
For pseudo-Riemannian manifolds there is a “canonical” γz,v (0) = z, γ̇z,v (0) = v. (122)
choice of a metric connection, the Levi-Cevita connection,
which always exists and is uniquely characterized by its two The ∇-exponential map at z ∈ M is then the map:
main properties.
expz : T◦z M → M, expz (v) := γz,v (1), (123)
Definition/Theorem F.14 (Fundamental theorem of pseu-
do-Riemannian geometry: the Levi-Civita connection). Let with domain:
(M, η) be a pseudo-Riemannian manifold. Then there ex-
ists a unique affine connection ∇ on (M, η) such that the T◦z M := {v ∈ Tz M | γz,v (1) is defined} . (124)
following two conditions hold for all X, Y, Z ∈ Γ(TM );
For pseudo-Riemannian manifolds (M, η) we will call the
1. metric preservation: exponential map expz defined via the Levi-Cevita con-
nection ∇LC the pseudo-Riemannian exponential map of
∂Z (η(X, Y )) = η(∇Z X, Y ) + η(X, ∇Z Y ). (117) (M, η) at z ∈ M .
24
Clifford-Steerable Convolutional Neural Networks
Remark F.18. For a pseudo-Riemannian manifold (M, η) (M, G, η) be a G-structured pseudo-Riemannian manifold
the differential d expz |v : Tv Tz M → Texpz (v) M is the of signature (p, q), d = p + q, and Ain and Aout two G-
identity map on Tz M at v = 0 ∈ Tz M : d expz |v=0 =
! associated vector bundles with typical fibre (Win , ρin ) and
idTz M : Tz M = T0 Tz M → Texpz (0) M = Tz M . (Wout , ρout ), resp. A template convolution kernel K for
(M, Ain , Aout ):
Furthermore, there exist an open subset Uz ⊆ Tz M such
that 0 ∈ Uz and expz : Uz → expz (Uz ) ⊆ M is a diffeo- K : Rd → HomVec (Win , Wout ), (130)
morphism and expz (Uz ) ⊆ M is an open subset. will be called G-steerable if for all g ∈ G and v ∈ Rd we
Notation F.19. For a transporter TA for a vector bundle have:
on (M, ∇) we abbreviate for z ∈ M and v ∈ T◦z M : 1
K(gv) = ρout (g) K(v) ρin (g)−1 (131)
∼ | det g|
Tz,v := TA,γz,v
− : Aexpz (v) −
→ Az , (125)
=: ρHom (g)(K(v)). (132)
− −
where γz,v : [0, 1] → M is given by γz,v (t) := expz ((1 −
Remark F.24. Note that the G-steerability of K is ex-
t) · v).
pressed through Equation (131), while the G-gauge equiv-
Definition F.20 (Transporter pullback, see Weiler et al. ariance of K will, more closely, be expressed through the
(2023) Def. 12.2.4). Let (M, η) be a pseudo-Riemannian re-interpretation in Equation (132).
manifold and A a vector bundle over M . Furthermore, Definition F.25 (Convolution operator, see Weiler et al.
let expz denote the pseudo-Riemannian exponential map (2023) Thm. 12.2.9). Let (M, G, η) be a G-structured
(based on the Levi-Civita connection) and TA any trans- pseudo-Riemannian manifold and Ain and Aout two G-
porter on A. We then define the transporter pullback: associated vector bundles over M with typical fibres
Exp∗z : Γ(A) → C(T◦z M, Az ), (126) (Win , ρin ) and (Wout , ρout ) and K a G-steerable template
convolution kernel, see Equation (131). Let fin ∈ Γ(Ain )
Exp∗z (f )(v)
:= Tz,v f (expz (v)) ∈ Az . (127) and consider a local trivialization (ΨC , U C ) ∈ AG around
| {z }
∈Aexpz (v) z ∈ U C ⊆ M (which locally trivializes Ain and Aout ).
Then we have a well-defined convolution operator:
Lemma F.21 (See Weiler et al. (2023) Thm. 13.1.4). For
G-structured pseudo-Riemannian manifold (M, G, η) and L : Γ(Ain ) → Γ(Aout ), fin 7→ L(fin ) := fout , (133)
G-associated vector bundle A, z ∈ M , ϕ ∈ Isom(M, G, η) given by the local formula:
and f ∈ Γ(A) we have: Z
C
K(v C ) [Exp∗z fin ]C (v C ) dv C ,
fout (z) := (134)
Exp∗z (ϕ ▷ f ) = ϕ∗,A ◦ [Exp∗ϕ−1 (z) (f )] ◦ ϕ−1
∗,TM , (128) Rd
provided the transporter map TA satisfies Equation (120). where Exp∗z is the transporter pullback from Definition F.20,
where expz denotes the pseudo-Riemannian exponential
Weight sharing for the convolution operator I boils down to map (based on the Levi-Cevita connection ∇LC ) and TAin
the use of a template convolution kernel K, which is then any transporter satisfying Equation (120) (e.g. parallel
applied/re-used at every location z ∈ M . transport based on ∇LC ).
Definition F.22 (Template convolution kernel). Let M be Remark F.26 (Coordinate independence of the convolution
a manifold of dimension d and Ain and Aout two vector operator). The coordinate independence of the convolution
bundles over M with typical fibres Win and Wout , resp. A operator L : Γ(Ain ) → Γ(Aout ) comes from the following
template convolution kernel for (M, Ain , Aout ) is then a covariance relations and Equation (131).
(sufficiently smooth, non-linear) map: If we use a different local trivialization (ΨB , U B ) ∈ AG
K : Rd → HomVec (Win , Wout ), (129) in Equation (134) with z ∈ U B ∩ U C then there exists a
g ∈ G such that:
that is sufficiently decaying when moving away from the ori-
v C = g v B ∈ Rd , (135)
gin 0 ∈ Rd (to make all later constructions, like convolution
C B
operations, etc., well-defined). dv = | det g| · dv , (136)
[Exp∗z fin ]C (v C ) = ρin (g) [Exp∗z fin ]B (v B ) ∈ Win ,
The G-gauge equivariance of a convolution operator I is (137)
encoded by the following G-steerability of the template
C B
convolution kernel. fout (z) = ρout (g)fout (z) ∈ Wout . (138)
Definition F.23 (G-steerability convolution kernel con- So, fout : M → Aout is a well-defined global section in
straints). Let G ≤ GL(d) be a closed subgroup and Γ(Aout ).
25
Clifford-Steerable Convolutional Neural Networks
We are finally in the place to state the main theorem of this Definition F.29 (Othonormal frame bundle of signature
section, stating that every G-steerable template convolution (p, q).). Let (M, η) be a pseudo-Riemannian manifold of
kernel leads to an isometry equivariant convolution operator. signature (p, q) and dimension d = p + q. Abbreviate for
Theorem F.27 (Isometry equivariance of convolution op- indices i, j ∈ [d]:
erator, see Weiler et al. (2023) Thm. 13.2.6). Let G ≤
GL(d) be closed subgroup and (M, G, η) be a G-structured 0
if i ̸= j,
p,q
pseudo-Riemannian manifold of signature (p, q) with d = δi,j := +1 if i = j ∈ [1, p], (141)
p + q. Let Ain and Aout be two G-associated vector bun-
−1 if i = j ∈ [p + 1, d].
dles with typical fibres (Win , ρin ) and (Wout , ρout ). Let
K be a G-steerable template convolution kernel, see Equa- Then the orthonormal frame bundle of signature (p, q) is
tion (131). Consider the corresponding convolution op- defined as:
erator L : Γ(Ain ) → Γ(Aout ) given by Equation (134),
where expz denotes the pseudo-Riemannian exponential G
OM := Oz M, (142)
map (based on the Levi-Cevita connection ∇LC ) and TAin
z∈M
any transporter satisfying Equation (120) (e.g. parallel
transport based on ∇LC ). where we put:
Then the convolution operator L : Γ(Ain ) → Γ(Aout ) is n
equivariant w.r.t. the G-structure preserving isometry group Oz M := [e1 , . . . , ed ] ∀j ∈ [d]. ej ∈ Tz M, (143)
Isom(M, G, η): for every ϕ ∈ Isom(M, G, η) and fin ∈ p,q
o
Γ(Ain ) we have: ∀i, j ∈ [d]. ηz (ei , ej ) = δi,j . (144)
26
Clifford-Steerable Convolutional Neural Networks
We can now describe the induced map ϕ∗,Cl(TM,η) via the Furthermore, the corresponding convolution operator L :
general construction on associated vector fields, see Re- Γ(Ain ) → Γ(Aout ), given by Equation (134), is equivariant
mark F.8, with help of the identification Equation (147): w.r.t. the full isometry group Isom(M, η): for every ϕ ∈
Isom(M, η) and fin ∈ Γ(Ain ) we have:
ϕ∗,Cl(TM,η) : OM ×ρCl Cl(Rp,q ) → OM ×ρCl Cl(Rp,q ),
ϕ∗,Cl(TM,η) (e, x) = (ϕ∗,FM (e), x), L(ϕ ▷ fin ) = ϕ ▷ L(fin ). (156)
(149)
Remark F.34. A similar theorem to Theorem F.33 can be
or we can look at the fibres directly, z ∈ M : stated for orientable pseudo-Riemannian manifolds (M, η)
and structure group G = SO(p, q), if one reduces the Clif-
ϕ∗,Cl(TM,η),z : Cl(Tz M, ηz ) → Cl(Tϕ(z) M, ηϕ(z) ),
! ford group equivariant neural network parameterizing the
X kernel network K to be (only) SO(p, q)-equivariant.
ϕ∗,Cl(TM,η),z ci · vi,1 • · · · • vi,ki
i∈I
X
= ci · ϕ∗,TM,z (vi,1 ) • · · · • ϕ∗,TM,z (vi,ki ). (150)
i∈I
27