0% found this document useful (0 votes)
19 views71 pages

Data-Driven Approaches To Inverse Problems

Uploaded by

L rayappan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views71 pages

Data-Driven Approaches To Inverse Problems

Uploaded by

L rayappan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

arXiv:2506.11732v1 [math.

NA] 13 Jun 2025

CIME 2023

Machine Learning: From Data to Mathematical Understanding

Data-driven approaches to inverse problems

Carola-Bibiane Schönlieb and Zakhar Shumaylov

June 16, 2025

University of Cambridge
Abstract
Inverse problems are concerned with the reconstruction of unknown physical quantities
using indirect measurements and are fundamental across diverse fields such as medical
imaging (MRI, CT), remote sensing (Radar), and material sciences (electron microscopy).
These problems serve as critical tools for visualizing internal structures beyond what is
visible to the naked eye, enabling quantification, diagnosis, prediction, and discovery.
However, most inverse problems are ill-posed, necessitating robust mathematical treat-
ment to yield meaningful solutions. While classical approaches provide mathematically
rigorous and computationally stable solutions, they are constrained by the ability to ac-
curately model solution properties and implement them efficiently.

A more recent paradigm considers deriving solutions to inverse problems in a data-driven


manner. Instead of relying on classical mathematical modeling, this approach utilizes
highly over-parameterized models, typically deep neural networks, which are adapted to
specific inverse problems using carefully selected training data. Current approaches that
follow this new paradigm distinguish themselves through solution accuracy paired with
computational efficiency that was previously inconceivable.

These notes offer an introduction to this data-driven paradigm for inverse problems, cov-
ering methods such as data-driven variational models, plug-and-play approaches, learned
iterative schemes (also known as learned unrolling), and learned post-processing. The
first part of these notes will provide an introduction to inverse problems, discuss classical
solution strategies, and present some applications. The second part will delve into modern
data-driven approaches, with a particular focus on adversarial regularization and provably
convergent linear plug-and-play denoisers. Throughout the presentation of these method-
ologies, their theoretical properties will be discussed, and numerical examples will be
provided for image denoising, deconvolution, and computed tomography reconstruction.
The lecture series will conclude with a discussion of open problems and future perspectives
in the field.

i
ii Abstract
Contents

1 Introduction to Inverse Problems 1


1.0.1 Examples of Inverse problems . . . . . . . . . . . . . . . . . . . . . . 2
1.1 Well-posed and ill-posed problems . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Overcoming the ill-posedness . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.1 Regularization in functional analysis . . . . . . . . . . . . . . . . . . 10
1.2.2 Variational Regularization . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.3 Regularization as maximum a-posteriori estimation . . . . . . . . . . 11

2 Variational Models and PDEs for Inverse Imaging 13


2.1 Variational Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Total Variation (TV) regularization . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 From Total Variation Regularization to Nonlinear PDEs . . . . . . . . . . . 19
2.4 Numerical Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.1 Dual Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.2 Proximal descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4.3 Forward-Backward Splitting . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.4 Primal-Dual Hybrid Gradient descent . . . . . . . . . . . . . . . . . 24
2.5 Regularizer Zoo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.5.1 Regularizer Zoo: Electric boogaloo . . . . . . . . . . . . . . . . . . . 26
2.6 Limitations and move towards data-driven . . . . . . . . . . . . . . . . . . . 26

3 Data-Driven Approaches to Inverse Problems 27


3.0.1 Knowledge-Driven vs. Data-Driven Models . . . . . . . . . . . . . . 27
3.0.2 The Black Box of Deep Learning . . . . . . . . . . . . . . . . . . . . 29
3.1 Learned Iterative Reconstruction Schemes . . . . . . . . . . . . . . . . . . . 31
3.1.1 Limitations and Challenges . . . . . . . . . . . . . . . . . . . . . . . 33
3.1.2 Deep Equilibrium Networks . . . . . . . . . . . . . . . . . . . . . . . 34
3.2 Learned Variational Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.1 Learning the regularizer . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.2 In-Depth: Adversarial regularization . . . . . . . . . . . . . . . . . . 36
3.3 Plug-and-Play (PnP) Methods . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3.1 Theoretical Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.2 In-depth: Linear Denoiser Plug and Play . . . . . . . . . . . . . . . 40
3.4 Outlooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4 Perspectives 45
4.1 On Task Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 The Data Driven - Knowledge Informed Paradigm . . . . . . . . . . . . . . 47

Bibliography 49

iii
iv Abstract

List of Figures
1.1 An overview of various fundamental image processing tasks. . . . . . . . . . 1
1.2 Overview of various biological, biomedical and clinical research applications
using image analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Applications in conservation, sustainability, and digital humanities, show-
casing various remote sensing and image analysis techniques. . . . . . . . . 5
1.4 Applications in Physical Sciences, including materials science, computa-
tional fluid dynamics, astrophysics, and geophysics. . . . . . . . . . . . . . . 6
1.5 Illustration of non-uniqueness in CT reconstruction (left) and a description
of ill-posedness in inverse problems (right). Courtesy of Samuli Siltanen. . . 10
1.6 Regularization visualized. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1 Examples of different noise models and corresponding data fidelity terms,
with example images. See works by Werner and Hohage [2012], Hohage
and Werner [2014]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Example of TV denoised image of rectangles. The total variation penal-
izes small irregularities/oscillations while respecting intrinsic image features
such as edges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Properties of Total Variation Smoothing . . . . . . . . . . . . . . . . . . . . 16
2.4 Comparison of regularization methods. This showcase that convex relax-
ation of l0 of TV with l1 successfully achieves sparsity and is a more natural
prior for denoising. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 MRI reconstruction example: (a) Ground truth Shepp-Logan phantom. (b)
Undersampled k-space (Fourier) data. (c) Reconstruction via zero-filling
the undersampled k-space and inverse Fourier transform. (d) Reconstruc-
tion using a Total Variation (TV) regularized approach. . . . . . . . . . . . 17
2.6 Example of binary Chan–Vese segmentation compared to Mumford-Shah
segmentation. [Mumford and Shah, 1989, Pock et al., 2009, Getreuer, 2012]. 18

3.1 Limited angle CT reconstruction: Heavily ill-posed problem. Deep Learn-


ing cannot do magic and also hits boundaries of what is mathematically
possible. A fully learned method LPD (Section 3.1) in d begins halluci-
nating, as highlighted in red boxes, despite resulting in better performance
metrics (here PSNR and SSIM). . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Sparse view CT reconstruction: top row is based on mathematical/hand-
crafted models; bottom row is using novel deep learning based models. For
this problem, deep learning methods result in both improved metrics (here
PSNR and SSIM) and visually better reconstructions. Photo courtesy of
Subhadip Mukherjee [Mukherjee et al., 2023]. . . . . . . . . . . . . . . . . . 28
3.3 Learned Iterative Schemes Schematic. . . . . . . . . . . . . . . . . . . . . . 31
3.4 Learned Primal Dual Schematic. . . . . . . . . . . . . . . . . . . . . . . . . 33
v

3.5 Illustration of what artifacts appearing whenever learned operators are ap-
plied repeatedly without convergence guarantees. Example borrowed from
[Gilton et al., 2021a]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6 Comparison of CT reconstructions: (a) a good quality reconstruction, (b)
the corresponding sinogram data, and (c) a poor quality reconstruction. . . 36
3.7 Diagram illustrating the concept of spectral filtering. From [Hauptmann
et al., 2024]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.8 Spectral Filtering for Convergent Regularisation . . . . . . . . . . . . . . . 42

4.1 Biomedical imaging pathway: The path from imaging data acquisition
to prediction; diagnosis; treatment planning, features several processing
and analysis steps which usually are performed sequentially. CT data
and segmentation are courtesy of Evis Sala and Ramona Woitek. . . . . . . 45
4.2 Task-adapted reconstruction, with CNN-based MRI reconstruction (task
X) and CNN-based MRI segmentation (task D). Both are trained jointly
with combined loss CℓX + (1 − C)ℓD for varying C ∈ [0, 1]. All figures from
[Adler et al., 2022]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
vi Abstract
Chapter 1

Introduction to Inverse Problems


Inverse problems arise in a wide variety of scientific fields, from medical imaging and geo-
physics to finance and astronomy, where one is faced with the task of inferring information
about an unknown object of interest from observed indirect measurements. Common fea-
tures of inverse problems include the need to understand indirect measurements, and to
overcome extreme sensitivity to noise and inaccuracies arising due to imperfect modeling.
Knowledge-driven approaches traditionally dominated the field, relying on first-principles
to derive physical models, leading to mathematically grounded reconstruction methods.
However, the past decade has witnessed a paradigm shift towards data-driven methods,
particularly with the advent of deep learning. While these data-driven approaches have
achieved remarkable empirical success in image reconstruction, they often lack rigorous
theoretical guarantees. In these notes we will examine the data-driven paradigm through
a mathematical lens, presenting some state of the art methods, covering their provable
properties and theoretical limitations.

(a) Image Denoising: Given noisy image y = u + n, (b) Image Segmentation: Given image u on domain Ω,
the task is to compute a denoised version u∗ . [Ke and compute the characteristic function χS of a region of
Schönlieb, 2021] interest S ⊂ Ω [Grah et al., 2017]1 .

(c) Image Reconstruction: Compute u from indirect, (d) Image Classification: Given a set of images ui ,
noisy measurements y = A(u) + n, with A known oper- the task is to assign appropriate labels yi to each image
ator [Benning and Burger, 2018, Arridge et al., 2019]. [Aviles-Rivero et al., 2019].

Fig. 1.1: An overview of various fundamental image processing tasks.

In this chapter, we begin by exploring the concept of well-posedness and its significance in
the context of inverse problems. This will lead us to the notion of ill-posedness of inverse
problems. These are often characterized by high sensitivity to noise, meaning even small
errors or perturbations in the input data can lead to large variations in the solution. We
will then investigate a range of knowledge-driven regularization techniques designed to
mitigate the effects of ill-posedness and stabilize the solution process.

1
2 Introduction to Inverse Problems

1.0.1 Examples of Inverse problems


Before delving into the mathematical details, it’s important to stress that the concepts
discussed here are deeply rooted in real-world applications. Inverse problems form the fun-
damental language of numerous scientific and engineering disciplines, with some examples
illustrated in Figures 1.2 to 1.4. To illustrate their complexity and wide-ranging applica-
bility, let’s consider several notable examples, mostly focusing on imaging problems, which
often focus on four fundamental tasks illustrated in Figure 1.1.

• Bio-Medical Imaging (Computed Tomography, MRI): Techniques such as Computed


Tomography (CT) [Natterer and Wübbeling, 2001] and Magnetic Resonance Imag-
ing (MRI) [Lustig et al., 2008, Fessler, 2008] serve as fundamental examples of in-
verse problems in action. These methods reconstruct detailed internal images of the
human body from externally collected data, such as X-ray attenuation profiles or
magnetic resonance signals, see e.g. Figure 1.1c. The challenge lies in accurately de-
termining internal structures from these measurements, which are often incomplete
or corrupted by noise due to various factors like patient movement or limitations
on radiation exposure. Beyond these foundational examples, the scope of inverse
problems in biomedical and biological imaging is vast, with various modalities and
objectives, as illustrated in Figure 1.2. This diversity includes other primary imag-
ing modalities like Positron Emission Tomography (PET) (Figure 1.2a), as well as
more complex imaging scenarios such as spatio-temporal MRI, which incorporates
additional dynamical information to capture changes over time (Figure 1.2d). Fur-
thermore, solutions of inverse problems are often used for subsequent quantitative
analysis and clinical decision-making. Examples of such downstream applications in-
clude the detailed analysis of cellular processes, such as mitosis analysis (Figure 1.2c)
and the estimation of cell dynamics (Figure 1.2b), as well as clinical support tools
like automated tumor segmentation (Figure 1.2e) and systems for aiding diagnosis
or prognosis from X-ray data (Figure 1.2f). The applications of inverse problem
extend even further into broader biological research, including studies in zoology
(Figure 1.2g) and molecular biology (Figure 1.2h), underscoring their wide-ranging
impact.

• General Image Processing: Image processing more generally involves solving mul-
tiple intertwined inverse problems, with many applications spanning environmental
conservation, remote sensing, an digital humanities, as illustrated in Figure 1.3. For
instance, in conservation and environmental science, LiDAR data can be used for
detailed tree monitoring and forest assessment (Figure 1.3a), or multispectral and
hyperspectral imagery for landcover analysis (Figure 1.3b). Remote sensing data,
coupled with image processing, also aids in understanding dynamical systems, such
as analyzing traffic flow for urban planning and infrastructure management (Fig-
ure 1.3c). Multi-modal image fusion (Figure 1.3d), with data from different sensors,
can be used for improving data representations, applicable in many fields ranging
from remote sensing to medical imaging. In the realm of digital humanities, compu-
tational image processing plays a vital role for virtual art restoration and interpreta-
tion (Figure 1.3e), where imaging can help unveil hidden details, analyze materials,
or digitally restore damaged cultural heritage artifacts. These varied applications
all rely on extracting meaningful information from image data, often necessitating
a range of image processing steps such as image reconstruction, enhancement, seg-
mentation, feature extraction, deblurring, denoising and registration, many of which
3

(a) PET imaging: (b) Estimating cell dynamics:


[Ehrhardt et al., 2019, Chambolle et al., 2018] [Drechsler et al., 2020]

(c) Mitosis analysis: (d) Spatio-temporal MRI: [Aviles-Rivero et al., 2018, (e) Tumour segmentation:
[Grah et al., 2017] 2021] [Buddenkotte et al., 2023]

UNABELLED DATA LABELLED DATA HERNIA CARDIOMEGALY

EFFUSION ATELECTASIS

GRAPH CLASSIFIER
CONSTRUCTION OUTPUT EMPHYSEMA PNEUMONIA

MASS INFILTRATION

X-RAY DATASET
INITIAL FINAL
GRAPH GRAPH

(f) X-Ray diagnosis and prognosis: [Aviles-Rivero et al., 2019]

(g) Zoology: (h) Molecular biology:


[Calatroni et al., 2017b] [Diepeveen et al., 2023, Esteve-Yagüe et al., 2023]

Fig. 1.2: Overview of various biological, biomedical and clinical research applications using image analysis.
4 Introduction to Inverse Problems

can be formulated as inverse problems.

• Physical Sciences: Inverse problems are foundational across the physical sciences,
enabling researchers to probe and understand phenomena from the vastness of cos-
mic structures down to the intricacies of material microstructures, as illustrated in
Figure 1.4. In astrophysics, for instance, inverse problems arise when reconstructing
images of distant celestial objects from data collected by telescopes [Starck, 1998].
This process is complicated by vast distances, interference from various light sources,
faint signals, and atmospheric disturbances. A particularly famous example is the
first imaging of a black hole (Figure 1.4c) [Akiyama et al., 2019].

Similarly, geophysics utilizes seismic tomography (Figure 1.4d) to create images of


the Earth’s subsurface by analyzing seismic waves from earthquakes or controlled
sources [Biegler et al., 2011, 2003, Haber, 2014]. The inherent ill-posedness of this
problem, due to complex geological layers, noisy and limited view data, necessitates
regularization methods. The applications of inverse problem methodologies also ex-
tend to material sciences, where they are used to characterize material properties
or analyze microstructures from indirect measurements (Figure 1.4a) [Tovey et al.,
2019], and to computational fluid dynamics, for estimating flow parameters or re-
constructing complex flow fields from limited sensor data (Figure 1.4b) [Benning
et al., 2014].
5

(a) Tree monitoring w/ LiDAR: [Lee et al., 2015, 2016] (b) Landcover analysis: [Sellars et al., 2019]

(c) Analysing traffic: EPSRC project (d) Multi-modal image fusion: [Bungert et al., 2018]

(e) Virtual art restoration and interpretation: Mathematics for Applications in Cultural Heritage (MACH)
https://mach.maths.cam.ac.uk funded by the Leverhulme Trust; [Calatroni et al., 2018, Parisotto et al., 2019,
2020]

Fig. 1.3: Applications in conservation, sustainability, and digital humanities, showcasing various remote
sensing and image analysis techniques.
6 Introduction to Inverse Problems

(a) Material Sciences: [Tovey et al., 2019] (b) CFD : [Benning et al., 2014]

(c) Black Hole Imaging [Akiyama et al., 2019]

(d) Seismic Tomography: EarthScope Consortium

Fig. 1.4: Applications in Physical Sciences, including materials science, computational fluid dynamics,
astrophysics, and geophysics.
1.1 Well-posed and ill-posed problems 7

1.1 Well-posed and ill-posed problems


An inverse problem is the problem of finding u satisfying the equation

y = Au,

where y ∈ Rm is given. We refer to y as observed data or measurement, and u as an


unknown. The physical phenomena that relates the unknown and the measurement is
modeled by a linear operator A. The function spaces for u and A will be specified later
when particular inverse problems or reconstruction approaches are being discussed. In
reality, the perfect data is perturbed by noise. For an additive noise model, we observe
measurements of the form
yn = Au + n, (1.1)
where n ∈ Rm represents the observational noise. The main question that we concern
ourselves with is: “Can we always compute a reliable answer u?” We are often interested
in ill-posed inverse problems, where the inverse problem is more difficult to solve than the
forward problem of finding yn when u is given. To explain what we mean by ill-posed, we
first need to introduce well-posedness as defined by Hadamard [1902]:

Definition 1.1.1 (Well-posed problem). A problem is called well-posed if:

• There exists at least one solution. (Existence)


• There is at most one solution. (Uniqueness)
• The solution depends continuously on data. (Stability)

While the forward problem is generally assumed to be well-posed, inverse problems are
typically ill-posed, meaning they violate at least one of the conditions for well-posedness.
This focus on ill-posedness arises from physical motivations as opposed to as an abstract
concern; it stems directly from the fact that a vast majority of practical problems in science
and engineering are indeed ill-posed. The following simple examples illustrate common
issues that arise when these conditions are not met.

Example 1.1.2. • (Non-existence) Assume that n < m and A : Rn → ran(A) ⊊


Rm , where the range of A is a proper subset of Rm . Because of the noise in the
measurement, it may end up that yn ∈ / ran(A) so that inverting A is not possible.
Note that usually only the statistical properties of the noise n are known, so we can
not simply subtract it.

• (Non-uniqueness) Assume next that n > m and A : Rd → Rm , in which case the


system is under-determined. We then have more unknowns than equations which
means that there may be infinitely many solutions.

• (Instability) Finally suppose that n = m, and that there exists an inverse A−1 :
Rm → Rn . Suppose further that the condition number κ = λ1 /λm is very large,
where λ1 and λm are the biggest and smallest eigenvalues of A. Such a matrix is
said to be ill-conditioned. In this case, the problem is sensitive to even the smallest
errors in the measurement. The naive reconstruction ũ = A−1 yn = u + A−1 n would
not produce a meaningful solution, but would instead be dominated by the noise
component A−1 n.
8 Introduction to Inverse Problems

The last example illustrates one of the key questions of inverse problem theory:

How to stabilize the reconstruction while maintaining accuracy?

Example 1.1.3 (Blurring in the continuum). One common and illustrative inverse prob-
lem encountered in image processing is deblurring. Imagine we have an image that has
been blurred, perhaps by camera motion or an out-of-focus lens. Our goal is to recover the
original, sharp image. This seemingly straightforward task quickly reveals the challenges
inherent in many inverse problems.

Let us consider this in a continuous one-dimensional setting. Suppose our observed blurred
function, y(x) : R → R, results from the convolution of an original sharp function, u(x),
with a blurring kernel, i.e. y(x) = (Gσ ∗ u) (x), where

1 −|x|2 /(2σ2 )
Gσ (x) := e = Gaussian kernel
2πσ 2
with standard deviation σ, dictating the extent of the Gaussian blur. Our objective is
to reconstruct the original sharp function u from the observed blurred function y. This
turns out to be equivalent to inverting the heat equation. To be precise, the blurred
measurement y(x) can be seen as a solution to the heat equation at a specific time t = σ 2 /2,
with u(x) being the initial condition. Therefore, attempting to retrieve u(x) from y(x) is
analogous to solving the heat equation backward in time. Ill-posedness in this example
arises from a lack of continuous dependence of the solution on the data: small errors in
the measurement y can lead to very large errors in the reconstructed u. From Fourier
theory, we can write the following, where F and F −1 denote the Fourier transform and
inverse Fourier transform respectively:

y= 2πF −1 (FGσ Fu)
1 Fy
u = √ F −1 .
2π FGσ

Instead of measuring a blurry y, suppose now that we measure a blurry and noisy yδ =
y + nδ , with deblurred solution uδ . Then,

√ F (y − yδ ) Fnδ
2π |u − uδ | = F −1 = F −1
FGσ FGσ

Now, for high-frequencies, F (nδ ) will be large while FGσ will be small (since Gσ is a
compact operator). Hence, the high frequency components in the error are amplified!

In many practical linear inverse problems, the condition of continuity is the first to break
down. This failure of continuous dependence of the solution on the data leads to extreme
amplification of noise. In addition, the uniqueness condition often fails in under-sampled
inverse problems. While under-sampling has physical advantages, such as faster data
acquisition or reduced exposure (e.g., in medical imaging), the trade-off is significant
noise amplification, thereby making the ill-posedness even more severe.

Example 1.1.4 (Computed Tomography). In almost any tomography application, the


underlying inverse problem is either the inversion of the Radon transform or of the X-ray
1.2 Overcoming the ill-posedness 9

transform. Here, we primarily follow Sherry [2025]. For u ∈ C0∞ R2 , s ∈ R, and θ ∈ S1




the Radon transform R : C0∞ R2 → C ∞ S1 × R can be defined as the integral operator


 

Z Z
f (θ, s) = (Ru)(θ, s) = u(x)dx = u(sθ + y)dy
x′ ·θ=s θ⊥

for θ ∈ S1 and θ⊥ being the vector orthogonal to θ. Effectively the Radon transform
in two dimensions integrates the function u over lines in R2 . Since S 1 is the unit circle
S 1 = θ ∈ R2 | ∥θ∥ = 1 , we can choose for instance θ = (cos(φ), sin(φ))⊤ , for φ ∈ [0, 2π),
and parameterize the Radon transform in terms of φ and s, i.e.
Z
f (φ, s) = (Ru)(φ, s) = u(s cos(φ) − t sin(φ), s sin(φ) + t cos(φ))dt
R

Note that with respect to the origin of the reference coordinate system, φ determines the
angle of the line along one wants to integrate, while s is the offset from that line from the
center of the coordinate system. It can be shown that the Radon transform is linear and
continuous, and even compact. Visually, CT simply turns images into sinograms:

R +n=

Radon Transform Inversion is Ill-Posed While enabling many applications, in prac-


tice inversion of the Radon transform can be shown to be ill-posed [Hertle, 1981], by con-
sidering singular values of the operator to show unboundedness. A practical illustration
of this ill-posedness is shown in Figure 1.5.

Informally, following [Epstein, 2007, Candès, 2021a], we can construct eigenfunctions of the
operator R∗ R, where R∗ is the adjoint of the Radon transform. For a formal presentation
see [Candès, 2021b]. For g(x) = ei⟨k,x⟩ , we have:

1 i⟨k,x⟩
R∗ R [g] (x) = e
∥k∥

Thus, g is an eigenfunction of R∗ R with eigenvalue ∥k∥


1
, meaning that singular values of
R are √ . As these tend to 0, the inverse is unbounded and the problem is ill-posed.
1
∥k∥

1.2 Overcoming the ill-posedness


The primary strategy to overcome the ill-posedness is not to solve the original ill-posed
problem directly, but rather to formulate and solve a related well-posed problem whose
solution is a good approximation of the true, underlying solution we seek. This is generally
called regularization.
10 Introduction to Inverse Problems

Ill-posedness: A−1 is not continuously in-


vertible (unbounded or discontinuous)

Typical reasons: noise, undersampling, non-


linearity, . . .

Fig. 1.5: Illustration of non-uniqueness in CT reconstruction (left) and a description of ill-posedness in


inverse problems (right). Courtesy of Samuli Siltanen.

1.2.1 Regularization in functional analysis


More generally considering a linear bounded operator A ∈ L(U, V) between two normed
spaces U and V, the natural choice for the inverse is the Moore-Penrose inverse A† (see
e.g. [Sherry, 2025]), however when im(A) is not closed, the pseudo-inverse is not bounded:
given noisy data y δ such that y δ − y ⩽ δ, we cannot expect convergence A† y δ → A† y as
δ → 0. To achieve convergence, A† isreplaced
 with a family of well-posed reconstruction
operators operators Rα with α = α δ, y δ and require that Rα(δ,yδ ) (y δ ) → A† y for all
y ∈ dom(A† ) and all y δ ∈ V s.t. y − y δ ⩽ δ as δ → 0.

Definition 1.2.1. Let A ∈ L(U, V) be a bounded operator. A family {Rα }α>0 of contin-
uous operators is called a regularization (or regularization operator) of A† if

Rα y → A† y = u†

for all f ∈ dom(A† ) as α → 0. In other words, a regularization is a pointwise approxima-


tion of the Moore-Penrose inverse with continuous operators.

1.2.2 Variational Regularization


An influential technique to overcome the ill-posedness of Equation (1.1), ensuring that
reconstructions are regularization operators, has been the variational approach pioneered
by Tikhonov [1963] and Phillips [1962]. Here, the inverse problem is re-framed as an
optimization task of minimizing a carefully constructed “variational energy functional”.
The energy functional typically has two components, taking the form:

min ∥Au − yn ∥2 + αR(u).


u∈U

Here, α > 0 acts as a tuning parameter balancing the effect of the data fidelity term
∥Au − yn ∥2 , which ensures consistency with the observed measurements. The regulariza-
tion term R(u) aims to incorporate prior knowledge about the reconstruction, penalizing
x if it is not “realistic”.

The selection of an appropriate regularizer R and the tuning of the parameter α are
critical for designing effective regularization methods. By introducing regularization, we
aim to achieve more than just any solution: we seek a problem that is well-posed and
1.2 Overcoming the ill-posedness 11

U A V
u†
u(y δ , α) y
δ

A† y δ yδ

Fig. 1.6: Regularization visualized.

whose solution is a good approximation of the true solution. This is normally captured
through the properties of existence, uniqueness, stability and convergence. Here, we will
present this for the simple setting of finite dimensions and strongly convex regularizers,
and a proof for this setting can be found in Mukherjee et al. [2024].

Theorem 1.2.2. For a forward operator A : Rn → Rm and noisy measurement y δ =


Au∗ + n, assume that the noise is bounded ∥n∥ < δ, and let R be a strongly convex
regularization term. Let
2
u(y δ , α) = argmin Au − y δ + αR(u), (1.2)
u∈Rn

then the following hold:

• Existence and Uniqueness: For all δ ≥ 0, α > 0, y ∈ Rm , the minimizer of


Equation (1.2) exists and is unique.
• Stability: For all sequences yk → y, u(yk , α) → u(y, α), i.e. u is continuous in y.
• Convergent Regularization: For a certain parameter rule α(δ) satisfying (δ, α(δ)) →
0, we have u(y δ , α(δ)) → u† , where

u† = argmin R(u)
u∈Rn
s.t. Au=Ay 0

is the R-minimizing solution, generalizing Definition 1.2.1 to general inverses.

This naturally extends to more complex settings [Shumaylov et al., 2024, Pöschl, 2008,
Scherzer et al., 2009, Mukherjee et al., 2023]. In words, Existence, Uniqueness and Stability
ensure that Definition 1.1.1 is satisfied and the regularized inverse problem is well-posed.
Convergent Regularization on the other hand shows that “solution is close to the orig-
inal”, and thus that formulating the inverse problem using a variational formulation is
reasonable.

1.2.3 Regularization as maximum a-posteriori estimation


Another way of tackling problems arising from ill-posedness is by adopting the Bayesian
perspective for inversion [Kaipio and Somersalo, 2006, Stuart, 2010]. The idea of statistical
inversion methods is to rephrase the inverse problem as a question of statistical inference.
12 Introduction to Inverse Problems

We consider the problem


Y = AU + N,
where the measurement Y , unknown U , and noise N are now modeled as random variables.
This approach allows us to model the noise through its statistical properties. Through this
lens, we can encode a-priori knowledge of the unknown in the form of a prior probability
distribution, assigning higher probability to those values of u we expect to see. The desired
solution then is no longer a vector; instead, it is a so-called posterior distribution, which is
the conditional probability distribution of u given measurement y. This distribution can
then be used to obtain estimates that are most likely in some sense. The main distinction
from the functional analytic approach is that in the former case, we attain only a single
solution; in the Bayesian viewpoint, a distribution of solutions is attained, from which
one can either sample, consider point estimates like means or modes, as well as achieve
uncertainty quantification by observing the variance of the posterior reconstructions.

Recall the Bayes theorem, which provides us with a way to statistically invert: for
y ∈ Rm , u ∈ Rn
p(y | u)p(u)
p(u | y) = .
p(y)
The likelihood p(y | u) is determined by the forward model and the statistics of the
measurement error, and p(u) encodes our prior knowledge on u. In practice, p(y) is
usually a normalizing factor which may be ignored. The maximum a-posteriori (MAP)
estimate is the maximizer u∗ of the posterior distribution:

p (u∗ | y) = max p(u | y) = max{p(y | u)p(u)}


u u

parameter and noise level This interpretation provides a useful connection between the
variational formulation and noise statistics. To be precise, the likelihood can be interpreted
as a fidelity term, and the prior as regularization. See e.g. [Pereyra, 2019, Tan et al.,
2024a]. For Gaussian noise, for example, the regularization parameter is given by the
variance of the noise model as shown below.

Example 1.2.3 (MAP for Gaussian noise). Consider an inverse problem y = Au + n,


where n is modeled as a zero-mean Gaussian n ∼ N (0, σ 2 Im ). Suppose we have a prior
distribution on the unknowns p(u) = e−R(u) . The posterior distribution for u knowing y
is given by Bayes:
p(y | u)p(u) 1 2
p(u | y) = ∝ e− 2σ2 ∥Au−y∥ −R(u)
p(y)
In this case, the MAP image reconstruction is the one that maximizes this probability, or
equivalently solves the variational regularization minimization problem
1
 
min R(u) + ∥y − Au∥2 .
u 2σ 2
Chapter 2

Variational Models and PDEs for Inverse


Imaging

In this chapter we will explore the relationship between variational models and the theory
of Partial Differential Equations (PDEs). This connection will equip us with a powerful
analytical and computational tools for analyzing and solving inverse problems. We begin
by investigating the impact of various regularization choices, with a particular emphasis
on Total Variation (TV) regularization due to its efficacy in preserving edges - a property
fundamental in image processing tasks. Because regularizers like TV often result in non-
smooth optimization problems, we will then introduce key concepts from classical convex
analysis. This provides the essential numerical tools for minimizing such variational ener-
gies. These tools are fundamental to many established methods and have led to a diverse
“zoo” of regularizers designed for various imaging tasks. Finally, we will finish the sec-
tion by highlighting the inherent limitations of purely model-driven approaches, thereby
motivating the exploration of data-driven and hybrid techniques in subsequent sections.

In the introduction, we primarily considered finite-dimensional Euclidean spaces. How-


ever, as we shift our attention to modeling inverse problems in imaging, it becomes nec-
essary to return to the continuous setting. In this framework, we typically consider the
unknown image u as a function in L2 (Ω), where Ω is an open and bounded domain with
a Lipschitz boundary (often a rectangle in R2 ). The transformation A, which maps the
true image to the observed data, is taken to be a bounded linear operator mapping from
L2 (Ω) into itself.

2.1 Variational Models


The reconstruction process of recovering the unknown image u from the observed data
y in many inverse imaging problems is formulated as a minimization problem. This is
typically formulated as (see e.g. Engl et al. [1996], Benning and Burger [2018]):

min {αR(u) + D(Au, y)} . (2.1)


u

Here, the data fidelity D(Au, y) enforces alignment between the forward model applied to u
and the observed data y. A common choice for D is a least-squares distance measure, such
as 21 ∥Au−y∥2 , however generally choice of D depends on data statistics (see Section 1.2.3),
and some examples are shown in Figure 2.1. The second term R(u) is a functional that
incorporates a-priori information about the image, acting as a regularizer, and α > 0 is
a weighting parameter that balances the influence of this prior information against fidelity
to the data. Some basic examples of forward operators include:

• A = Id for image denoising,


• A = 1Ω\D for image inpainting,
• A = ∗k for image deconvolution,
• A is (a possibly undersampled) Radon/Fourier transform for tomography.

13
14 Variational Models and PDEs for Inverse Imaging

Gaussian Poisson Impulse


D(Au, f ) = ∥Au − f ∥22 D(Au, f ) = Au − f log(Au) dx D(Au, f ) = ∥Au − f ∥1
R

MRI PET1 Sparse noise


Fig. 2.1: Examples of different noise models and corresponding data fidelity terms, with example images.
See works by Werner and Hohage [2012], Hohage and Werner [2014].

A critical question then arises: how do we choose an appropriate regularizer R(u)


for images? From a Bayesian perspective, the regularizer should be the data prior,
penalizing characteristics likely introduced by noise. Beyond this, the choice of R(u) is
guided by the specific properties we wish to enforce on the reconstructed image, such as
smoothness or preservation of edges. The following examples illustrate this fundamental
concept.

Example 2.1.1 (1D Tikhonov). Classical Tikhonov regularization often employs simple
quadratic regularizers like R(u) = 1R
2 Ω u2 dx or, more commonly for images, R(u) =

2 Ω |∇u| dx. The latter term penalizes large gradients, encouraging smoothness in the
1R 2

solution. However, this choice implies that the reconstructed image u possesses a certain
degree of regularity. Most crucially for imaging, the reconstruction cannot exhibit
sharp discontinuities like object boundaries or fine edges within an image.

To see this, consider a one-dimensional scenario where u : [0, 1] → R and u ∈ H 1 (0, 1),
i.e. is L2 with L2 derivative. For any 0 < s < t < 1, we have:
s
Z t √ Z t √
u(t) − u(s) = ′
u (r)dr ≤ t−s |u′ (r)|2 dr ≤ t − s∥u∥H 1 (0,1)
s s

This inequality shows that u must be Hölder continuous with exponent 1/2 (i.e., u ∈
C 1/2 (0, 1)), precluding jump discontinuities.

Example 2.1.2 (2D Tikhonov). Extending this to a two-dimensional image u ∈ H 1 ((0, 1)2 ),
2.2 Total Variation (TV) regularization 15

Fig. 2.2: Example of TV denoised image of rectangles. The total variation penalizes small irregulari-
ties/oscillations while respecting intrinsic image features such as edges.

one can show that for almost every y ∈ (0, 1), the function x 7→ u(x, y) (a horizontal slice
of the image) belongs to H 1 (0, 1). This is because:

∂u(x, y) 2
Z 1 Z 1 !
dx dy ≤ ∥u∥2H 1 < ∞
0 0 ∂x

This implies that u cannot have jumps across vertical lines in the image (and similarly for
horizontal lines).

2.2 Total Variation (TV) regularization


The examples above demonstrate that traditional Sobolev spaces like H 1 are often too
restrictive for image processing because they penalize the very edges that define objects.
To overcome this, we seek to relax the smoothness constraint by employing a weaker
notion of the derivative, leading us to Total Variation (TV) regularization.

Example 2.2.1 (Total Variation regularization). In TV regularization, the functional


used is R(u) = |Du|(Ω), which represents the total variation of u over the domain Ω. For
a locally integrable function u ∈ L1loc (Ω), its variation is defined as:
Z 
V (u, Ω) := sup u∇ · φdx : φ ∈ Cc1 (Ω; Rn ), ∥φ∥∞ ≤ 1 .

A function u belongs to the space of functions of Bounded Variation, denoted BV (Ω),


if and only if its variation V (u, Ω) is finite. For such functions, the total variation is given
by:
|Du|(Ω) = V (u, Ω)
where |Du|(Ω) is the total mass of the Radon measure Du, which is the derivative of u in
the sense of distributions.

The space BV (Ω) is particularly well-suited for images because, unlike H 1 (Ω), BV
functions can have jump discontinuities (edges). Minimizing the total variation
penalizes small irregularities and oscillations while respecting instrinsic image features
such as edges. See Figure 2.3 for a visualisation of properties of TV. Heuristically, the total
variation of a function quantifies the “amount of jumps” or oscillations it contains; thus,
16 Variational Models and PDEs for Inverse Imaging

(a) Large TV (b) Small TV (c) Examples of functions with equal TV

Fig. 2.3: Properties of Total Variation (TV) smoothing. (a-b) TV penalizes small irregularities and
oscillations, and tends to preserve edges. (c) The total variation measures the size of the jump discontinu-
ity. Overall, the total variation penalizes small irregularities/oscillations while respecting intrinsic image
features such as edges [Rudin et al., 1992].

noisy images, which typically have many rapid oscillations, have a large TV value. Owing
to these desirable properties, TV regularization has become a widely used technique in
image processing and inverse problems. It promotes solutions that are piecewise smooth
yet retain sharp edges, a property that is crucial in image processing, see e.g. Figures 2.3
and 2.4d. The non-differentiability of the TV term, however, necessitates specialized
optimization algorithms, such as primal-dual methods or general splittings [Lions and
Mercier, 1979, Combettes and Wajs, 2005, Hintermüller and Stadler, 2003].

Example 2.2.2 (Compressed sensing). In compressed sensing [Candes et al., 2006, Poon,
2015], TV regularization plays a vital role. Images often exhibit sparse gradients (large
areas of constant intensity), a key assumption in compressed sensing. For u ∈ W 1,1 (Ω),
the total variation coincides with the L1 norm of its gradient:

|Du|(Ω) = ∥∇u∥L1 (Ω)

The L1 norm is well-known for promoting sparsity. While the L0 norm (counting non-
zero gradient values) would be ideal for enforcing sparse gradients, it leads to compu-
tationally intractable (NP-hard) problems. The L1 norm serves as a convex relaxation,
making optimization feasible while still encouraging solutions with few non-zero gradient
values, characteristic of piecewise constant regions. Remarkably, if the underlying data
is indeed sparse, TV regularization enables near-perfect reconstruction from significantly
undersampled data, for example Figures 2.4 and 2.5.

Example 2.2.3 (MRI). Magnetic Resonance Imaging ( [Lustig, 2008, Fessler, 2008]) is
a medical imaging technique that measures the response of atomic nuclei in a strong
magnetic field. The measured data in MRI is essentially a sampled Fourier transform
of the object being imaged. In many MRI applications, acquiring a full set of Fourier
measurements is time-consuming and can be uncomfortable for the patient. Compressed
sensing offers a way to speed up the process by acquiring only a subset of the Fourier
data. This is known as undersampled Fourier acquisition, see Figure 2.5b.

y = (Fu)|Λ + n,

where F denotes the Fourier transform operator, Λ is the set of (undersampled) Fourier
coefficients and n represents noise in the measurement process. We often seek a piecewise
constant image, which means the image has distinct regions with constant intensities (like
different tissues in the body). The goal, then, is to identify a piecewise constant function
2.2 Total Variation (TV) regularization 17

(a) original (b) noisy

(c) ∥∇u∥22 (d) ∥∇u∥1

Fig. 2.4: Comparison of regularization methods. This showcase that convex relaxation of l0 of TV with
l1 successfully achieves sparsity and is a more natural prior for denoising.

u consistent to the datum g. As before, minimizing ∥∇u∥0 under data consistency is NP-
hard, and ℓ1 can be used as convex relaxation, leading to total variation minimization:

1 2
min α∥∇u∥1 + (Fu)|Λ − y .
u 2

(a) Ground truth (b) Undersampled Fourier (c) Zero-filling (d) TV solution

Fig. 2.5: MRI reconstruction example: (a) Ground truth Shepp-Logan phantom. (b) Undersampled
k-space (Fourier) data. (c) Reconstruction via zero-filling the undersampled k-space and inverse Fourier
transform. (d) Reconstruction using a Total Variation (TV) regularized approach.

Another key insight is that the TV measure can be interpreted as an accumulation


of the perimeters of all level sets. Thus, penalizing the TV encourages the stretching or
smoothing of these level sets, leading to a reduction in their overall length. This property
makes TV regularization particularly well-suited for segmentation problems, where the
goal is to partition an image into distinct regions with well-defined boundaries.
18 Variational Models and PDEs for Inverse Imaging

Example 2.2.4 (Sets of Finite Perimeter and the Co-area Formula). For instance, if
Ω ⊂ R2 is an open set and D is a subset with a C 1,1 boundary, the total variation of its
characteristic function u = χD (1 inside D, 0 outside) is simply the perimeter of D within
Ω: |Du|(Ω) = H1 (∂D ∩ Ω). See Ambrosio et al. [2000]. More generally, the co-area
formula states that for any u ∈ BV (Ω):
Z +∞
|Du|(Ω) = Per({u > s}; Ω)ds
−∞

where Per({u > s}; Ω) = ∥Dχ{u>s} ∥(Ω) is the perimeter of the superlevel set of u at level
s. This formula reveals that the total variation of u is the integral of the perimeters of all
its level sets.

In the context of image segmentation, minimizing the TV of an image encourages


the boundaries between different regions to be smooth and well-defined. This is because
minimizing the perimeters of the level sets leads to a reduction in the overall length
and complexity of the boundaries. This has made TV a fundamental tool in numerous
segmentation algorithms, for example through the Chan–Vese model.

Example 2.2.5 (Chan–Vese Segmentation). The Chan–Vese model [Chan and Vese, 2001]
is a popular variational approach for image segmentation that leverages the TV regular-
ization. It stems from the Mumford–Shah functional [Mumford and Shah, 1989], which
aims to find an optimal piecewise smooth approximation of a given image. The Chan–
Vese model simplifies this by assuming that the image can be segmented into regions with
constant intensities.

Let Ω ⊂ R2 represent the image domain, and let y : Ω → R denote the given image. The
Chan–Vese model seeks to partition Ω into two regions, represented by a binary function
χ : Ω → {0, 1}. The objective functional to be minimized is:
Z Z
2
min α|Dχ|(Ω) + (y − c1 ) χ + (y − c2 )2 (1 − χ),
χ,c1 ,c2 Ω Ω

where χ is the binary segmentation function, c1 and c2 are the average intensities within the
regions where χ = 1 and χ = 0, respectively. Solving the original Chan–Vese formulation

Fig. 2.6: Example of binary Chan–Vese segmentation compared to Mumford-Shah segmentation. [Mum-
ford and Shah, 1989, Pock et al., 2009, Getreuer, 2012].

with the binary constraint is computationally challenging. A common approach [Cai et al.,
2.3 From Total Variation Regularization to Nonlinear PDEs 19

2013, 2019] to address this is to relax the binary constraint. This leads to the following
convex optimization problem (with given c1 and c2 ):
Z Z
min α|Dv|(Ω) + (y − c1 )2 v + (y − c2 )2 (1 − v),
v Ω Ω

with the relaxed segmentation function v ∈ [0, 1]. The final binary segmentation is then
typically obtained by thresholding the resulting v. See Figure 2.6 for a visual example.

2.3 From Total Variation Regularization to Nonlinear PDEs


The study of variational methods such as TV regularization is deeply enriched by its
connection to the theory of Partial Differential Equations (PDEs). PDEs frequently arise
as mathematical descriptions of the gradient flow minimizing some energy functional. This
dynamic perspective is instrumental not only for characterizing important properties of
solutions, such as scale-space [Perona and Malik, 1990, Burger et al., 2006, Florack and
Kuijper, 2000] and image decomposition [Rudin et al., 1992, Ambrosio et al., 2001, Caselles
et al., 1993, Aujol et al., 2005, Alvarez et al., 1992, Caselles and Morel, 1998, Chambolle
et al., 2010], but also for understanding the continuous analog of iterative optimization
algorithms, thereby informing the analysis of their discrete counterparts. Moreover, the
PDE framework itself can motivate the development of novel regularizers or reconstruction
paradigms, occasionally leading to methods like Cahn–Hilliard inpainting [Burger et al.,
2009] that may not possess an explicit variational formulation. The extensive analytical
toolkit of PDE theory further allows for rigorous investigation into the qualitative aspects
of solutions, such as their regularity. This PDE-centric view enriches the understanding
of TV-based methods and is pivotal in guiding the design of efficient numerical solvers, a
concept we will illustrate with the Rudin–Osher–Fatemi (ROF) model [Rudin et al., 1992].

Example 2.3.1 (Nonlinear Image Smoothing: The ROF Model). The ROF model seeks
to determine an image u that remains close to a noisy observation y while also possessing
a minimal total variation. The corresponding optimization problem is:
1
 
min α|Du|(Ω) + ∥u − y∥2 .
u 2
To understand the process by which u evolves to minimize this energy functional, we can
examine its (sub)gradient flow. This flow describes the path of steepest descent for the
functional. It is given by the differential inclusion [Bellettini et al., 2002a]:

ut = αp + (u − y), where p ∈ ∂(|Du|), in Ω.

In this expression, ut denotes the derivative of u with respect to an artificial time variable
t (representing the evolution of the flow), and p is an element from the subdifferential of
the TV term.

In regions of the image where the gradient is component-wise non-zero, the subdifferential
reduces to a singleton, and the gradient flow to be expressed more explicitly as the following
nonlinear PDE:
Du
 
ut = α div + (u − y), in Ω.
|Du|
This equation is a nonlinear diffusion equation. Its key characteristic is that the effective
diffusion coefficient is inversely proportional to the magnitude of the image gradient, |Du|.
20 Variational Models and PDEs for Inverse Imaging

This property leads to a highly desirable selective smoothing behavior: in relatively flat
regions of the image where |Du| is small (often dominated by noise), the diffusion is strong,
leading to significant smoothing. Conversely, near sharp edges where |Du| is large, the
diffusion is weak, which helps to preserve these important structural features of the image
while reducing noise elsewhere.

2.4 Numerical Aspects


Having explored various regularization functionals and their impact on reconstructed im-
age properties, we now turn to the crucial aspect of their practical implementation: nu-
merical optimization. This section discusses established algorithms for finding minimizers
of the energy functionals that arise in variational image reconstruction, largely drawing
from the framework presented in Chambolle and Pock [2016]. We typically consider opti-
mization problems in a finite-dimensional setting, u ∈ Rn = X. The general form of the
minimization problem is:
min J (u) + H(u),
u∈X

where J and H are proper and convex functions, and one or both may be Lipschitz
differentiable. Note that for this section the notation has changed from the usual, as will
become clear from Example 2.4.1 - both J and H can take on the rule of the data fidelity.

Example 2.4.1 (ROF problem). For a given noisy image y ∈ Rn , the ROF model from
Example 2.3.1 seeks to find an image u by solving:

1
min α∥∇u∥2,1 + ∥u − y∥22 ,
u 2
P q
where the TV term is ∥∇u∥2,1 = |(∇u)ij |2 = (ux )2ij + (uy )2ij .
P
ij ij

Several algorithms have been developed to compute minimizers of such functionals. One
approach involves regularising the TV term to make it differentiable. For instance, one
might consider instead solving the following regularized ROF problem:

1
 Xq 
min α u2x + u2y + ϵ + ∥u − g∥22
u 2

for a small 0 < ϵ ≪ 1. The regularized TV is differentiable in the classical sense, therefore
we can apply classical numerical algorithms to compute a minimizer, e.g. gradient descent,
conjugate gradient methods etc.

However, to address the original, unregularized problem, we need to employ tech-


niques from convex analysis, specifically the concept of subgradients. We will consider the
following basic properties [Chambolle and Pock, 2016].

Definition 2.4.2 (Lower Semi-Continuity (lsc.)). A function f : X → R ∪ {+∞} is


lower semi-continuous (lsc) at a point x0 ∈ X if for any sequence {xk }∞
k=1 in X such that
xk → x0 , we have:
lim inf f (xk ) ≥ f (x0 )
k→∞

A function f is said to be lsc on X if it is lsc at every point in X.


2.4 Numerical Aspects 21

Definition 2.4.3 (Proper). An extended real-valued convex function f : X → R ∪ {+∞}


is called a proper if dom(f ) = {x ∈ X : f (x) < +∞}, is non-empty, and f (x) > −∞ , ∀x ∈
X.

Definition 2.4.4 (Subdifferential). For a convex function J : X → R, we define the


subdifferential of J at x ∈ X, as ∂J(x) = ∅ if J(x) = ∞, otherwise

∂J(x) := p ∈ X ′ : ⟨p, y − x⟩ + J(x) ≤ J(y)



∀y ∈ X ,

where X ′ denotes the dual space of X. It is obvious from this definition that 0 ∈ ∂J(x)
if and only if x is a minimizer of J.

Example 2.4.5 (Subdifferential of the ℓ1 norm). To illustrate the concept of the sub-
differential for a common non-smooth function in imaging, consider the ℓ1 -norm. Let
X = ℓ1 (Λ) and J(x) := ∥x∥1 , with Λ = [1, . . . , n] or N. The subdifferential is given by:

∂∥ · ∥1 (x) = {ξ ∈ ℓ∞ (Λ) : ξλ ∈ ∂| · | (xλ ) , λ ∈ Λ} ,

where ∂| · |(z) = {sign(z)} if z ̸= 0, and ∂| · |(0) = [−1, 1].

The Legendre–Fenchel transform In order to approach the problem in Example 2.4.1,


it will turn out to be useful to write down the dual formulation, for which we need to define
the Legendre–Fenchel transform.

Definition 2.4.6 (Legendre-Fenchel Transform). The Legendre-Fenchel transform (also


known as the convex conjugate) of a function f : X → R ∪ {±∞} is a function f ∗ : X ′ →
R ∪ {−∞, +∞} defined as f ∗ (y) := supx∈X {⟨y, x⟩ − f (x)} where ⟨x, y⟩ denotes the inner
product of x and y. In particular, f ∗ is convex and lsc.

What is more, when f is a proper, convex, lsc function, applying the Legendre-Fenchel
transform twice returns the original function: ∀x ∈ Rn : f ∗∗ (x) = f (x).

Example 2.4.7 (One homogeneous functions). For example, for a function J that is one-
homogeneous (i.e., J(λu) = λJ(u) for every u and λ > 0), its Legendre–Fenchel transform
is the characteristic function of a closed convex set K:
0 if v ∈ K,
(

J (v) = χK (v) =
+∞ otherwise.

Since J ∗∗ = J (as J is proper, convex, and lsc), we can recover J(u) from its transform:

J(u) = sup ⟨u, v⟩X .


v∈K

Proximal Map To effectively handle non-differentiable terms in optimization problems


and to construct iterative algorithms, the concept of the proximal map is essential. For
a convex, proper, and lower semi-continuous function J : X → R ∪ {+∞}, its proximal
map at a point y ∈ X, with parameter τ > 0, is defined as the unique minimizer of the
following problem:
1
 
proxτ J (y) = argmin J(u) + ∥u − y∥22 .
u∈X 2τ
22 Variational Models and PDEs for Inverse Imaging

Let u∗ = proxτ J (y). The optimality condition for this minimization is:
u∗ − y
0 ∈ ∂J (u∗ ) + ,
τ
which can be rewritten as u∗ = (I + τ ∂J)−1 y. Furthermore, Moreau’s identity provides a
relationship between the proximal map of J and its convex conjugate J ∗ :
y
 
y = proxτ J (y) + τ prox 1 J ∗ .
τ τ
This identity implies that if proxτ J is known, prox 1 J ∗ can also be computed.
τ

2.4.1 Dual Problem


Transforming the original (primal) optimization problem into its dual counterpart can
often lead to more tractable problems or enable the development of efficient algorithms,
especially when dealing with complex coupling terms. Consider the primal problem:

min J (Au) + H(u),


u∈X

where J : Y → (−∞, +∞] and H : X → (−∞, +∞] are convex, lower semi-continuous
(l.s.c.) functions, and A : X → Y is a bounded linear operator. Using the definition of
the convex conjugate, we have J (Au) = supp∈Y (⟨p, Au⟩ − J ∗ (p)). Substituting this into
the primal problem leads to:

min sup (⟨p, Au⟩ − J ∗ (p) + H(u)) .


u∈X p∈Y

Then, under certain mild assumptions on J , H, e.g. in finite dimensions it is sufficient


to have a point with Ax in the relative interior of dom J and x in the relative interior
of dom H see e.g. [Chambolle and Pock, 2016, Section 3.1] or [Sherry, 2025, Section 5.1],
one can swap the min and sup to have:
min J (Au) + H(u)
u∈X
= min sup⟨p, Au⟩ − J ∗ (p) + H(u)
u∈X p∈Y
|{z}
J ∗∗ =J
= max inf ⟨p, Au⟩ − J ∗ (p) + H(u)
p u

= max −J ∗ (p) − H∗ (−A∗ p) ,


p

where the latter is the dual problem, H∗ is the convex conjugate of H, and A∗ is the
adjoint of A. Under the above assumptions, there exists at least one solution p∗ (see, e.g.,
Ekeland and Temam [1999], Borwein and Luke [2015]). If u∗ solves the primal problem
and p∗ solves the dual problem, then (u∗ , p∗ ) is a saddle-point of the Lagrangian L(u, p)
defined as follows, which provides a link between primal and dual solutions:

L(u, p) := ⟨p, Au⟩ − J ∗ (p) + H(u),

such that for all (u, p) ∈ X × Y , we have L(u∗ , p) ≤ L(u∗ , p∗ ) ≤ L(u, p∗ ). Moreover, we
can define the primal-dual gap, a measure of suboptimality, is defined as

G(u, p) := (J (Au) + H(u)) + (J ∗ (p) + H∗ (−A∗ p)) ,

which vanishes if and only if (u, p) is a saddle point of L.


2.4 Numerical Aspects 23

Example 2.4.8 (Dual ROF). To show how duality can simplify or offer new perspectives,
we can derive the dual of the ROF problem from Example 2.4.1. Let A = ∇, J (z) =
α∥z∥2,1 , and H(u) = 21 ∥u − y∥22 . The convex conjugate of J is

0 if |pi,j |2 ≤ α
(
∗ ∀i, j,
J (p) = χ{∥·∥2,∞ ≤α} (p) = .
+∞ otherwise

The conjugate of H is H∗ (q) = 12 ∥q + y∥22 − 12 ∥y∥22 . Substituting into the dual formulation,
we get:
1
 
max −J ∗ (p) − ∥∇∗ p∥22 − ⟨∇∗ p, y⟩
p 2
1 1
 
= − min J ∗ (p) + ∥∇∗ p − y∥22 + ∥y∥2 .
p 2 2
So the dual ROF problem is equivalent to solving:
1
 
min ∥∇∗ p − y∥22 : ∥pi,j ∥2 ≤ α for all i, j .
p 2
This dual problem is a constrained least-squares problem, which can be easier to solve
than the primal non-smooth problem. From the optimality conditions of the saddle-point
problem, we also have the relationship u = y − ∇∗ p connecting the primal and dual
solutions.

With these tools, we can now introduce several iterative algorithms designed for non-
smooth convex optimization.

2.4.2 Proximal descent


The proximal descent algorithm is a foundational iterative method for minimizing non-
smooth convex functions, generalizing the idea of gradient descent by leveraging the prox-
imal operator. Starting from an initial guess u0 , it generates a sequence of iterates u(k)
according to:
uk+1 = proxτ J (uk ) = (I + τ ∂J)−1 (uk ).
If J is differentiable, this corresponds to an implicit gradient step
 
uk+1 = uk − τ ∇J uk+1 .

The iterate uk+1 is the unique minimizer of the proximal subproblem J(v) + 2τ
1
∥v − uk ∥22 .
This algorithm can be interpreted as an explicit gradient descent step on the Moreau–
Yosida regularization of J. The Moreau–Yosida regularization of J (or envelope) with
parameter τ > 0 is: !
∥v − ū∥22
Jτ (ū) := min J(v) + .
v 2τ
It can be shown that Jτ is continuously differentiable (even if J is not) with gradient:

ū − proxτ J (ū)
∇Jτ (ū) = .
τ
Thus, the proximal descent update uk+1 = proxτ J (uk ) can be rewritten as uk+1 = uk −
τ ∇Jτ (uk ), which is an explicit gradient descent step on the smoothed function Jτ .
24 Variational Models and PDEs for Inverse Imaging

2.4.3 Forward-Backward Splitting


To solve optimization problems structured as the sum of a smooth function and a non-
smooth (but proximally tractable) function, the Forward-Backward Splitting algorithm
offers an effective iterative approach. Consider the problem of minimizing the sum of two
convex, proper, and lsc. functions:

min J (u) + H(u),


u

where J is “simple” (its prox is easily computable) and H is differentiable with a Lipschitz
continuous gradient (Lipschitz constant LH ). The Forward-Backward splitting algorithm,
also known as the proximal gradient algorithm, combines an explicit gradient descent step
on H (forward step) and an implicit proximal step on J (backward step):

uk+1 = proxτ J (uk − τ ∇H(uk )).

A point u is a minimizer of the composite objective if and only if it is a fixed point of this
iteration, which corresponds to the optimality condition 0 ∈ ∇H(u) + ∂J (u). If the step
size τ satisfies 0 < τ ≤ 1/LH , the iterates uk converge to a minimizer.

2.4.4 Primal-Dual Hybrid Gradient descent


For more complex structured problems, particularly those involving a linear operator A
coupling terms (e.g., J (Au)), primal-dual algorithms iteratively seek a saddle point of an
associated Lagrangian, which corresponds to a solution of the original problem. Consider
now problems of the form
min J (Au) + H(u),
u
where J , H are convex, lsc and simple, and A bounded and linear. Primal-dual hybrid
(PDHG) gradient descent alternates between proximal descent in the primal variable u
and ascent in the dual variable p for the corresponding saddle-point problem

max inf ⟨p, Au⟩ − J ∗ (p) + H(u)


p u

via updating the primal and dual variables in an alternating fashion:


 
uk+1 = proxτ H uk − τ A∗ pk
  
pk+1 = proxσJ ∗ pk + σA 2uk+1 − uk

This algorithm is closely related to other optimization techniques like the augmented
Lagrangian method and the alternating direction method of multipliers (ADMM) [Arrow
et al., 1958, Pock et al., 2009, Esser et al., 2010].

2.5 Regularizer Zoo


Beyond the ROF model, TV regularization has found applications in a multitude of image
processing tasks such as deblurring, inpainting, and segmentation. Each distinct appli-
cation typically gives rise to a unique PDE, thereby presenting specific analytical and
numerical challenges. Since the inception of these TV-based approaches, considerable
research effort has been dedicated to the comprehensive analysis of these models, encom-
passing both their numerical and analytical properties. The analytical and numerical
2.5 Regularizer Zoo 25

challenges posed by these problems have driven significant advancements in optimization


and PDE theory, leading to the development of sophisticated algorithms and a deeper un-
derstanding of the underlying mathematical structures. Here we summarize key aspects
of this extensive body of work, though it represents only a selection from the broader
literature.

Analytical Properties:

• Function Space: The natural function space for TV regularization is the space of
functions of bounded variation, BV (Ω). This space is non-reflexive, necessitating
the use of specialized compactness properties for analysis [Ambrosio et al., 2000,
Ambrosio, 1990, De Giorgi and Ambrosio, 1988, De Giorgi, 1992].
• Stability: Novel metrics like Bregman divergences can be employed for deriving
stability estimates in TV-regularized problems [Burger et al., 2007, Hofmann et al.,
2007, Schönlieb et al., 2009].
• Non-differentiability: The non-differentiability of the TV term requires tools from
convex analysis, such as subgradients, and leads to the study of TV flow via dif-
ferential inclusions or viscosity solutions [Chen et al., 1999, Ambrosio and Soner,
1996, Caselles et al., 2007, Alter et al., 2005, Caselles and Chambolle, 2006, Bellet-
tini et al., 2002a,b, Paolini, 2003, Novaga and Paolini, 2005, Bellettini et al., 2006].
Analysis often draws upon geometric measure theory [Federer, 1996, 2014, Allard
and Almgren, 1986, Allard, 2008].

Numerical Properties:

• Non-smooth Optimization: Standard optimization frameworks for smooth, strictly


convex problems are not directly applicable to TV regularization. New analysis is
based on algorithms involving generalized Lagrange multipliers, Douglas–Rachford
splitting [Lions and Mercier, 1979], iterative thresholding algorithms [Chambolle and
Dossal, 2015, Chambolle, 2004, Combettes and Pesquet, 2008, Combettes and Wajs,
2005, Daubechies et al., 2004] and semi-smooth Newton methods [Hintermüller,
2010, Hintermüller and Stadler, 2003].
• Scalability: Large-scale non-smooth convex minimization problems involving TV
regularization often scale poorly. Accelerations are achieved through precondition-
ing, splitting approaches, partial smoothness, and stochastic optimization [Chan and
Chan, 1990, Chan and Strang, 1989, Fornasier and Schönlieb, 2009, Fornasier et al.,
2010, Afonso et al., 2010, Bioucas-Dias and Figueiredo, 2007, Figueiredo et al., 2007,
Beck and Teboulle, 2009, Cevher et al., 2014, Liang et al., 2014, Bredies and Sun,
2015, Chambolle et al., 2018].
• Non-smooth and Nonlinear Problems: Further challenges arise when using non-
smooth regularization for nonlinear inverse problems. From an optimization per-
spective, the combination of non-smoothness and non-convexity opens yet another
chapter in numerical analysis and optimization with only partly satisfying results.
Seminal contributions include the work of Kaltenbacher et al. [2008] and Bachmayr
and Burger [2009] on computational solutions for nonlinear inverse problems, and
several works on non-smooth and non-convex optimization [Attouch et al., 2010,
Valkonen, 2014, Bolte et al., 2014, Chizat and Bach, 2018, Driggs et al., 2021, Ben-
ning et al., 2021].
26 Variational Models and PDEs for Inverse Imaging

2.5.1 Regularizer Zoo: Electric boogaloo


Real-world inverse problems often present unique challenges, such as diverse noise, in-
tricate structures, and specific modeling requirements, necessitating tailored regularizers
for optimal reconstruction. Classically, this has been addressed by designing and refining
handcrafted models encouraging desirable properties in the reconstructed image, such as
smoothness, sparsity, or adherence to specific geometric features. Over time, this has lead
to a diverse array of specialized regularizers, based on:

• Multi-resolution analysis, wavelets [Mallat, 1999, Daubechies, 1992, Vonesch et al.,


2007, Unser and Blu, 2000, Dragotti and Vetterli, 2003, Kutyniok and Labate, 2012,
Foucart and Rauhut, 2013, Fornasier et al., 2012];
• Other Banach-space norms, e.g. Sobolev norms, Besov norms, etc. [Saksman et al.,
2009, Lassas Eero Saksman and Siltanen, 2009];
• Higher-order total variation regularization [Osher et al., 2003, Chambolle and Lions,
1997, Setzer et al., 2011, Bredies et al., 2010], as well as higher-order PDEs, Euler
elastica [Masnou and Morel, 1998, Shen et al., 2003, Bertozzi et al., 2006];
• Non-local regularization [Gilboa and Osher, 2009, Buades et al., 2011];
• Anisotropic regularization [Weickert et al., 1998];
• Free-discontinuity problems; [Mumford and Shah, 1989, Carriero et al., 1992].
• and mixtures of the above and others. . .

This “regularizer zoo” highlights the diversity of approaches developed to address the
specific challenges posed by different image reconstruction problems. The selection of an
appropriate regularizer requires careful consideration of the image properties, the degra-
dation process, and the desired characteristics of the reconstruction.

2.6 Limitations and move towards data-driven


While knowledge-driven regularization has significantly advanced image reconstruction, its
effectiveness is inherently limited by our ability to accurately model the complex structures
present in real-world images. Even with sophisticated mathematical models and physical
principles from non-linear PDEs, variational formulations, and multi-resolution or sparsity
constraints, our understanding of image formation and degradation remains incomplete.
Consider the remarkable ability of the human brain to denoise and interpret images under
challenging conditions — a feat far surpassing current knowledge-driven approaches.

This limitation highlights the need for a paradigm shift towards data-driven recon-
struction methods, which leverage the power of overparameterized models like support
vector machines and neural networks [Goodfellow et al., 2016]. These models, trained on
vast amounts of data, can learn intricate patterns and relationships that may be diffi-
cult to capture through explicit mathematical formulations. The next section will delve
into existing paradigms and challenges of data-driven reconstruction, exploring how it
can complement and even surpass knowledge-driven methods in the quest for accurate
and robust image recovery. Furthermore, we will investigate the emerging field of hybrid
approaches that combine the strengths of both data-driven and knowledge-driven tech-
niques, potentially leading to a new generation of image reconstruction algorithms that
push the boundaries of performance and applicability.
Chapter 3

Data-Driven Approaches to Inverse


Problems
This chapter explores the paradigm shift from knowledge-driven to data-driven approaches
for solving inverse problems in imaging. As we established in previous chapters, traditional
knowledge-driven methods, while powerful, are fundamentally limited by our ability to
accurately model the complexities of real-world images.

It is important to preface this section by acknowledging the exceptionally rapid evolution


of the field of deep learning, particularly in imaging. The content presented herein largely
reflects the understanding and prominent methods as of 2023, the time of the original
CIME lectures. Consequently, while the foundational concepts discussed remain relevant,
the field has likely seen further advancements since.

3.0.1 Knowledge-Driven vs. Data-Driven Models

(a) Ground-truth (b) FBP: 21.61 dB, 0.17 (c) TV: 25.74 dB, 0.80 (d) LPD: 29.51 dB, 0.85

Fig. 3.1: Limited angle CT reconstruction: Heavily ill-posed problem. Deep Learning cannot do magic
and also hits boundaries of what is mathematically possible. A fully learned method LPD (Section 3.1) in
d begins hallucinating, as highlighted in red boxes, despite resulting in better performance metrics (here
PSNR and SSIM).

Knowledge-Driven Models Knowledge-driven models are rooted in mathematical


principles and domain expertise. These models are often interpretable and offer theo-
retical guarantees, but their performance is fundamentally limited by two critical factors:
the accuracy of the underlying mathematical model, and the designer’s ability to cap-
ture complex system behaviors. Such models often struggle with complex or unknown
noise patterns and complex image structures that defy straightforward mathematical rep-
resentation. Even if such complexities could be modeled, practical limitations arise due
to computational constraints: while we might develop highly detailed models, the sheer
complexity may render them computationally infeasible to use effectively.

The deep learning revolution of the 2010s fundamentally transformed our approach
to complex imaging tasks by challenging traditional modeling paradigms. As computa-
tional power and data availability dramatically increased, neural networks demonstrated
their ability to learn intricate representations directly from massive datasets, often out-
performing carefully crafted mathematical models. Consequently, given the abundance
of image data available today, a natural question emerges: why meticulously handcraft

27
28 Data-Driven Approaches to Inverse Problems

Fig. 3.2: Sparse view CT reconstruction: top row is based on mathematical/handcrafted models; bottom
row is using novel deep learning based models. For this problem, deep learning methods result in both
improved metrics (here PSNR and SSIM) and visually better reconstructions. Photo courtesy of Subhadip
Mukherjee [Mukherjee et al., 2023].

models when we can potentially derive effective priors simply by providing sufficient data
to overparameterized models?

Data-Driven Models Data-driven models, in contrast to knowledge driven approaches,


directly extract information from data. While traditional knowledge-driven methods often
involve some degree of learning (e.g., parameter estimation), they typically rely on mod-
els with a limited number of parameters. Unlike knowledge-driven models, data-driven
approaches have a significant number of parameters and leverage large datasets to learn
complex patterns and relationships without explicit mathematical modeling.

Deep learning exemplifies this paradigm, using extensive computational resources to train
highly flexible, over-parameterized neural networks that can adapt to diverse imaging
tasks and datasets, especially in high dimensions, while remaining computationally effi-
cient. An example in the context of inverse problems is shown in Figure 3.2 – learned
methods consistently and significantly outperform knowledge driven methods like TV reg-
ularization. Despite their power, these models often sacrifice interpretability and demand
substantial training data to achieve good performance. An example of this in the context
of inverse problems is shown in Figure 3.1 - for a significantly ill-posed problem, fully
learned methods begin hallucinating, despite resulting in better performance metrics.

We note however, that derivation of models directly from data is by no means exclusive
to deep neural networks; in fact, such approaches predate and extend beyond them, con-
stituting a rich methodological landscape within machine learning and signal processing.
To illustrate, classical learning techniques have long been employed to explore data-driven
regularization models, for example:

• Sparse Coding and Dictionary Learning: These methods aim to find a sparse
29

representations of signals as linear combinations of a few elementary atoms from a


dictionary, which itself can be learned from data. Approaches optimize signal repre-
sentation through a minimization problem that balances data fidelity and sparsity:
! 2
min A + ∥γ∥1 .
X
γi ϕi − y
γ,ϕ
i 2

Some examples include Elad and Aharon [2006], Aharon et al. [2006], Mairal et al.
[2009], Rubinstein et al. [2010], Moreau and Bruna [2016], Chandrasekaran et al.
[2011], DeVore [2007], Fadili et al. [2009], Mallat and Zhang [1993], Elad and Aharon
[2006], Rubinstein et al. [2009], Papyan et al. [2017], Peyré [2009].

• Black-Box Denoiser Methods: These techniques integrate powerful, often pre-


existing, denoising algorithms as implicit priors within iterative reconstruction schemes,
without requiring explicit knowledge of the denoiser’s internal structure. Some ex-
amples include the Plug-and-Play Prior method Venkatakrishnan et al. [2013], Wei
et al. [2020] and Regularization by Denoising Romano et al. [2017], Terris et al.
[2020]:

min D(A(u), y) + αR(u), with R(u) = ⟨u, u − Λ(u)⟩, Λ : X → X denoiser.


u

• Bilevel Optimization (since early 2000s): This class of methods addresses


the challenge of learning model parameters by formulating a nested optimization
problem, where an outer problem optimizes parameters used in an inner image
reconstruction or processing task.

min F (uλ ) s.t. uλ = argmin R(λ, u) + D(A(u), y).


λ u

Some examples include Calatroni et al. [2017a], Kunisch and Pock [2013], De los
Reyes et al. [2016], Haber et al. [2009], Langer [2017], Horesh et al. [2010].

The key distinctions between the paradigms emerge not just in methodology, but in their
philosophical approach: knowledge-driven models seek to understand through explicit
modeling assumptions, while data-driven models pursue understanding through statistical
learning and pattern recognition.

The rest of this section will discuss deep learning more generally, will present a number
of recent approaches within the data-driven paradigm, progressively advancing towards
methodological frameworks that intersect the two paradigms, exemplified through meth-
ods using deep neural networks as regularizers.

3.0.2 The Black Box of Deep Learning


Deep learning has shown remarkable success in various fields, but comes with significant
challenges and limitations associated with these approaches. We refer the interested reader
to the review by Grohs and Kutyniok [2022]. State of the art deep neural networks have
‘too many’ degrees of freedom:

• millions of free parameters, i.e., the parameter space Θ is super high-dimensional;


30 Data-Driven Approaches to Inverse Problems

• complex concatenation of diverse mathematical constructs (convolutions, activa-


tions, attention, skip connections, normalization, dropout, ...);
• high-dimensional and non-convex optimization problems.

The usefulness of the resulting model is influenced by all of these model design parameters,
as well as the quality of the training set, and optimization approach used. This in turn,
makes it difficult to understand their internal workings and interpret their outputs. This
makes deep learning models effectively a “black box”. More precisely, the resulting issues
are:

• Lack of Interpretability: It is difficult to understand why a deep learning model


produces a particular output. This can make it challenging to identify biases, errors,
or limitations in the model.
• Safety Concerns: In applications where safety is critical, such as medical imaging,
or autonomous driving, the lack of interpretability can raise concerns about the
reliability and trustworthiness of deep learning models.
Particularly, for the problem of CT reconstruction: how can we reliably say that
a deep learning model did not introduce or obscure cancerous tissue during recon-
struction?
• Limited Design Principles: Due to their complexity, there is no systematic way
to design deep learning models. Their development often involves trial and error,
making it difficult to guarantee optimal performance or generalize to new tasks.

Despite these, deep learning offers interesting opportunities for inverse problems thanks
to its ability to produce highly accurate and computationally fast solutions. However, to
fully leverage the potential of deep learning, it is essential to integrate these techniques
with established mathematical modeling principles. This synergy is crucial to ensure pre-
dictable and reliable (in a certain sense) behavior in the resulting solutions, as oftentimes
interpretability goes hand in hand with mathematical guarantees, but often comes at an
expense in either reduced performance or computational intractability. The current main
goal of the field thus is in trying to find the sweet spot between computational power and
mathematical guarantees.

Example 3.0.1 (Neural Networks in a Nutshell). A neural network can be formally


defined as a mapping:
Ψ:X ×Θ→Y
(x, θ) 7→ z K ,
where in above, X is the input space, Y is the output space, x ∈ X is the input data,
z K ∈ Y is the output of the network, Θ = (Θ0 , ..., ΘK−1 ) represents the parameter space,
and Θk denoting the parameter space of the k-th layer. The network’s internal operations
are characterized by a sequence of layer-wise transformations:

z0 = x ∈ X
 
z k+1 = f k z k , θk , k = 0, . . . , K − 1

where z k ∈ X k represents the feature vector at the k-th layer, with X k being the corre-
sponding feature space, and f k : X k × Θk → X k+1 is the non-linear transformation at
the k-th layer, parameterized by θk . A common choice for f k is an affine transformation
3.1 Learned Iterative Reconstruction Schemes 31

∇D(A(·), y) ∇D(A(·), y)

Fig. 3.3: Learned Iterative Schemes Schematic.

followed by an element-wise non-linear activation function:


 
f k (z) = σ W k z + bk ,

where W k is a weight matrix (for imaging tasks often represented by a convolution oper-
ator), bk is a bias vector, σ is an element-wise non-linear activation function (e.g., ReLU,
tanh).

The training process aims to optimize the network parameters θ by minimizing a loss
function Ln over a given dataset {(xn , cn )}n , often with an added regularization term
R(θ) (this time to regularize the training):

N
1 X
min Ln (Ψ (xn , θ) , cn ) + R(θ)
θ∈Θ N n=1

This generic framework can be adapted and applied to various mathematical imaging
tasks, such as image classification, segmentation, and reconstruction, by appropriately
defining the network architecture, loss function, and training data.

3.1 Learned Iterative Reconstruction Schemes


This section explores learned iterative reconstruction schemes, a class of end-to-end deep
learning methods for solving inverse problems. Some good reviews on the topic include
Arridge et al. [2019], McCann et al. [2017], with many works in the literature proposing
similar approaches [Gregor and LeCun, 2010, Sun et al., 2016, Meinhardt et al., 2017,
Putzky and Welling, 2017, Adler and Öktem, 2017, 2018, Hammernik et al., 2018, Haupt-
mann et al., 2018, de Hoop et al., 2022, Gilton et al., 2019, Bubba et al., 2021]. These
methods draw inspiration from classical iterative algorithms, where individual iterative
steps are replaced or augmented with neural networks. The core idea is to “unroll” a fixed
number of iterations of an optimization algorithm and learn parts of this unrolled scheme
from data. This approach often begins by considering a gradient descent update rule for
a variational problem and then introducing parameterized blocks within these iterations.
The parameters of these blocks are subsequently optimized over a fixed number of steps
using supervised data. The general concept can be illustrated by comparing it with stan-
dard gradient descent. Given an initial guess u0 ∈ X, gradient descent follows a sequence
of steps as
u := (Id −η∇D(A(·), y))N u0 .
32 Data-Driven Approaches to Inverse Problems

Learned iterative schemes generalize this by parameterizing these steps individually:


 
u := (ΛΘN ◦ · · · ◦ ΛΘ1 ) u0 .

Each ΛΘk can be viewed as a residual layer in a neural network ΨΘ (y) with N layers, which
reconstructs u from y. These schemes are typically derived from an iterative method
designed to solve a variational regularization problem and several variations of learned
iterative schemes exist, each with a different formulation for the layers ΛΘk . Writing the
steps as   
uk+1 = Λθk uk , A∗ Auk − y for k = 0, . . . , N − 1,
for some neural networks Λθk : X × X → X, the main examples (described in more detail
below) are

Λθ (u, h) := u + Γθ (h) (original learned gradient) (3.1)


Λθ (u, h) := u − h + Γθ (u) (variational networks) (3.2)
Λθ (u, h) := Γθ (u − h) (plug-and-play; learned proximal) (3.3)

for some neural network Γθ : X → X with an architecture that does not involve data or the
forward operator (or its adjoint), which only enter into the evaluation of h = D(A(·), y).
An illustration is shown in Figure 3.3. Parameters are then learned from supervised data.

Variational Networks. First proposed by Hammernik et al. [2018], Kobler et al. [2017],
they represent one of the earliest learned iterative schemes and are notable for their ex-
plicit connection to variational regularization. These networks are inspired by variational
regularization models where the regularization functional incorporates parameterizations
that extend beyond traditional handcrafted regularisers like TV. More specifically, their
development draws from the Fields of Experts (FoE) model and, by extension, conditional
shrinkage fields [Schmidt and Roth, 2014], which allow for adaptive parameters across it-
erations. Variational Networks are defined by unrolling an iterative scheme designed to
minimize a functional comprising both a data discrepancy component and a regularizer.
Such networks can be interpreted as performing block incremental gradient descent on a
learned variational energy or as learned non-linear diffusion.

Learned Gradient, Proximal, Primal Dual. These are all further extensions of the
idea introducing increasingly more freedom to parameters [Adler and Öktem, 2017], with
Adler and Öktem [2018] generalizing the steps even further to include steps in both primal
(image) and dual (measurement) spaces, inspired by the primal-dual hybrid gradient from
Section 2.4.4. This can be summarized with a more general parameterization of the
gradient steps, potentially including some R regularization information, e.g. using TV
[Kiss et al., 2025]. An illustration is shown in Figure 3.4.

ΛΘk := ΓΘk (u, y, A∗ y, Au, ∇R(u)) for (u, ỹ) ∈ X × Y,

Empirical evidence suggests such approaches result in models that are easier to train
and demonstrate excellent reconstruction quality for mildly ill-posed inverse problems and
offer considerable versatility, for instance in task-adapted reconstruction [Adler et al.,
2022, Lunz et al., 2018]. While intuitively appealing, this approach can lose connection
to the original variational problem, leading to a lack of theoretical guarantees. Despite
this, this generic recipe has been used to propose numerous methods, leveraging various
iterative algorithms such as gradient descent, proximal descent, and primal-dual methods.
3.1 Learned Iterative Reconstruction Schemes 33

A A
A∗ A∗ A∗

Fig. 3.4: Learned Primal Dual Schematic.

3.1.1 Limitations and Challenges


Despite their promising performance, learned iterative schemes are still often perceived as
“black boxes” due to several outstanding challenges.

• Limited Theoretical Understanding: There is a general lack of rigorous analysis


regarding their well-posedness and regularization properties, with few exceptions
[Hertrich et al., 2021, Sun et al., 2021, Gilton et al., 2021b]. This theoretical gap
contributes to an ongoing debate within the field concerning the applicability of
deep learning to inverse problems arising in critical fields due to a lack of robustness
[Genzel et al., 2022].
• Interpretability: The learned operators within these schemes frequently lack a
clear mechanistic explanation, making it difficult to fully understand their behavior.
While some asymptotic properties are known, such as the convergence of learned it-
erative schemes with an ℓ2 loss to the conditional mean under infinite data conditions
[Adler and Öktem, 2017], a deeper, more general understanding remains elusive.
• Data Consistency with the measurements is not always guaranteed in the final
reconstructions, although specific approaches exist, attempting to address this lim-
itation, e.g. [Schwab et al., 2019].
• Supervised Training: These methods typically require large amounts of super-
vised data, which can be challenging to obtain in practice.
• Convergence: Iterating beyond the number of training steps may not guarantee
convergence. Learned iterative schemes are trained for a fixed number of iterations
(typically ≤ 20) due to computational constraints, and the reconstruction deterio-
rates if more iterations are performed at test time. An example of such deterioration
can be seen in bottom row of Figure 3.5.
• Computational Cost: Evaluating the forward and adjoint operators in each layer
can be computationally memory expensive, hindering scalability.

While this may seem gloomy, many current research efforts are focused on addressing
these limitations, including

• Theoretical Analyses: Investigating the convergence properties and approxima-


tion capabilities of learned iterative schemes, a particular example of which is the
usage of equilibrium models discussed in the next subsection.
34 Data-Driven Approaches to Inverse Problems

Fig. 3.5: Illustration of what artifacts appearing whenever learned operators are applied repeatedly
without convergence guarantees. Example borrowed from [Gilton et al., 2021a].

• Efficient Implementations: Exploring techniques like invertible neural networks


[Rudzusika et al., 2021, 2024], stochastic subsampling fo the forward operator [Tang
et al., 2025, 2021], and greedy training [Hauptmann et al., 2018] to reduce compu-
tational cost.

3.1.2 Deep Equilibrium Networks


One promising avenue is the use of learned fixed point iterations, exemplified by deep
equilibrium networks (DEQs). These networks are designed such that the desired recon-
struction is a fixed point of a learned operator

u = ΓΘ (u; y).

This formulation naturally leads to iterative schemes that provably converge (under certain
assumptions) to a fixed point as the number of iterations approaches infinity. For instance,
consider a deep equilibrium gradient descent scheme where:

ΓΘ (u; y) = u + ηA∗ (y − Au) − ηRΘ (u).

Here, A is the forward operator, A∗ is its adjoint, η is a step size, and RΘ is a trainable
neural network representing a gradient of a learned regularizer. Note that these models
are more general than gradient descent, as convergence can be ensured even when RΘ
is not a gradient of a function. To ensure convergence, ΓΘ can be constrained to be
a contraction mapping. This, once again, is not simply an academic exercise and has
a significant effect in practice, guaranteeing convergence to a fixed point as showcased
in Figure 3.5 on top row, compared to iterate divergence for a non-constrained model,
showcased on the bottom row. The following theorem provides sufficient conditions for
convergence in the context of deep equilibrium gradient descent:

Theorem 3.1.1 ([Gilton et al., 2021a]). Assume RΘ − Id is ϵ-Lipschitz continuous and


let L = λmax (A∗ A) and µ = λmin (A∗ A). If 0 < η < 1/(L + 1), then ΓΘ fulfills

∥ΓΘ (u; y) − ΓΘ (ũ; y)∥ ≤ (1 − η(1 + µ) + ηϵ) ∥u − ũ∥, ∀u, ũ ∈ X.


| {z }

Therefore, ΓΘ is a contraction if ϵ < 1 + µ, and hence the iterates converge.


3.2 Learned Variational Models 35

Remark 3.1.2. An interesting additional avenue with convergent learned iterative schemes
is the ability to accelerate their convergence by increasing the memory of the iterations,
i.e. by introducing dependency of each iteration from just the previous iterate only to
a couple of previous iterates, for instance via Anderson acceleration as in Gilton et al.
[2021b]. While convergence to a fixed point is a desirable property, further investigation is
needed to characterize the properties of this fixed point and its relationship to the solution
of the underlying inverse problem, e.g. in analogy with Obmann and Haltmeier [2023].

3.2 Learned Variational Models


This section delves into learned variational models for inverse problems. While such
models have a history in signal processing discussed in Section 3.0.1, application of over-
parameterized models represents a more recent development.

The central idea is to leverage the well-established mathematical framework of variational


methods, integrating deep learning while retaining theoretical guarantees and enforcing
desirable structural properties on the neural networks. This approach allows us to combine
the expressive power of deep learning with the stability and interpretability of variational
methods.

3.2.1 Learning the regularizer


A particularly appealing strategy within the learned variational model paradigm focuses
on learning the regularizer itself, while maintaining the classical variational framework.
Consider the general form of a variational problem:

arg min ∥Au − y∥22 + αR(u) (3.4)


u∈X

where y ∈ Y is the measured data, A is a linear and bounded forward operator, X and Y
are Banach spaces, α > 0 is a regularization parameter, and R(u) is the regularizer.

Instead of learning the entire reconstruction mapping from y to u as in Section 3.1, this
approach concentrates on learning a data-adaptive regularizer R(u). The goal is for
R(u) to effectively capture prior knowledge about the desired solution, promoting re-
constructions with desirable characteristics (e.g., “good-looking” images) while penalizing
undesirable features. Learning the regulariser offers several advantages:

• Interpretability: The learned regulariser provides an explicit prior on the solution


space.
• Stability and Convergence: Existing variational theory can be applied to analyze
stability and convergence, e.g. through Theorem 1.2.2.
• Adaptability: The regularization parameter α can be adjusted to accommodate
different noise levels.
• Incorporation of Forward Model and Noise Statistics: The variational frame-
work explicitly incorporates the forward model and noise statistics, e.g. ensuring
data consistency.

In essence, this approach seeks to learn an “image prior” that is both data-driven and
amenable to mathematical analysis.
36 Data-Driven Approaches to Inverse Problems

How to learn? Given some parametric model Rθ for the regularizer, its parameters θ
still need to be learned from data. Over the past decades, a variety of paradigms have
been introduced for learning the regularizer given an image distribution. Analogous to
Section 3.1, the direct approach is to train parameters such that the optimal solution in
Equation (3.4) minimizes the ℓ2 loss over training data. This results in the so called bilevel
learning discussed in Section 3.0.1. While parameter hypergradients can be computed via
implicit differentiation or unrolling, they can quickly become computationally infeasible,
necessitating approximations. Alternatively, interpreting the problem as maximum-a-
posteriori estimation (Section 1.2.3), the prior can be learned directly from data. See
Habring and Holler [2024] or Dimakis et al. [2022] for an overview.

In what follows, we consider one specific approach for learning Rθ , which relies neither
on the bilevel structure, nor on learning the whole prior. Instead, we view the regulariser
as a “classifier” that distinguishes between desirable and undesirable solutions. This de-
couples the problem of learning the regulariser from the underlying variational problem.

3.2.2 In-Depth: Adversarial regularization


In this section we will present the method for learning the regularizer This section delves
into the concept of adversarial regularization for learning priors in inverse problems.
The core idea is to train the regularizer R such that it assigns low values to samples
from a target distribution of “good” images, denoted PU , and high values to samples
from a distribution of “bad” or undesirable images, Pn . Figure 3.6 conceptually illustrates
this distinction, where clean ground truth images are “good”, while noisy corruptions are
“bad”.

(a) Good (b) Sinogram (c) Bad

Fig. 3.6: Comparison of CT reconstructions: (a) a good quality reconstruction, (b) the corresponding
sinogram data, and (c) a poor quality reconstruction.

Inspired by the Wasserstein GAN framework [Arjovsky et al., 2017], the 1-Wasserstein
distance between the clean and noisy distributions is employed as a weakly supervised
loss for the regulariser. The 1-Wasserstein distance between Pn and PU is given by:
Wass1 (Pn , PU ) = sup EU ∼Pn [R(U )] − EU ∼PU [R(U )] ,
R∈1-Lip

where the supremum is taken over all 1-Lipschitz functions R. By finding an appropriate
R, this formulation allows to train the regulariser to effectively capture image statistics
without requiring paired examples of “good” and “bad” images. In practice, the regulariser
is parameterized as:
RΘ (u) = ΨΘ (u) + ρ0 |u|22 ,
where ΨΘ (u) is a (potentially convex [Mukherjee et al., 2024] or weakly convex [Shumaylov
et al., 2024]) convolutional neural network (CNN) and an additional l2 regularization term
3.2 Learned Variational Models 37

that enhances analysis and ensures coercivity. Ensuring exact 1-Lipschitzness turns out
to be rather complicated, and the network is trained by minimizing the following loss
function [Lunz et al., 2018]:
h i
min EU ∼PU [ΨΘ (U )] − EU ∼Pn [ΨΘ (U )] + µ · E (∥∇u ΨΘ (U )∥ − 1)2+
Θ

The first two terms encourage the regularizer to distinguish between “good” and “bad”
images, while the third term softly enforces a Lipschitz constraint on the regularizer.

Analysis. Once trained (with parameters Θ∗ ), the learned (convex/weakly-convex) ad-


versarial regulariser (AR/ACR/AWCR) is incorporated into a variational problem:
1  
arg min ∥Au − y∥22 + α ΨΘ∗ (u) + ρ0 ∥u∥22
u 2
This variational problem can be solved using standard optimization techniques such as
(sub)gradient methods, or proximal descent. The resulting model benefits from the the-
oretical properties of variational methods, including well-posedness and stability [Shu-
maylov et al., 2023, Lunz et al., 2018, Mukherjee et al., 2024, Shumaylov et al., 2024].

A theoretical justification for using such a loss can be seen by analyzing the gradient
descent flow of the regularizer with respect to the Wasserstein distance. Under certain
assumptions, it can be shown that this flow leads to the fastest decrease in Wasserstein
distance for any regularization functional with normalized gradients.

Theorem 3.2.1 (Theorem 1 in Lunz et al. [2018]). Let


gη (u) := u − η · ∇u ΨΘ∗ (u).
Pη := (gη ) #Pn
and assume that η 7→ Wass (Pu , Pη ) admits a left and a right derivative at η = 0, and that
they are equal. Then,
d h i
Wass (Pu , Pη ) = −EU ∼Pn ∥∇u ΨΘ∗ (U )∥2 = −1.
dη η=0

This is the fastest decrease in Wasserstein distance for any regularization functional with
normalized gradients!

Furthermore, under a data manifold assumption (DMA) and a low-noise assump-


tion, the distance function to the data manifold is a maximizer of the Wasserstein loss.
This provides further intuition for the effectiveness of adversarial training in learning a
regularizer that captures the underlying data distribution.

Assumption 3.2.2 (Data Manifold Assumption (DMA)). The measure Pu is supported


on the weakly compact set M, i.e. Pu (Mc ) = 0.

Denote by PM : X → M, u 7→ arg minv∈M ∥u − v∥ the projection onto the data manifold.

Assumption 3.2.3 (Low Noise Assumption (LNA)). The pushforward of the noisy dis-
tribution under the projection equals the clean distribution, (PM )# (Pn ) = Pr . This
corresponds to an assumption that the noise level is low in comparison to manifold cur-
vature.
38 Data-Driven Approaches to Inverse Problems

Theorem 3.2.4. Assume DMA and LNA. Then, the distance function to the data man-
ifold
u 7→ min ∥u − v∥2
v∈M
is a maximizer to the Wasserstein Loss

sup EU ∼Pn R(U ) − EU ∼Pr R(U ) (3.5)


R∈1-Lip

Remark 3.2.5. The functional in Equation (3.5) does not necessarily have a unique maxi-
mizer. However, in certain settings it can be shown to be unique almost everywhere, see
Staudt et al. [2022], Milne et al. [2022].

It is worth mentioning that while theoretically appealing, recent work [Stanczuk et al.,
2021] has shown that the practical success of Wasserstein GANs may not be solely at-
tributed to their ability to approximate the Wasserstein distance.

3.2.2.1 Extensions
The development of learned regularizers, particularly adversarial ones, is an active research
area with several important possible extensions:

• Generalization: The generalization capabilities of machine-learned regularisers


to out-of-distribution data remain a subject of ongoing empirical investigation, as
explored in, e.g. [Lunz, 2022].
• Stronger Constraints: Imposing stronger structural constraints on the learned
regulariser, such as those related to source conditions or optimization landscapes,
can lead to improved theoretical guarantees [Mukherjee et al., 2021] and practical
implementations [Shumaylov et al., 2023, 2024], but these need to be designed with
a problem in mind [Shumaylov et al., 2025].
• Qualitative Properties: Ensuring that learned regularisers fulfill desired qualita-
tive properties, such as invariance or equivariance to certain transformations (e.g.,
affine transformations), can be achieved by designing specialized network architec-
tures like equivariant NNs [Celledoni et al., 2021a,b].
• Choice of Optimality Criteria: The definition of optimality for a regulariser
should be task-dependent. Task-adapted inversion strategies aim to learn recon-
structions that are optimal for a specific end-goal or metric [Escudero Sanchez et al.,
2023, Adler et al., 2022].
• Scalability: Training regularisers for large-scale and high-dimensional inverse prob-
lems could necessitate efficient network architectures, such as invertible networks,
to manage computational complexity [Etmann et al., 2020].
• Uncertainty Quantification: Learned convex regularisers could be integrated
into frameworks for uncertainty quantification, for example, using proximal Markov
Chain Monte Carlo (MCMC) methods [Pereyra, 2016].

3.3 Plug-and-Play (PnP) Methods


This section explores Plug-and-Play Prior (PnP) methods, a class of iterative reconstruc-
tion algorithms that utilize black-box denoisers, which can range from traditional algo-
rithms to powerful deep learning-based denoisers. The core idea behind PnP is to consider
3.3 Plug-and-Play (PnP) Methods 39

operator splitting techniques for optimizing the variational objective in Equation (2.1),
decoupling the regularization step from the data fidelity term, and replacing regularization
steps with sophisticated denoisers. PnP methods are rooted in operator splitting tech-
niques, such as the Alternating Direction Method of Multipliers (ADMM) [Setzer, 2011].
Consider a reformulation of Equation (2.1), introducing an auxiliary variable v:
min{D(Au, y) + αR(v)} s.t. u = v.
u,v

The augmented Lagrangian associated with this constrained problem is:


λ λ
Lλ (u, v, h) = D(Au, y) + αR(v) + ∥u − v + h∥22 − ∥h∥22 ,
2 2
where h is the Lagrange dual variable and λ > 0 a penalty parameter. ADMM consists
of approximating a solution to the saddle point problem for Lλ by iterating
uk+1 = arg min Lλ (u, v k , hk ),
u
v k+1
= arg min Lλ (uk+1 , v k , hk ),
v
h k+1
= h + (uk+1 − v k+1 ).
k

In particular, the updates in u and v read


λ
uk+1 = arg min D(Au, y) + ∥u − v k + uk ∥22 , (3.6)
u 2
 
v k+1 = proxτ α R uk+1 + hk . (3.7)
λ

The crucial insight for PnP methods is that this decouples the measurement fidelity step
from the reconstruction regularization, done by denoising v. The (regularizing) v-update
step in Equation (3.7) can be recognized as the proximal operator of the regularizer R
scaled by α/λ. This allows for the replacement of the proximal operator with any effective
denoising algorithm D, such as BM3D [Dabov et al., 2009], non-local means [Buades et al.,
2005], or deep learning-based denoisers. This flexibility gives rise to the name “Plug-and-
Play”.

3.3.1 Theoretical Properties


Despite their widespread empirical success and practical utility, PnP methods present
several theoretical challenges that have been the subject of ongoing research. A primary
concern is that, due to the black-box nature of the denoiser D, it is often unclear to what
objective function, if any, the PnP iterations converge, or even if they converge at all
without restrictive assumptions. What is worse however, is that even under significant
restrictions, it remains unclear what properties the limit points satisfy, unless there exists
a corresponding regularizer.

• Convergence: In general, PnP is not provably convergent. Early results typically


required strong conditions, such as non-expansive denoisers, or the data-fidelity term
D(Au, y) being strongly convex in u. Without access to the explicit form of the
regularizer (implicitly defined by the denoiser), characterizing the fixed points of
the iteration or the properties of the limit can be challenging.
More recent variants, like the gradient step denoisers have shown promise in achiev-
ing convergence under milder conditions [Hurault et al., 2021]. These approaches
often involve specific parameterizations or interpretations of the denoising operator.
40 Data-Driven Approaches to Inverse Problems

• Regularization: A fundamental characteristic of many PnP schemes is the lack


of an explicit representation for the regularizer R(u) that the denoiser D implicitly
implements. This poses difficulties for a direct Bayesian interpretation, where the
regularizer would typically correspond to a prior probability distribution. Without
an explicit R(u), it is hard to ascertain the precise nature of the prior being enforced
or to analyze its properties.
Some works exist, e.g. Regularization by Denoising (RED), proposed by Ro-
mano et al. [2017]. RED defines an explicit variational regularizer based on a given
denoiser D(·):
1
RRED (u) = ⟨u, u − D(u)⟩.
2
The iterative schemes based on this RRED can be shown to seek stationary points
of an explicit objective function 12 ∥Au − y∥2 + αRRED (u). However, a critical and
restrictive condition for this gradient formulation to hold is that the Jacobian of
the denoiser D(u) must be symmetric [Reehorst and Schniter, 2018], a property not
generally satisfied by many advanced denoisers, especially deep neural networks.
• Gradient-step (GS) denoisers [Hurault et al., 2021] model the denoiser Dθ as an
explicit gradient step of a potential function Rθ :

Dθ (u) = u − ∇Rθ (u).

A common choice for the potential is Rθ (u) = 21 ∥u − Ψθ (u)∥22 , where Ψθ is any


differentiable (deep) network. This formulation similarly directly provides an explicit
representation of the regulariser Rθ (u) being enforced. What is more, the denoiser
in this case is exactly a proximal operator Dθ (x) = proxϕθ (x), where ϕθ is defined
by
  1 2
ϕσ (x) = Rθ Dθ−1 (x) − Dθ−1 (x) − x ,
2
admitting various desirable properties like Lipshitz smoothness and weak convexity
[Hurault et al., 2022, Tan et al., 2024b], often resulting in provable convergence.

3.3.2 In-depth: Linear Denoiser Plug and Play


A critical aspect of robust inverse problem solving is ensuring that the chosen regular-
ization strategy is convergent as in Theorem 1.2.2. While PnP schemes using learned
denoisers can achieve convergence of iterates under certain conditions, explicitly control-
ling the regularization strength to ensure convergent regularization (i.e., convergence to a
true solution as data noise δ → 0) is crucial. This subsection details a principled approach
for such control when the denoiser is linear, based on spectral filtering, as introduced by
Hauptmann et al. [2024]. Insights from the linear case may also inform strategies for
nonlinear denoisers [Khelifa et al., 2025].

A significant challenge in this case, particularly when utilizing learned denoisers, is the
adjustment of regularization strength. These denoisers are often trained for a specific noise
level σ, yet the effective noise within PnP iterations can vary, and the overall regularization
must be adapted to the noise present in the measurements y δ . Empirically, this has been
approached by tuning regularization strength by denoiser scaling [Xu et al., 2020].

Consider a linear denoiser Dσ : X → X. For Dσ to be the proximal operator proxJ


of some convex functional J : X → R ∪ {∞}, Dσ must satisfy specific conditions: it must
3.3 Plug-and-Play (PnP) Methods 41

be symmetric and positive semi-definite [Moreau, 1965, Gribonval and Nikolova, 2020].
For the resulting functional to be convex as well, Dσ must be non-expansive, i.e. we will
assume that its eigenvalues live in the interval [0, 1] for the sake of contractivity. Lastly,
we will assume that the operator norm is bounded from below, such that the inverse
is well defined and is a bounded operator. If these conditions hold, the functional J is
uniquely determined by Dσ (up to an additive constant). The objective then becomes
controlling the regularization strength by effectively scaling this underlying functional
J. The difficulty is that one typically only has access to the denoiser Dσ , not J itself.
However, when the denoiser is linear, it turns out to be possible to appropriately modify
the denoiser based on the following observations. By definition of a proximal operator
Dσ = proxJ = (id + ∂J)−1 .
On the other hand, since Dσ is linear, Dσ−1 is linear, and by above ∂J =: W is also
linear. As a result, J(x) = 21 ⟨x, W x⟩ up to an additive constant. Inverting, we have
W = Dσ−1 − Id. Therefore,
1 D  −1  E
x, Dσ − Id x .
J(x) =
2
We can control the regularization strength, by scaling the regularization functional J:
1 D h −1 i  E
τ J(x) = x, τ Dσ − (τ − 1) Id − Id x ,
2
resulting in
 −1
proxτ J = τ Dσ−1 − (τ − 1) Id = gτ (Dσ ) .
Here gτ : R → R, given by gτ (λ) = λ/(τ − λ(τ − 1)) is applied to Dσ using the functional
calculus. This implies that applying this filter (as illustrated conceptually in Figure 3.7)
effectively transforms the original denoiser Dσ = proxJ into gτ (Dσ ) = proxτ J . Further-
more Figure 3.8a illustrates the effect on eigenvalues of the resulting linear denoiser as a
function of original eigenvalues.

J prox Dσ

scale by τ spectral filtering by gτ

τJ prox gτ (Dσ )

Fig. 3.7: Diagram illustrating the concept of spectral filtering. From [Hauptmann et al., 2024].

This approach differs from traditional spectral regularization methods (e.g., Tikhonov
regularization, Landweber iteration) [Engl et al., 1996], where filtering is typically applied
to the singular values of the forward operator A. In contrast, here the denoiser (and thus
the implicit prior) is modified.

It turns out to be possible to show convergent regularization in general for other spectral
filters satisfying technical conditions that (1 − gτ (λ)) / (τ gτ (λ)) is bounded above and
below by positive values and converges as τ → 0. Under such conditions one achieves
convergent PnP regularization by Theorem 5 in Hauptmann et al. [2024], and an example
of this is illustrated in Figures 3.8b and 3.8c.
42 Data-Driven Approaches to Inverse Problems

The effect of spectral filtering on denoiser eigenvalues Convergent regularisation by spectral filtering
1.0
10−2
Filtered eigenvalue hτ (λ)

0.8

kx̂(y δ , τ δ ) − x† k
0.6 10−3

0.4 τ = 0.1
τ = 0.5 10−4
0.2 τ = 1.0
τ = 2.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 10−5 10−4 10−3 10−2 10−1
Input eigenvalue λ δ

(a) Eigenvalue spectral filtering. (b) Illustration of convergent regularization in practice.

Increasing δ
† δ δ
x x̂(y , τ ) x̂(y , τ δ )
δ
x̂(y δ , τ δ )

x∗ FBP(y δ ) FBP(y δ ) FBP(y δ )

(c) Illustration of convergent regularization via a selection of snapshots from the plot in b.

Fig. 3.8: Further concepts in spectral filtering and application to CT reconstruction. (a) Eigenvalue
spectral filtering. (b) Spectral filtering to control regularisation strength for convergent regularisation. (c)
Resulting images from the CT reconstruction. The linear denoiser filter is gτ (λ) = λ/(τ − λ(τ − 1)). All
illustrations from [Hauptmann et al., 2024].

3.4 Outlooks
The preceding sections have largely focused on methodologies that adapt existing math-
ematical frameworks to incorporate deep learning techniques. Examples include learned
iterative schemes, where neural networks replace components of classical algorithms, and
Plug-and-Play (PnP) methods, where denoisers replace splitting steps. While these ap-
proaches have demonstrated considerable empirical success, they often represent incremen-
tal adaptations rather than fundamental redesigns. Consequently, they can sometimes lack
a deep theoretical grounding or may appear as ad-hoc integrations rather than solutions
derived from first principles tailored to the unique characteristics of deep learning.

A central challenge and a key direction for future research is to move beyond merely
“plugging in” deep learning components into pre-existing structures. To fully harness the
potential of high-capacity, overparameterized models, the development of new frameworks
that are fundamentally designed with deep learning in mind is essential. Such a paradigm
shift would likely involve concerted efforts in several interconnected areas:
3.4 Outlooks 43

• Rethinking optimization: Can we design optimization algorithms specifically


tailored to the properties of deep neural networks, moving beyond simple gradient
descent?
• Embracing inductive biases: How can we incorporate domain-specific knowledge
and structure into the architecture and training of deep networks, moving beyond
generic black boxes?
• Developing new theoretical tools: Can we create new mathematical tools and
theoretical frameworks that can better explain the generalization capabilities of deep
learning in the context of inverse problems, and provide useful guarantees guarantees
on stability and convergence for learned solution maps?

A persistent theme in this endeavor is navigating the trade-off between capacity and
guarantees. Constraints are necessary for interpretability and reliability, but excessive
constraints limit the expressive power of deep learning. Finding the sweet spot is key.
This might involve:

• Developing more flexible constraints: Can we design constraints that are less
restrictive but still ensure desirable properties like stability and convergence?
• Learning constraints from data: Can we use data to learn optimal constraints
that balance capacity and guarantees?

Ultimately, the future of data-driven inverse problems lies in integrating deep learning
more deeply with the underlying mathematical principles. This will require both theoret-
ical and practical innovations, but the potential rewards are immense.
44 Data-Driven Approaches to Inverse Problems
Chapter 4

Perspectives

4.1 On Task Adaptation


The field of inverse problems is increasingly benefiting from the integration of deep learn-
ing methodologies. This chapter explores the evolving perspective that moves beyond
addressing inverse problems as isolated reconstruction tasks, instead considering their
embedding within broader, interconnected workflows and multi-tasking scenarios. Tradi-
tionally, inverse problems that can be modeled mathematically comprise only a part of
the overall problem. Oftentimes, the full problem involves a sequential pipeline of individ-
ual tasks, such as data acquisition, reconstruction, segmentation, and classification, see
Figure 4.1 for example. These stages, while often tackled sequentially and independently,
are inherently intertwined. The output quality and characteristics of one stage directly
influence the performance and feasibility of subsequent ones. Treating them in isolation
can therefore lead to suboptimal overall performance.

Fig. 4.1: Biomedical imaging pathway: The path from imaging data acquisition to prediction; di-
agnosis; treatment planning, features several processing and analysis steps which usually are performed
sequentially. CT data and segmentation are courtesy of Evis Sala and Ramona Woitek.

A key observation driving current research is that the metrics used to evaluate the quality
of a reconstruction should be intrinsically linked to the ultimate objective of the entire
workflow. For instance, in clinical medical imaging, the primary goal is rarely the re-
construction of a visually appealing image, but rather to enable accurate diagnosis, guide
treatment planning, or monitor therapeutic response. This motivates the concept of task-
adapted reconstruction [Adler et al., 2022, Wu et al., 2018], wherein the reconstruction
process is explicitly tailored to optimize performance on a specific downstream task, such
as segmentation, classification, or quantitative parameter estimation.

Within the classical, purely model-driven (or knowledge-driven) paradigm of Chapter 2,


designing such task-adapted reconstruction algorithms can rapidly become intractable due
to the complexity of formulating and solving the coupled optimization problems. Deep
learning, however, offers a powerful and flexible framework for realizing task-adapted

45
46 Perspectives

reconstruction.

Training neural networks often involves highly non-convex optimization, and joint train-
ing for multiple tasks does not fundamentally alter this characteristic. In fact, combining
tasks within a unified learning pipeline can create opportunities for synergy, where the
optimization process for one task can provide beneficial regularization or feature represen-
tations for another. The inherent non-convexity of end-to-end learned systems means that
extending them to incorporate downstream tasks does not necessarily introduce greater
optimization challenges than those already present in learning the reconstruction alone.

Consider, for example, the task of detecting ovarian cancer from medical images (ra-
diomics). This process typically involves reconstructing an image from sensor measure-
ments, followed by segmentation of potential tumorous regions, and then extraction of
quantitative imaging features for statistical analysis and classification. By jointly opti-
mizing the reconstruction and segmentation processes within a single deep learning model,
it is possible to guide the reconstruction to produce images that are not only faithful to
the measured data but are also more amenable to accurate segmentation by the learned
segmentation module.

Example 4.1.1 (Joint Reconstruction Segmentation Adler et al. [2022]). For example, in
tomographic reconstruction, we can jointly optimize the reconstruction and segmentation
processes using a combined loss function. Consider a CNN-based MRI reconstruction (X)
and CNN-based MRI segmentation (D).

(a) Minimal loss values for various C values, showcasing that jointly training for reconstruction and segmentation
is better.

(b) CNN-based reconstructions (top row) and segmentations (bottom row).

Fig. 4.2: Task-adapted reconstruction, with CNN-based MRI reconstruction (task X) and CNN-based
MRI segmentation (task D). Both are trained jointly with combined loss CℓX + (1 − C)ℓD for varying
C ∈ [0, 1]. All figures from [Adler et al., 2022].
4.2 The Data Driven - Knowledge Informed Paradigm 47

m
1 X
( )
  
(θ∗ , ϑ∗ ) arg min ℓjoint (xi , τ (zi )) , A†θ (yi ) , Tϑ ◦ A†θ (yi )
(θ,ϑ)∈Θ×Ξ m i=1
ℓjoint (x, d), x′ , d′ := (1 − C)ℓX x, x′ + CℓD d, d′ for fixed C ∈ [0, 1].
  

This loss function balances the reconstruction error (ℓX ) and the segmentation error (ℓD ),
allowing for a trade-off between the two tasks. Figure 4.2a illustrates that whenever
segmentation performance is the ultimate task to be solved, training primarily for recon-
struction results in poor segmentations, while training primarily for segmentation results
in poor reconstructions. What is of most significance is that training for both jointly
with equal weighting actually results in better segmentations than if one were to train for
segmentation only!

This co-adaptation can lead to improved accuracy and robustness in both the reconstruc-
tion and the segmentation, ultimately enhancing the reliability of the radiomic analysis
and the diagnostic outcome. The degrees of freedom inherent in solving an ill-posed in-
verse problem can be strategically utilized to favor solutions that, while consistent with
the data, also possess features conducive to the success of the subsequent task.

This task-adapted approach not only enhances the efficiency and accuracy of medical im-
age analysis but also has broader implications for healthcare accessibility. By automating
or streamlining certain tasks, such as segmentation, it reduces the reliance on special-
ized expertise, potentially making advanced imaging techniques more widely available in
settings with limited resources.

4.2 The Data Driven - Knowledge Informed Paradigm


The recent trajectory of research in inverse problems, as explored throughout these notes,
signifies a notable paradigm shift. The ascent of deep learning has significantly trans-
formed the field, primarily through its capacity to learn efficient data-driven priors, often
surpassing traditional handcrafted models. While empowering, this shift introduces sig-
nificant challenges, most critically by a potential decline in interpretability especially as
model complexity increases, e.g. for multimodal applications. Open questions remain re-
garding the generalization of these models across diverse datasets and the crucial balance
between empirical performance and robust theoretical guarantees.

It is important to acknowledge the limitations of classical, purely model-based approaches


and recognize the advancements that deep learning has brought to the field. However,
we must also emphasize the necessity of grounding these powerful data-driven techniques
within rigorous mathematical frameworks. Such integration is not merely an academic
exercise but a necessary step to ensure interpretability, and provably provide assurances
beyond empirical validation, to ultimately foster trust in safety-critical applications.

The Imperative for Guarantees via Structured Learning. A central challenge


in this new paradigm lies in reconciling the expressive power of deep learning with the
need for verifiable guarantees. In many inverse problems, particularly within medical
imaging, the concept of an absolute “ground truth” does not exist. This ambiguity el-
evates the importance of model interpretability and reliability. Consequently, while the
48 Perspectives

allure of purely data-driven solutions is strong, a wholesale abandonment of mathemat-


ical formalism is untenable. The pursuit of guarantees inherently compels us to impose
specific structural or functional properties onto neural network architectures. The perti-
nent research questions then become: What are the most effective properties to instill for
achieving meaningful guarantees (e.g., stability, robustness, fairness)? And, how can these
properties be integrated into network design and training in a manner that is both prin-
cipled and computationally tractable, without unduly sacrificing performance? Exploring
deeper connections with established mathematical fields, such as the theory of PDEs or
optimal transport, continues to be a promising avenue for discovering and formalizing such
beneficial structural biases.

Towards more useful theoretical tools: The paradigm shift driven by deep learning
also necessitates a corresponding evolution in our theoretical approaches. Much of the
traditional analysis in inverse problems has focused on model properties, optimization
landscapes, and convergence proofs, often treating the model in relative isolation from
the data that was used to create it. However, deep learning models are fundamentally
data-centric; their behavior, efficacy, and potential failure modes are inextricably linked
to the characteristics of the training dataset. Therefore, future analytical efforts must
pivot to more explicitly account for this data dependency. It is no longer sufficient to
analyze convex regularizers in abstraction. Rigorous analysis must now encompass the
training dataset itself: its size, diversity, representativeness, potential inherent biases,
and the precise manner in which these factors influence the learned model’s generalization
capabilities, and its robustness to distributional shifts.

As deep learning systems become more complex and their decision-making processes more
opaque, explainability emerges as a critical concern. If a model produces a reconstruc-
tion specifically optimized for a downstream task, understanding why the reconstruction
appears as it does, and how specific features (or apparent artifacts) contribute to the
downstream decision, is crucial for validation, debugging, and building trust, especially
in safety-critical applications like medicine. Future research must focus on developing
methods that can provide insights into these complex, end-to-end trained systems.

Beyond Theory: And alongside theory, we need to continue working on convincing


use-cases. Ultimately, the successful integration of deep learning and mathematics holds
the potential to transform medical imaging into a more accessible, efficient and widespread
clinical screening tool, benefiting both clinicians and patients. The transformative poten-
tial of this research is immense, and we envision a future where its impact is widely
recognized, with headlines declaring:

“Deep Learning & Maths turn CT/MRI into


a clinical screening tool!”
Bibliography

J. Adler and O. Öktem. Solving ill-posed inverse problems using iterative deep neural
networks. Inverse Problems, 33(12):124007, 2017.

J. Adler and O. Öktem. Learned primal-dual reconstruction. IEEE transactions on medical


imaging, 37(6):1322–1332, 2018.

J. Adler, S. Lunz, O. Verdier, C.-B. Schönlieb, and O. Öktem. Task adapted reconstruction
for inverse problems. Inverse Problems, 38(7):075006, 2022.

M. V. Afonso, J. M. Bioucas-Dias, and M. A. Figueiredo. Fast image recovery using


variable splitting and constrained optimization. IEEE transactions on image processing,
19(9):2345–2356, 2010.

M. Aharon, M. Elad, and A. Bruckstein. K-svd: An algorithm for designing overcomplete


dictionaries for sparse representation. IEEE Transactions on signal processing, 54(11):
4311–4322, 2006.

K. Akiyama, A. Alberdi, W. Alef, K. Asada, R. Azulay, A.-K. Baczko, D. Ball,


M. Baloković, J. Barrett, D. Bintley, et al. First m87 event horizon telescope results.
iv. imaging the central supermassive black hole. The Astrophysical Journal Letters, 875
(1):L4, 2019.

W. K. Allard. Total variation regularization for image denoising, i. geometric theory.


SIAM Journal on Mathematical Analysis, 39(4):1150–1190, 2008.

W. K. Allard and F. J. Almgren. Geometric measure theory and the calculus of variations,
volume 44. American Mathematical Soc., 1986.

F. Alter, V. Caselles, and A. Chambolle. Evolution of characteristic functions of convex


sets in the plane by the minimizing total variation flow. Interfaces Free Bound, 7(1):
29–53, 2005.

L. Alvarez, P.-L. Lions, and J.-M. Morel. Image selective smoothing and edge detection
by nonlinear diffusion. ii. SIAM Journal on numerical analysis, 29(3):845–866, 1992.

L. Ambrosio. Metric space valued functions of bounded variation. Annali della Scuola
Normale Superiore di Pisa-Classe di Scienze, 17(3):439–478, 1990.

L. Ambrosio and H. M. Soner. Level set approach to mean curvature flow in arbitrary
codimension. Journal of differential geometry, 43(4):693–737, 1996.

L. Ambrosio, N. Fusco, and D. Pallara. Functions of bounded variation and free disconti-
nuity problems. Oxford university press, 2000.

L. Ambrosio, V. Caselles, S. Masnou, and J.-M. Morel. Connected components of sets of


finite perimeter and applications to image processing. Journal of the European Mathe-
matical Society, 3(1):39–92, 2001.

M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein generative adversarial networks. In


International conference on machine learning, pages 214–223. PMLR, 2017.

49
50 Perspectives

S. Arridge, P. Maass, O. Öktem, and C.-B. Schönlieb. Solving inverse problems using
data-driven models. Acta Numerica, 28:1–174, 2019.

K. J. Arrow, L. Hurwicz, H. Uzawa, H. B. Chenery, S. Johnson, and S. Karlin. Studies


in linear and non-linear programming, volume 2. Stanford University Press Stanford,
1958.

H. Attouch, J. Bolte, P. Redont, and A. Soubeyran. Proximal alternating minimization


and projection methods for nonconvex problems: An approach based on the kurdyka-
łojasiewicz inequality. Mathematics of operations research, 35(2):438–457, 2010.

J.-F. Aujol, G. Aubert, L. Blanc-Féraud, and A. Chambolle. Image decomposition into a


bounded variation component and an oscillating component. Journal of Mathematical
Imaging and Vision, 22:71–88, 2005.

A. I. Aviles-Rivero, G. Williams, M. Graves, and C. Schonlieb. Cs+ m: a simultane-


ous reconstruction and motion estimation approach for improving undersampled mri
reconstruction. In Proc. 26th Annual Meeting ISMRM, 2018.

A. I. Aviles-Rivero, N. Papadakis, R. Li, P. Sellars, Q. Fan, R. T. Tan, and C.-B. Schönlieb.


Graphxˆ\small net-net-chest x-ray classification under extreme minimal supervision. In
Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd
International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part VI
22, pages 504–512. Springer, 2019.

A. I. Aviles-Rivero, N. Debroux, G. Williams, M. J. Graves, and C.-B. Schönlieb. Com-


pressed sensing plus motion (cs+ m): A new perspective for improving undersampled
mr image reconstruction. Medical Image Analysis, 68:101933, 2021.

M. Bachmayr and M. Burger. Iterative total variation schemes for nonlinear inverse
problems. Inverse Problems, 25(10):105004, 2009.

A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear


inverse problems. SIAM journal on imaging sciences, 2(1):183–202, 2009.

G. Bellettini, V. Caselles, and M. Novaga. The total variation flow in rn. Journal of
Differential Equations, 184(2):475–525, 2002a.

G. Bellettini, V. Caselles, and M. Novaga. The total variation flow in rn. Journal of
Differential Equations, 184(2):475–525, 2002b.

G. Bellettini, M. Novaga, and E. Paolini. Global solutions to the gradient flow equation
of a nonconvex functional. SIAM journal on mathematical analysis, 37(5):1657–1687,
2006.

M. Benning and M. Burger. Modern regularization methods for inverse problems. Acta
numerica, 27:1–111, 2018.

M. Benning, L. Gladden, D. Holland, C.-B. Schönlieb, and T. Valkonen. Phase reconstruc-


tion from velocity-encoded mri measurements–a survey of sparsity-promoting variational
approaches. Journal of Magnetic Resonance, 238:26–43, 2014.

M. Benning, M. M. Betcke, M. J. Ehrhardt, and C.-B. Schoönlieb. Choose your path


wisely: gradient descent in a bregman distance framework. SIAM Journal on Imaging
Sciences, 14(2):814–843, 2021.
4.2 The Data Driven - Knowledge Informed Paradigm 51

A. L. Bertozzi, S. Esedoglu, and A. Gillette. Inpainting of binary images using the cahn–
hilliard equation. IEEE Transactions on image processing, 16(1):285–291, 2006.

L. Biegler, G. Biros, O. Ghattas, M. Heinkenschloss, D. Keyes, B. Mallick, Y. Marzouk,


L. Tenorio, B. van Bloemen Waanders, and K. Willcox. Large-scale inverse problems
and quantification of uncertainty. 2011.

L. T. Biegler, O. Ghattas, M. Heinkenschloss, and B. van Bloemen Waanders. Large-


scale pde-constrained optimization: an introduction. In Large-scale PDE-constrained
optimization, pages 3–13. Springer, 2003.

J. M. Bioucas-Dias and M. A. Figueiredo. A new twist: Two-step iterative shrink-


age/thresholding algorithms for image restoration. IEEE Transactions on Image pro-
cessing, 16(12):2992–3004, 2007.

J. Bolte, S. Sabach, and M. Teboulle. Proximal alternating linearized minimization


for nonconvex and nonsmooth problems. Mathematical Programming, 146(1):459–494,
2014.

J. M. Borwein and D. R. Luke. Duality and convex programming. Handbook of Mathe-


matical Methods in Imaging, 2015:257–304, 2015.

K. Bredies and H. Sun. Preconditioned douglas–rachford splitting methods for convex-


concave saddle-point problems. SIAM Journal on Numerical Analysis, 53(1):421–444,
2015.

K. Bredies, K. Kunisch, and T. Pock. Total generalized variation. SIAM Journal on


Imaging Sciences, 3(3):492–526, 2010.

A. Buades, B. Coll, and J.-M. Morel. A non-local algorithm for image denoising. In
2005 IEEE computer society conference on computer vision and pattern recognition
(CVPR’05), volume 2, pages 60–65. Ieee, 2005.

A. Buades, B. Coll, and J.-M. Morel. Non-local means denoising. Image Processing On
Line, 1:208–212, 2011.

T. A. Bubba, M. Galinier, M. Lassas, M. Prato, L. Ratti, and S. Siltanen. Deep neu-


ral networks for inverse problems with pseudodifferential operators: An application to
limited-angle tomography. 2021.

T. Buddenkotte, L. E. Sanchez, M. Crispin-Ortuzar, R. Woitek, C. McCague, J. D. Bren-


ton, O. Öktem, E. Sala, and L. Rundo. Calibrating ensembles for scalable uncertainty
quantification in deep learning-based medical image segmentation. Computers in Biol-
ogy and Medicine, 163:107096, 2023.

L. Bungert, D. A. Coomes, M. J. Ehrhardt, J. Rasch, R. Reisenhofer, and C.-B. Schönlieb.


Blind image fusion for hyperspectral imaging with the directional total variation. Inverse
Problems, 34(4):044003, 2018.

M. Burger, G. Gilboa, S. Osher, and J. Xu. Nonlinear inverse scale space methods. 2006.

M. Burger, E. Resmerita, and L. He. Error estimation for bregman iterations and inverse
scale space methods in image restoration. Computing, 81:109–135, 2007.
52 Perspectives

M. Burger, L. He, and C.-B. Schönlieb. Cahn–hilliard inpainting and a generalization


for grayvalue images. SIAM Journal on Imaging Sciences, 2(4):1129–1167, 2009. doi:
10.1137/080728548. URL https://doi.org/10.1137/080728548.

X. Cai, R. Chan, and T. Zeng. A two-stage image segmentation method using a con-
vex variant of the mumford–shah model and thresholding. SIAM Journal on Imaging
Sciences, 6(1):368–390, 2013.

X. Cai, R. Chan, C.-B. Schonlieb, G. Steidl, and T. Zeng. Linkage between piecewise
constant mumford–shah model and rudin–osher–fatemi model and its virtue in image
segmentation. SIAM Journal on Scientific Computing, 41(6):B1310–B1340, 2019.

L. Calatroni, C. Cao, J. C. De Los Reyes, C.-B. Schönlieb, and T. Valkonen. Bilevel


approaches for learning of variational imaging models. Variational Methods: In Imaging
and Geometric Control, 18(252):2, 2017a.

L. Calatroni, Y. van Gennip, C.-B. Schönlieb, H. M. Rowland, and A. Flenner. Graph


clustering, variational image segmentation methods and hough transform scale detection
for object measurement in images. Journal of Mathematical Imaging and Vision, 57:
269–291, 2017b.

L. Calatroni, M. d’Autume, R. Hocking, S. Panayotova, S. Parisotto, P. Ricciardi, and


C.-B. Schönlieb. Unveiling the invisible: mathematical methods for restoring and inter-
preting illuminated manuscripts. Heritage science, 6:1–21, 2018.

E. J. Candès. Lecture 10. Course Notes for MATH 262/CME 372: Applied Fourier
Analysis and Elements of Modern Signal Processing, 2021a. URL https://candes.
su.domains/teaching/math262/Lectures/Lecture10.pdf.

E. J. Candès. Lecture 11. Course Notes for MATH 262/CME 372: Applied Fourier
Analysis and Elements of Modern Signal Processing, 2021b. URL https://candes.
su.domains/teaching/math262/Lectures/Lecture11.pdf.

E. J. Candes, J. K. Romberg, and T. Tao. Stable signal recovery from incomplete and in-
accurate measurements. Communications on Pure and Applied Mathematics: A Journal
Issued by the Courant Institute of Mathematical Sciences, 59(8):1207–1223, 2006.

M. Carriero, A. Leaci, and F. Tomarelli. Plastic free discontinuities and special bounded
hessian. Comptes rendus de l’Académie des sciences. Série 1, Mathématique, 314(8):
595–600, 1992.

V. Caselles and A. Chambolle. Anisotropic curvature-driven flow of convex sets. Nonlinear


Analysis: Theory, Methods & Applications, 65(8):1547–1577, 2006.

V. Caselles and J. Morel. Introduction to the special issue on partial differential equations
and geometry-driven diffusion in image processing and analysis. IEEE transactions on
image processing, 7(3):269–273, 1998.

V. Caselles, F. Catté, T. Coll, and F. Dibos. A geometric model for active contours in
image processing. Numerische mathematik, 66:1–31, 1993.

V. Caselles, A. Chambolle, and M. Novaga. The discontinuity set of solutions of the


tv denoising problem and some extensions. Multiscale modeling & simulation, 6(3):
879–894, 2007.
4.2 The Data Driven - Knowledge Informed Paradigm 53

E. Celledoni, M. J. Ehrhardt, C. Etmann, R. I. McLachlan, B. Owren, C.-B. Schonlieb, and


F. Sherry. Structure-preserving deep learning. European journal of applied mathematics,
32(5):888–936, 2021a.

E. Celledoni, M. J. Ehrhardt, C. Etmann, B. Owren, C.-B. Schönlieb, and F. Sherry.


Equivariant neural networks for inverse problems. Inverse Problems, 37(8):085006,
2021b.

V. Cevher, S. Becker, and M. Schmidt. Convex optimization for big data: Scalable,
randomized, and parallel algorithms for big data analytics. IEEE Signal Processing
Magazine, 31(5):32–43, 2014.

A. Chambolle. An algorithm for total variation minimization and applications. Journal


of Mathematical imaging and vision, 20:89–97, 2004.

A. Chambolle and C. Dossal. On the convergence of the iterates of the “fast iterative
shrinkage/thresholding algorithm”. Journal of Optimization theory and Applications,
166:968–982, 2015.

A. Chambolle and P.-L. Lions. Image recovery via total variation minimization and related
problems. Numerische Mathematik, 76:167–188, 1997.

A. Chambolle and T. Pock. An introduction to continuous optimization for imaging. Acta


Numerica, 25:161–319, 2016.

A. Chambolle, V. Caselles, D. Cremers, M. Novaga, T. Pock, et al. An introduction to


total variation for image analysis. Theoretical foundations and numerical methods for
sparse recovery, 9(263-340):227, 2010.

A. Chambolle, M. J. Ehrhardt, P. Richtárik, and C.-B. Schonlieb. Stochastic primal-dual


hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM
Journal on Optimization, 28(4):2783–2808, 2018.

R. H. Chan and T. F. Chan. Circulant preconditioners for elliptic problems. Department


of Mathematics, University of California, Los Angeles, 1990.

R. H. Chan and G. Strang. Toeplitz equations by conjugate gradients with circulant


preconditioner. SIAM Journal on Scientific and Statistical Computing, 10(1):104–119,
1989.

T. F. Chan and L. A. Vese. Active contours without edges. IEEE Transactions on image
processing, 10(2):266–277, 2001.

V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S. Willsky. Rank-sparsity in-


coherence for matrix decomposition. SIAM Journal on Optimization, 21(2):572–596,
2011.

Y.-G. Chen, Y. Giga, and S. Goto. Uniqueness and existence of viscosity solutions of
generalized mean curvature flow equations. In Fundamental Contributions to the Con-
tinuum Theory of Evolving Phase Interfaces in Solids: A Collection of Reprints of 14
Seminal Papers, pages 375–412. Springer, 1999.

L. Chizat and F. Bach. On the global convergence of gradient descent for over-
parameterized models using optimal transport. Advances in neural information pro-
cessing systems, 31, 2018.
54 Perspectives

P. L. Combettes and J.-C. Pesquet. Proximal thresholding algorithm for minimization


over orthonormal bases. SIAM Journal on Optimization, 18(4):1351–1376, 2008.

P. L. Combettes and V. R. Wajs. Signal recovery by proximal forward-backward splitting.


Multiscale modeling & simulation, 4(4):1168–1200, 2005.

K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Bm3d image denoising with shape-
adaptive principal component analysis. In SPARS’09-Signal Processing with Adaptive
Sparse Structured Representations, 2009.

I. Daubechies. Ten lectures on wavelets. SIAM, 1992.

I. Daubechies, M. Defrise, and C. De Mol. An iterative thresholding algorithm for linear


inverse problems with a sparsity constraint. Communications on Pure and Applied
Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 57
(11):1413–1457, 2004.

E. De Giorgi. Variational free-discontinuity problems. In International Conference in


Memory of Vito Volterra (Italian) (Rome, 1990), volume 92 of Atti Convegni Lincei,
pages 133–150. Accad. Naz. Lincei, Rome, 1992.

E. De Giorgi and L. Ambrosio. Un nuovo tipo di funzionale del calcolo delle variazioni.
Atti della Accademia Nazionale dei Lincei. Classe di Scienze Fisiche, Matematiche e
Naturali. Rendiconti Lincei. Matematica e Applicazioni, 82(2):199–210, 1988.

M. V. de Hoop, M. Lassas, and C. A. Wong. Deep learning architectures for nonlinear op-
erator functions and nonlinear inverse problems. Mathematical Statistics and Learning,
4(1):1–86, 2022.

J. C. De los Reyes, C.-B. Schönlieb, and T. Valkonen. The structure of optimal parameters
for image restoration problems. Journal of Mathematical Analysis and Applications, 434
(1):464–500, 2016.

R. A. DeVore. Deterministic constructions of compressed sensing matrices. Journal of


complexity, 23(4-6):918–925, 2007.

W. Diepeveen, J. Lellmann, O. Öktem, and C.-B. Schönlieb. Regularizing orientation


estimation in cryogenic electron microscopy three-dimensional map refinement through
measure-based lifting over riemannian manifolds. SIAM Journal on Imaging Sciences,
16(3):1440–1490, 2023.

A. G. Dimakis, A. Bora, D. Van Veen, A. Jalal, S. Vishwanath, and E. Price. Deep


generative models and inverse problems. Mathematical Aspects of Deep Learning, pages
400–421, 2022.

P. L. Dragotti and M. Vetterli. Wavelet footprints: theory, algorithms, and applications.


IEEE Transactions on Signal Processing, 51(5):1306–1323, 2003.

M. Drechsler, L. F. Lang, L. Al-Khatib, H. Dirks, M. Burger, C.-B. Schönlieb, and I. M.


Palacios. Optical flow analysis reveals that kinesin-mediated advection impacts the
orientation of microtubules in the drosophila oocyte. Molecular Biology of the Cell, 31
(12):1246–1258, 2020.
4.2 The Data Driven - Knowledge Informed Paradigm 55

D. Driggs, J. Tang, J. Liang, M. Davies, and C.-B. Schonlieb. A stochastic proximal


alternating minimization for nonsmooth and nonconvex optimization. SIAM Journal
on Imaging Sciences, 14(4):1932–1970, 2021.

M. J. Ehrhardt, P. Markiewicz, and C.-B. Schönlieb. Faster pet reconstruction with non-
smooth priors by randomization and preconditioning. Physics in Medicine & Biology,
64(22):225019, 2019.

I. Ekeland and R. Temam. Convex analysis and variational problems. SIAM, 1999.

M. Elad and M. Aharon. Image denoising via sparse and redundant representations over
learned dictionaries. IEEE Transactions on Image processing, 15(12):3736–3745, 2006.

H. W. Engl, M. Hanke, and A. Neubauer. Regularization of inverse problems, volume 375.


Springer Science & Business Media, 1996.

C. L. Epstein. Introduction to the mathematics of medical imaging. SIAM, 2007.

L. Escudero Sanchez, T. Buddenkotte, M. Al Sa’d, C. McCague, J. Darcy, L. Rundo,


A. Samoshkin, M. J. Graves, V. Hollamby, P. Browne, et al. Integrating artificial intel-
ligence tools in the clinical research setting: the ovarian cancer use case. Diagnostics,
13(17):2813, 2023.

E. Esser, X. Zhang, and T. F. Chan. A general framework for a class of first order primal-
dual algorithms for convex optimization in imaging science. SIAM Journal on Imaging
Sciences, 3(4):1015–1046, 2010.

C. Esteve-Yagüe, W. Diepeveen, O. Öktem, and C.-B. Schönlieb. Spectral decomposition


of atomic structures in heterogeneous cryo-em. Inverse Problems, 39(3):034003, 2023.

C. Etmann, R. Ke, and C.-B. Schönlieb. iunets: learnable invertible up-and downsampling
for large-scale inverse problems. In 2020 IEEE 30th International Workshop on Machine
Learning for Signal Processing (MLSP), pages 1–6. IEEE, 2020.

M. J. Fadili, J.-L. Starck, J. Bobin, and Y. Moudden. Image decomposition and separation
using sparse representations: An overview. Proceedings of the IEEE, 98(6):983–994,
2009.

H. Federer. Applications to the calculus of variations. In Geometric Measure Theory,


pages 513–654. Springer, 1996.

H. Federer. Geometric measure theory. Springer, 2014.

J. A. Fessler. Image reconstruction: Algorithms and analysis. Under preparation, 2008.

M. A. Figueiredo, R. D. Nowak, and S. J. Wright. Gradient projection for sparse recon-


struction: Application to compressed sensing and other inverse problems. IEEE Journal
of selected topics in signal processing, 1(4):586–597, 2007.

L. Florack and A. Kuijper. The topological structure of scale-space images. Journal of


Mathematical Imaging and Vision, 12:65–79, 2000.

M. Fornasier and C.-B. Schönlieb. Subspace correction methods for total variation and
\ell_1-minimization. SIAM Journal on Numerical Analysis, 47(5):3397–3428, 2009.
56 Perspectives

M. Fornasier, A. Langer, and C.-B. Schönlieb. A convergent overlapping domain decompo-


sition method for total variation minimization. Numerische Mathematik, 116:645–685,
2010.

M. Fornasier, Y. Kim, A. Langer, and C.-B. Schönlieb. Wavelet decomposition method


for l_2//tv-image deblurring. SIAM Journal on Imaging Sciences, 5(3):857–885, 2012.

S. Foucart and H. Rauhut. An invitation to compressive sensing. Springer, 2013.

M. Genzel, J. Macdonald, and M. März. Solving inverse problems with deep neural
networks–robustness included? IEEE transactions on pattern analysis and machine
intelligence, 45(1):1119–1134, 2022.

P. Getreuer. Chan-vese segmentation. Image Processing On Line, 2:214–224, 2012.

G. Gilboa and S. Osher. Nonlocal operators with applications to image processing. Mul-
tiscale Modeling & Simulation, 7(3):1005–1028, 2009.

D. Gilton, G. Ongie, and R. Willett. Neumann networks for linear inverse problems in
imaging. IEEE Transactions on Computational Imaging, 6:328–343, 2019.

D. Gilton, G. Ongie, and R. Willett. Deep equilibrium architectures for inverse problems
in imaging. IEEE Transactions on Computational Imaging, 7:1123–1133, 2021a.

D. Gilton, G. Ongie, and R. Willett. Model adaptation for inverse problems in imaging.
IEEE Transactions on Computational Imaging, 7:661–674, 2021b.

I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press, 2016. http:
//www.deeplearningbook.org.

J. S. Grah, J. A. Harrington, S. B. Koh, J. A. Pike, A. Schreiner, M. Burger, C.-B.


Schönlieb, and S. Reichelt. Mathematical imaging methods for mitosis analysis in live-
cell phase contrast microscopy. Methods, 115:91–99, 2017.

K. Gregor and Y. LeCun. Learning fast approximations of sparse coding. In Proceedings


of the 27th international conference on international conference on machine learning,
pages 399–406, 2010.

R. Gribonval and M. Nikolova. A characterization of proximity operators. Journal of


Mathematical Imaging and Vision, 62(6):773–789, 2020.

P. Grohs and G. Kutyniok. Mathematical aspects of deep learning. Cambridge University


Press, 2022.

E. Haber. Computational methods in geophysical electromagnetics. SIAM, 2014.

E. Haber, L. Horesh, and L. Tenorio. Numerical methods for the design of large-scale
nonlinear discrete ill-posed inverse problems. Inverse Problems, 26(2):025002, 2009.

A. Habring and M. Holler. Neural-network-based regularization methods for inverse prob-


lems in imaging. GAMM-Mitteilungen, page e202470004, 2024.

J. Hadamard. Sur les problèmes aux dérivées partielles et leur signification physique.
Princeton university bulletin, pages 49–52, 1902.
4.2 The Data Driven - Knowledge Informed Paradigm 57

K. Hammernik, T. Klatzer, E. Kobler, M. P. Recht, D. K. Sodickson, T. Pock, and


F. Knoll. Learning a variational network for reconstruction of accelerated mri data.
Magnetic resonance in medicine, 79(6):3055–3071, 2018.

A. Hauptmann, F. Lucka, M. Betcke, N. Huynh, J. Adler, B. Cox, P. Beard, S. Ourselin,


and S. Arridge. Model-based learning for accelerated, limited-view 3-d photoacoustic
tomography. IEEE transactions on medical imaging, 37(6):1382–1393, 2018.

A. Hauptmann, S. Mukherjee, C.-B. Schönlieb, and F. Sherry. Convergent regularization


in inverse problems and linear plug-and-play denoisers. Foundations of Computational
Mathematics, pages 1–34, 2024.

A. Hertle. On the problem of well-posedness for the radon transform. In Mathematical


Aspects of Computerized Tomography: Proceedings, Oberwolfach, February 10–16, 1980,
pages 36–44. Springer, 1981.

J. Hertrich, S. Neumayer, and G. Steidl. Convolutional proximal neural networks and


plug-and-play algorithms. Linear Algebra and its Applications, 631:203–234, 2021.

M. Hintermüller. Semismooth newton methods and applications. Department of Mathe-


matics, Humboldt-University of Berlin, 2010.

M. Hintermüller and G. Stadler. A semi-smooth newton method for constrained linear-


quadratic control problems. ZAMM-Journal of Applied Mathematics and Mechanic-
s/Zeitschrift für Angewandte Mathematik und Mechanik: Applied Mathematics and Me-
chanics, 83(4):219–237, 2003.

B. Hofmann, B. Kaltenbacher, C. Poeschl, and O. Scherzer. A convergence rates result for


tikhonov regularization in banach spaces with non-smooth operators. Inverse Problems,
23(3):987, 2007.

T. Hohage and F. Werner. Convergence rates for inverse problems with impulsive noise.
SIAM Journal on Numerical Analysis, 52(3):1203–1221, 2014.

L. Horesh, E. Haber, and L. Tenorio. Optimal experimental design for the large-scale
nonlinear ill-posed problem of impedance imaging. Large-Scale Inverse Problems and
Quantification of Uncertainty, pages 273–290, 2010.

S. Hurault, A. Leclaire, and N. Papadakis. Gradient step denoiser for convergent plug-
and-play. arXiv preprint arXiv:2110.03220, 2021.

S. Hurault, A. Leclaire, and N. Papadakis. Proximal denoiser for convergent plug-and-play


optimization with nonconvex regularization. In International Conference on Machine
Learning, pages 9483–9505. PMLR, 2022.

J. Kaipio and E. Somersalo. Statistical and computational inverse problems, volume 160.
Springer Science & Business Media, 2006.

B. Kaltenbacher, A. Neubauer, and O. Scherzer. Iterative regularization methods for


nonlinear ill-posed problems. Walter de Gruyter, 2008.

R. Ke and C.-B. Schönlieb. Unsupervised image restoration using partially linear denoisers.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):5796–5812,
2021.
58 Perspectives

N. Khelifa, F. Sherry, and C.-B. Schönlieb. Enhanced denoising and convergent regulari-
sation using tweedie scaling. arXiv preprint arXiv:2503.05956, 2025.

M. B. Kiss, A. Biguri, Z. Shumaylov, F. Sherry, K. J. Batenburg, C.-B. Schönlieb,


and F. Lucka. Benchmarking learned algorithms for computed tomography im-
age reconstruction tasks. Applied Mathematics for Modern Challenges, 3(0):1–43,
2025. doi: 10.3934/ammc.2025001. URL https://www.aimsciences.org/article/
id/67ab0b783942fc6603063e6e.

E. Kobler, T. Klatzer, K. Hammernik, and T. Pock. Variational networks: connecting


variational methods and deep learning. In Pattern Recognition: 39th German Confer-
ence, GCPR 2017, Basel, Switzerland, September 12–15, 2017, Proceedings 39, pages
281–293. Springer, 2017.

K. Kunisch and T. Pock. A bilevel optimization approach for parameter learning in


variational models. SIAM Journal on Imaging Sciences, 6(2):938–983, 2013.

G. Kutyniok and D. Labate. Introduction to shearlets. Shearlets: Multiscale analysis for


multivariate data, pages 1–38, 2012.

A. Langer. Automated parameter selection for total variation minimization in image


restoration. Journal of Mathematical Imaging and Vision, 57:239–268, 2017.

M. Lassas Eero Saksman and S. Siltanen. Discretization-invariant bayesian inversion and


besov space priors. arXiv e-prints, pages arXiv–0901, 2009.

J. Lee, X. Cai, C.-B. Schönlieb, and D. A. Coomes. Nonparametric image registration of


airborne lidar, hyperspectral and photographic imagery of wooded landscapes. IEEE
Transactions on Geoscience and Remote Sensing, 53(11):6073–6084, 2015.

J. Lee, X. Cai, J. Lellmann, M. Dalponte, Y. Malhi, N. Butt, M. Morecroft, C.-B. Schön-


lieb, and D. A. Coomes. Individual tree species classification from airborne multisensor
imagery using robust pca. IEEE Journal of Selected Topics in Applied Earth Observa-
tions and Remote Sensing, 9(6):2554–2567, 2016.

J. Liang, J. M. Fadili, and G. Peyré. Local linear convergence of forward–backward under


partial smoothness. Advances in neural information processing systems, 27, 2014.

P.-L. Lions and B. Mercier. Splitting algorithms for the sum of two nonlinear operators.
SIAM Journal on Numerical Analysis, 16(6):964–979, 1979.

S. Lunz. Machine Learning in Inverse Problems-Learning Regularisation Functionals and


Operator Corrections. PhD thesis, 2022.

S. Lunz, O. Öktem, and C.-B. Schönlieb. Adversarial regularizers in inverse problems.


Advances in neural information processing systems, 31, 2018.

M. Lustig. Sparse mri. Stanford University, 2008.

M. Lustig, D. L. Donoho, J. M. Santos, and J. M. Pauly. Compressed sensing mri. IEEE


signal processing magazine, 25(2):72–82, 2008.

J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online dictionary learning for sparse coding.
In Proceedings of the 26th annual international conference on machine learning, pages
689–696, 2009.
4.2 The Data Driven - Knowledge Informed Paradigm 59

S. Mallat. A wavelet tour of signal processing. Elsevier, 1999.

S. G. Mallat and Z. Zhang. Matching pursuits with time-frequency dictionaries. IEEE


Transactions on signal processing, 41(12):3397–3415, 1993.

S. Masnou and J.-M. Morel. Level lines based disocclusion. In Proceedings 1998 Interna-
tional Conference on Image Processing. ICIP98 (Cat. No. 98CB36269), pages 259–263.
IEEE, 1998.

M. T. McCann, K. H. Jin, and M. Unser. Convolutional neural networks for inverse


problems in imaging: A review. IEEE Signal Processing Magazine, 34(6):85–95, 2017.

T. Meinhardt, M. Moller, C. Hazirbas, and D. Cremers. Learning proximal operators:


Using denoising networks for regularizing inverse imaging problems. In Proceedings of
the IEEE International Conference on Computer Vision, pages 1781–1790, 2017.

T. Milne, Étienne Bilocq, and A. Nachman. A new method for determining Wasserstein 1
optimal transport maps from Kantorovich potentials, with deep learning applications.
arXiv preprint arXiv:2211.00820, 2022.

J.-J. Moreau. Proximité et dualité dans un espace hilbertien. Bulletin de la Société


mathématique de France, 93:273–299, 1965.

T. Moreau and J. Bruna. Understanding trainable sparse coding via matrix factorization.
arXiv preprint arXiv:1609.00285, 2016.

S. Mukherjee, C.-B. Schönlieb, and M. Burger. Learning convex regularizers satisfying


the variational source condition for inverse problems. arXiv preprint arXiv:2110.12520,
2021.

S. Mukherjee, A. Hauptmann, O. Öktem, M. Pereyra, and C.-B. Schönlieb. Learned


reconstruction methods with convergence guarantees: A survey of concepts and appli-
cations. IEEE Signal Processing Magazine, 40(1):164–182, 2023. doi: 10.1109/MSP.
2022.3207451. URL https://ieeexplore.ieee.org/abstract/document/10004773.

S. Mukherjee, S. Dittmer, Z. Shumaylov, S. Lunz, O. Öktem, and C.-B. Schönlieb. Data-


driven convex regularizers for inverse problems. In ICASSP 2024-2024 IEEE Interna-
tional Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 13386–
13390. IEEE, 2024.

D. B. Mumford and J. Shah. Optimal approximations by piecewise smooth functions and


associated variational problems. Communications on pure and applied mathematics,
1989.

F. Natterer and F. Wübbeling. Mathematical methods in image reconstruction. SIAM,


2001.

M. Novaga and E. Paolini. Regularity results for boundaries in r2 with prescribed


anisotropic curvature. Annali di Matematica Pura ed Applicata (1923-), 2(184):239–
261, 2005.

D. Obmann and M. Haltmeier. Convergence analysis of equilibrium methods for inverse


problems. arXiv preprint arXiv:2306.01421, 2023.
60 Perspectives

S. Osher, A. Solé, and L. Vese. Image decomposition and restoration using total variation
minimization and the h. Multiscale Modeling & Simulation, 1(3):349–370, 2003.

E. Paolini. On the relaxed total variation of singular maps. manuscripta mathematica,


111(4):499–512, 2003.

V. Papyan, Y. Romano, J. Sulam, and M. Elad. Convolutional dictionary learning via


local processing. In Proceedings of the IEEE International Conference on Computer
Vision, pages 5296–5304, 2017.

S. Parisotto, L. Calatroni, M. Caliari, C.-B. Schönlieb, and J. Weickert. Anisotropic


osmosis filtering for shadow removal in images. Inverse Problems, 35(5):054001, 2019.

S. Parisotto, L. Calatroni, A. Bugeau, N. Papadakis, and C.-B. Schönlieb. Variational


osmosis for non-linear image fusion. IEEE Transactions on Image Processing, 29:5507–
5516, 2020.

M. Pereyra. Proximal markov chain monte carlo algorithms. Statistics and Computing,
26:745–760, 2016.

M. Pereyra. Revisiting maximum-a-posteriori estimation in log-concave models. SIAM


Journal on Imaging Sciences, 12(1):650–670, 2019.

P. Perona and J. Malik. Scale-space and edge detection using anisotropic diffusion. IEEE
Transactions on pattern analysis and machine intelligence, 12(7):629–639, 1990.

G. Peyré. Sparse modeling of textures. Journal of mathematical imaging and vision, 34:
17–31, 2009.

D. L. Phillips. A technique for the numerical solution of certain integral equations of the
first kind. Journal of the ACM (JACM), 9(1):84–97, 1962.

T. Pock, D. Cremers, H. Bischof, and A. Chambolle. An algorithm for minimizing the


mumford-shah functional. In 2009 IEEE 12th international conference on computer
vision, pages 1133–1140. IEEE, 2009.

C. Poon. On the role of total variation in compressed sensing. SIAM Journal on Imaging
Sciences, 8(1):682–720, 2015.

C. Pöschl. Tikhonov regularization with general residual term. PhD thesis, Leopold
Franzens Universität Innsbruck, 2008.

P. Putzky and M. Welling. Recurrent inference machines for solving inverse problems.
arXiv preprint arXiv:1706.04008, 2017.

E. T. Reehorst and P. Schniter. Regularization by denoising: Clarifications and new


interpretations. IEEE transactions on computational imaging, 5(1):52–67, 2018.

Y. Romano, M. Elad, and P. Milanfar. The little engine that could: Regularization by
denoising (red). SIAM Journal on Imaging Sciences, 10(4):1804–1844, 2017.

R. Rubinstein, M. Zibulevsky, and M. Elad. Double sparsity: Learning sparse dictionaries


for sparse signal approximation. IEEE Transactions on signal processing, 58(3):1553–
1564, 2009.
4.2 The Data Driven - Knowledge Informed Paradigm 61

R. Rubinstein, A. M. Bruckstein, and M. Elad. Dictionaries for sparse representation


modeling. Proceedings of the IEEE, 98(6):1045–1057, 2010.

L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal
algorithms. Physica D: nonlinear phenomena, 60(1-4):259–268, 1992.

J. Rudzusika, B. Bajić, O. Öktem, C.-B. Schönlieb, and C. Etmann. Invertible learned


primal-dual. In NeurIPS 2021 Workshop on Deep Learning and Inverse Problems, on-
line, 2021.

J. Rudzusika, B. Bajić, T. Koehler, and O. Öktem. 3d helical ct reconstruction with a


memory efficient learned primal-dual architecture. IEEE Transactions on Computa-
tional Imaging, 2024.

M. L. Saksman, S. Siltanen, et al. Discretization-invariant bayesian inversion and besov


space priors. arXiv preprint arXiv:0901.4220, 2009.

O. Scherzer, M. Grasmair, H. Grossauer, M. Haltmeier, and F. Lenzen. Variational meth-


ods in imaging, volume 167 of Applied Mathematical Sciences. Springer-Verlag, New
York, 2009.

U. Schmidt and S. Roth. Shrinkage fields for effective image restoration. In Proceedings
of the IEEE conference on computer vision and pattern recognition, pages 2774–2781,
2014.

C.-B. Schönlieb, A. Bertozzi, M. Burger, and L. He. Image inpainting using a fourth-order
total variation flow. In SAMPTA’09, pages Special–session, 2009.

J. Schwab, S. Antholzer, and M. Haltmeier. Deep null space learning for inverse problems:
convergence analysis and rates. Inverse Problems, 35(2):025008, 2019.

P. Sellars, A. I. Aviles-Rivero, N. Papadakis, D. Coomes, A. Faul, and C.-B. Schönlieb.


Semi-supervised learning with graphs: Covariance based superpixels for hyperspectral
image classification. In IGARSS 2019-2019 IEEE International Geoscience and Remote
Sensing Symposium, pages 592–595. IEEE, 2019.

S. Setzer. Operator splittings, bregman methods and frame shrinkage in image processing.
International Journal of Computer Vision, 92:265–280, 2011.

S. Setzer, G. Steidl, and T. Teuber. Infimal convolution regularizations with discrete


l1-type functionals. Commun. Math. Sci, 9(3):797–827, 2011.

J. Shen, S. H. Kang, and T. F. Chan. Euler’s elastica and curvature-based inpainting.


SIAM journal on Applied Mathematics, 63(2):564–592, 2003.

F. Sherry. Part iii: Inverse problems. 2025.

Z. Shumaylov, J. Budd, S. Mukherjee, and C.-B. Schönlieb. Provably convergent data-


driven convex-nonconvex regularization. In NeurIPS 2023 Workshop on Deep Learning
and Inverse Problems, 2023. URL https://openreview.net/forum?id=yavtWi6ew9.

Z. Shumaylov, J. Budd, S. Mukherjee, and C.-B. Schönlieb. Weakly convex regularisers for
inverse problems: Convergence of critical points and primal-dual optimisation. In Forty-
first International Conference on Machine Learning, 2024. URL https://openreview.
net/forum?id=E8FpcUyPuS.
62 Perspectives

Z. Shumaylov, P. Zaika, J. Rowbottom, F. Sherry, M. Weber, and C.-B. Schönlieb. Lie al-
gebra canonicalization: Equivariant neural operators under arbitrary lie groups. In
The Thirteenth International Conference on Learning Representations, 2025. URL
https://openreview.net/forum?id=7PLpiVdnUC.

J. Stanczuk, C. Etmann, L. M. Kreusser, and C.-B. Schönlieb. Wasserstein gans


work because they fail (to approximate the wasserstein distance). arXiv preprint
arXiv:2103.01678, 2021.

J. Starck. Image processing and data analysis, the multi-scale approach. Cambridge Uni-
versity Press, 1998.

T. Staudt, S. Hundrieser, and A. Munk. On the uniqueness of Kantorovich potentials.


arXiv preprint arXiv:2201.08316, 2022.

A. M. Stuart. Inverse problems: a bayesian perspective. Acta numerica, 19:451–559, 2010.

J. Sun, H. Li, Z. Xu, et al. Deep admm-net for compressive sensing mri. Advances in
neural information processing systems, 29, 2016.

Y. Sun, Z. Wu, X. Xu, B. Wohlberg, and U. S. Kamilov. Scalable plug-and-play admm


with convergence guarantees. IEEE Transactions on Computational Imaging, 7:849–
863, 2021.

H. Y. Tan, Z. Cai, M. Pereyra, S. Mukherjee, J. Tang, and C.-B. Schönlieb. Unsupervised


training of convex regularizers using maximum likelihood estimation. Transactions on
Machine Learning Research, 2024a.

H. Y. Tan, S. Mukherjee, J. Tang, and C.-B. Schönlieb. Provably convergent plug-and-play


quasi-newton methods. SIAM Journal on Imaging Sciences, 17(2):785–819, 2024b.

J. Tang, S. Mukherjee, and C.-B. Schönlieb. Stochastic primal-dual deep unrolling. arXiv
preprint arXiv:2110.10093, 2021.

J. Tang, S. Mukherjee, and C.-B. Schönlieb. Iterative operator sketching framework for
large-scale imaging inverse problems. In ICASSP 2025-2025 IEEE International Con-
ference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2025.

M. Terris, A. Repetti, J.-C. Pesquet, and Y. Wiaux. Building firmly nonexpansive con-
volutional neural networks. In ICASSP 2020-2020 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), pages 8658–8662. IEEE, 2020.

A. N. Tikhonov. Solution of incorrectly formulated problems and the regularization


method. Soviet Math. Dokl., 4:1035–1038, 1963.

R. Tovey, M. Benning, C. Brune, M. J. Lagerwerf, S. M. Collins, R. K. Leary, P. A. Midg-


ley, and C.-B. Schönlieb. Directional sinogram inpainting for limited angle tomography.
Inverse problems, 35(2):024004, 2019.

M. Unser and T. Blu. Fractional splines and wavelets. SIAM review, 42(1):43–67, 2000.

T. Valkonen. A primal–dual hybrid gradient method for nonlinear operators with appli-
cations to mri. Inverse Problems, 30(5):055012, 2014.
4.2 The Data Driven - Knowledge Informed Paradigm 63

S. V. Venkatakrishnan, C. A. Bouman, and B. Wohlberg. Plug-and-play priors for model


based reconstruction. In 2013 IEEE global conference on signal and information pro-
cessing, pages 945–948. IEEE, 2013.

C. Vonesch, T. Blu, and M. Unser. Generalized daubechies wavelet families. IEEE trans-
actions on signal processing, 55(9):4415–4429, 2007.

K. Wei, A. Aviles-Rivero, J. Liang, Y. Fu, C.-B. Schönlieb, and H. Huang. Tuning-


free plug-and-play proximal algorithm for inverse imaging problems. In International
Conference on Machine Learning, pages 10158–10169. PMLR, 2020.

J. Weickert et al. Anisotropic diffusion in image processing, volume 1. Teubner Stuttgart,


1998.

F. Werner and T. Hohage. Convergence rates in expectation for tikhonov-type regular-


ization of inverse problems with poisson data. Inverse Problems, 28(10):104004, 2012.

D. Wu, K. Kim, B. Dong, G. E. Fakhri, and Q. Li. End-to-end lung nodule detection
in computed tomography. In Machine Learning in Medical Imaging: 9th International
Workshop, MLMI 2018, Held in Conjunction with MICCAI 2018, Granada, Spain,
September 16, 2018, Proceedings 9, pages 37–45. Springer, 2018.

X. Xu, J. Liu, Y. Sun, B. Wohlberg, and U. S. Kamilov. Boosting the performance of plug-
and-play priors via denoiser scaling, 2020. URL https://arxiv.org/abs/2002.11546.

You might also like