Data-Driven and Physics-Informed Deep Learning Operators For Solution of Heat Conduction Equation
Data-Driven and Physics-Informed Deep Learning Operators For Solution of Heat Conduction Equation
a r t i c l e i n f o a b s t r a c t
Article history: Deep neural networks as universal approximators of partial differential equations (PDEs) have attracted
Received 11 October 2022 attention in numerous scientific and technical circles with the introduction of Physics-informed Neural
Revised 12 December 2022
Networks (PINNs). However, in most existing approaches, PINN can only provide solutions for defined
Accepted 23 December 2022
input parameters, such as source terms, loads, boundaries, and initial conditions. Any modification in
Available online 31 December 2022
such parameters necessitates retraining or transfer learning. Classical numerical techniques are no excep-
Keywords: tion, as each new input parameter value necessitates a new independent simulation. Unlike PINNs, which
DeepONet approximate solution functions, DeepONet approximates linear and nonlinear PDE solution operators by
Heat (Poisson’s) equation using parametric functions (infinite-dimensional objects) as inputs and mapping them to different PDE
Multi-dimensional parameter solution function output spaces. We devise, apply, and compare data-driven and physics-informed Deep-
Deep learning ONet models to solve the heat conduction (Poisson’s) equation, one of the most common PDEs in science
and engineering, using the variable and spatially multi-dimensional source term as its parameter. We
provide novel computational insights into the DeepONet learning process of PDE solution with spatially
multi-dimensional parametric input functions. We also show that, after being adequately trained, the pro-
posed frameworks can reliably and almost instantly predict the parametric solution while being orders
of magnitude faster than classical numerical solvers and without any additional training.
© 2022 Elsevier Ltd. All rights reserved.
https://doi.org/10.1016/j.ijheatmasstransfer.2022.123809
0017-9310/© 2022 Elsevier Ltd. All rights reserved.
S. Koric and D.W. Abueidda International Journal of Heat and Mass Transfer 203 (2023) 123809
universal approximation theorem for operators [14]. DeepONet ef- 2D conditions and regular rectangular geometries. With further
fectively mapped between unseen parametric functions and solu- code modification beyond the scope of the current work, they can
tion spaces for a few linear and nonlinear PDEs in that seminal be implemented to various loading, boundary conditions, material
work, in addition to learning explicit operators such as integrals. properties, and even irregular 2 and 3D geometries.
This provided an effective new technique to solve parametric and
stochastic PDEs. Wang et al. [15] have enhanced the DeepONet 2.2. Data-driven DeepONet
formulation by the information from governing PDE in so-called
physics-informed DeepONet and reported increased prediction ac- DeepONet was proposed in [13] as a method for learning non-
curacy and data handling efficiency but under the higher compu- linear operators by mapping input functions into matching output
tational cost for training. Both of these works have solved mostly functions. We illustrate how DeepONet can be used to tackle the
spatially one-dimensional PDEs. We extended and compared these challenge of learning the parametric PDE solution operator for the
DeepONet formulations to solve the heat conduction equation with heat conduction Eq. (1), with the source term u being a paramet-
spatially 2D parametric source term, one of the most solved PDEs ric function that can take on a wide range of values. In an infinite
using traditional numerical approaches in science and engineering functional U space, u ∈ U represents the source term parameters
research. (i.e., input functions) and s ∈ S refers to the PDE’s unknown solu-
tions in the functional space S. We assume that there is a single
solution s = s(u) in S to the Eq. (1) for every u in U, which is also
2. Formulations subject to the boundary conditions BC. As a result, the mapping
solution operator G : U → S may be defined as:
2.1. Heat conduction (Poisson’s) equation G (u ) = s (u ) (2)
The Poisson’s equation governs physical phenomena such as Instead of only a collection of points ȳ on a domain, the Deep-
heat conduction with a moving heat source (laser head) in addi- ONet considers both u and ȳ, and predicts G(u) by combining them,
tive manufacturing, potential flow and pressure solvers in Compu- as shown in Fig. 1.
tational Fluid Dynamics (CFD), electrostatics, gravity in astronomy, Because it can accept a source term u function as an input vari-
and molecular dynamics, to name a few. The Poisson’s equation able, this network is significantly more capable than other PINN
is an inhomogeneous elliptic PDE, with the inhomogeneous part networks. Every G(u) computation at a point ȳ generates a new
u(x,y) representing the source of the field. In many engineering and solution point, which can be written as G(u)(ȳ). Please keep in
scientific applications, a Poisson’s equation with the same bound- mind that, in this case, ȳ represents coordinate points in the 2D
ary conditions but different source terms are frequently solved, domain (a unit square) where the network predicts the solution of
which may consume a significant fraction of the entire appli- the parametric heat conduction equation.
cation solution time, even with state-of-the-art numerical meth- Every input function u is specified on m discrete points in the
ods and computer technology. Moreover, many iterative compu- domain known as input sensors, and the output solution can be
tationally processes used in thermal parametric studies, design, assessed on P output sensor locations. The unstacked DeepONet
sensitivity analysis, uncertainty quantification, and optimization is employed, which is made up of two independent fully con-
of classical and advanced manufacturing processes require a vast nected neural networks termed branch and trunk. According to Lu
number of forward functional evaluations by sampling the para- et al. [13], the unstacked DeepONet achieves better outcomes than
metric space and calculating temperature solutions to obtain con- the stacked DeepONet in balancing training between u and ȳ in-
verging statistics. These evaluations are traditionally performed by puts while consuming fewer computing resources. The branch and
classical numerical methods such as finite elements. For high- trunk networks employ the hyperbolic tangent activation function
fidelity thermo-mechanical models, these simulations are often (Tanh) to model intricate functional linkages between inputs and
prohibitively computationally expensive in the design and op- outputs. The branch network receives u at each of its m locations
timization loops, particularly with traditional sampling methods and outputs q intermediate outputs bk . The trunk network receives
such as Monte Carlo or Latin Hypercube. Instead of numerical ȳ as an input and outputs q intermediate outputs to tk . In Eq. (3), a
forward evolutions, the Deep Operator Neural Network, which in dot product is used to combine the intermediate outputs, resulting
addition to the solution, can also learn its parameters, is a nat- in a DeepONet solution operator prediction Gˆ (u )(ȳ ).
ural choice to be used as a surrogate model for instant func-
q
tional evaluation. In the rest of this work, we will investigate Gˆ (u )(ȳ ) = bi ti (3)
how the data-driven and physics-informed DeepONet formulations i=1
can learn the solution of a parametric heat conduction equation. Eq. (4) gives the loss function L for a data-driven DeepONet
We also validate the DeepONet predictions and compare both ap- based on mean squared error:
proaches’ computational performance and accuracy on the latest 2
1
N P
high-performance computing resources.
L= Gˆ (ui )(ȳuj i ) − s(ui )(ȳuj i ) (4)
We employ the heat conduction (Poisson’s) equation, Eq. (1), on N∗P
i=1 j=1
a unit square domain with zero Dirichlet boundary conditions as a
basic reference equation to solve with DeepONet throughout this N is a number of sample functions u. In a training step, the net-
paper. work predicts Gˆ (u )(ȳ ) for each sample function ui , which is as-
sessed at P output sensor locations ȳ j and compared to the associ-
kx ∂∂ x2s + ky ∂∂y2s + u(x, y ) = 0(x, y ) ∈ [0, 1]x[0, 1]
2 2
ated target solutions s(u )(ȳ ) calculated by a classical second-order
(1)
BC : s(x, 0 ) = s(x, 1 ) = s(0, y ) = s(1, y ) = 0 finite difference solution in an offline (data-generation) stage. The
gradient of the loss function with respect to the weights in both
Where s(x,y) is an unknown function of two independent vari- networks is then calculated as a part of a backpropagation pro-
ables, x and y, u(x,y) is a source term function, and kx = ky = 0.01 cess, and the Adam optimizer minimizes the loss value by mod-
is the diffusion coefficient. It is important to note that the data- ifying the weights. DeepONet learns the solution operator of the
driven and physics-informed DeepONet methodologies developed heat conduction equation after a sufficient number of feedforward
in this work for parametric heat conduction are not limited to and backpropagation iterations and can inference a discrete point
2
S. Koric and D.W. Abueidda International Journal of Heat and Mass Transfer 203 (2023) 123809
solution for an unknown source term parametric function almost which satisfy the governing heat conduction PDE and thus provides
instantly. a physics-based regularization contribution to the overall loss that
constrains the space of admissible deep learning solutions.
2.3. Physics-informed DeepONet
1
N Q
L phys =
The original, or purely data-driven, DeepONet architecture from N∗Q
i=1 j=1
Fig. 1 requires a large number of outputs, also known as labels 2
∂ 2 G(ui )(xi , yi ) ∂ 2
( i
)( i
, i
)
or targets, s(u )(ȳ ), which are used to calculate the loss function j j
G u x j
y j i
×kx + k y + u (
i i
x , y ) (5)
in Eq. (4) and thus properly train the network. As stated before, ∂ (xij )
2
∂ (yij )
2 j j
the generation of such data often requires repeated evaluation
with classical numerical methods such as higher-order finite dif- N is again the number of sample functions u. Since u is orig-
ference, finite volume, or finite element methods. This can be par- inally defined on m points, whose coordinates are not necessarily
ticularly time and computationally expensive with the governing coincided with the collocation points, the 2D interpolation is used
PDEs defined on large multi-dimensional domains, even on high- to provide discrete u values at the collocation point coordinates.
performance computing platforms. It is even more difficult to ob-
tain a sufficient number of labels from the experimental approach. 2.4. Data sampling and computing environment
We have devised a DeepONet model that can be trained without
any generated or observable data at all, given only knowledge of We use a 2D correlated and scale-invariant Gaussian random
the heat conduction PDE and its corresponding BCs. The so-called field to generate random input functions u(x,y). We utilized a
physics-informed DeepONet model architecture is given in Fig. 2. python implementation described in [16], in which the correla-
The major difference is that there are two contributions to the loss tions are explained by a scale-free spectrum P(k) ∼ 1/|k|α /2 (with
function. One is the operator loss similar to the data-driven Deep- α = 4). The smoothness of the sampled function is determined by
ONet loss in Eq. (4) but applied only to the points on the bound- the length-scale coefficient, and a larger value means a smoother
ary conditions where the targets (solutions) are already defined, u. Because the grid where the finite difference solution for s tar-
zero-valued in the case of our Dirichlet BCs. The other loss is the gets is calculated is generally not the same as the P grid for out-
physics loss calculated at the Q collocation points in the domain’s put sensors, and similarly, the grid where u is generated is gener-
interior where the estimated solution operator G is differentiated ally different than the input sensor grid m, bilinear 2D interpola-
with respect to input coordinates by means of automatic differen- tion is used to provide discrete values for s and u across the dif-
tiation. For each collocation point, residual in Eq. (5) is calculated, ferent grids. Bilinear 2D interpolation is used to provide discrete
3
S. Koric and D.W. Abueidda International Journal of Heat and Mass Transfer 203 (2023) 123809
values for s and u across the different grids since the grid where Table 1
Solution times for classical non-optimized and highly opti-
the finite difference solution for s targets is calculated is gener-
mized Poisson’s solvers and DeepONet inference.
ally not the same as the P grid for output sensors, and similarly,
the grid where u is generated is generally different than the in- FD Iterative (Jacobi) FD Implicit DeepONet Inference
put sensor grid m. In this novel work, we have provided an exten- 2.1 sec. 0.05 sec. 9 x 10-4 sec.
sion of the data-driven and physics-informed DeepONet formula-
tions to a 2D heat conduction domain under the variable spatial
distribution of the heat source. The code is written in JAX [17], spreading at random in the 2-dimensional domain where the pre-
a relatively new Python-based toolkit built for high-performance dicted solution is compared to targets, P has less impact on the
machine learning research and created by Google. Many sophisti- testing error than the size of the training data set #u. Nevertheless,
cated capabilities of JAX include advanced automated differentia- the error was as low as 3% for #u=80 0 0 and P=600. The diffusion-
tion (grad), just-in-time compilation (JIT), and cross-device com- reaction PDE across its space-time grid has a smaller L2 test error,
pute replication (pmap). JIT was utilized to offload some of the according to the supplement of [15]. Whilst its u parametric func-
computationally heavy kernels to the GPU, and pmap was very tion is defined on a one-dimensional spatial space (line), our u in-
helpful in speeding up target creation. The Adam optimizer, also put function, on the other hand, is defined on a two-dimensional
available in JAX as a high-level function, employed automatic dif- spatial domain. To have the same closeness of the discrete points
ferentiation. Computing was done on a computing node of the defining u in [14], it would need a square of m (number of input
HPC cluster Delta [18], housed at the National Center for Super- sensors), which would surpass the device memory of our present
computing Applications (NCSA), and has four A100 Nvidia GPU GPU hardware.
cards. If the branch and trunk network sizes are varied, i.e., the num-
ber of neurons per hidden layer (network width) in Fig. 5 and the
3. Results number of hidden layers (network depth) in Fig. 6, a similar pat-
tern emerges. Increasing the network’s width or depth, in particu-
3.1. Data-driven DeepONet lar, tends to enhance prediction accuracy. Surprisingly, as network
sizes get larger by increasing their width and depth, the compu-
The data-driven DeepONet network was trained during 80,0 0 0 tational training time rises very little (2 min maximum), owing
epochs with 1,0 0 0-8,0 0 0 training u samples (m=121). For each u to JAX’s excellent handling of deep learning training kernels on
data sample, target solutions were supplied for 20 0-60 0 output GPUs. Finally, if we had naively utilized a single fully connected
sensor points (P=20 0-60 0), whose coordinates were picked at ran- feedforward neural network instead of DeepONet, we would have
dom in the 2D domain. During the offline data-generation stage, needed 14,641 outputs to provide predictions on the 121x121 grid
the second-order finite-difference (FED) solution on the 121x121 used for DeepONet data generation and prediction validations. This
grid is used to derive the target solutions on those points. We would call for a significantly larger neural network. It is debat-
tested both simple explicit Jacobi iterative and implicit FED solver able whether it would be feasible to correctly train such a network
schemes, Özişik et al [19]. A vector-style mapping of computing even on the most recent and powerful high-performance comput-
across devices (pmap) in JAX is used to significantly accelerate ing platforms.
data sample generation. It took the Jacobi FED iterative solver 4– Table 1 shows the computational cost of inferencing using a
6 min to generate all training data samples with pmap, while using trained data-driven DeepONet model, as well as the solution times
highly optimized python libraries, the implicit solver needed only of a heat conduction PDE using second-order finite difference it-
30-35 sec. erative (Jacobi) and highly optimized implicit solution schemes on
Fig. 3 depicts three randomly picked test u samples, which the GPU with the numpy and scipy implementations in JAX [17]. Be-
network has never seen before, as well as the matching classical cause JAX employs asynchronous dispatch to disguise python over-
numerical heat conduction equation solutions (targets) and data- heads, we properly waited for the JAX calculation to finish be-
driven DeepONet forecasts. The network was trained using 5,0 0 0 u fore providing accurate measurements. Particularly, once training
samples and 400 output sensors (P=400) in this scenario. For both data is generated, and a data-driven DeepONet model is adequately
the branch and trunk networks, we start with the nominal setup trained (which combined takes about 22 min on A100 GPU), the
by Wang et al. [15], which had five hidden layers, each with 50 DeepONet can predict the solution of a heat conduction PDE in a
neurons, trained on 80,0 0 0 epochs. Even though the peaks and val- fraction of a second on modern GPU, which is two to three orders
leys in the parametric source distributions u differ significantly be- of magnitude faster than traditional PDE solvers. The inferencing
tween the test samples, data-driven DeepONet correctly predicted involves a single forward pass consisting of dense matrix-vector
their 2D diffusive nature solution in the interior governed by the multiplication kernels, which are exceedingly well optimized not
heat conduction equation and on the zero-valued Dirichlet bound- only on GPUs but also on CPUs, which allows inferencing on low-
aries. Visual assessment of the numerous additional test samples end computers, too, with the trained parameters transferred from
corroborated the data-driven DeepONet predictions’ quantitative high-end computers with GPUs. This is also comparable to tradi-
correctness. tional PINNs, but with the significant difference that trained Deep-
The mean of the relative L2 prediction error over NP number of ONets can produce solutions relatively instantaneously when a
examples in the test data-set, each defined on ȳ with P training new and unknown spatially distributed parametric input is sup-
point coordinates in the domain (output sensors), provided a more plied, whereas PINNs must be retrained. Since DeepONets can gen-
qualitative test error analysis of a trained data-driven DeepONet in erally be trained to solve any other PDE with its parameters, the
Eq. (6). surrogate DeepONet-trained models might possibly replace the tra-
NP ˆ
ditional and often computationally expensive PDE solver kernels,
1 G(ui )(ȳ ) − s(ui )(ȳ ) requiring just a forward pass of the network (inferencing) for each
2
L̄ = 2
(6)
NP Gˆ (ui )(ȳ ) new input such as source term, material properties, boundary con-
i=1 2 ditions, loads, and other parameters. This can considerably speed
For a variable number of input parametric functions #u (or N) up high-fidelity scientific and engineering applications governed
and output sensors P utilized for training, the test error is shown by parametric PDEs, particularly those that solve large problems
in Fig. 4 for Np=100. Because the output sensors are sparsely [20,21] or repeatedly a large number of PDEs with parameters.
4
S. Koric and D.W. Abueidda International Journal of Heat and Mass Transfer 203 (2023) 123809
Fig. 3. Target (numerical) solution versus the prediction of a trained data-driven DeepONet for 3 random u data samples, represented in 3 rows, from the test dataset.
Fig. 5. Effect of number of neurons on test error and corresponding training time
with 5-hidden-layer branch and trunk networks (#u=50 0 0, P=40 0).
Fig. 4. Effect of training data set size #u and number of output sensors P on test
error.
number of input sensors m and collocation points Q. The num-
ber of output sensors P where data–driven DeepONet is evalu-
3.2. Comparison of data-driven and physics-informed DeepONets ated matches the number of random points enforcing zero-valued
Dirichlet boundary conditions in the physics-informed DeepONet.
Solution predictions from Data-driven and Physics-informed The test error analysis in Fig. 8 compares prediction errors across
networks are compared in Fig. 7 for three randomly chosen source 100 test samples from the data-driven and physics-informed nom-
distributions from the test datasets. The nominal neural networks inal size DeepONets. While the number of input sensors m is set
are used consisting of 5 hidden layers with 50 neurons each, 5,0 0 0 to be equal to the number of collocation points Q in both physics-
u training samples trained with 80,0 0 0 epochs, and a variable informed cases, in the first case, their spatial coordinates do not
5
S. Koric and D.W. Abueidda International Journal of Heat and Mass Transfer 203 (2023) 123809
Fig. 7. Data-driven and Physics-informed DeepONet predictions for 3 random u data samples, represented in 3 rows, from the test dataset.
6
S. Koric and D.W. Abueidda International Journal of Heat and Mass Transfer 203 (2023) 123809
7
S. Koric and D.W. Abueidda International Journal of Heat and Mass Transfer 203 (2023) 123809
[10] D.W. Abueidda, Q. Lu, S. Koric, Meshless physics-informed deep learning [17] Bradbury J., Frostig R., Hawkins P., Johnson M.J., Leary C., Maclaurin D., Necula
method for three-dimensional solid mechanics, Int. J. Numer. Methods Eng. G., Paszke, VanderPlas A.J., Wanderman-Milne S., Zhang Q.: JAX: composable
122 (23) (2021) 7182–7201, doi:10.1002/nme.6828. transformations of Python+NumPy programs (2018).
[11] J.N. Fuhg, N. Bouklas, The mixed deep energy method for resolving concen- [18] Delta HPC system at NCSA, https://www.ncsa.illinois.edu/research/project-
tration features in finite strain hyperelasticity, J. Comput. Phys. 451 (2022) highlights/delta/.
110839, doi:10.1016/j.jcp.2021.110839. [19] M.N. Özişik, H.R.B. Orlande, M.J. Colaço, R.M. Cotta, Finite Difference Methods
[12] S. Cai, Z. Wang, F. Fuest, Y.J. Jeon, C. Gray, G.E. Karniadakis, Flow over an in Heat Transfer, 2nd ed., CRC Press, 2017, doi:10.1201/9781315121475.
espresso cup: inferring 3-D velocity and pressure fields from tomographic [20] S. Koric, A. Gupta, Sparse matrix factorization in the implicit finite element
background oriented Schlieren via physics-informed neural networks, J. Fluid method on petascale architecture, Comput. Methods Appl. Mech. Eng. 32
Mech. 915 (2021) A102, doi:10.1017/jfm.2021.135. (2016) 281–292, doi:10.1016/j.cma.2016.01.011.
[13] L. Lu, P. Jin, G. Pang, Z. Zhang, G.E. Karniadakis, Learning nonlinear operators [21] M. Vázquez, G. Houzeaux, S. Koric, et al., Alya: Multiphysics engineering simu-
via DeepONet based on the universal approximation theorem of operators, Nat. lation toward exascale, J. Comput. Sci. 14 (2016) 15–27, doi:10.1016/j.jocs.2015.
Mach. Intell. 3 (2021) 218–229, doi:10.1038/s42256- 021- 00302- 5. 12.007.
[14] T. Chen, H. Chen, Universal approximation to nonlinear operators by neural [22] Y. Zhu, N. Zabaras, P.S. Koutsourelakis, P. Perdikaris, Physics-constrained deep
networks with arbitrary activation functions and its application to dynamical learning for high-dimensional surrogate modeling and uncertainty quantifica-
systems, IEEE Trans. Neural Netw. 6 (1995) 911–917, doi:10.1109/72.392253. tion without labeled data, J. Comput. Phys. 394 (2019) 56–61, doi:10.1016/j.jcp.
[15] S. Wang, H. Wang, P. Perdikaris, Learning the solution operator of paramet- 2019.05.024.
ric partial differential equations with physics-informed DeepONets, Sci. Adv. 7 [23] J.N. Fuhg, A. Karmarkar, T. Kadeethum, H. Yoon, N. Bouklas, Deep convolutional
(40) (2021) 1–9, doi:10.1126/sciadv.abi8605. ritz method: parametric PDE surrogates without labeled data, arXiv:2206.
[16] Sciolla B.:, Generator of 2D gaussian random fields, https://github.com/bsciolla/ 04675v1 [cs.CE], Jun 2022.
gaussian- random- fields.