Hybrid modeling design patterns

Rudolph, Maja; Kurz, Stefan; Rakitsch, Barbara

doi:10.1186/s13362-024-00141-0

Survey
Open access
Published: 19 March 2024

Hybrid modeling design patterns

Journal of Mathematics in Industry volume 14, Article number: 3 (2024) Cite this article

1161 Accesses
Metrics details

Abstract

Design patterns provide a systematic way to convey solutions to recurring modeling challenges. This paper introduces design patterns for hybrid modeling, an approach that combines modeling based on first principles with data-driven modeling techniques. While both approaches have complementary advantages there are often multiple ways to combine them into a hybrid model, and the appropriate solution will depend on the problem at hand. In this paper, we provide four base patterns that can serve as blueprints for combining data-driven components with domain knowledge into a hybrid approach. In addition, we also present two composition patterns that govern the combination of the base patterns into more complex hybrid models. Each design pattern is illustrated by typical use cases from application areas such as climate modeling, engineering, and physics.

1 Introduction

Models play a crucial role in the scientific process by providing a representation of complex systems, processes, and phenomena. Models help scientists to make predictions, test hypotheses, and gain a deeper understanding of the behavior of these systems [1, 2]. By using mathematical models, such as physical, statistical, or simulation models, scientists can study the relationships between variables, estimate uncertainties, and explore scenarios without having to perform expensive or dangerous experiments [3, Ch. 1]. In this way, models serve as a powerful tool for advancing our knowledge and understanding of the world, and for solving real-world problems in fields such as medicine, engineering, and environmental science.

Traditionally, models are derived from first principles and encode domain knowledge such as physical laws or physical constraints. Such models emerge from the scientific process through a combination of observation, experimentation, and theoretical analysis. After careful observation of natural phenomena, scientists form hypotheses and theories to explain the observed behavior. These theories are then tested through experiments and compared with existing knowledge and models. If a theory withstands experimental scrutiny and provides accurate predictions, it may become accepted as a law or constraint. Models based on first principles are data-efficient, causal, lead to explainable predictions, are often more reliable than data-driven models since the underlying theory has been validated, and predictions will generalize to other deployment regimes as long as the underlying assumptions of the model still hold.

Data-driven models, on the other hand, are a type of modeling approach that relies on large data sets to identify patterns and correlations in the data that can be used to make predictions or classifications [4, 5]. These models are often used in fields where the underlying physical processes are too complex to model by first-principles. Data-driven models are typically developed using machine learning techniques such as neural networks [6]. These models can be trained on large data sets of labeled and in some cases unlabeled data and can then be used to make predictions or classifications on new data. Data-driven models have shown promise in a wide range of applications, including image and speech recognition, natural language processing, and predictive modeling in finance and healthcare.

Hybrid models combine the strengths of both data-driven and first-principle based models, and can be useful in situations where neither approach alone is sufficient [7–10]. For example, mechanistic models are based on first principles and describe a hypothesized causal process between variables [11]. While they can provide a deep understanding of the underlying physics or biology of a system, they may not always capture all of the relevant details or interactions, leading to inaccuracies. On the other hand, data-driven models can accurately capture complex relationships in large data sets, but may not be able to explain the underlying mechanisms or provide insight into how the system behaves under new conditions. Hybrid models can combine the strengths of both approaches, allowing for more accurate and interpretable predictions even in complex systems with incomplete understanding of the underlying mechanisms.

Hybrid modeling is challenging because it requires expertise in both first-principle-based modeling and data-driven modeling, as well as knowledge of how to integrate the two approaches effectively. It can be difficult to determine the appropriate level of complexity for each component of the hybrid model and to ensure that the different components are compatible with each other. In particular, hybrid modeling requires careful consideration of the trade-offs between accuracy, complexity, interpretability, and scalability, which can be difficult to optimize.

Validating and verifying a hybrid model presents another challenge. Its data-driven and physics-based components may contribute different sources of uncertainty and error which need to be handled with care. For these reasons, designing and implementing a hybrid model requires careful consideration of the strengths and weaknesses of each modeling approach and a thorough understanding of the system being modeled.

The applications of hybrid modeling are incredibly diverse, spanning a wide range of fields and industries. From molecular modeling in drug discovery [12], to simulation tasks in climate [13] and earth science [14] and engineering, to modeling sensor data, hybrid modeling is used in many domains to address unique and complex challenges.

This diversity of applications means that there is a need for solutions that can be applied more broadly, rather than being specific to one particular domain. Developing such approaches requires a focus on abstraction and generalization, so that solutions can be formulated at a higher level of abstraction that can be applied across multiple domains. While literature surveys of hybrid modeling have introduced taxonomies of modeling approaches [8, 9], the contribution of this paper is to present different design patterns for composing data-driven and first-principle based models. The design patterns address recurring modeling challenges and distill useful solution approaches that generalize across applications.

Formalizing solutions to recurring modeling challenges into hybrid modeling design patterns provides several benefits. First, it allows for the sharing of knowledge and expertise across application domains, which can lead to faster progress and innovation. Second, it facilitates the development of standardized tools and techniques for hybrid modeling, which can improve the efficiency and reliability of the modeling process. Third, it can help identify common challenges and limitations in hybrid modeling, which can guide future research directions and advance the field as a whole. Overall, the use of hybrid modeling design patterns can improve the accessibility, efficiency, and effectiveness of hybrid modeling across a wide range of applications.

2 Background

In this background section, we introduce modeling and then review both the first-principles-based as well as the data-driven perspective on modeling.

2.1 Computational models

The goal of hybrid modeling is to build a computational model for a system of interest. A computational model is a set of computations that are applied to an input to produce an output. The model of a system can be used to make predictions about how the system would react to certain inputs or to study how the system behaves under certain conditions. Alternatively, the model can be used to simulate the system. Models typically approximate the behavior of the underlying system, which might be too complex to model more accurately.

An computational model is of the form

$$ y = u(x). $$

(1)

The inputs x are manipulated by a function u to produce the outputs y. The functional form of u will depend on the model type. We distinguish between two different model types: The first type is models based on first principles, for example from physics. These are sometimes also called scientific models, and we often call them physical models. The second type of model is data-driven. Here one uses data to find a model within a class of functions that best explains the data. This function is then used as a model, e.g. to make predictions.

2.2 Modeling from first principles

When modeling from first principles, the choice of u is derived using scientific reasoning. There is a justification for both the functional form of u and for the choice of its parameters. For this reason, these models are often called models based on first-principles, mechanistic models, physics-based models or science-based models.

For example, laws of physics, such as Newton’s laws of motion and the law of conservation of energy, emerged from centuries of observation and experimentation in the field of mechanics. These laws provide a mathematical framework for understanding and predicting the behavior of physical systems, and have been tested and confirmed through numerous experiments. Similarly, in chemistry, conservation laws, such as the law of conservation of mass, emerged from the study of chemical reactions and provide a fundamental understanding of the behavior of chemical systems.

From a mathematical point of view, scientific models frequently take the form of algebraic models, ordinary differential equations (ODEs), partial differential equations (PDEs), or a combination of those.

2.2.1 Algebraic models

An algebraic mathematical model is a type of mathematical model that uses algebraic equations or functions to represent a real-world situation or system. In an algebraic model, the relationships between the variables are often represented using equations that involve elementary mathematical operations and functions.

One example is the equation for the trajectory of a stone that is vertically thrown in the air, where air resistance is neglected. The height $u(t)$ over ground as a function of time $t\ge 0$ is

$$ u(t) = -0.5gt^{2} + v_{0}t + h_{0} , $$

(2)

where $h_{0}$ is the initial height, $v_{0}$ the initial velocity and g the gravitational constant.

From a computational perspective, this model could be utilized to compute – for a given instance $t_{1}$ – the height at this instance, $h_{1}=u(t_{1})$.

2.2.2 Ordinary differential equations (ODEs)

A more involved model class are differential equations. An ODE is a type of differential equation that involves only one independent variable, usually time t, and its derivatives.

ODE models are particularly useful for systems that involve dynamic behavior, where the behavior of the system changes over time in response to internal or external factors. In an ODE model, the behavior of a system is represented using one or more ODEs that describe the rates of change of the system’s variables. The ODEs can be used to predict how the system will evolve over time, based on its initial conditions and the values of its parameters.

Solving an ODE involves finding a mathematical expression that describes the behavior of the system as a function of the independent variable, usually as a function of time. This can be done using various analytical or numerical methods, depending on the complexity of the system and the accuracy of the desired solution. A closed form solution of an ODE yields an algebraic model. For example, the algebraic model (2) is a solution to the ODE

$$ \frac{\mathrm{d}^{2}u(t)}{\mathrm{d}u^{2}}=-g , $$

subject to given initial conditions. This is just Newton’s law, the first-principle based model that underlies the mechanistic model (2).

Once a solution has been obtained, it can be used to predict the behavior of the system under different conditions or to design interventions to achieve a desired outcome.

In the following, we will consider three additional ODE models that will serve as recurring examples throughout the remainder of the paper.

1.
Let us start with the ODE of an harmonic oscillator
$$ \frac{\mathrm{d}^{2}u(t)}{\mathrm{d}t^{2}} = -u(t) , $$
(3)
where $u(t)$ yields the normalized displacement at normalized time t. The normalization is with respect to some reference displacement $s_{0}$ and the oscillatory period T, respectively. For a spring-mass system with mass m and spring constant k the oscillatory period is $T=\sqrt{m/k}$. The model gets more interesting if a nonlinear damping term is added,
$$ \frac{\mathrm{d}^{2}u(t)}{\mathrm{d}t^{2}} = -u(t)+\mu \frac{\mathrm{d}u(t)}{\mathrm{d}t} \bigl(1-u(t)^{2} \bigr) , $$
(4)
where the positive real parameter μ determines the amount of nonlinear damping. Equation (4) is the Van der Pol equation [15, Sect. 5.7], which exhibits a number of interesting nonlinear phenomena, such as relaxation oscillations [16].
2.
The Lotka-Volterra equations [17, Sect. 4.1] are used to model the population dynamics of two interacting species of a predator and its prey. The population density of prey is $u(t)$ and the population density of predators is $w(t)$. The population dynamics is modeled by the nonlinear system of ODEs
$$ \frac{\mathrm{d}u(t)}{\mathrm{d}t} = \alpha u(t) - \beta u(t)w(t) , \qquad \frac{\mathrm{d}w(t)}{\mathrm{d}t} = \delta u(t)w(t) - \gamma w(t) , $$
(5)
with positive real parameters α, β, γ, and δ determining the self and mutual interactions of the two species.
3.
The simplest standard model for a dynamical system with several degrees of freedom is a system of ODEs, of the form
$$ \frac{\mathrm{d}u(t)}{\mathrm{d}t}=f \bigl(u(t),t;\theta \bigr) , $$
(6)
where $u(t)\in \mathbb{R}^{n}$ describes the state of the system at time t, a point in an n-dimensional state space. Herein, $\theta \in \mathbb{R}^{p}$ is a p-dimensional parameter vector that admits calibrating the model. Given an initial condition $u(t_{0})$ at time $t_{0}$, the dynamics of the system can be obtained by integrating the ODE system. At time $t_{1}>t_{0}$ we obtain
$$ u(t_{1})=u(t_{0})+ \int _{t_{0}}^{t_{1}}f \bigl(u(t),t;\theta \bigr) \, \mathrm{d}t . $$
(7)
This representation clearly demonstrates that the dynamics of the system is entirely encoded in the function f, which assigns to each state $u(t)$ and time t the rate of change of this state. The structure of the function f is often dictated to us from physics, and the values of the parameters can be obtained from domain knowledge.

Moreover, given an actual numerical implementation of the function f there are several numerical methods, such as Runge-Kutta methods [3, Ch. 4 & 6], to integrate ODE systems. Only together with an integration method will an ODE system yield a computational model (Eq. (1)) for predicting future states.

2.2.3 Partial differential equations (PDEs)

A PDE is an equation for a function which depends on more than one independent variable. The equation involves the independent variables, the function, and partial derivatives of the function, with respect to the independent variables. PDEs are ubiquitous in mathematical physics and foundational in several fields, such as acoustics, elasticity, electrodynamics, fluid dynamics, thermodynamics, general relativity, and quantum mechanics. The independent variables are often space-time coordinates, like $(x,y,z,t)$.

As a simple example, we consider a scalar function u, which depends on the spatial coordinates $(x,y,z)$, and the PDE

$$ \frac{\partial ^{2}u(x,y,z)}{\partial x^{2}}+ \frac{\partial ^{2}u(x,y,z)}{\partial y^{2}}+ \frac{\partial ^{2}u(x,y,z)}{\partial z^{2}}=0 . $$

(8)

This is the Laplace equation in three dimensions. For example, if u denotes the scalar electric potential, (8) is the governing equation in electrostatics, for domains that are free of electrical charges.

To obtain a Computation model (Eq. (1)) for predicting the state of the system over time the PDE will need to be solved either analytically or numerically. Here the finite element method (FEM) is a popular choice [18], but many other methods exist [19].

2.3 Data-driven modeling

An alternative path for developing a model is data-centric. Given data in form of observations, a model is developed to be consistent with the observations, for example, reproducing the data as accurately as possible. There are many different data-driven approaches. Unlike the scientific models, which are chosen based on deductive reasoning, data-driven models are chosen based on their statistical and computational properties and their match to the requirements of the modeling problem at hand.

2.3.1 Data-driven calibration

Data-driven calibration is a methodological approach that leverages observed data in order to optimize the parameters of a given model. Consider, for example, the Lotka-Volterra equations, Eq. (5). In the context of data-driven calibration, the goal is to optimize the parameters α, β, γ, and δ based on observed data, to accurately capture the dynamics of the predator-prey system.

Traditionally, these parameters might be adjusted by specialists through a process of trial-and-error until the desired behavior is achieved. However, more systematic and efficient approaches to parameter identification are available [20]. Data-driven calibration can employ optimization algorithms, often utilizing a specific loss function (e.g., the mean-squared error) to guide the optimization process. For straightforward scenarios, standard least squares approaches can be effective [21], while for complex or non-differentiable problems, derivative-free optimization methods such as genetic algorithms [22], particle swarm optimization [23], and Bayesian optimization [24] offer valuable alternatives. Moreover, data-driven calibration is not limited to refining existing models; it can also facilitate the identification of physical systems from scratch [25, 26].

When considering uncertainty in the data, more sophisticated techniques, termed as Bayesian calibration or simulation-based inference, come into play [27, 28]. These methods do not merely estimate point values for the parameters but learn their posterior distribution, accounting for both aleatoric (inherent randomness) and epistemic (model uncertainty) factors. Furthermore, there are specialized methods designed for ordinary differential equations (ODEs), which improve algorithmic efficiency by utilizing their mathematical structure [29, 30].

While calibration focuses on refining model parameters to align predictions with observed data, standard machine learning techniques as we will discuss next aim to learn patterns directly from data without providing any physical interpretation.

2.3.2 Machine learning

Machine learning presents an approach for learning model parameters from data [4, 5]. While non-parametric approaches exist, a machine learning model often consists of a parameterized function $u(\cdot ;\theta )$ with parameters θ, that can predict a response y from inputs x. Different parameter settings correspond to different functional relationships between the predictions $\hat{y}=u(x;\theta )$ and the inputs. The quality of a prediction, i.e. how closely a prediction ŷ resembles a desired output y, can be measured in a loss function $l(x, y, \theta )$. In the supervised learning setting [5, Ch. 1.3], given a data set $\mathcal{D}$ of examples of x and y pairs, the optimal parameter setting is found by minimizing the loss, averaged over the training examples,

$$\begin{aligned} \theta ^{*}= \min_{\theta } \frac{1}{ \vert \mathcal{D} \vert }\sum_{x,y \in \mathcal{D}}l(x, y, \theta ) . \end{aligned}$$

(9)

Machine learning approaches, are also applicable in the unsupervised setting [5, Ch. 1.3] where the training data only contains input samples x, but no labels. Common unsupervised modeling tasks include clustering, where the target label y would be the cluster assignment of an input, or anomaly detection, where the unknown label represents the likelihood that the input sample is an anomaly. For an overview of common machine learning tasks see Ch. 5.1.1 of [6].

Probabilistic modeling

Probabilistic modeling [4, 5] refers to a class of machine learning methods where data points are treated as observations of random variables. Modeling consists of making assumptions about the underlying distributions from which these data points are drawn. The primary aim is to infer the parameters that characterize these distributions from the available data. Once the model is learned, it can be used to predict future observations, evaluate the likelihood of observed data, or provide uncertainty estimates regarding the outcomes.

In probabilistic modeling, the uncertainty inherent in predictions is embraced, allowing for more robust decision-making in many scenarios. There are numerous techniques and models in this category, including Bayesian networks [4, Ch. 8.1.], Gaussian processes [31], Markov and Hidden Markov Models [5, Ch. 17], and Markov random fields [5, Ch. 19], among others. Each of these models has its own strengths and applications, depending on the nature of the data and the problem at hand. One model class is particularly useful in some hybrid modeling scenarios – Gaussian processes. For this reason, they are introduced next.

Gaussian processes

Gaussian processes (GPs) define a distribution over functions. They provide a principled, non-parametric methodology to infer underlying patterns in data [31]. A Gaussian process is defined by its mean function $m(x)$ and its covariance or kernel function $k(x,x')$. At a high level, the mean function describes the expected value of the process, and the kernel function dictates how data points influence each other based on their separation in the input space.

Formally, a Gaussian process can be represented as:

$$ u(x) \sim \mathrm{GP}\bigl(m(x), k\bigl(x,x' \bigr)\bigr), $$

(10)

where $u(x)$ is the output of the GP for input x, $m(x)$ is the mean function, and $k(x,x')$ is the kernel function.

Since GPs provide a distribution over functions, they can capture an infinite number of possible explanations for the observed data. Any finite set of these observations can be viewed as being drawn from some multivariate Gaussian distribution defined by the mean and kernel functions. This is particularly powerful as it not only provides a prediction for unseen data but also an associated uncertainty, which can be crucial for decision-making in uncertain environments.

Kernel functions play an integral role in shaping the GP, with the choice of kernel determining the nature of functions the GP can represent. For instance, the Radial Basis Function (RBF) kernel assumes that points closer in input space are more correlated, leading to smooth function approximations. On the other hand, periodic kernels can capture cyclical patterns in the data.

Training a GP typically involves maximizing the likelihood of the observed data under the GP prior, leading to the optimization of kernel hyperparameters. Once trained, predictions with GPs involve conditioning the GP on the observed data to infer values (and uncertainties) at unseen input points.

However, one should note that while GPs offer many advantages, including providing uncertainty estimates and flexibility in modeling, they can become computationally expensive with large data sets. But recent advancements and approximations, like inducing points or sparse GPs [32–34], allow for more scalable implementations. If GPs are combined with universal kernels, such as the RBF kernel, their data hunger rises very quickly with the number of input features, an effect also known as the “curse of dimensionality”. Here, it often helps to build customized kernels that take properties of the data into account, e.g. convolutional kernels for images [35] or kernels tailored to linear ODE and PDE systems [36, 37].

Altogether, Gaussian Processes are a versatile tool for machine learning and allow hybrid modeling at scale [28, 38, 39].

Neural networks

Neural networks [6] are computational models consisting of interconnected nodes, or “neurons” (this terminology is borrowed from how the brain processes information), organized into layers: input, hidden, and output layers. The connections between neurons has an associated weight, which is adjusted during training to minimize the difference between the predicted and actual output. Each layer of a neural network can be represented as $\sigma (Wx + b)$, where W is a matrix of weights, x is the input vector from the previous layer, b is the bias vector, and σ represents an activation function, such as the sigmoid or ReLU (Rectified Linear Unit) [40], which is applied element-wise.

The power of neural networks lies in their capacity to approximate complex, non-linear functions. By stacking multiple layers and using non-linear activation functions, neural networks can capture intricate patterns and relationships in data. The training process involves iteratively adjusting the weights using optimization algorithms like gradient descent to reduce the error between the network’s predictions and the ground truth.

Deep learning, a sub-field of machine learning, refers to neural networks with many layers, enabling the capture of even more complex representations. For instance, convolutional neural networks (CNNs) [6, Ch. 9] are adept at processing image data, while recurrent neural networks (RNNs) [6, Ch. 10] excel in handling sequential data.

However, while neural networks have achieved remarkable success in various applications, they come with challenges. For example, neural networks require an amount of data that is appropriate for the size of the network to avoid a phenomenon called overfitting. When the network becomes large and has many parameters but is trained on too little data, it can use its modeling capacity to model irrelevant details including noise which leads to overfitting meaning that the predictions will be close to perfect on the training data but will not work well for new test cases. Since model behavior is determined by the training data, out-of-sample and out-of-distribution generalization cannot be assumed. In addition, the “black-box” nature of neural networks usually limits the interpretability of the model and its predictions. Finally, hyperparameter tuning is another area of concern, requiring extensive experimentation to find the optimal settings for parameters such as the learning rate, batch size, and network depth, which can be both time-consuming and resource-intensive.

Regularization of machine learning methods

Regularization techniques serve as foundational tools in machine learning, designed to prevent models from overfitting to their training data. By introducing a penalty to the model’s complexity, regularization ensures that models remain generalizable to unseen data [4]. $L_{1}$ (Lasso) and $L_{2}$ (Ridge) regularization, which penalize the magnitude of model parameters, can be viewed as implicit modeling methods. They don’t dictate the model’s structure directly but influence it by penalizing certain parameter configurations. In neural networks, techniques like dropout, which randomly deactivates certain neurons during training, aid in enhancing generalization. Other methods such as early stopping and batch normalization, which normalizes neuron activations, further contribute to model robustness. While regularization provides a shield against overfitting, it introduces the challenge of selecting the right regularization strength, necessitating meticulous tuning and validation.

2.4 Explicit versus implicit models

In Sect. 2.1 we have introduced computational models, and so far avoided the distinction between explicit models, which directly provide computational representations like Eq. (1), and implicit models, which on their own are not enough to obtain a computational model. While an explicit model prescribes a direct mapping from input x to output y implicit models often require a solver or an optimization procedure to result in a computational model akin to Eq. (1). Regularization is a fitting example of this distinction. While it introduces constraints or penalties to the learning process, it doesn’t directly specify the functional form of the model. Instead, the model emerges as a result of an optimization process that balances fitting the data with the imposed regularization constraints.

Similarly, differential equations provide the dynamics or laws governing a system but don’t directly offer a computational model for predicting states. Only when combined with a solver, often numerical, do they yield a method to predict the state at subsequent time points. Partial differential equations (PDEs), such as Maxwell’s equations, also epitomize this concept. While they describe the fundamental relationships between electric and magnetic fields, a computational model that predicts field values at specific spatial and temporal points necessitates the application of a solver. The allure of implicit models lies in their ability to capture complex behaviors and constraints. However, they also demand a deeper understanding and careful selection of solvers or optimization techniques to ensure accurate and meaningful predictions.

2.5 Model composition

A computational model, as defined in Sect. 2.1 can itself be a composition of multiple sub-models. The generic function u that we have used so far can be composed of other functions representing the sub-models in various ways. The sub-models can be implicit or explicit and can be data-driven or first-principles based. The contribution of this paper is to present different design patterns for composing data-driven and first-principle based models.

2.5.1 Model composition in machine learning

An example of model composition is deep kernel learning [41]. In deep kernel learning, the kernel function of a GP is parameterized using a deep neural network. This means that instead of using a traditional kernel function like the RBF or Matérn kernel, the kernel is defined by the outputs of a neural network. Formally, given two input vectors x and $x'$, the kernel function can be represented as $k_{\theta _{k}}(f_{\theta _{f}}(x),f_{\theta _{f}}(x')) $, where $f_{\theta _{f}}$ is the neural network with parameters $\theta _{f}$, and $k_{\theta _{k}}$ is a base kernel with parameters $\theta _{k}$.

This composition allows the model to learn intricate patterns and relationships in the data that might not be captured by a standard GP kernel. By mapping the input data into a new representation space using the neural network, the kernel can operate on features that are potentially more informative and better suited to the problem at hand.

Another illustrative example of model composition is the concept of model stacking or stacked generalization [42]. Here, individual models, often referred to as base learners, make predictions which are then used as input features for another model, typically called the meta-learner or the stacking model. The meta-learner then makes the final prediction. This composition technique aims to combine the strengths of multiple models, thereby improving generalization performance.

A different perspective on model composition can be found in ensemble methods like bagging [43] and boosting [44]. In bagging, multiple models are trained on different subsets of the data and then averaged (for regression) or voted upon (for classification) to make predictions. Boosting, on the other hand, iteratively trains models by giving more weight to instances that previous models got wrong, aiming to correct mistakes made by earlier learners.

2.5.2 Model composition of models based on first principles

Another example of model composition can be found in classical electrodynamics. An electromagnetic field is defined as a four-tuple of space- and time-dependent vector fields $(\vec{E},\vec{D},\vec{H},\vec{B})$, the electric field E⃗, the electric displacement D⃗, the magnetic field H⃗, and the magnetic flux density B⃗. Electromagnetic fields are governed by Maxwell’s equations, a set of four PDEs. Two of the equations are dynamic equations, since they contain time derivatives. We collect them in a sub-model $U_{1}$,

$$ \frac{\partial}{\partial t} \begin{pmatrix} \vec{D} \\ \vec{B} \end{pmatrix} = \begin{pmatrix} 0&+\operatorname{curl}\\ -\operatorname{curl}&0 \end{pmatrix} \begin{pmatrix} \vec{E} \\ \vec{H}\end{pmatrix} - \begin{pmatrix} \vec{\jmath} \\ 0 \end{pmatrix} , $$

(11)

with the electric current density ȷ⃗. The first equation in (11) is Ampère’s law, the second Faraday’s law, respectively. The remaining two equations have the form of PDE constraints. We collect them in the sub-model $U_{2}$,

$$ \begin{pmatrix} 0 \\ 0 \end{pmatrix} = \begin{pmatrix} \operatorname{div}&0 \\ 0&\operatorname{div}\end{pmatrix} \begin{pmatrix} \vec{D} \\ \vec{B}\end{pmatrix} - \begin{pmatrix} \rho \\ 0 \end{pmatrix} , $$

(12)

with the electric charge density ρ. These are the electric and magnetic Gauss’ laws, respectively. Maxwell’s equations $(U_{1},U_{2})$ need to be complemented by constitutive relations that encode the material properties. For simple media at rest, the additional sub-model $U_{3}$ takes the algebraic form

$$ \begin{pmatrix} \vec{D} \\ \vec{B} \end{pmatrix} = \begin{pmatrix} \boldsymbol{\varepsilon}&0 \\ 0&\boldsymbol{\mu}\end{pmatrix} \begin{pmatrix} \vec{E} \\ \vec{H} \end{pmatrix} , $$

(13)

with the dielectric tensor ε and the permeability tensor μ. All three sub-models can be written in implicit form $U_{i}(\vec{E},\vec{D},\vec{H},\vec{B})=0$, $i=1,2,3$, and aggregate to the composed model $U=(U_{1},U_{2},U_{3})$, which yields a predictive model of electrodynamics.

3 Hybrid modeling design patterns

Hybrid modeling is diverse with applications ranging from molecular modeling in drug discovery [45], over various simulation tasks in climate science [46] or various engineering disciplines [47], to modeling sensor data for virtual sensing. Solutions for individual use cases are usually application-specific. New hybrid modeling challenges often seem so unique that interdisciplinary teams come together to develop a custom solution from scratch. While this leads to progress in individual disciplines, solutions are often not accessible to other application domains.

To make progress in hybrid modeling research, it is necessary to abstract recurring modeling challenges and to distill useful solution approaches that generalize across applications. The goal of this paper is to introduce hybrid modeling design patterns that formalize these solution approaches at an abstraction level beyond individual applications. We adopt the following definition of design pattern.

Definition 1

A hybrid modeling design pattern is a reusable blue-print for a building block of a general solution to recurring hybrid modeling challenges.

Per our definition, a design pattern should address recurring challenges beyond individual application domains. For this reason, the solution approach encoded in the design pattern should be general, meaning that application-specific aspects are abstracted away. Further, the hybrid modeling design patterns are modular and solving a modeling challenge will typically involve the composition of multiple design patterns. Finally, a design pattern is a blue-print rather than an implementation; blue-prints are reusable and useful for developing a solution and guiding its implementation.

In this section, we discuss the motivation behind working at this level of abstraction and list properties of useful design patterns. We then introduce the block diagram notation we propose to communicate the design patterns. Finally, we provide some guidance on how the design patterns can be used for new hybrid modeling use cases as well as meta-level research.

3.1 The block diagram notation for hybrid modeling design patterns

We propose a simple block diagram notation for working with the hybrid modeling design patterns. The general question in recurring hybrid modeling challenges is typically how to best combine the available domain knowledge with the available data. The data is processed by a data-driven model, which we denote by $\mathcal{D}$, while the chosen first-principles-based model is denoted by $\mathcal{P}$. Both models $\mathcal{D}$ and $\mathcal{P}$ are computational blocks, which receive inputs and perform computations to produce an output. For example, a data-driven model component will receive observations as an input which it will process to either produce a prediction, a lower dimensional representation of the input, or another quantity that is needed for the modeling challenge at hand. The inputs to $\mathcal{P}$ will depend on the type of domain knowledge available. In the case of a differential equation for example, the inputs might consist of the initial conditions and the time interval over which the dynamics are to be integrated. The desired output could be the simulated dynamics, or the final state.

In the block diagram notation, a computational block (typically $\mathcal{P}$ or $\mathcal{D}$) is represented by a square. Directed arrows indicate the flow of information. For example, a directed arrow between two blocks indicates that the output (i.e. the result of the computation) of the first block, is used as one of the inputs to the second block. A computational block can have multiple incoming arrows, meaning that its inputs come from various sources, and it can have multiple outgoing arrows, meaning that its computational results are further processed in different ways.

In summary, a block diagram for describing a design pattern consists of rectangular boxes representing computational blocks and of directed arrows, which indicate the flow of inputs and outputs between the boxes. Actual examples of design patterns will be presented in Sect. 4.

3.2 Properties of useful design patterns

Before diving into the specific design patterns introduced in Sect. 4 and utilizing the block diagram notation to generate patterns that satisfy Definition 1, it is crucial to discuss the properties that make a design pattern useful. Some of these properties are essential and have already been explicitly stated in our definition of hybrid modeling design patterns.

Design pattern versus architecture

We prefer the term “design pattern” over “architecture” because, in a specific model architecture, several design patterns might be combined or nested. Additionally, we emphasize that the design patterns were collected by analyzing actual applications. Since there is no comprehensive theory of hybrid modeling from which these patterns could be derived, our collection is not exhaustive and is intended to grow as new design patterns are developed or gain importance.

Abstract and general

An essential step in creating design patterns is abstracting useful concepts that are applicable across various applications and formulating them in a way that makes them easily applicable in a general reusable context. A good design pattern is not a finished design, but rather a blueprint that can be adapted to specific problems.

Design patterns should be abstract and general rather than application-specific, allowing them to be applied across a wide range of problems. This flexibility enables researchers and practitioners to adapt and customize the design pattern for their specific needs, promoting innovation and problem-solving in diverse fields.

Broad applicability

A useful design pattern should have the potential to address various challenges and applications, enabling researchers and practitioners to benefit from its adoption. By offering solutions that can be adapted to different contexts, a design pattern with broad applicability can contribute to the development and improvement of numerous models, fostering progress across multiple domains.

Modularity and composability

Design patterns should be modular, allowing for easy integration with other patterns, and promoting composability for constructing more complex models. This property enables the combination of multiple design patterns, leading to the creation of more sophisticated and powerful hybrid models that can tackle complex challenges.

Tractability and ease of communication

A good design pattern should be tractable, facilitating implementation, and easy to communicate, promoting understanding and collaboration among researchers and practitioners. Clear and understandable design patterns encourage adoption and facilitate the sharing of ideas, contributing to the overall growth and development of hybrid modeling methodologies.

Clear interface between physics-based and data-driven components

An effective design pattern should provide a clear interface between the physics-based and data-driven components, enabling seamless integration and interaction between the two modeling paradigms. By defining how these two aspects interact, a design pattern can help create a cohesive and well-structured model that effectively leverages the strengths of both approaches.

4 Examples of design patterns

We now delve into the key design patterns for hybrid modeling. There will be two types of patterns, base patterns and composition patterns. The base patterns establish systematic approaches for combining a first-principles-based model $\mathcal{P}$ with a data-driven model $\mathcal{D}$, capitalizing on the strengths of both modeling techniques. In Sect. 4.1, each of the base design patterns is described in detail, elucidating the principles and methodologies underlying their application. Furthermore, we provide illustrative examples to enhance comprehension and demonstrate the practical utility of these design patterns in various scenarios. In Sect. 4.2, we present patterns for the composition of base patterns. These composition patterns facilitate building more elaborate hybrid modeling solutions for complex modeling tasks.

4.1 Base patterns for hybrid modeling

The base patterns are the basic building blocks for the development of hybrid modeling solutions. Each design pattern takes two computational models, typically a first-principles-based model $\mathcal{P}$ and a data-driven model $\mathcal{D}$ and combines their computation steps into a hybrid model. The order in which the computation is executed, and the flow of inputs and outputs between computational blocks will differ between the design patterns.

In the following sections, we present a total of four base patterns, with the first three having previously been introduced by von Stosch et al. [48] within the context of process systems engineering.

4.1.1 The delta model

The delta model serves as a fundamental design pattern in hybrid modeling, providing an effective method to combine the strengths of both first-principles-based and data-driven models. This design pattern is particularly useful when the first-principles-based model captures the primary underlying physical, chemical, or biological processes but may lack the precision or comprehensiveness required for specific applications. By introducing a data-driven component that accounts for discrepancies or unmodeled phenomena, the delta model can significantly enhance the accuracy and predictive capabilities of the overall hybrid model.

The delta model is formulated by additively combining a first-principles-based model $\mathcal{P}$ with a data-driven model $\mathcal{D}$, resulting in a hybrid model $\mathcal{H}$ as follows:

$$ \mathcal{H}(x) = \mathcal{D}(x) + \mathcal{P}(x) . $$

(14)

The block diagram is given in Fig. 2. In the equation, x represents the input variables, and $\mathcal{H}(x)$, $\mathcal{P}(x)$, and $\mathcal{D}(x)$ are the output predictions for the hybrid, first-principles-based, and data-driven models, respectively. The first-principles-based model, $\mathcal{P}(x)$, encapsulates the primary knowledge of the underlying processes, while the data-driven model, $\mathcal{D}(x)$, is trained to capture the discrepancies between $\mathcal{P}(x)$ and the observed data. The data-driven component, therefore, accounts for the unmodeled or inaccurately modeled phenomena, refining the overall predictions made by the hybrid model.

Typical use cases

The delta model is applicable in a variety of scenarios, including but not limited to:

Thompson and Kramer [49] suggest compensating for the inaccuracies of first principle based equations, such as mass and component balances by building a hybrid model which additively combines these simple process models with a neural network. For a survey of more recent approaches we refer the reader to Zendehboudi et al. [50].
Ground water modeling in geoscience: Xu and Valocchi [51] showcase that various data-driven models are effective at correcting the bias of physics-based ground flow models and can in addition produce well calibrated error bars.
Computational fluid dynamics: Reynold-averaged Navier Stokes (RANS) equation solvers are an important computational tool for modeling turbulent flows. Unfortunately, RANS predictions are often inaccurate due to large discrepancies in the predicted Reynolds stress. Wang et al. [52] propose to mitigate these discrepancies with a data-driven correction term.
Dynamics modeling: Levine and Stuart [53] present a unified framework for learning the modeling error in dynamical systems, when $\mathcal{P}$ is described by differential equations.

Example

To study the delta model in action, we consider data from an accelerometer. The long-term effects can be described by a harmonic oscillator with non-linear dampling, while the short-term effects lack a physical interpretation. We will study the delta model in comparison to just its physical component $\mathcal{P}$ or the data-driven component $\mathcal{D}$. We assume, that the underlying dynamics of the system resemble the Van der Pol equation (Eq. (4)) and that the short-time behavior can be simulated by a Gaussian process (GP).

We generate data according to the model

$$ y(t) = u_{\text{vdp}}(t) + u_{\text{loc}}(t) + \epsilon , $$

(15)

where $u_{\text{vdp}}(t)$ are the predictions obtained from the Van der Pol equation, $u_{\text{loc}}(t) \sim \mathrm{GP}(0, k(t,t'))$ are simulated local effects according to a GP with squared exponential kernel with variance 0.2 and length scale 0.5 and $\epsilon \sim \mathcal{N}(0, \sigma ^{2}_{n})$ is Gaussian noise with variance $\sigma _{n}^{2}=0.05$.

To simulate the Van der Pol equation (Eq. (4)), we define the differential $f_{\text{ODE}}:\mathbb{R}^{2} \rightarrow \mathbb{R}^{2}:(s_{t}, v_{t}) \rightarrow (v_{t},- s_{t}+\mu v_{t}(1 - s_{t}^{2}) )$ in the state-space $h_{t}=(s_{t}, v_{t})=(u_{t}, \frac{du_{t}}{d_{t}})$, where for ease of readability, we denote a function evaluated at time point t with the subindex t, e.g. $u_{t} \equiv u(t)$. We use a order 5(4) Runge-Kutta method to simulate $\frac{dh_{t}}{dt} = f_{\text{ODE}} (h_{t}; \mu )$ over the time interval $[0,50]$ (at a resolution of 0.1 units) with $\mu =5$, and initial state $h_{0}=(1,0)$.

The generated time series data $\mathcal{D}=(t_{k}, y_{k})_{k=1,\ldots ,K}$, where $y_{k}$ is the measured dynamic response at time $t_{k}$ is depicted in Fig. 3, with training data denoted by blue points and test data denoted by red points. It can be seen that the generated data follows mostly the Van der Pol equation, which covers the majority of the underlying physical processes, but does not fully account for certain localized phenomena or short-term dynamics. To make the modeling task more challenging, we further assume that the measurement system had a black-out between 5 and 15 time units during which no training data is available.

The results in the figure provide a qualitative comparison of a pure first principles-based modeling approach based on Eq. (4), fitting a data-based approach (Eq. (10)), and a hybrid model using the delta approach.

Figure 3a shows the dynamic response according to the Van der Pol equation. While this model accurately captures the long-term behavior of the system, it falls short in capturing the finer details and short-term effects.

The GP predictions are shown in Fig. 3b. When abundant training data is available, the Gaussian Process performs well. However, if training data is scarce (between 5 and 15 time units), the predictions fall back to the prior (which is zero) and are accompanied by high uncertainties.

Finally, we combine the Van der Pol oscillator with the Gaussian Process. The data-driven model learns the discrepancies between the first-principles-based model’s predictions and the observed data, effectively accounting for unmodeled or inaccurately modeled phenomena. Results are depicted in Fig. 3c demonstrating that the hybrid model combines the best of both worlds: when training data is available, the Gaussian Process improves the predictions compared to the physics-based model significantly, capturing effects not considered in the Van der Pol equation. When training data is limited, the physics-based model takes over, as the Gaussian Process predictions revert to the prior.

Employing the delta model combines the first-principles-based and data-driven components, resulting in an improved hybrid model. Our results confirm that this model provides more accurate and reliable predictions by accounting for both the strengths and the limitations of the individual models in different data scenarios.

Discussion

The delta model offers several compelling advantages that underscore its utility in hybrid modeling. One of its primary strengths is the facilitation of fast prototyping. With the availability of a first-principles-based model $\mathcal{P}$, researchers and practitioners can swiftly initiate their modeling efforts. As more data becomes available or as the need for enhanced precision arises, the data-driven component $\mathcal{D}$ can be incrementally introduced, refining the model without necessitating a complete overhaul.

Moreover, the delta model inherently promotes higher accuracy and robustness. While the physical model $\mathcal{P}$ provides a foundational understanding, it might occasionally fall short due to assumption mismatches or its inability to encapsulate the stochasticity inherent in many real-world processes. For instance, $\mathcal{P}$ might be predicated on idealized assumptions, such as negligible noise levels or presumed linearity, which might not hold true in practical scenarios. The data-driven component $\mathcal{D}$ serves as a corrective mechanism in such instances, adeptly learning to account for complex non-linearities, stochastic effects, and other intricate real-world phenomena that the physical model might overlook.

Another salient advantage of the delta model is its data efficiency. Learning the deviations or discrepancies from an existing model $\mathcal{P}$ is often more data-efficient than attempting to learn the entire function from scratch solely through $\mathcal{D}$. This efficiency is particularly pronounced when training data is sparse. By incorporating the physical model, the delta model introduces a beneficial inductive bias, ensuring that even in low-data regimes, plausible estimates can be generated.

Lastly, the delta model’s design inherently supports specialization. In many scenarios, it might be infeasible to obtain training data that spans the entirety of the input domain, perhaps due to safety concerns, prohibitive measurement costs, or other constraints. The delta model elegantly addresses this challenge. For test points that lie outside the domain covered by the training data, the physics-based model $\mathcal{P}$ takes precedence, leveraging its capability to extrapolate reliably. Conversely, for inputs that are well-represented in the training data, the data-driven model $\mathcal{D}$ offers its specialized insights, ensuring predictions that are both accurate and nuanced.

The advantages described above, make the delta model a popular design pattern for hybrid modeling. However, it also has its limitations. Due to the additive nature of the pattern, it has limited modeling flexibility. Specifically, it does not explicitly model higher-order interactions between the physics-based model and the data-driven component.

4.1.2 Physics-based preprocessing

Physics-based preprocessing is another crucial design pattern in hybrid modeling that leverages domain knowledge to enhance the performance of data-driven models. By incorporating transformations derived from physical laws or other domain-specific knowledge, this design pattern preprocesses the input data before feeding it into a data-driven model. The preprocessing step can introduce useful inductive biases, reduce the dimensionality of the data, and improve the overall efficiency and interpretability of the resulting model.

In the physics-based preprocessing design pattern, a transformation model $\mathcal{P}$ is applied to the input variables x before they are fed into a data-driven model $\mathcal{D}$. The transformation function incorporates domain knowledge, such as physical laws or constraints, to preprocess the data. The output prediction of the hybrid model $\mathcal{H}(x)$ can be expressed as:

$$\begin{aligned} \mathcal{H}(x) = \mathcal{D}\bigl(\mathcal{P}(x)\bigr) . \end{aligned}$$

(16)

Here, $\mathcal{P}(x)$ represents the preprocessed input variables, and $\mathcal{H}(x) = \mathcal{D}(\mathcal{P}(x))$ are the output predictions for the hybrid and data-driven models, respectively. The transformation function, $\mathcal{P}(x)$, is designed based on domain knowledge to enhance the data’s representation or to simplify the data-driven model’s task, leading to improved performance and interpretability. The block diagram for physics-based preprocessing is in Fig. 4.