Hybrid modeling: towards the next level of scientific computing in engineering

The integration of machine learning (Keplerian paradigm) and more general artificial intelligence technologies with physical modeling based on first principles (Newtonian paradigm) will impact scientific computing in engineering in fundamental ways. Such hybrid models combine first principle-based models with data-based models into a joint architecture. This paper will give some background, explain trends and showcase recent achievements from an applied mathematics and industrial perspective. Examples include characterization of superconducting accelerator magnets by blending data with physics, data-driven magnetostatic field simulation without an explicit model of the constitutive law, and Bayesian free-shape optimization of a trace pair with bend on a printed circuit board.

In modern terms, we call this complementary approach hybrid modeling.
Definition Hybrid models combine first principle-based models with data-based models into a joint architecture, supporting enhanced model qualities, such as robustness and explainability.
First principles express formalized domain knowledge. For the purpose of this paper the domain knowledge results from physics. But there are other possibilities, such as statistics (e.g., probabilistic graphical models [3,Ch. 8]) or discourse (e.g., ontologies [16]). Data may be obtained from any source, in particular from observation or simulation. We find also the somewhat narrower terms scientific machine learning [29], physics-based machine learning [39] and predictive data science [45], respectively.
Consider a high-dimensional manifold that contains some big data. It might be that a submanifold can be identified, which is dictated to us by the laws of physics, e.g. regarding admissible system dynamics. Learning algorithms can then be used to project the data into this submanifold. In other words, the structure of submanifold embeds physical constraints. A classical example is Kálmán filtering [23]. Kálmán filtering was actually an enabling technology for the moon landing in 1969, where the goal was landing within ≈500 m after ≈400,000 km of travel. More preference is given to physics or data, depending on the level of uncertainty. Kálmán filtering can be recognized as recursive Bayesian estimation. A more sophisticated example is presented in Sect. 2. Ensemble Kálmán filtering is used in weather forecasting centers worldwide [37]. They have to deal with about 10 6 incoming data points per hour, and mathematical models with about 10 9 states. This use case is similar to digital twinning, since data from the field is acquired and used to update the models. Citing [45, p. 39]: "Learning from data through the lens of models is a way to exploit structure in an otherwise intractable problem. " Looking closer into engineering, we notice that a large class of physics models can be decomposed into conservation laws and constitutive laws [40,Ch. 1.3], [41]. The conservation laws are of topological nature and can therefore be discretized easily, leaving little room for data-driven techniques. The situation is different for the constitutive relations, which are of metric nature, and encode phenomenological material properties. Except for simple media (local, linear) there are many potential complications (non-local, hysteretic, non-linear, multi-scale, multi-physics, etc.). Here, data-driven models can be useful, provided that the models fulfil certain admissibility criteria, which can often be expressed in terms of invariance with respect to symmetry groups (orthogonal group, Lorentz group, etc.). This is demonstrated in Sect. 3.
In physics-based modeling, one often aims at sparse models, by exploiting the principle of locality [18], 2 or at least at low-rank representations. Some popular hybrid approaches model physics by partial differential equations plus boundary conditions, represent the solution space by a deep neural network, and learn the solution in a data-driven fashion (Deep Ritz [11], Physics-Informed Neural Networks, PINNs [32]). In general, the notion of sparsity is then lost, though. Recently, Sparse, Physics-based, and partially Interpretable Neural Networks (SPINNs) were proposed, where sparsity is recovered by exploiting concepts from traditional mesh-free numerical methods [33].
To sum up, hybrid modeling has the potential to improve the Pareto tradeoff between simulation accuracy and simulation cost significantly, and therefore bring scientific computing in engineering to the next level. In the remainder of the paper we will showcase this by some recent achievements.

Blending data with physics [30]
This section deals with the characterization of normal and superconducting magnets of CERN's accelerator complex. Due to the complex interplay of iron saturation, eddy currents and magnetic hysteresis, numerical field simulation alone fails to achieve the required accuracy for particle tracking applications. However, after a well-defined excitation current cycling, the physical state of the magnet is reproducible. The magnets are therefore precycled and the static magnetic field is characterized by measurements. It is beneficial to model the magnetostatic field by the Boundary Element Method (BEM). In this way, the reconstructed field is locally an exact magnetostatic solution, no domain discretization inside the air gap is required and high convergence rates are achieved for field evaluations [30].
In the sequel, the field is represented by a double layer potential. The dipole layer is located on the boundary of the domain of interest, which follows the particle trajectory, such as the curved domain illustrated in Fig. 1. Following an isogeometric approach, the domain itself is described by NURBS (Non-Rational Uniform B-Splines) surfaces and the double layer density is discretized by higher oder basis splines [8]. This gives rise to a state vector ν ∈ R N , with N degrees of freedom. The measurement data y ∈ R M relates to the state vector ν, by means of the observation operator H ∈ R M×N : ν → y.
Spatial field sampling is achieved by using alignment machines and magnetic field sensors, such as Hall probes or induction coils, which are supported on stages or beams with finite stiffness. To achieve a reasonable total measurement duration the data is almost always acquired on the fly. This means that the machine is performing moves along lines and a measurement is triggered whenever a certain distance is passed. As field gradients are large in the fringe field region of a magnet, most of the measurement uncertainty is due to positioning errors and vibrations. Such errors are perturbing the observation operator H. To quantify the impacts, the perturbations d ∈ R 6M may be introduced, encoding the de-

(ν)+∂ d H(ν)d is justified. The operator ∂ d H encodes the derivatives of H with respect to d.
A hybrid model is established, by estimating the state vectors ν and d from measurement data. Considering both ν and d as unknown random variables, Bayesian inference provides the framework to utilize all the available knowledge. To this end, Bayes rule is applied to the joint probability density function p(ν, d|y) ∝ p(y|ν, d)p(ν)p(d).
The distributions p(ν) and p(d) are called prior distributions as they encode knowledge about ν and d, which has been available before the data y was collected. Due to the dependency of y on ν and d the explicit computation of the statistical moments of p(ν, d|y) is infeasible. However, under normal distribution assumptions, the conditionals p(ν|d, y) and p(d|ν, y) take on the least squares forms, With the conditionals at hand, samples from the joint distribution p(ν, d|y) can be computed by Gibbs sampling [35], according to Algorithm 1. As the mechanical degrees of freedom can be considered as uncorrelated between the moves, samples are drawn move-by-move in a blockwise fashion. In Fig. 2 the sampled mean as well as the maximum and minimum values for the vertical sensor position over a single move are drawn. For verification, an optical measurement was taken. Differences are within measurement precision. In case of a linear observation operator H, sampling ν k from the high dimensional p(ν|d, y) is efficient, when implemented as the solution of the stochastic linear equation system [31] where a Gaussian prior p(ν) ∼ N (ν 0 , L -1 ν ) was selected. The vector ν 0 ∈ R N is the prior expected value of the BEM state vector and L ν ∈ R N×N is the prior precision matrix. The matrix R ∈ R M×M is a sparse, symmetric and positive definite covariance matrix of the sensor noise and H k ∈ R M×N is the perturbed observation matrix, which is constructed sample BEM state vector, see (2) for k = 1, . . . , number of samples K do for j = 1, . . . , number of moves J do d k,j ∼ p(d j |ν k-1 , y j ) sample mechanical state vector for move j end for sample BEM state vector, see (2) end for Equation (2) follows from the conditional p(ν|d, y) and the linearity of the expected value function [2, 1.17]. The structure of H k depends on the underlying field representation. Boundary element formulations give rise to dense discrete operators. Fast multipole methods can be applied to compress the products H T k y and H k x if required. A solution to (2) can then be obtained by conjugate gradient iterations.
The matrix H T k R -1 H k + L ν corresponds to the inverse of the posterior covariance matrix. It therefore needs to be symmetric and positive definite. This can generally be achieved with the selection of a symmetric and positive definite prior precision matrix L ν , (see e.g. [2, 5.1.1]). In the results presented in Fig. 2, dense operators were used and the solution of (2) was computed by using the Cholesky solver of the Eigen C++ library [17].

Data-driven field simulation [7, 13, 14]
This section is about data-driven field simulation for magnetostatic problems. Here, datadriven simulation is meant in the context of simulations directly on the material data and was first introduced in [26]. Consider Maxwell's equations for magnetostatics, where H denotes the magnetic field strength, j the imposed source current density, B the magnetic flux density, and the considered domain. The problem formulation (3) is completed with appropriate boundary conditions. The phase space of the system is denoted by Z := {z(x) := (B(x), H(x))}, x ∈ . The set of all states that fulfill (3) and the boundary conditions is denoted by M ⊂ Z, the set of Maxwell-conforming fields. 3 Yet, to uniquely solve equations (3), a relation between B and H is necessary. Following the conventional approach, first a material law is assumed. Secondly, various fitting techniques are employed to compute the coefficients of the considered material law such that it fits best to the available measurement data. However, this approach suffers from approximation errors and additionally introduces epistemic uncertainties, i.e. the uncertainty is in the actual choice of the material model. Note that in contrast Maxwell's equations are derived from first principles and accepted to be known exactly.
The hybrid solver acts directly on material data and circumvents the demand for a fixed material model. The measurement data is collected in a set D * : where N is the number of measurement points. This gives rise to a set of discrete material states D := {z ∈ Z|z(x) ∈ D * ∀x ∈ }. "The material response is not known exactly and, instead, it is imperfectly characterized" [26, p. 95] by the set D.
The solution is given by the states M ∩ D that fulfill Maxwell's equations, while being compatible with the material states. However, for a finite number of data points, this set is very likely to be empty, M ∩ D = ∅. Therefore, we define the solution S by the relaxed condition where the distance function is defined in terms of the auxiliary norms · ν,μ , which are defined as whereν,μ are diagonal tensors carrying the so-called weighting factorsν,μ. Note that ν,μ are of computational nature and do not represent physical material properties. Yet, their choice affects the convergence rate of the numerical scheme [13]. The optimization problem (4) is essentially a double minimization problem, where we accept a solution z that conforms to Maxwell's equations, while at the same time being closest to available measurement data. The solution of (4) is organized as a fixed point iteration, see Fig. 3 right. Letd(z 1 , z 2 ) denote the distance function between the two states z 1 and z 2 defined over the auxiliary norms (6). For any given state z ∈ Z a modified finite element solver is used to compute the state z ∈ M such thatd(z, z ) = min. The solver is based on a variational principle discussed in [34]. The idea is to solve Ampère's and Faraday's laws exactly and shift the discretization error entirely into the constitutive relation. 4 Given z ∈ M, a discrete optimization selects the closest measurement data points, see  left. These so-called active measurement data are associated with a state z × ∈ D such that d(z × , z ) = min. State z = z × is the starting point for the next iteration. Under convexity assumptions this algorithm converges to the solution of (4). Furthermore, it has been shown in [4] that for linear elasticity, the conventional solution is recovered with measurement data sets of increasing size. The hybrid solver was applied to the model of a two-dimensional quadrupole magnet in [7] and [13]. The first work [7] extended the hybrid solver to cope with problems that feature domains where only measurement data is at hand and domains where the material law is known, e.g. in the case of vacuum. Such known material laws can be naturally included into the data-driven iterations. To analyze the performance of the hybrid solver, a strongly nonlinear soft-magnetic material was considered in the iron part of the quadrupole magnet. As a baseline, a standard magnetostatic finite element solution was considered, by discretizing 1/8 of the geometry with 6k piecewise linear elements. For the novel method, data was created by an equidistant sampling of the given non-linear B(H) characteristic, without adding noise. Global weighting factorsν,μ were employed throughout. The results show that the relative error to the reference solution, the so-called energy-mismatch, decreases as more measurement data is employed, see the dashed line in Fig. 4. However, depending on N = 10 2 . . . 10 4 the proposed method required 50 . . . 1300 outer iterations for convergence. In [13] local weighting factorsν(x),μ(x) have been introduced and defined as to be the local tangent on the current operation point z × . Particularly in the case of unbalanced and/or data-starved sets D, local weighting factors yield significant improvement in accuracy, see the solid line in Fig. 4. Moreover, approximately 55 iterations are needed irrespective of the cardinality of D.
A computationally demanding 3D inductor model was considered in [14], where realworld measurement data [1] was employed for the iron part. For comparison, an extended Brauer model [22] was constructed on the measurement data and utilized in a standard Newton solver. Figure 4, right shows the absolute value of the x-component of the magnetic fields for the data-driven solution as well as the solution computed with the extended Brauer model. Furthermore, the available measurement data is depicted in red. The solution of the hybrid solver clusters around the measurement data, which is expected due to the sparse data set. Here, the data-driven solution is next to the available measurement data in all regimes of the BH-curve. That does not hold for the extended Brauer model which provides a good approximation in the linear part, but fails to properly model the Rayleigh part as well as the saturation part of the BH-curve. Both approaches lead to an acceptable solution, yet the employed material is well understood and significant effort has already been spent in the material model. If a more complex material is utilized, the hybrid solver should outperform the conventional approach.
In summary it can be said that so far hybrid solvers fail to be as computationally efficient as standard methods, albeit lots of effort is spent to improve the performance [24,27]. Yet, hybrid solvers are advantageous if complex or novel materials are considered, since the effort for matching a material model to the measurement data is abolished. Furthermore, by circumventing the demand for a fixed material model, no epistemic uncertainties arise.

Bayesian free-shape optimization [36]
Bayesian optimization (BO) is an optimization method to optimize a given function which is expensive to evaluate [12]. It is built upon a hybrid architecture that blends intricate physical models with a Bayesian machine learning technique, such as Gaussian Process (GP) regression. The resulting surrogate models are cheap to evaluate, including derivatives, and keep track of their interpolation uncertainty. The core idea of BO is to successively refine those surrogates in regions of design space that are close to optimal, which are however not known beforehand. Regions with high surrogate uncertainty might be optimal even though the mean interpolation says otherwise. Thus, surrogate refinement requires balancing exploration against exploitation during sampling, the so-called bandit problem. 5 There are different strategies for achieving a good balance. One strategy considers the best value achieved so far and computes the Expected Improvement (EI). The next sample is taken at the point with the largest EI; this yields yet another optimization problem. The BO algorithm stops if the EI drops below some threshold. The BO approach can be generalized in various ways, such as BO with noise, BO in several dimensions, and BO for several objectives. As an industrial example we consider BO of a differential trace pair on a printed circuit board. Differential signalling benefits from high immunity against electromagnetic interference and low crosstalk. However, bend discontinuities in transmission lines introduce (i) reflection and (ii) differential-to-common-mode conversion. An optimal design hence requires multi-objective optimization of the geometry. A parametric case was studied in [15], while we aim at free-shape optimization. Figure 5 shows a schematic for system simulation. The trace pair with ports 1,1' and 2,2' , respectively, is described by an equivalent electrical circuit (EEC). Mode converters admit a separation of differential mode (D) and common mode (C) signal components. The optimization objectives can be stated in terms of S-parameters: 6 (i) reflection |S DD11 | ! = min; (ii) mixed mode conversion |S CD21 | ! = min.
The geometric setting is depicted in Fig. 6 left. The outer trace is fixed, while the inner trace has a free interior surface. The geometry is described by a finite element mesh, and the free surface can be re-shaped by mesh morphing. This corresponds to a highdimensional design space with ≈200 dimensions. This should be put in contrast to the six-dimensional design space that was considered in [15]. The optimization problem is: Find the Pareto front for the shape of the free surface that minimizes the objectives.
The ingredients for solving the optimization problem are: finite element electromagnetic field solver, EEC extraction, and adjoint sensitivity analysis. Figure 6 right shows the shape gradient vectors at the free surface for the two objectives (i) and (ii). The two gradient vector fields point in opposite directions, so the objectives are conflicting. However, the gradient fields are not exactly negatives of each other, so there is still subtle room for improvement.
The BO is extended to the multi-objective case as follows. The Pareto front is approached via a sequence of auxiliary optimization problems, each with respect to a certain  2D affine subspace of the high-dimensional design space. This particular affine subspace is spanned by the adjoint-based gradients; it is the subspace of maximum objective variance. For each optimization problem of the sequence, BO learns and optimizes GP surrogate models for the objective functions, restricted to this subspace. Once the intermediate Pareto front is converged in this subspace, new subspaces may be chosen on the intermediate Pareto front. Figure 7 shows the result of this algorithm, after only ≈100 design evaluations. Note that even subtle improvement potentials will be exploited by the hybrid free-shape optimizer.

Summary and outlook
We highlighted three examples of hybrid modeling, cf. Table 1. Further examples stem from other applied mathematics and industrial domains, and involve artificial neural networks, such as Physics Informed Neural Networks (PINNs) [25], Deep Learning (DL) with embedded physics simulation [6], or Field Inversion and Machine Learning (FIML) [9]. They will soon find their way into scientific computing in electrical engineering, too.

Funding
The work of AG has been supported by the DFG, Research Training Group 2128 "Accelerator Science and Technology for Energy Recovery Linacs". The work of AK, MS and SK was funded by the Robert Bosch GmbH. The work of DL is further supported by the BMBF via the research contract 05K19RDB.

Availability of data and materials
The datasets analysed during the study in Sect. 2 are available in the repository https://github.com/MelvinLie/BEMMM.git. For Sect. 3, the measured material data can be found in [1]. Further information on the simulated geometries is given in [7,13,14]. The CAD model and network simulation setup explained in Sect. 4 are property of the Robert Bosch GmbH. Details are available on reasonable request, subject to permission by Robert Bosch GmbH.