A blackbox yield estimation workflow with Gaussian process regression applied to the design of electromagnetic devices

In this paper an efficient and reliable method for stochastic yield estimation is presented. Since one main challenge of uncertainty quantification is the computational feasibility, we propose a hybrid approach where most of the Monte Carlo sample points are evaluated with a surrogate model, and only a few sample points are reevaluated with the original high fidelity model. Gaussian process regression is a non-intrusive method which is used to build the surrogate model. Without many prerequisites, this gives us not only an approximation of the function value, but also an error indicator that we can use to decide whether a sample point should be reevaluated or not. For two benchmark problems, a dielectrical waveguide and a lowpass filter, the proposed methods outperform classic approaches.


Introduction
In mass production of electrical devices, e.g. antennas or filters, often one has to deal with uncertainties in the manufacturing process. These uncertainties may lead to deviations in important parameters, e.g. geometrical or material parameters. Those may lead to rejections due to malfunctioning. In this context, the quantification of uncertainty and its impact plays an important role, also with regard to later optimization. According to Graeb [1,Chap. 2] we define the yield as the percentage of realizations in a manufacturing process, which fulfills performance feature specifications. When dealing with electromagnetism, the performance feature specifications are requirements involving partial differential equations (PDEs) describing the electromagnetic fields, i.e., Maxwell's equations. These can be solved numerically, e.g. with the finite element method (FEM). The most straightforward approach for yield estimation is the Monte Carlo (MC) analysis [2,Chap. 5]. The space of the uncertain parameters is sampled and the performance feature specifications are tested for each sample point. This requires typically many evaluations of the underlying PDEs. Thus, the computational effort is one main challenge of yield estimation.
Over the last decades, various methods have been developed with the aim of reducing the computational effort of MC. One approach to achieve this is to reduce the number of sample points, e.g. by Importance Sampling (IS) [2,Chap. 5.4]. Another approach is to reduce the effort for each sample point, e.g. by using surrogate based approaches [3][4][5]. The cost for building most surrogate models increases rapidly with the number of uncertain parameters. Furthermore, there are counter examples where the yield estimator fails drastically, even though the surrogate model seems highly accurate, measured by classical norms or pointwise [6]. Therefore the same authors propose a hybrid approach. They focus their attention on critical sample points that are close to the limit state function, which is the limit between sample points fulfilling and not fulfilling the performance feature specifications. Critical sample points are evaluated on the original high fidelity model, while the other sample points that are far from the limit state function are evaluated only on a surrogate model. Because of this distinction, which leads to a combination of different model types, it is called a hybrid approach. Here, the choice of the surrogate model and the definition of close and far are crucial. In [7], a hybrid approach is proposed, using radial basis functions (RBF) for the surrogate model and an adjoint error indicator to choose the critical sample points. In [8] a similar hybrid approach is proposed, using stochastic collocation with polynomial basis functions and also an adjoint error indicator. In this paper we combine these ideas and propose a hybrid approach using Gaussian process regression (GPR) for both, building the surrogate model and obtaining an error indicator in form of the prediction standard deviation given by the GPR. The critical sample points are used to improve the GPR model adaptively during the estimation process. Further we investigate if sorting the sample points can increase the efficiency.
Other research related to GPR based surrogate models for yield or failure probability estimation is conducted in [9][10][11][12]. In [9] various sorting strategies for GPR model training data are described and compared. In [10], the authors concentrate on the calculation of small failure probabilities with a limited number of function evaluations on the high fidelity model. They also use an adaptive GPR surrogate model, but do not combine it with a hybrid approach and therefore have no critical sample points that could be used to improve the GPR model. Instead, they distinguish between the sample points generated by Subset Simulation (Sequential MC) for error probability estimation and those generated as training data using a Stepwise Uncertainty Reduction technique to refine the GPR model adaptively. In [11] and [12], a GPR based surrogate model approach is combined with IS. Again, no hybrid approach is used. Adaptively, GPR model and IS density are improved by adding one or more sample points from the MC sample of the last iteration to the training data set, which are selected by a learning function and then calculated on the high fidelity model. On the contrary, in practice it is often assumed that the design parameter deviations are small in a way that a linearization is valid [13, Online Help: Yield Analysis Overview]. This approach is obviously very efficient but it is very difficult to determine on beforehand if the assumption is valid.
The paper is structured as follows. After setting up the problem, in Sect. 3 existing approaches and the concept of GPR are briefly described. Then the use of GPR for yield estimation, also in combination with a hybrid approach, is discussed. In Sect. 4, numerical results are presented using a benchmark problem, a simple waveguide, and a practical example, a low pass filter calculated with CST, before the paper is concluded in Sect. 5.

Problem setting
Even though the proposed ideas are generally applicable, in the following we will focus on problems from the electrical engineering where electromagnetic field simulations are necessary. This is the case, for example, when designing antennas or filters. Depending on the frequency, e.g. the electric field can be calculated to retrieve information about the performance of the device. Then the performance can be optimized by adapting the design. In order to calculate the electric field on a simply connected bounded domain D, we start from Maxwell's formulation where E ω = E ω (x, p) denotes the electric field phasor, ω the angular frequency, μ = μ r μ 0 the dispersive complex magnetic permeability, ε = ε r ε 0 the dispersive complex electric permittivity and J = J(x, p) the phasor of current density. The vacuum and relative permeability are denoted by μ 0 and μ r = μ r (x, p), the vacuum and relative permittivity respectively by ε 0 and ε r = ε r (x, p). Assuming suitable boundary conditions, building the weak formulation and discretizing with (high-order) Nédélec basis functions we derive the linear system with system matrix A ω (p), discrete solution e ω (p), the discretized right-hand side f ω , all depending on the design parameter p and the frequency ω. For further details we refer to [14][15][16]. As quantity of interest (QoI) Q ω (p) = q e ω (p) , we consider the scattering parameter (S-parameter), cf. [17,Chap. 3] and [8], i.e., Q ω (p) := S ω (p). In this case, q is an affine linear function, but this is no requirement for the following yield estimation methods. If there are uncertainties in the manufacturing process, the design parameters may be subject to random deviations. Therefore we model the uncertain parameter vector p as multidimensional random variable. We assume p to be (truncated) Gaussian distributed (cf. [18]), i.e., p ∼ N T (p, , lb, ub) with mean value p, covariance matrix , lower and upper bounds lb and ub and the corresponding probability density function [19] Note that the definition of the yield and the proposed estimation method is independent of the chosen probability density function pdf(p). The Gaussian distribution is a typical choice for modeling design parameters as uncertain. Here, we truncate it to avoid nonphysical realizations of p, e.g. negative distances. Following [1] we define the performance feature specifications as a restriction on our QoI in a specific interval, i.e., where c is a constant and ω a range parameter from an interval T ω . Here, we identify ω with the frequency and T ω with the frequency domain of interest. The safe domain is defined as the set containing all parameters, fulfilling the performance feature specifications, i.e., Then, the yield can be written as [1,Chap. 4 where E denotes the expected value and 1 s the indicator function with value 1 if the parameter p lies inside the safe domain and value 0 otherwise.

A GPR-hybrid approach for yield estimation
In a MC analysis a large number of sample points is generated, according to the truncated normal distribution of the uncertain parameters, and evaluated in order to obtain the QoI. The fraction of sample points lying inside the safe domain is an estimator for the yield. Since the accuracy depends directly on the size of the sample, a classic MC analysis comes with high computational costs [20]. In the past, various surrogate based approaches have been proposed. The idea is to approximate the QoI, i.e., find a mapping from the design parameter p toS ω , whereS ω is an approximation of S ω . This allows to evaluate the performance feature specifications (2) and thus the safe domain (3) without solving a PDE for each sample point. The stochastic collocation hybrid approach proposed by [8] showed that the computational effort can be reduced significantly while ensuring the same accuracy and robustness as with a classic MC method. Nevertheless, there are a few drawbacks. First, since a polynomial collocation approach was used, the training data for the surrogate model must come from a tensorial grid and cannot be chosen arbitrarily. As a consequence the surrogate model cannot be updated easily, e.g. with the information from the evaluation of critical sample points. This could be handled by using regression, but the second disadvantage would still remain: In order to distinguish between critical and non-critical sample points an adjoint error indicator was used. This requires the system matrices and the solution of the primal and the dual problem, which is not always given when using proprietary software and can become very costly in case of non-linear QoIs. The GPR-Hybrid approach we propose in this paper overcomes these issues.

Gaussian process regression
Following Rasmussen and Williams [4, Chap. 2.2], the technique of Gaussian process regression can be divided into four mandatory steps and one optional step.

Prior:
We make some prior assumptions about the functions we expect to observe. We write if we expect the S-parameter to follow a Gaussian process (GP) with specific mean m and kernel function k. In the following we use the constant zero function as a starting value for the mean function. When the GP is trained, the mean value of the training data evaluations will be used. As kernel function we choose the squared exponential kernel, which is also known as RBF, i.e., with the two hyperparameters ζ ∈ R and l > 0. At this point we refer to Sect. 4 to see how we set the hyperparameters. For more information about hyperparameters in general, please refer to [4, Chap. 5].

Training data:
We collect data by evaluating sample points on the high fidelity FE model. The so-called training data set will be used to train the GP. In the following, we generate these sample points according to the distribution of the uncertain parameters.

Posterior:
In this step the information from the prior and the training data is combined in order to obtain a new GP, with updated mean and kernel function. We write then the posterior distribution of the output S p depending on the training data set T is given by

Predictions:
For an arbitrary test data point p the predicted distribution of the output S p depending on the training data set T and the test data point is given by with k p , P = k p , p 1 , . . . , k p , p n , k P, p = k p 1 , p , . . . , k p n , p .
Thus, GPR predictions of the function valueS GPR (p ) and the standard deviation σ GPR (p ) can be obtained. Please note, that σ GPR (p ) is the standard deviation of the surrogate model and is not related to the design uncertainty, i.e., .

Model update (optional):
A new data point (p add. , S(p add. )) can be used to update an existing GPR model. Therefore the training data set is updated to as well as (4) has to be updated according to and m new = m m(p add. ) .
Then, predictions for a new test data point p can be obtained by (5) using the updated data from (6) and (7). This update involves the factorization of the matrix K new which is in the worst case of cubic complexity, i.e., O(n 3 ), and can be reduced to O(r 2 n) by using low-rank approximations, where n is the number of training data points and r the rank of the low-rank approximation [4,Chap. 8].
In conclusion, we assume that this effort is negligible in comparison to the high fidelity evaluations, i.e., solving (1)

Combining GPR and the hybrid approach
The idea of the hybrid approach is saving computing time by evaluating most of the MC sample points on a cheap to evaluate surrogate model and only a small subset of the sample on the original high fidelity (e.g. FE) model. The critical sample points, i.e., sample points close to the limit state function, are those which are evaluated on the high fidelity model. As mentioned before, the choice of the critical sample points is crucial, for efficiency and accuracy of this approach. In [7] and [8] adjoint error indicators are used. Here, we take advantage of the GPR that provides an error indicator in the point p in the form of the standard deviation σ GPR (p). The performance feature specification expects the inequality (2) to hold in the whole frequency interval T ω . However, we define a discrete subset T d ⊂ T ω and enforce only that the inequality holds for all ω j ∈ T d . This means, for each frequency point ω j a separate surrogate model is built, otherwise rational interpolation could be used, e.g. [21]. Thus, the GPR model and the resulting prediction values and standard deviations depend on the frequency and are denoted byS GPR,ω j (p) and σ GPR,ω j (p) with j = 1, . . . , |T d |. We apply a short circuit strategy, i.e., a sample point is not evaluated on the remaining frequency points if it has already been rejected for a previous one. This allows us to save computing time and does not affect the estimation result, except the case that a sample point has been rejected erroneously based on an underestimated standard deviation prediction. Further, we build separate surrogate models for the real part and the imaginary part of the S-parameter, and later combine them for the prediction. This guarantees (affin-)linearity of the QoI by avoiding the square root. Algorithm 1 shows the classification procedure for one sample point p i . Once the GPR models are constructed, a MC analysis is carried out on the surrogates. For each sample point a predicted S-parameter value and a predicted standard deviation are obtained. Following the concept of sigmalevels [22], the predicted standard deviation multiplied with a safety factor γ is considered as an error indicator for the surrogate model. The value of γ is problem dependent and can be derived by evaluating some test data points on the high fidelity model and on the GPR model and considering the ratio of the true error and the predicted error, i.e., the Algorithm Evaluate the GPR model and obtainS GPR,ω j (p i ) and σ GPR,ω j (p i ) 4: Performance feature specifications fulfilled for ω j 6: Classify p i / ∈ s (not accepted) and stop 8: else 9: p i is a critical sample point 10: Evaluate the high fidelity model and obtain S ω j (p i ) 11: if |S ω j (p i )| ≤ c then 12: Performance feature specifications fulfilled for ω j 13: else 14: Classify p i / ∈ s (not accepted) and stop 15: end if 16: end if 17: if j = |T d | then 18: Classify p i ∈ s (accepted) 19: else 20: Continue with j = j + 1 21: end if 22: end for standard deviation. The predicted standard deviation multiplied with the safety factor γ serves as a buffer zone. If the performance feature specification (2) is (not) fulfilled for the predicted S-parameter value and all values in the range plus/minus this buffer zone the considered sample point is classified as (not) accepted, else it is classified as critical and reevaluated on the high fidelity model. Then, the yield will be estimated bỹ where N MC is the size of the MC sample. A significant advantage of GPR is, that the model can be easily updated on the fly. Algorithm 2 shows the process of yield estimation including updating the GPR models. Typically the computational effort of a surrogate based approach lies in the offline evaluation of the training data. Therefore we start with a small initial training data set. The resulting less accurate GPR model does not pose a problem in terms of yield estimation accuracy, because the hybrid method still classifies all MC sample points correctly as accepted or not accepted. The only difference is, that there might be more critical sample points in the beginning, if the initial GPR surrogate has been built with a smaller training data set. Then, during the estimation process (online), we use critical sample points to improve our GPR model. This update requires almost no additional computational effort, since these sample points were calculated in the hybrid method anyway. In order to enable parallel computing even with model updates, we intro- for j = 1, . . . , |T d | do 6: Define C j = {p i : p i classified as critical for ω j in last N B high fidelity evaluations} 7: end for 8: if |HF online GPR-H | is an integer multiplier of N B then 9: for j = 1, . . . , |T d | do 10: Initialize ε = ∞ 11: while ε > ε t do 12: Update GPR model for ω j with sample point p add. 14: Evaluate updated GPR model and obtain updatedS GPR,ω j (p i ) for all p i ∈ C j 15: . Then, the sample point for which the difference between the predicted value and the real value of the S-parameter is maximum will be included in the training data set (cf. line 12 in Algorithm 2). The GPR model is updated with the additional training data point and all sample points in C j are evaluated on the new GPR surrogate model in order to obtain a new prediction. This procedure is repeated until the error is below a tolerance ε t . Using the updated GPR model, the next MC sample points are evaluated until again N B sample points have been evaluated on the high fidelity model (in parallel) and GPR model updates are considered. Without much extra cost it is also possible to reevaluate all already considered, non-critical sample points after each GPR model update. At this point it can also be decided whether all critical sample points are added to the training data set or only a part. Especially when solving in batches, it can be advisable not to include all critical sample points in order to avoid adding to many, closely neighboring sample points. If ε t = 0, all critical sample points are used to update the GPR model. The proposed updating strategy can be modified by sorting the sample points with negligible costs. The idea is to start with the most promising sample points, i.e., those that contribute most to the improvement of the GPR model. This shall lead to more sample points classified correctly without being evaluated on the high fidelity model. In [9] different sorting criterions are described and compared. Here, we will focus on two criterions. The criterion proposed by Echard, Gayton and Lemaire in [23], which we will call the EGL criterion in the following, and a criterion based on our hybrid decision rule, which we will call the Hybrid criterion. The EGL criterion is given by where c denotes the upper bound for the performance feature specification from (2). Then, the sample points are sorted such that we start with the smallest value, i.e., min p i C EGL (p i ) [23]. The Hybrid criterion is defined by holds. Using this criterion, the sample points are sorted such that we start with the largest value, i.e., max p i C H (p i ). Algorithm 3 is a modification of Algorithm 2 including the sorting strategy. Before the classification of each sample point is started, all sample points are evaluated on the GPR models and sorted according to the chosen sorting criterion, e.g. the EGL criterion or the Hybrid criterion. Nevertheless, in the sampling strategy proposed in Algorithm 3 the sorting criterion can be replaced by any other criterion. For the use of batches, for example, a sorting criterion avoiding closely lying sample points within a batch could be even more efficient. After updating the GPR model for one batch of MC sample Run lines 3-7 from Algorithm 2 (i.e., classify p i and define C j ) 6: if |HF online GPR-H | is an integer multiplier of N B then 7: Run lines 9-17 from Algorithm 2 (i.e., update GPR models) 8

Numerical results
In the following we perform numerical tests on two examples, a dielectrical waveguide and a stripline low pass filter. The results of the waveguide are also compared with the estimates resulting from a linearization, which is common in industry. The computations have been carried out with the following configuration: Intel i7-8550U processor with four cores, 1.80 GHz and 16 GB RAM. For solving the corresponding PDEs (1) with FEM, the frequency domain solver of CST Studio Suite®2018 [13] has been used. The yield estimation has been carried out in python 3.7, using the scikit-learn package version 0.21.3 [24] for GPR. Solving our simple models takes only about 15 seconds in CST, while the factorization for the GPR model update is always 1 second.

Dielectrical waveguide
The benchmark problem, an academic example, on which we perform the numerical tests is a simple dielectrical waveguide, cf. [8]. We consider two uncertain geometrical parameters, the length of the dielectrical inlay p 1 , the length of the offset p 2 (see Fig. 1), and two uncertain material parameters p 3 and p 4 with the following effect on the relative permeability and permittivity of the inlay ε r = 1 + p 3 + (1p 3 ) 1 + jω 2π5 · 10 9 -1 -1 , μ r = 1 + p 4 + (2p 4 ) 1 + jω 1.1 · 2π20 · 10 9 -1 -1 .   In this frequency range we consider eleven equidistant frequency points ω j ∈ T d . A commonly used error indicator for MC estimation is given by [20] whereσ Y denotes the standard deviation of the yield estimator. Since the size of the yield is not known on beforehand, we estimate its upper bound by Y (p) = 0.5. We allow a standard deviation ofσ Y = 0.01. According to (8) this leads to a sample size of N MC = 2500. Figure 2 shows values of MC yield estimators of the waveguide for different sample sizes. The black line indicates the most accurate solution we have calculated, i.e., Y MC for N MC = 10,000. The gray shaded area indicates theσ Y level for the yield estimator of the corresponding sample size. The number of high fidelity evaluations before a possible update of the GPR model (batch) can be set to the number of parallel processors available, since these evaluations can be carried out in parallel. However, this value also has another effect: a small number leads to more frequent model updates than a larger number. In general, more frequent model updates imply less critical sample points. We present tests with N B = 50, N B = 20 and N B = 1. The latter implies that no calculation in batches is used. The error tolerance is set to ε t = 0, since this leads to the best results for the waveguide example, i.e., all critical sample points are added to the training data set. Further we set the safety factor γ = 2. This is a rather conservative choice, which may result in too many sample points classified as critical and evaluated on the high fidelity model. Thus, this may increase the computing effort, but it also leads to higher accuracy, since misclassification of sample points is avoided.
For the GPR, the applied kernel is the product of a constant kernel representing ζ and an RBF kernel representing the exponential function with hyperparameter l. In scikitlearn, the hyperparameters have a starting point, in our case ζ 0 = 0.1 and l 0 = 1, and then they are optimized within given bounds, in our case b ζ = (10 -5 , 10 -1 ) and b l = (10 -5 , 10 5 ), respectively. We allowed the hyperparameters to be tuned within 10 iteration steps in order to find the most suitable values for our data. Due to this optimization, the initial setting does not affect the results of the yield estimation significantly. Further we set the noise parameter α = 10 -5 . This parameter defines the allowed deviation from the training data in the interpolation and is recommended to avoid numerical issues, e.g. due to mesh noise. For more information about setting the hyperparameters we refer to [4,Chap. 2.3] and [24]. Once we have evaluated first training data points, the training data's mean is set as mean function of the GP.
For the simple waveguide a closed form solution of (1) exists, cf. [25]. However, we will refer to this solution as high fidelity solution in the following, since in practice a computational expensive FEM evaluation would be necessary at this point. In the following we denote the number of high fidelity evaluations for a specific method with |HF method |. Please note, that the short circuit strategy mentioned in Sect. 3.2 has been applied, i.e., a sample point is not tested for a frequency point if it has been rejected for a previous one. Without this short circuit strategy the number of high fidelity evaluations would be the product of the number of frequency points and the size of the MC sample, i.e., |T d | · N MC = 11 · 2500 = 27,500.
In order to build the GPR models, an initial training data set is needed. It consists of random data points generated according the truncated Gaussian distribution N T (p, , lb, ub) of the uncertain parameters. The size of the initial training data set is chosen, such that the total costs, i.e., the sum of offline (initial training data) and online (critical sample points) costs, is minimal. Using batch size N B = 50, we tested different sizes of the initial training data set, see Table 1. We proceed with the best performing number of ten training data points. For smaller initial training data sets the offline costs decrease, but the online costs increase. For larger initial training data sets it is the opposite. Only this initial training data set is the same for all GPR models. Then, the estimation procedure with Algorithm 2 is started. After a batch of N B critical sample points the GPR models are updated individually if there were critical sample points on them. Table 2 shows the online high fidelity costs |HF online GPR-H | and effective evaluations |EE online GPR-H | for yield estimation with different updating strategies. In order to obtain the total costs, the costs for the initial training data set |HF offline GPR-H | = 110, |EE offline GPR-H | = 110 N B respectively, need to be added. In all cases, the yield estimator isỸ GPR-H = 95.44%, so we obtain the same accuracy as with pure Table 1 Comparison of the number of high fidelity evaluations for the GPR-Hybrid approach |HF GPR-H | for different sizes of the initial training data set |T I |, using batch size N B = 50   MC. With all updating strategies, the number of high fidelity evaluations can be reduced at least by factor 78, in the best case by factor 111, compared to classic MC. In the first setting, N B = 1, there are no batches (i.e., batches of size 1). Without sorting and with both sorting criteria, this setting has the lowest number of high fidelity evaluations. However, parallel computing is not possible without batches, so the number of effective evaluations equals the number of high fidelity evaluations. Using batches, the GPR models are not updated immediately, only after evaluating the complete batch. This leads to an increasement of the number of high fidelity evaluations. But, batches allow parallel computing (on N B parallel computers), i.e., the number of effective evaluations is much lower.
Further, we see that the number of high fidelity evaluations decreases when applying a sorting strategy. The GPR model is improved after the evaluation of a critical sample point. Due to the sorting, we start with the most critical sample points, so the GPR model improves fast and less sample points are categorized as critical. The larger the batches are, the smaller the effect of sorting. Figure 3 shows the number of high fidelity evaluations |HF total GPR-H | over the number of MC sample points, which have been considered for classification. For the 0-th considered MC sample point the offline costs are plotted, then the total costs. The different sorting strategies from Table 2 are compared for batch size N B = 50. The marks indicate the position, where one batch is completed. We see, using sorting strategies, first all critical sample points are evaluated on the high fidelity model, then the non-critical sample points on the GPR model, i.e., the number of high fidelity evaluations increases early and then remains constant. The batches are filled within the first 250 MC sample points. Without sorting, the increasement is also a bit steeper in the beginning, but in general the batches are spread over the whole MC sample. In the end, the total number of high fidelity evaluations is similar for all strategies.
In the following, we compare these results to the results of the stochastic collocation hybrid approach proposed in [8]. The hybrid method is the same, the difference lies in the choice of the surrogate model and the error indicator for defining the critical sample points. In [8], the surrogate model is built using an adaptive stochastic collocation approach with Leja nodes, which led to a maximum polynomial degree of three. Once the polynomial surrogate is built, it is not straightforward to update it during the estimation procedure. Thus, higher accuracy in the initial model is required. An adjoint error indicator is used to estimate the error of the surrogate model. Analogously to the standard deviation of the GPR model, this error indicator, multiplied with a safety factor. Using this stochastic collocation hybrid approach, the same accuracy, i.e., the same yield estimator, was reached using |HF total SC-H | = |HF offline SC-H | + |HF online SC-H | = 330 + 165 = 495 high fidelity evaluations. The number of training data points was chosen such that the method performs best.

Comparison with a linearization approach
In practice, often a simple linearization of the QoI is used for the MC analysis, assuming that the design parameter deviations are small enough to obtain valid results [13]. Therefore we compare the proposed GPR-Hybrid approach with linearization in the following. Linearizations means here, that we use a surrogate model, built by linear interpolation with two points in each dimension, i.e., in addition to p 0 = [p 0 1 , p 0 2 , p 0 3 , p 0 4 ] we consider the four nodes where e k is the k-th unit vector and δ p > 0 the step size (if interpreted in the context of finite differences). Alternatively, derivative information could be used if available. These five nodes are used to create a linear approximation according tõ where |p k | is the length of the vector p k and the a l are the coefficients of the linearization. This model is setup for each frequency point ω j and for the real and the imaginary part of the S-parameter separately. Then a MC analysis on the linear surrogate models is performed. In Fig. 4 we see the results of the yield estimation for different values of δ p . We compare this to the MC solution on the high fidelity model as reference solution and the GPR-Hybrid solution from Sect. 4.1. We introduce υ ∈ [0, 1] as a measure of the magnitude of deviation. The covariance matrix is multiplied with this factor υ in order to obtain problem settings with varying magnitude of uncertainty, i.e., we consider p ∼ N T (p, υ , lb, ub) with different values for υ. For υ = 1 we obtain the results of Sect. 4.1, for υ < 1 the scaled variance decreases and the yield estimator increases until for υ = 0 there is no uncertainty at all and the yield is Y = 1 since p is in the safe domain. While the GPR-Hybrid solution exactly matches the reference solution for all magnitudes

Lowpass filter
We consider as industrial example a stripline lowpass filter, see Fig. 5, taken from the examples library of CST Studio Suite® [13]. We consider six uncertain geometrical parameters g = [L 1 , L 2 , L 3 , W 1 , W 2 , W 3 ] describing length and width of the single blocks. Again, we assume the uncertain parameters to follow a truncated Gaussian distribution with mean and covariance (in mm) given by The distribution of L 1 , L 2 and L 3 is truncated at L i ± 3 mm (i = 1, 2, 3), the distribution of W 1 , W 2 and W 3 at W i ± 0.3 mm (i = 1, 2, 3). Since the requirement for a low pass filter is to allow low frequency signals to pass through while filtering out high frequency signals, in this example we have two performance feature specifications given by I.) S ω (g) ! ≥ -1 dB ∀ω ∈ T ω,1 = [2π0, 2π4] in GHz,
As in the previous example we setσ Y = 0.01 which leads to a sample size of N MC = 2500, according to (8). Again, we show test results for N B = 50, N B = 20 and N B = 1 and ε t = 0. Also, the kernel function and the hyperparameter settings are as in the previous example. The safety is set to γ = 3. Further we consider eight equidistant frequency points ω j ∈ T d , i.e., eight GPR surrogate models are built. The evaluation of the high fidelity model is implemented in CST, using the default parameters of the frequency domain solver. The mathematical model is described in [26]. An evaluation within CST calculates the Sparameter in a whole frequency range, i.e., for all considered frequency points ω j ∈ T d . Therefore, with respect to this example, we look at the number of CST calls |CC method | Figure 5 Lowpass filter from examples library of CST Studio Suite® [13]