Model order reduction for parametric high dimensional models in the analysis of financial risk

This paper presents a model order reduction (MOR) approach for high dimensional problems in the analysis of financial risk. To understand the financial risks and possible outcomes, we have to perform several thousand simulations of the underlying product. These simulations are expensive and create a need for efficient computational performance. Thus, to tackle this problem, we establish a MOR approach based on a proper orthogonal decomposition (POD) method. The study involves the computations of high dimensional parametric convection-diffusion reaction partial differential equations (PDEs). POD requires to solve the high dimensional model at some parameter values to generate a reduced-order basis. We propose an adaptive greedy sampling technique based on surrogate modeling for the selection of the sample parameter set that is analyzed, implemented, and tested on the industrial data. The results obtained for the numerical example of a floater with cap and floor under the Hull-White model indicate that the MOR approach works well for short-rate models.

which will generate a fairly accurate ROM. The calculation of the relative error between the ROM and the HDM is expensive, so instead, we use error estimators like the residual error associated with the ROM [10,50]. The greedy sampling algorithm picks the optimal parameters which yield the highest values for the error estimator. Furthermore, we use these parameters to construct a snapshot matrix and, consequently, to obtain the desired ROB. However, it is not reasonable to compute an error estimator for the entire parameter space. The error estimator is based on the norm of the residual, which scales with the size of the HDM. This problem forces us to select a pre-defined parameter set as a subset of the high dimensional parameter space to train the greedy sampling algorithm. We usually select this pre-defined subset randomly. But, a random selection may neglect the crucial parameters within the parameter space. Thus, to surmount this problem, we implemented an adaptive greedy sampling approach. We choose the most suitable parameters adaptively at each greedy iteration using an optimized search based on surrogate modeling. We construct a surrogate model for the error estimator and use it to find the best suitable parameters. The use of the surrogate model avoids the expensive computation of the error estimator over the entire parameter space. The adaptive greedy sampling approach associated with surrogate modeling has been introduced before in [41,3]. There are several approaches to design a surrogate model like regression analysis techniques [37], response surface models, or Kriging models [36]. The authors of [41] have designed a Kriging based surrogate model to select the most relevant parameters adaptively. However, in our case, due to the high dimensional parameter space, we may face the multicollinearity problem as some variable in the model can be written as a linear combination of the other variables in the model [34]. Also, we need to construct a surrogate model considering the fact that the model parameters are time-dependent. Thus, in this work, we construct a surrogate model based on the principal component regression (PCR) technique [37]. The PCR approach is a dimension reduction technique in which explanatory variables are replaced by few uncorrelated variables known as principal components. It replaces the multivariate problem with a more straightforward low dimensional problem and avoids overfitting. In the classical greedy sampling approach, the convergence of the algorithm is observed using the error estimator. However, we can use the norm of the residual to estimate the exact error between the HDM and the ROM. In this work, we establish an error model for an exact error as a function of the error estimator based on the idea presented in [41]. Furthermore, we use this exact error model to observe the convergence of the greedy sampling algorithm. To summarize, this paper presents an approach to select the most prominent parameters or scenarios for which we solve the HDM and obtain the required ROB. Thus, instead of performing 10,000 expensive computations, we perform very few expensive computations and solve the remaining scenarios with the help of the ROM. The paper illustrates the implementation of numerical algorithms and methods in detail. It is necessary to note that the choice of a short-rate model depends on the underlying financial instrument. In this work, we focus on one-factor short-rate models only. We implement the developed algorithms for the one-factor Hull-White model [31] and present the results with a numerical example of a floater with cap and floor [20]. The current research findings indicate that the MOR approach works well for short-rate models. The paper is organized as follows. Section 2 presents a model hierarchy to construct a short-rate model along with some of the well-known one-factor models. Section 3 is branched into two parts. In subsection 3.1, we present a detailed simulation procedure for yield curves and, subsequently, subsection 3.2 describes the calibration of model parameters based on these simulated yield curves. In section 4, subsection 4.1 illustrates the FDM implemented for the Hull-White model, and subsection 4.2 introduces the projection-based MOR technique for the HDM. The selection of best suitable parameters to obtain the ROB based on the classical greedy approach is presented in subsection 4.3. To overcome the drawbacks associated with the classical greedy approach, subsection 4.4 explains the adaptive greedy method with the surrogate modeling technique illustrated in sub-subsection 4.4.1. Furthermore, sub-subsection 4.4.2 presents a detailed algorithm and its description for the adaptive greedy approach. Finally, we tested our algorithms using a numerical example of a floater, and the obtained results are presented in section 5. 2

Model Hierarchy
The management of interest rate risks, i.e., the control of change in future cash flows due to the fluctuations in interest rates is of great importance. Especially, the pricing of products based on the stochastic nature of the interest rate creates the necessity for mathematical models. In this section, we present some basic definitions and the model hierarchy to construct a financial model for any underlying instrument.

Bank Account and Short-Rate
First we introduce the definition for a bank account also called as a money-market account. When investing a certain amount in a bank account, we expect it to grow at some rate over time. A money-market account represents a risk-less investment with a continuous profit at a risk-free rate.
Definition 2.1. Bank account (Money-market account). Let B(t) be the value of a bank account at time t ≥ 0. We assume that the bank account evolves according to the following differential equation with B(0) = 1, where r t is a short-rate. This gives According to the above definition, investing a unit amount at time t = 0 yields the value in (2) at time t, and r t is the short-rate at which the bank account grows. This rate r t is the growth rate of the bank account B within a small time interval (t, t + dt). When working with interest-rate products, the study about the variability of interest rates is essential. Therefore, it is necessary to consider the short-rate as a stochastic process and it prompts us to construct a stochastic model that will describe the dynamics of the short-rate. Let S be the price of a stock at the end of the nth trading day. The daily return from days n to (n + 1) is given by (S n+1 − S n )/S n . In general, it is common to work with log returns, since the log return of k days can be easily computed by adding up the daily log returns: log(S k /S 0 ) = log(S 1 /S 0 ) + · · · + log(S k /S k−1 ).
Based on the assumption that the log returns over disjoint time intervals are stochastically independent, and are equally distributed, the central limit theorem [16] of probability theory implies that the log returns are normally distributed [13]. So, it is necessary to define a stochastic model in continuous time in which log returns over arbitrary time intervals are normally distributed. The Brownian motion provides these properties [2].
Definition 2.2. Brownian motion. A standard Brownian motion is a stochastic process W (t) where t ∈ R, i.e., a family of random variables W (t), indexed by non-negative real numbers t with the following properties: • With probability 1, the function W (t) is continuous in t.
• For t ≥ 0, the increment W (t + s) − W (s) is normally distributed with mean 0 and variance t, i.e., • For all n and times t 0 < t 1 < · · · < t n−1 < t n , the increments W (t j ) − W (t j−1 ) are stochastically independent.
Based on the definition of the Brownian motion, we can establish a stochastic differential equation (SDE). Consider an ordinary differential equation (ODE) 3 with an initial condition X(0) = X 0 . When we consider ODE (3) with an assumption that the parameter q(t) is not a deterministic but rather a stochastic parameter, we get a stochastic differential equation. In our case, the stochastic parameter q(t) is given as [26] where w(t) is a white noise process. Thus, we get This equation is known as Langevin equation [39]. Here X(t) is a stochastic variable having the initial condition X(0) = X 0 with probability one. The Langevin force w(t) = dW (t)/dt is a fluctuating quantity having Gaussian distribution. Substituting dW (t) = w(t)dt in (4), we get In the general form a stochastic differential equation is given by where f (t, X(t)) ∈ R, and g(t, X(t)) ∈ R are sufficiently smooth functions. Based on (6), we can define a stochastic differential equation with the short-rate r t as a stochastic variable as Furthermore, based on Ito's lemma [53], we can derive a general PDE for any underlying instrument depending on the short-rate r.
Theorem 2.1. Ito's Lemma. Let ξ(X(t), t) be a sufficiently smooth function and let the stochastic process X(t) be given by (8), then with probability one, Consider a risk-neutral portfolio Π t that depends on the short-rate r t and consists of two interest rate instruments V 1 and V 2 with different maturities T 1 and T 2 respectively. Let there be ∆ units of the instrument V 2 . For an infinitesimal time interval, the value change of the portfolio is dΠ t = ∆dV 2 − dV 1 . To avoid the arbitrage, we have to consider a risk-free rate [1], which gives Based on Ito's lemma, we get ∂rt . Due to the zero net investment requirement, i.e., dΠ t = 0, we obtain Eliminating the stochastic term, we get Rearranging the terms, and using the notation r = r t , we get Denoting the right side as u(r, t) and using the notation V = V 1 , we obtain the following PDE for any financial instrument V depending on r ∂V ∂t We introduce some well-known one-factor short-rate models in the following subsections.

Vasicek and Cox-Ingersoll-Ross Models
The following SDE represent two different models depending on the choice of a parameter λ.
where a, b, σ, and λ are positive constants. The model proposed in [49] considers λ = 0 and is well known as the Vasicek model, while the Cox-Ingersoll-Ross model introduced in [14] considers λ = 0.5. One of the drawbacks of the Vasicek model is that the short-rate can be negative. On the other hand, in the case of the Cox-Ingersoll-Ross model, the square root term does not allow negative interest rates. However, the major drawback of these models is that the model parameters are constant, so we can not fit the models to the market structures like yield curves. 5

Hull-White Model
The Hull-White model [31,32] is an extension of the Vasicek model which can be fitted to market structures like yield curves. The SDE is given as where now the parameters a(t), b(t), and σ(t) are time-dependent parameters. The term (a(t) − b(t)r t ) is a drift term and a(t) is known as deterministic drift. We can define a PDE for any underlying instrument based on the Hull-White model depending on r. In the case of the Hull-White model in (9), g(r, t) = σ(t), and −u(r, t) = (a(t) − b(t)r). Substituting g(r, t) and u(r, t), we get In the following, we consider the parameter b(t) and σ(t) as constants to have a robust model. Hull and White supported for making model parameters b and σ to be independent of time [33]. The problem is that the volatility term structure in the future could become nonstationary in the sense that the future term structure implied by the model can be quite different than it is today. Also, the author of [15] quoted that the volatility in the future may reach zero, which ultimately could result in implausible option prices, thus, we suggest to consider parameters b and σ as independent of time. Figure 1 shows the model hierarchy for the Hull-White model.
SDE for a short-rate model PDE from SDE using Ito's lemma Hull-White PDE for any financial instrument V The following section 3 presents the simulation procedure for yield curves and the calibration of the model parameter a(t) based on these simulated yield curves.

Yield Curve Simulation
The problem of determining the model parameters is relatively complex. The time-dependent parameter a(t) is derived from yield curves, which determine the average direction in which the short-rate r moves. The PRIIP regulation demands to perform yield curve simulations for at least 10,000 times [19]. We explain the detailed yield curve simulation procedure in this subsection. 1. Collect historical data for the interest rates.
The data set must contain at least 2 years of daily interest rates for an underlying instrument or 4 years of weekly interest rates or 5 years of monthly interest rates. Further, we construct a data matrix D ∈ R n×m of the collected historical interest rates data, where each row of the matrix forms a yield curve, and each column is a tenor point m. The tenor points are the different contract lengths of an underlying instrument. For example, we have collected the daily interest rate data at 20-30 tenor points in time over the past five years. Each year has approximately 260 working days also known as observation periods. Thus, there are n ≈ 1306 observation periods and m ≈ 20 tenor points in time.
2. Calculate the log return over each period. We take the natural logarithm of the ratio between the interest rate at each observation period and the interest rate at the preceding period. To avoid problems while taking the natural logarithm, we have to ensure that all elements of the data matrix D are positive which is done by adding a correction term γ.
The correction factor γ ensuring all elements of matrix D to be positive. Here the matrix W ∈ R n×m is a binary matrix having all entries as 1. The selection of γ does not affect the final simulated yield curves as we are compensating this shift at the bootstrapping stage by subtracting it from the simulated rates. Then we calculate the log returns over each period and store them into a new matrixD =d ij ∈ R n×m aŝ 3. Correct the returns observed at each tenor so that the resulting set of returns at each tenor point has a zero mean. We calculate the arithmetic mean µ j of each column of the matrixD, and subtract this arithmetic mean µ j from each element of the corresponding jth column of a matrixD and store the obtained results in the matrixD ∈ R n×m , 4. Compute the singular value decomposition [22] of the matrixD. The singular value decomposition of the matrixD is where Σ is a diagonal matrix having singular values Σ i arranged in descending order. The columns of Φ are the normalized singular vectors φ ∈ Φ, and the columns of ΦΣ are known as principal components. The right singular vectors ψ ∈ Ψ are the eigenvectors or also called principal directions of the covariance matrix C =D TD . A detailed relation between the singular value decomposition and the principal component analysis is presented in the Appendix A.
5. Select the principal singular vectors corresponding to the maximum energy. The relative importance of ith principal singular-vector is determined by the relative energy Ξ i of that component defined as where the total energy is given by m i=1 Ξ i = 1. We select p right singular vectors corresponding to the maximum energies from the matrix Ψ. We construct a matrixΨ ∈ R m×p composed of these selected singular vectorsΨ Calculate the matrix of returns to be used for the simulation of yield curves. We project the matrixD onto the matrix of selected singular vectorsΦ.
Furthermore, we calculate the matrix of returns M R ∈ R n×m by multiplying the matrix X with the transpose of the matrix of singular vectorsΨ.
This process simplifies the statistical dataD that transforms m correlated tenor points into p uncorrelated principal components. It allows reproducing the same data by simply reducing the total size of the model. 7. Bootstrapping Bootstrapping is a type of resampling where large numbers of small samples of the same size are drawn repeatedly from the original data set. It follows the law of large numbers, which states that if samples are drawn over and over again, then the resulting set should resemble the actual data set. In short, the bootstrapping creates different scenarios based on simulated samples, which resembles the actual data. These scenarios can be further used to obtain the values at favorable, moderate, and unfavorable conditions for an underlying instrument. According to the PRIIP regulations, we have to perform a bootstrapping procedure for the yield curve simulation for at least 10,000 times. The regulations state that a standardized key information document shall include the minimum recommended holding period (RHP). Remark. The recommended holding period gives an idea to an investor for how long should an investor hold the product to minimize the risk. Generally, the RHP is given in years.
The time step in the simulation of yield curves is typically one observation period. Let h be the RHP in days (e.g., h ≈ 2600 days or 10 years). So, there are h observation periods in the RHP. For each observation period in the RHP, we select a row at random from the matrix M R , i.e., we select h random rows from the matrix M R .
We construct a matrixM R = χ ij ∈ R h×m from these selected random rows. Further, we sum over the selected rows of the columns corresponding to the tenor point j, In this way, we obtain a row vectorχ ∈ R 1×m such that The final simulated yield rate y j at tenor point j is the rate at the last observation periodd nj at the corresponding tenor point j, 1. multiplied by the exponential of theχ j , 2. adjusted for any shift γ used to ensure positive values for all tenor points. 3. adjusted for the forward rate so that the expected mean matches current expectations.
The forward rate between two time points t 1 and t 2 is then given as where t 1 and t 2 are measured in years. Here, r(t 0 , t 1 ) and r(t 0 , t 1 ) are the interest rates available from the data matrix for the time periods (t 0 , t 1 ) and (t 0 , t 2 ). Thus, the final simulated rate r j is given by Finally, we get the simulated yield curve from the calculated simulated returns y j as y = [y 1 y 2 · · · y m ], j = 1, · · · , m.
We perform the bootstrapping procedure for at least s = 10, 000 times and construct a simulated yield curve matrix Y ∈ R s×m as Subsection 3.2 explains the parameter calibration based on simulated yield curves.

Parameter Calibration
The model parameters a(t) is calibrated based on simulated yield curves Y . According to [45], we can write a closed-form solution for a zero-coupon bond B(t, T ) maturing at time T based on the Hull-White model as where r is the short rate at time t and We take the following input data for the calibration: 1. The zero-coupon bond prices for all maturities T m , 0 ≤ T m ≤ T , where T m is the maturity at the mth tenor point. 2. The initial value of a(t) at t = 0 as a(0).
3. The constant value of the volatility σ of the short-rate r t at all maturities 0 ≤ T m ≤ T is assumed to be constant. 4. The constant value of the parameter b is known and constant for all maturities 0 ≤ T m ≤ T .
We then determine the value of Γ(0, T ) as follows: As we know the value of b from the given initial condition, we can compute κ(t). Subsequently, using κ(t), we compute Γ(t).
We can use Λ(0, T ) to determine a(t), for 0 ≤ T m ≤ T in the following way: 10 The yield y(T ) at time T is given by From (16), we can obtain This gives an ordinary differential equation (ODE) for a(t) where y(T ) is the simulated yield rate at tenor point T . We can solve (21) numerically with the given initial conditions and yield rates for 0 ≤ T m ≤ T . For all T m ∈ [0, T ], we know b, σ and Γ(0, T ) from the given initial conditions and (18) respectively. If we assume a(t) to be piecewise constant with values a(i) in ((i + 1).∆T, i.∆T ), then we can calculate a(i) iteratively for 0 ≤ T m ≤ T . This leads to a triangular system of linear equations for the vector a(i) with non-zero diagonal elements.
where E maps the parameter a(t) of the Hull-White model based on the simulated yield curves obtained using the market data. The authors of [18] noticed that a small change in the market data used to obtain the yield curves leads to large disturbances in the model parameter a(t). This makes the problem of solving a system of linear equations ill-posed. We can define a well-posed problem with the following three different properties.

Definition 3.2. A well-posed problem [28]
A given problem is said to be well-posed if it holds the following three properties.

For all suitable data, a solution exist, 2. has an unique solution, and
3. the solution changes continuously with data.
However, in our case, a small change in the market data may lead to the large perturbations in the model parameters, and which violates the third property. Hence, the use of naive approaches may result in some numerical errors. To regularize the ill-posed problem, we implement Tikhonov regularization.
where a δ µ is an estimate for a, µ is the regularization parameter, δ = F − F δ is the noise level, and µ a 2 is a penalty term. We solve the optimization problem (23) to obtain the parameter a(t). In this work, we use the commercially available software called UnRisk PRICING ENGINE for the parameter calibrations [40]. The UnRisk engine implements a classical Tikhonov regularization approach to regularize the ill-posed problems. By providing the simulated yield curve, the UnRisk pricing function returns the calibrated parameter a(t) for that yield curve. Based on 10,000 different simulated yield curves, we obtain s =10,000 different parameter vectors a(t). In the matrix form, we write where m is the number of tenor points. All parameters a i,j are assumed to be piecewise constant changing their values only on tenor points, i.e., m tenor points mean m values for a single parameter vector.  Figure 2 shows the model hierarchy to obtain a reduced-order model for the Hull-White model.

Hull-White model
Selection of training parameters: Classical greedy sampling algorithm

ROM Quality
Adaptive greedy sampling algorithm

Model Order Reduction
Stop Stop satisfactory unsatisfactory Figure 2: Model hierarchy to obtain a reduced order model for the Hull-White model. 12 We discretize the Hull-White PDE using a finite difference method as presented in the subsection 4.1.3. The discretization of the PDE creates a parameter-dependent high dimensional model. We need to solve the high dimensional model for at least 10,000 parameters calibrated in the previous subsection 3.2. Solving the high dimensional model for such a large parameter space is computationally costly. Thus, we incorporate the parametric model order reduction approach based on the proper orthogonal decomposition. The POD approach relies on the method of snapshots. The snapshots are nothing but the solutions of the high dimensional model at some parameter values.
The idea is to solve the high dimensional model for only a certain number of training parameters to obtain a reduced-order basis. This reduced-order basis is then used to construct a reduced-order model. Subsection 4.2 illustrates the proper orthogonal decomposition approach, along with the construction of snapshots, and the Algorithm 1 presents the methodology to obtain the reduced-order basis. Finally, we can solve the reduced-order model cheaply for the large parameter space. The selection of the training parameters is of utmost importance to obtain the optimal reduced-order model. In subsection 4.3, we introduce a greedy sampling technique for the selection of training parameters. The greedy sampling technique selects the parameters at which the error between the reduced-order model and the high dimensional model is maximal. However, the greedy sampling approach exhibits some drawbacks (see subsection 4.3.1). To avoid these drawbacks, in the subsection 4.4, we implement an adaptive greedy approach. These methods are interlinked with each other and necessary to obtain the most efficient reduced-order model.

Finite Difference Method
The PDEs obtained for the Hull-White model is a convection-diffusion-reaction type PDE [4]. Consider a Hull-White PDE given by (20) In this work, we apply a finite difference method to solve the Hull-White PDE. The convection term in the above equation may lead to numerically unstable results. Thus, we implement the so-called upwind scheme [30] to obtain a stable solution. We incorporate the semi-implicit scheme called the Crank-Nicolson method [47] for the time discretization.

Spatial Discretization
Consider the following one-dimensional linear advection equation for a field ζ(x, t), describing a wave propagation along the x−axis with a velocity U . We define a discretization of the computational domain in sd spatial dimensions as where u and v are the cut off limits of the spatial domain. T denotes the final time of the computation. The corresponding indices are i k ∈ {1, . . . , M k } for the spatial discretization and n ∈ {1, . . . , N } for the time 13 discretization. The first order upwind scheme of order O(∆x) is given by Let us introduce, We have implemented the above defined upwind scheme for the convection term. The diffusion term is discretized using the second order central difference scheme of order O(∆x) 2 given by

Time Discretization
Consider a time-dependent PDE for a quantity ζ where L is the differential operator containing all spatial derivatives. Using the Taylor series expansion, we write Neglecting terms of order higher than one, we obtain Let us introduce a new parameter Θ such that We can construct different time discretization schemes for different values of Θ. Setting Θ = 0, we obtain a fully explicit scheme known as the forward difference method, while considering Θ = 1, we get a fully implicit scheme known as the backward difference method. Here we set Θ = 1/2 and obtain a semi-implicit scheme known as the Crank-Nicolson method [4] 14

Finite Difference Method for a Hull-White Model
The computational domain for a spatial dimension the rate r is [u, v]. According to [1], the cut off values u and v are given as where r sp is a yield at the maturity T also known as a spot rate. We divide the spatial domain into M equidistant grid points which generate a set of points {r 1 , r 2 , . . . , r M }. The time interval [0, T ] is divided into N − 1 time points (N points in time that are measured in days starting from t = 0). Equation (25) can then be represented as, We specify the spatial discretization operator L(n), where the index n denotes the time-point. From (27) and (29), we get, From (33), we obtain For Θ = 1/2, we then have where the matrices A(ρ s (t)), and B(ρ s (t)) depend on parameters a(t), b and σ. We denote ρ s = {a(t), b, σ} as the sth group of these parameters. Here The above discretization of the PDE generates a parametric high dimensional model of the following form (39).
where the matrices A(ρ) ∈ R M ×M , and B(ρ) ∈ R M ×M are parameter dependent matrices. M is the total number of spatial discretization points. t is the time variable. t = [0, T ] where T is the final term date. For the simplicity of notations, we denote ρ s = {(a s1 , . . . , a sm ), b, σ} as the sth group of model parameters where s = 1, . . . , 10000, and m is the total number of tenor points. We need to solve this system at each time step n with an appropriate boundary condition and a known initial value of the underlying instrument. We need to solve the system (39) for at least 10,000 parameter groups ρ generating a parameter space P of 10000 × m.

Parametric Model Order Reduction
We employ the projection based model order reduction (MOR) technique to reduce the high dimensional model (39). The reduced-order model is obtained using the Galerkin projection onto the low dimensional subspace, We approximate the high dimensional state space V n by a Galerkin ansatz as where Q ∈ R M ×d is the reduced-order basis with d ≪ M , V d is a vector of reduced coordinates, andV ∈ R M is the solution obtained using the reduced order model. Substituting (40) into the system of equations (39) gives the residual of the reduced state as In the case of the Galerkin projection, the residual R(V d , ρ s ) is orthogonal to the ROB Q Multiplying (41) by Q T , we get where the matrices A d (ρ s ) ∈ R d×d and B d (ρ s ) ∈ R d×d are the parameter dependent reduced matrices. In short, instead of solving a linear system of equations of order M , the MOR approach solves a linear system of order d where d ≪ M . We obtain the Galerkin projection matrix Q (43) based on a proper orthogonal decomposition (POD) method. POD generates an optimal order orthonormal basis Q in the least square sense which serves as the reduced-order basis for a given set of computational data. We aim to obtain the subspace Q independent of the parameter space P . In this work, we obtain the reduced-order basis by the method of snapshots. The snapshots are nothing but the state solutions obtained by solving the high dimensional models for selected parameter groups. We assume that we have a training set of parameter groups ρ 1 · · · ρ l ∈ [ρ 1 ρ s ]. We compute the solutions of the high dimensional models for the training set and combine them to form a snapshot matrixV = [V (ρ 1 ), V (ρ 2 ), ..., V (ρ l )]. Now, to obtain the desired reduced-order basis, the POD method solves for all matrices Q ∈ R M ×d that satisfy QQ T = I. We can obtain the reduced-order basis Q fulfilling the condition (44) by computing the SVD of the matrixV . We perform a truncated SVD [22] of the matrixV to obtain the reduced-order basis QV where φ i and ψ i are the left and right singular vectors of the matrixV respectively, and Σ i are the singular values.V The truncated (economy-size) SVD computes only the first k columns of the matrix Φ. The optimal projection subspace Q consists of d left singular vectors φ i known as POD modes. Here d is the dimension of the reduced order model. The Algorithm 1 shows the steps to construct a reduced-order basis using the proper orthogonal decomposition approach. We have to choose the dimension d of the subspace Q such that we get a good approximation of the snapshot matrix. According to [44], large singular values correspond to the main characteristics of the system, while small singular values give only small perturbations of the overall dynamics. The relative importance of 18 the ith POD mode of the matrixV is determined by the relative energy Ξ i of that mode If the sum of the energies of the generated modes is unity, then these modes can be used to reconstruct a snapshot matrix completely [52]. Generally, the number of modes required to generate the complete data set is significantly less that the total number of POD modes [42]. Thus, a matrixV can be accurately approximated by using POD modes whose corresponding energies sum to almost all of the total energy. Thus, we choose only d out of k POD modes to construct Q = [φ 1 · · · φ d ] which is a parameter independent projection subspace based on (46). It is evident that the quality of the reduced-order model mainly depends on the selection of parameter groups ρ 1 , ..., ρ l use to compute snapshots. Thus, it necessitates defining an efficient sampling technique for the high dimensional parameter space. We can consider the standard sampling techniques like uniform sampling or random sampling [34]. However, these techniques may neglect a vital region within the parameter space. Alternatively, the greedy sampling method is proven to be an efficient method for sampling a high dimensional parameter space in the framework of model order reduction [25,41,3].

Greedy Sampling Method
The greedy sampling technique selects the parameter groups at which the error between the reduced-order model and the high dimensional model is maximal. Further, we compute the snapshots using these parameter groups so that we can obtain the best suitable reduced-order basis Q.
where e is a relative error between the reduced-order model and the high dimensional model. Thus, the greedy sampling algorithm, at each greedy iteration i = 1, ..., I max , selects the optimal parameter group ρ I , which maximizes the relative error e . However, the computation of relative error e is costly as it entails the solution for the high dimensional model. Thus, usually, the relative error is replaced by error bounds or the residual error R . However, in some cases, it is not possible to define the error bounds or the error bounds do not exist. In such cases, the norm of the residual is a good alternative [10,50,41]. For the simplicity of notation, we consider ε as the error estimator, i.e., in our case the norm of the residual. The greedy sampling algorithm runs for I max iterations. At each iteration i = 1, ..., I max , we choose the parameter group as the maximizer ρ I = argmax ρ∈P ε(ρ).
The Algorithm 2 describes the classical greedy sampling approach. It initiates by selecting any parameter group ρ 1 from the parameter set P and computes a reduced-order basis Q 1 , as described in subsection 4.2.
It is necessary to note that the choice of a first parameter group to obtain Q 1 does not affect the final result. Nonetheless, for the simplicity of computations, we select the first parameter group ρ 1 from the parameter space P . Furthermore, the algorithm chooses the pre-defined parameter setP randomly of cardinality C as a subset of P . At each point within the parameter setP , the algorithm determines a reduced-order model using the reduced-order basis Q 1 and computes error estimator values, ε(ρ j ) C j=1 . The parameter group in the pre-defined parameter setP at which the error estimator is maximal is then selected as the optimal parameter group ρ I . Furthermore, the algorithm solves the high dimensional model for the optimal parameter group and updates the snapshot matrixV . Finally, a new reduced-order basis is obtained by computing a truncated singular value decomposition of the updated snapshot matrix, as described in the Algorithm 1. These steps are then repeated for I max iterations or until the maximum value of the error estimator is higher than the specified tolerance ε tol . Tolerance for the error estimator, greedy iteration terminates if ε < ε tol . V (ρ i ) Solution obtained by solving a high dimensional model for ρ i . V Snapshot matrix.

Algorithm 2 The classical greedy sampling algorithm
Input: Maximum number of iterations I max , maximum parameter groups C, Parameter space P , ε tol Output: Q 1: Choose first parameter group ρ 1 = [(a 11 , ..., a 1m ), b, σ] from P 2: Solve the HDM for a parameter group ρ 1 and store the results in V 1 3: Compute a truncated SVD of the matrix V 1 and construct Q 1 4: Randomly select a set of parameter groupsP = {ρ 1 , ρ 2 , ..., ρ C } ⊂ P 5: for i = 2 to I max do 6: for j = 1 to C do Compute an SVD of the matrixV and construct Q i 18: end for

Drawbacks
The greedy sampling method computes an inexpensive a posteriori error estimator for the reduced-order model. However, it is not feasible to calculate the error estimator values for the entire parameter space P . An error estimator is based on the norm of the residual which scales with the dimension of the high dimensional model, M . With an increase in dimension, it is not computationally reasonable to calculate the residual for 10,000 parameter groups, i.e., to solve the reduced-order model for the entire parameter space. Hence, the classical greedy sampling technique chooses the pre-defined parameter setP randomly as a subset of P . Random sampling is designed to represent the whole parameter space P , but there is no guarantee thatP will reflect the complete space P since the random selection of a parameter set may neglect the parameter groups corresponding to the 20 most significant error. These observations motivate to design a new criterion for the selection of the subsetP . Another drawback of the classical greedy sampling technique is that we have to specify the maximum error estimator tolerance ε tol . The error estimator usually depends on some error bound, which is not tight or it may not exist. To overcome this drawback, we establish a strategy by constructing a model for an exact error as a function of the error estimator based on the idea presented in [41]. Furthermore, we use this exact error model to observe the convergence of the greedy sampling algorithm instead of the error estimator.

Adaptive Greedy Sampling Method
To avoid the drawbacks associated with the classical greedy sampling technique, we have derived an adaptive greedy sampling approach which selects the parameter groups adaptively at each iteration of the greedy procedure, using an optimized search based on surrogate modeling. We construct a surrogate model of the error estimatorε to approximate the error estimator ε over the entire parameter space. Further, we use this surrogate model to locate the parameter groupsP k = {ρ 1 , ..., ρ C k } with C k < C, where the values of the error estimator ε are highest. For each parameter group within the parameter setP k , we determine a reduced-order model and compute the error estimator values. The algorithm builds a new surrogate model based on these error estimator values, and the process repeats itself until the total number of parameter groups reaches C, resulting in the desired parameter setP .

Surrogate Modeling of the Error Estimator
At each greedy iteration, the algorithm construct a surrogate model of the error estimator to locate the parameters adaptively.

Algorithm 3 Surrogate model using the principal component regression technique.
Input: The response vectorε = [ε 1 , ..., ε C k ],P k = [ρ 1 , ..., ρ C k ] ∈ R C k ×m , principal components p Output: Vector of regression coefficients η 1: StandardizeP k andε with zero mean and variance equals to one 2: Compute the singular value decomposition of the matrixP k : P k =ΦΣΨ 3: Construct a new matrix Z =P kΨp = [P kψ1 , ...,P kψp ] composed of principal components 4: Compute the least square regression using the principal components as independent variables: Ω = argmin Ω ε − ZΩ 2 2 5: Compute the PCR estimate η P CR of the regression coefficients η: η P CR =Ψ pΩ The detailed adaptive greedy sampling algorithm along with its description is presented in the subsection 4.4.2.
The first stage of the adaptive greedy sampling algorithm computes the error estimator over the randomly selected parameter setP 0 of cardinality C 0 . Furthermore, the algorithm uses these error estimator values to build a surrogate modelε 0 and locates the C k parameter groups corresponding to the C k maximum values of the surrogate model. This process repeats itself for k = 1, ..., K iterations until the total number of parameter groups reaches C. Finally, the optimal parameter group ρ I is the one that maximizes the error estimator within the parameter setP .
P =P 0 ∪P 1 ∪P 2 ∪ · · · ∪P K , k = 1, ..., K Thus, at each kth iteration, we construct a surrogate modelε k which approximates the error estimator over the entire parameter space P . There are different choices to build a surrogate model [34]. In this paper, we use the principal component regression (PCR) technique. Suppose the vectorε = (ε 1 , ..., ε C k ) ∈ R C k ×1 is the response vector having error estimator values at kth iteration. Since, we have considered parameters b and 21 σ as constants, we build a surrogate model with the parameter a(t) only. LetP k = [ρ 1 , ..., ρ C k ] ∈ R C k ×m be the matrix composed of C k parameter groups at the kth iteration. The rows of the matrixP k represent C k parameter vectors, while m columns represent m tenor points for the parameter vector a(t). We can fit a simple multiple regression model asε where η = (η 1 , ..., η m ) is an array containing regression coefficients and err is an array of residuals. The least square estimate of η is obtained aŝ However, if C k is not much larger than m, then the model might give weak predictions due to the risk of overfitting for the parameter groups which are not used in model training. Also, if C k is smaller than m, then the least square approach cannot produce a unique solution, restricting the use of the simple linear regression model. We might face this problem during the first few iterations of the adaptive greedy sampling algorithm, as we will have less error estimator values to build a reasonably accurate model. Hence, to overcome this drawback, we implement The principal components are nothing but the columns of the matrixP kΨ . For dimension reduction, we select only p columns of the matrixΨ, which are enough to construct a fairly accurate model. The author of [7] reported that the first three or four principal components are enough to analyze the yield curve changes. Let Z =P kΨp = [P kψ1 , ...,P kψp ] be the matrix containing first p principal components. We regressε on these principal components as followε = ZΩ + err, where Ω = [ω 1 , ..., ω p ] is the vector containing regression coefficient obtained using principal components. The least square estimate for Ω is given aŝ We obtain the PCR estimate η P CR ∈ R m of the regression coefficients η as Finally, we can obtain the value of the surrogate model for any parameter vector a s = (a s1 , ..., a sm ) as ε(ρ s ) = η 1 a s1 + · · · + η m a sm .  Final regression coefficients used to construct a surrogate model.

Adaptive Greedy Sampling Algorithm
The adaptive greedy sampling algorithm utilizes the designed surrogate model to locate the optimal parameter groups adaptively at each greedy iteration i = 1, ..., I max . The first few steps of the algorithm resemble the classical greedy sampling approach. It selects the first parameter group ρ 1 from the parameter space P and computes the reduced-order basis Q 1 . Furthermore, the algorithm randomly selects C 0 parameter groups and construct a temporary parameter setP 0 = {ρ 1 , ..., ρ C 0 }. For each parameter group in the parameter setP 0 , the algorithm determines a reduced-order model and computes the residual errors {ε(ρ j )} C 0 j=1 . Let ε 0 = {ε(ρ 1 ), ..., ε(ρ C 0 )} be the array containing the error estimator values obtained for the parameter set P 0 . The adaptive parameter sampling starts by constructing a surrogate model for the error estimatorε based on {ε(ρ j )} C 0 j=1 error estimator values, as discussed in subsection 4.4.1. The obtained surrogate model is then solved for the entire parameter space P . We locate C k parameter groups corresponding to the first C k maximum values of the surrogate model. We then construct a new parameter setP k = {ρ 1 , ..., ρ k } composed of these C k parameter groups. The algorithm determines a reduced-order model for each parameter group within the parameter setP k and obtains the analogous error estimator values {ε(ρ k )} C k k=1 . Let ε k = {ε(ρ 1 ), ..., ε(ρ C k )} be the array containing the error estimator values obtained for the parameter setP k . Furthermore, we concatenate the setP k and the setP 0 to form a new parameter setP =P k ∪P 0 . Let E sg = ε 0 ∪ · · · ∪ ε k be the set composed of all the error estimator values available at the kth iteration. The algorithm then uses this error estimator set E sg to build a new surrogate model. The quality of the surrogate model increases with each proceeding iterations as we get more error estimator values to build a fairly accurate model. This process repeats itself until the cardinality of the setP reaches C, P =P 0 ∪P 1 ∪P 2 ∪ · · · ∪P K , k = 1, ..., K.
Finally, the optimal parameter group ρ I is extracted from the parameter setP , which maximizes the error estimator (48). In this work, we build a computationally cheap surrogate model based on the principal component regression. Note that typically it is not necessary to obtain a very accurate sampling using the designed surrogate model. Sampling the high dimensional model in the neighborhood of the parameter group with maximum error is acceptable enough to obtain good results. In the classical greedy sampling approach, we use the residual error ε to observe the convergence of the algorithm, which corresponds to the exact error between the high dimensional model and the reduced-order model (47). However, in the adaptive greedy POD algorithm, we use an approximate modelē for an exact error e(., .) as a function of the error estimator. To build an approximate error model, we need to solve one high dimensional model at each greedy iteration. The algorithm solves the high dimensional model for the optimal parameter group ρ I and updates the snapshot matrixV . A new reduced-order basis Q is then obtained by computing the truncated singular value decomposition of the updated snapshot matrix as explained in subsection 4.2. Furthermore, we solve the reduced-order model for the optimal parameter group before and after updating the 23 reduced-order basis and obtain the respective error estimator values ε bf (ρ I ), and ε af (ρ I ). Here superscript bf and af denote the before and after updating the reduced-order basis. Now, we obtain the relative errors e bf , e af between the high dimensional model and the reduced-order models constructed before and after updating the reduced-order basis. In this way, at each greedy iteration, we get a set of error values E p that we can use to construct an approximate error model for an exact error e based on the error estimator ε.
We construct a linear model for an exact error based on the error estimator as follows Setting Y = log(ē), X = log(ε) andτ = log(τ ). We get where γ is the slope of the linear model andτ is the intersection with the logarithmic axis log(y). After each greedy iteration, we get more data points in the error set E p , which increases the accuracy of the error model. In subsection 5.3, we validate with the obtained results that in our case the linear model is sufficient to achieve an accurate error model for the exact error. Algorithm 4 describes the adaptive sampling method based on  [(a 11 , ..., a 1m ), b, σ] from P 2: Solve the HDM for parameter group ρ 1 and store the results in V 1 3: Compute a truncated SVD of the matrix V 1 and construct Q 1 4: for i = 2 to I max do 5: Randomly select a set of parameter groupsP 0 = {ρ 1 , ρ 2 , ..., ρ C 0 } ⊂ P 6: for j = 1 to C 0 do Compute the values of the surrogate model over P :ε(ρ) for all ρ ∈ P 15: Determine the first C k maximum values ofε(ρ) and corresponding parameter groupsP k = {ρ 1 , ..., ρ k } 16: for x = 1 to n(P k ) do Let ε k = {ε(ρ 1 ), ..., ε(ρ C k )} be the error estimator values obtained for parameter setP k

22:
Construct a new parameter setP =P 0 ∪P k with k = 1, 2, . . . Solve the HDM for the parameter group ρ I and store the result in V i

31:
Solve the ROM for the parameter group ρ I using Q i−1 and store the result inV i 32: Compute the relative error e bf i and the error estimator ε bf i using the ROM obtained with Q i−1 (Before updating the ROB).
Construct a snapshot matrixV by concatenating the solutions V s for s = 1, ..., i 34: Compute an SVD of the matrixV and construct Q i

35:
Solve the ROM for parameter group ρ I using Q i and store the result inV i+1

36:
Compute the relative error e af i and the error estimator ε af i using the ROM obtained with Q i (after updating the ROB).
Construct an approximate model for an exact errorē using error set E p : log(ē i ) = γ i log(ε) + logτ 39: end for 25

Numerical Example
A numerical example of a floater with cap and floor [20] is used to test the developed algorithms and methods. We solved the floater instrument using the Hull-White model. We obtained the high dimensional model by discretizing the Hull-White PDE as discussed in subsection 4.1.3 and compared the results with the reducedorder model. The reduced-order model is generated by implementing the proper orthogonal decomposition method along with the classical and the adaptive greedy sampling techniques. The characteristics of the floater instrument are as shown in Table 6.
Note that, the coupon rate c (n) at time t n is set in advanced by the coupon rate at t n−1 . All computations are carried out on a PC with 4 cores and 8 logical processors at 2.90 GHz (Intel i7 7th generation). We used MATLAB R2018a for the yield curve simulations. The numerical method for the yield curve simulations is tested with real market based historical data. We have collected the daily interest rate data at 26 tenor points in time over the past five years. Each year has 260 working days. Thus, there are 1300 observation periods.
We have retrieved this data from the State-of-the-art stock exchange information system, "Thomson Reuters EIKON [48]". We have used the inbuilt UnRisk tool for the parameter calibration, which is well integrated with Mathematica (version used: Mathematica 11.3). Further, we used calibrated parameters for the construction of a Hull-White model. We have designed the finite difference method and the model order reduction approach for the solution of the Hull-White model in MATLAB R2018a.

Model Parameters
We computed the model parameters as explained in subsection 3.2. The yield curve simulation is the first step to compute the model parameters. Based on the procedure described in subsection 3.1, we performed the bootstrapping process for the recommended holding period of 10 years, i.e., for the maturity of the floater. The

Finite Difference Method
The computational domain for a spatial dimension r is restricted to r ∈ [u, v] as described in subsection 4.1.3.
Here, u = −0.1 and v = 0.1. We applied homogeneous Neumann boundary conditions of the form 27

Numerical Example
We divided the spatial domain into M = 600 equidistant grid points which generate a set of points {r 1 , r 2 , . . . , r M }. The time interval [0, T ] is divided into N − 1 time points. N points in time that are measured in days starting from t = 0 untill the maturity T , i.e., in our case, the number of days until maturity are assumed to be 3600 with an interval ∆t = 1 (10 years ≈ 3600 days). Rewriting (39), we obtain We can apply the first boundary condition in (56) by updating the first and the last rows (A 1 and A M ) of the matrix A(ρ s ). Using the finite difference approach, the discretization of (56) yields 1, 0, . . . , 0) and A M = (0, . . . , 0, 1, −1).
The second Neumann boundary condition can be applied by changing the last entry of the vector BV n to zero. Starting at t = 0 with the known initial condition V (0) as the principal amount, at each time step, we solve the system of linear equations (39). Note that, we need to update the value of the grid point r i every three months as the coupon frequency is quarterly by adding coupon f n based on the coupon rate given by (55).

Model Order Reduction
We have implemented the parametric model order reduction approach for the floater example, as discussed in subsection 4.2. The quality of the reduced-order model depends on the parameter groups selected for the construction of the reduced-order basis Q. The reduced-order basis is obtained using both classical and adaptive greedy sampling algorithms.

Classical Greedy Sampling Approach
At each iteration of the classical greedy sampling approach, the algorithm constructs a reduced-order basis as presented in the Algorithm 1. We have specified a maximum number of pre-defined candidates to construct a setP to 40 and a maximum number of iteration I max to 10. The progression of the maximum and average residuals with each iteration of the greedy algorithm is presented in Fig. 5. It is observed that the maximum residual error predominantly decreases with increasing iterations. Thus, we can say that the proposed greedy algorithm efficiently locates the optimal parameter groups and constructs the desired reduced-order basis Q. Furthermore, we tested the effect of change in the cardinalities of the setP . The proposed algorithm is applied with three different cardinalities ofP : |P 1 | = 20, |P 2 | = 30, |P 3 | = 40. Note that we have constructedP by randomly selecting the parameter groups from the parameter space P . Figure 6 shows the plot of the maximum residual against the number of iterations for three different cardinalities. It is evident that with an increasing number of candidates, the maximum residual error decreases. However, the decrement is not significant enough with an increase in the cardinality ofP even by 20. Thus, we can say that 20 randomly selected parameter groups are enough to obtain the reduced-order basis Q.  Figure 7: The relative error between the HDM and the ROM for two different parameter groups. 29 We noticed that there are some parameter groups (e.g., ρ 238 ) for which the reduced-order model gives unsatisfactory results. Figure 7 illustrates the relative error between the high dimensional model and the reduced-order model for two different parameter groups. One can observe that the reduced-order model built for the parameter group ρ 238 (dashed line) shows inferior results as compared to the reduced-order model for the parameter group ρ 786 (solid line). Even an increase in the reduced dimension d does not improve the quality of the result substantially. This remark reveals that the selection of trial candidates by random sampling neglects the parameter groups corresponding to the significant error.

Adaptive Greedy Sampling Approach
To overcome the drawbacks associated with the classical greedy algorithm, we have implemented the adaptive greedy sampling approach for the floater example. At each greedy iteration, the algorithm locates C k = 10 parameter groups adaptively using the surrogate modeling technique, as described in subsection 4.4. We have fixed the maximum number of elements within the parameter setP to 40. Furthermore, the adaptively obtained parameter setP has been used to locate the optimal parameter group ρ I . These steps are repeated for a maximum of I max = 10 iterations or until the convergence. The algorithm has been initiated by selecting C 0 = 20 random parameter groups. The optimal parameter group updates the snapshot matrix, and consecutively the algorithm generates a new reduced-order basis at each greedy iteration. Figure 8 shows the evolution of maximum and average residual errors with each iteration of the adaptive greedy algorithm. The residual error decreases with each incrementing iteration and hence the algorithm succeeded in locating the optimal parameter group efficiently. To monitor the convergence of the adaptive greedy algorithm, we have designed an error modelē for the relative error e as a function of the residual error ε. Figure 9 shows the designed error model based on the available error set E p for four different greedy iterations. The error plot exhibits a strong correlation between the relative error and the residual error. The results indicate that a consideration of the linear error model is satisfactory to capture the overall behavior of the exact error as a function of the residual error. We used the reduced-order basis obtained from the adaptive greedy sampling procedure to design the reducedorder model. Figure 10 presents the relative error plot for the parameter groups ρ 238 , and ρ 786 . We see that the adaptive greedy approach gives better results as compared to the classical greedy method. With a reduced dimension of d = 6, we obtained an excellent result as the relative error is less than 10 −3 .

Computational cost
In the case of the classical greedy sampling approach, the algorithm solves C reduced-order models and one high dimensional model at each greedy iteration. It also computes a truncated SVD of the updated snapshot matrix with each proceeding iteration. Let t ROM be the time required to solve one reduced-order model, t HDM 31 be the computational time required for one high dimensional model, and t SVD be the time required to obtain a truncated SVD of the snapshot matrix. The total computational time T CG Q required to obtain the reduced-order basis in the case of the classical greedy sampling approach can be given as Similarly, in the case of the adaptive greedy approach, the total computational time T AG Q can be given as where t SM and t ev SM denote the computational times required to build and evaluate a surrogate model for the entire parameter space respectively. t EM is the time required to build an error model. The term 2t af,bf ROM shows the computational time needed to solve the reduced-order model after and before updating the ROB. Table 7 compares the computational times required to generate the reduced-order basis Q for different sets of P in case of the classical greedy sampling approach with the computational time needed to obtain the reducedorder basis in case of the adaptive greedy sampling approach.  The computational time t ρ I required to locate the optimal parameter group by constructing a surrogate model for one greedy iteration is approximately 8 seconds. t ρ I is nothing but the time required to complete a while loop outlined in the algorithm 4 for a single greedy iteration. Thus, the total time contributed to generate the reduced-order basis via surrogate modeling is I max × t ρ I = 78.56 s, considering the adaptive greedy algorithm runs for I max iterations. Nonetheless, Fig. 10 shows that we can truncate the algorithm after 4 or 5 iterations as the residual error fall below 10 −4 .

32
The computational times required to simulate reduced-order models and high dimensional models is presented in Table 8. The evaluation columns give the time needed to solve the systems generated for both high dimensional models and reduced-order models. The time required to solve the complete system with a parameter space of 10000 × m for both a high dimensional model and a reduced-order model is given in the total time column. We can see that the evaluation time required for the reduced-order model is at least 18-20 times less than that of the high dimensional model. However, there is a slight increase in total time due to the addition of reduction time T Q . One can also observe that with an increase in the dimension d of the reduced-order model, the evaluation time increases significantly. The reduced-order model with the classical greedy sampling approach is at least 10-11 times faster than the high dimensional model. The time required to simulate the reduced-order model with the adaptive greedy sampling approach is a bit higher than its counterpart due to the time invested in building surrogate and error models. Despite of that, the reduced-order model is at least 8-9 times faster than the high dimensional model. The computational time presented in both tables considers that the greedy algorithms run for maximum iterations I max . However, we can truncate the algorithms after 4 or 5 iterations, i.e., we can practically achieve even more speedup than commented here.

Floater Scenario Values
To design a key information document, we need the values of the floater at different spot rates. The spot rate r sp is the yield rate at the first tenor point from the simulated yield curve. The value of a floater at the spot rate r sp is nothing but the value at that short rate r = r sp . For 10,000 simulated yield curves, we get 10,000 different spot rates and the corresponding values for the floater. These several thousand values are further used to calculate three different scenarios: (i) favorable scenario, (ii) moderate scenario, (iii) unfavorable scenario. The favorable, moderate, and unfavorable scenario values are the values at 90th percentile, 50th percentile and 10th percentile of 10,000 values respectively.

Conclusion
This paper presents a parametric model order reduction approach for a high dimensional convection-diffusionreaction PDE that arises in the analysis of financial risk. The high dimensional parameter space with timedependent parameters is generated via the calibration of financial models based on market structures. A finite difference method has been implemented to solve such a high dimensional PDE. The selection of parameters to obtain the reduced-order basis is of utmost importance. We have established a greedy approach for parameter sampling, and we noticed that there are some parameter groups for which the classical greedy sampling approach gave unsatisfactory results. To overcome these drawbacks, we applied an adaptive greedy sampling method using a surrogate model for the error estimator that is constructed for the entire parameter space and further used to locate the parameters which most likely maximizes the error estimator. The surrogate model is built using the principal component regression technique.
We tested the designed algorithms for a numerical example of a floater with cap and floor solved using the Hull-White model. The results indicate the computational advantages of the parametric model order reduction technique for the short-rate models. A reduced-order model of dimension d = 6 was enough to reach an accuracy of 0.01%. The reduced-order model was at least 10-12 times faster than that of the high dimensional model. The developed model order reduction approach shows potential applications in the historical or Monte Carlo value at risk calculations as well, where a large number of simulations need to be performed for the underlying instrument.

A. Relation between a singular value decomposition and a principal component analysis
Consider a data matrix X of size n × m, where n is the number of samples and m is the number of variables. Let us also assume that the data matrix X is centered. We can decompose the matrix X using singular value decomposition (SVD) as X = ΦΣΨ T , where the matrices Φ and Ψ contain left and right singular vectors of the matrix X respectively. The matrix Σ is a diagonal matrix having singular values Σ i arranged in descending order along the diagonal. Consider a covariance matrix C of order m × m as follows: We now compute an eigendecomposition, also known as spectral decomposition of the covariance matrix as where Ψ is the matrix of eigenvectors andΣ is a diagonal matrix having eigenvalues λ i arranged in descending order along the diagonal. In the principal component analysis (PCA), the columns of a matrix XΨ are known as the principal components, while the columns of the matrix Ψ are known as principal directions or PCA loading. Furthermore, one can easily see the similarities between the SVD and the PCA. We can write a covariance matrix as follows X T X = (ΦΣΨ T ) T ΦΣΨ T , where Φ T Φ = I. Now we can see that the right singular vectors of the matrix X are simply the eigenvectors of the matrix C, and the singular values of X are related to the eigenvalues of C as λ i = Σ 2 i .