# Model order reduction for the simulation of parametric interest rate models in financial risk analysis

## Abstract

This paper presents a model order reduction approach for large scale high dimensional parametric models arising in the analysis of financial risk. To understand the risks associated with a financial product, one has to perform several thousand computationally demanding simulations of the model which require efficient algorithms. We establish a model reduction approach based on a variant of the proper orthogonal decomposition method to generate small model approximations for the high dimensional parametric convection-diffusion-reaction partial differential equations. This approach requires to solve the full model at some selected parameter values to generate a reduced basis. We propose an adaptive greedy sampling technique based on surrogate modeling for the selection of the sample parameter set. The new technique is analyzed, implemented, and tested on industrial data of a floater with cap and floor under the Hull–White model. The results illustrate that the reduced model approach works well for short-rate models.

## Introduction

It is essential to be aware of the financial risk associated with an invested product. The risk analysis of financial instruments often requires the valuation of such instruments under a wide range of future market scenarios. The market scenarios (e.g., values of equity prices, stock indices, foreign exchange rates, interest rates) are then input parameters in a valuation function that delivers the fair value of such financial instruments. Examples of such risk analysis tasks are the calculation of Value-at-Risk (VaR) or the expected shortfall to estimate worst-case scenarios of financial holdings . Three different VaR approaches are widely used: (1) The parametric VaR assumes a given distribution of underlying risk factors (e.g., equity prices or processes of stock indices). The future parametric value of a financial instrument can then be estimated via its sensitivities with respect to the underlying risk factors. (2) The historical VaR approach applies historic day-to-day changes of the underlying prices to the current status (today) to obtain an empiric distribution of tomorrow’s underlying prices. When a data range of one year (approx. 250 business days) is used, the value of the financial holding has to be calculated 250 times. (3) Monte Carlo VaR applies a Monte Carlo simulation to obtain a wide range of underlying values. Again, to calculate the values of the instrument, a valuation function has to be used. To obtain reasonably accurate results, Monte Carlo VaR typically needs thousands or tens of thousands of valuations.

Credit and debt value adjustments (CVA, DVA) are estimates of possible losses when a counterparty in a financial portfolio defaults . The difference between the credit-risk-free portfolio value and the true portfolio value in the presence of credit risk depends on the future liabilities of the counterparty and its default risk. For the generation of exposure scenarios, Monte Carlo simulation is the method of choice. Frequently, the positions between two market participants (often banks) contain many different risk factors (e.g., instruments in different currencies). In such cases, exposure calculation is a high-dimensional problem.

Packaged retail and insurance-based investment products (PRIIPs) are financial instruments that are packaged and offered to retail investors. We will come back to the definition of packaged and their categorization in Sect. 2. In order to make PRIIPs from different manufacturers more comparable concerning their risks and returns, the European regulation (EU) 1286/2014 requires manufacturers of PRIIPs to supply key information documents (KIDs) to possible retail investors that are easy to read and to understand . The commission delegated regulation (EU) 2017/653 formulates the details of how the risk and the possible returns of a PRIIP have to be calculated . There are four different categories of PRIIPs. In this paper, we concentrate on Category 3 instruments, for which at least $$10{,}000$$ market data scenarios have to be generated, and the PRIIP has to be valuated under these scenarios. While the time horizon (at which the distribution of future values is to be simulated) of VaR calculations is typically quite short (one business day or ten business days, sometimes until the end of the fiscal quarter), the time horizon of PRIIPs is the recommended holding period (RHP) of the PRIIP. This is typically in the range of years or even (e.g., in the case of structured life insurance contracts) decades. As a consequence, the calculation of PRIIPs key figures is computationally much more challenging than a VaR calculation of the same instrument. The commission delegated regulation for PRIIPs does not prescribe which valuation functions should be used. Therefore, one can use any appropriate valuation function, namely plain discounting cash flows, closed-form solutions of Black–Scholes or Black 76 , numerical schemes for finance-related partial differential equations (PDEs) , or Monte Carlo and Quasi Monte Carlos Schemes for stochastic differential equations arising from e.g., Libor market models [8, 9]. When a financial instrument is equipped with optionalities like callabilities (for the issuer) or putabilities (for the investor) or when the instrument is path-dependent (e.g., autocallable instruments, target redemption notes), the valuation functions are more complex PDEs .

In this paper, the financial instruments are evaluated via the dynamics of short-rate models , based on convection-diffusion-reaction partial differential equations. The choice of the short-rate model depends on the underlying financial instrument. Some of the prominent financial models are the one-factor Hull–White model , the shifted Black–Karasinski model , and the two-factor Hull–White model . These models are calibrated based on market data like yield curves that generate a high dimensional parameter space . In short, to perform the risk analysis, the financial model needs to be solved for such a high dimensional parameter space, and this requires efficient algorithms. For instruments with high complexity and long-time horizons, computing times of minutes for a single valuation are not unusual.

Thus, in this paper, we establish a parametric model order reduction (MOR) approach based on a variant of the proper orthogonal decomposition approach, which significantly reduces the overall computation time [17, 18]; (the POD method is also known as the Karhunen–Loéve decomposition  or principal component analysis  in statistics). This nonlinear MOR approach is computationally feasible  as it determines low dimensional linear (or affine) subspaces [22, 23] via a truncated singular value decomposition of a snapshot matrix , that is computed by simulating the high dimensional parametric model for a small number of pre-selected training parameter values. The question of how to select these parameters is often the most challenging part of the process. Typically, one uses a fixed or uniform sampling , but this may neglect vital regions in the case of high dimensional parameter spaces.

We present a greedy sampling algorithm to determine the best suitable parameter set, see also . The basic idea is to select the parameters at which the error between the reduced and the high dimensional model is maximal. We compute the snapshots using these parameters and thus obtain the best suitable reduced basis to generate a fairly accurate reduced model. Since the calculation of the relative error between the full and reduced model is expensive, we also develop error estimators similar to [29, 30]. The classical greedy sampling algorithm then picks optimal parameters which yield the highest values for the error estimator. However, it is not computationally feasible to compute an error estimator for the entire parameter space, so we locate the parameters using an adaptive method by constructing a surrogate model for the error estimator, similar to [26, 28]. We construct this surrogate model based on the principal component regression (PCR) technique .

To summarize, this paper presents an approach to select the most prominent parameters (or scenarios) for which we solve the high dimensional model to obtain a reduced order model. Thus, instead of performing thousands of costly simulations, we perform few expensive computations and simulate the remaining scenarios with the help of the reduced model. In this work, we focus on the one-factor Hull–White model , and we illustrate the results with a numerical example of a floater with cap and floor . For more details, see the technical report .

The paper is organized as follows. Section 2 discusses European regulations of PRIIPs and presents a flowchart explaining the calculation steps for a Category 3 PRIIP. Section 3 explains the construction of a short-rate model. The projection-based model order reduction technique is presented in Sect. 4. The selection of optimal sampling parameters based on the classical greedy approach is presented in Sect. 4.1 and based on the adaptive greedy method in Sects. 4.2 and 4.3. Numerical results for the example of a floater are presented in Sect. 5. Appendix B addresses the simulation procedure for yield curves and the calibration of model parameters based on these simulated yield curves is described in Appendix C. In Appendix D a finite difference method for the Hull–White model is presented.

## The European regulation of PRIIPs

The keywords of PRIIPs are packaged and retail investment. When a financial instrument manufacturer has to decide if a key information document (KID) has to be provided, the following two questions have to be answered with yes. Is the instrument offered to retail investors, and is it packaged? If the instrument is only offered to professional investors, e.g., banks, insurance firms, capital management firms, then no KID has to be provided. Examples are fix-to-floating swaps or caps between banks. If the same fix-to-floating swap is offered to a retail investor, a KID has to be supplied. Again, if an instrument is offered to a retail investor but not packaged, no KID is necessary. Examples of such instruments are bonds with fixed-rate interest, shares of companies traded at exchanges.

A KID must contain the following points on not more than three A4 pages:

1. 1.

General information on the product: Name of the product, name of the manufacturer, and its web address, date of the KID, name of the legal authorities for supervision.

2. 2.

What is this product?

3. 3.

What are the risks and what could I get in return?

4. 4.

What happens if [the name of the PRIIP manufacturer] is unable to pay out?

5. 5.

What are the costs?

6. 6.

How long should I hold it and can I take my money out early?

7. 7.

How can I complain?

8. 8.

Other relevant information.

To answer the question No. 3, one has to perform several thousand costly numerical simulations of the underlying instrument using an appropriate financial model. The commission delegated regulation (EU) 2017/653 formulates the details of how the risk and the possible returns of a PRIIP have to be calculated. The delegated regulation organizes PRIIPs into four categories.

1. Category 1:

PRIIPs include instruments where the investor can loose more than the amount invested, but also options, futures, swaps instruments where prices are available on a basis less than monthly. A Libor cap is one of the examples.

2. Category 2:

These instruments are either direct or synthetic linear investments for which at least two years of daily prices are available. (When prices are quoted less than daily, the time series has to be longer.) An example would be a synthetic exchange-traded fund based on DAX or the S&P500 index.

3. Category 3:

These instruments are nonlinear investments, again with a time series of prices as with Category 2 instruments. Examples are many structured financial instruments, e.g., a floater with caps and floors.

4. Category 4:

In these instruments the values depend on factors not observed in the market. Examples are life insurance instruments.

In this paper, we deal with the Category 3 instruments. The calculation for a Category 3 instrument starts by determining a recommended holding period. In most cases, manufacturers recommend holding the instrument until maturity. At the recommended holding period, a VaR equivalent volatility (VEV) has to be determined to calculate a market risk indicator (1 is very low risk, 7 is very high risk). Market risk classifications (MRM) are shown in Table 1. For the market risk indicator, $$10{,}000$$ different simulations of a market risk factor (e.g., yield curve) have to be performed. For each of those scenarios, the instrument has to be valuated. From the distribution of the resulting values, the VaR can be calculated. The VaR shall be the value of the PRIIP at a confidence level of 97.5% at the end of the recommended holding period discounted to the present date using the expected risk-free discount factor (DF) from the present date to the end of the recommended holding period. Let $$P_{97.5}$$ be the value of the PRIIP at a confidence level of 97.5% at the end of the recommended holding period. Furthermore, we calculate $$\operatorname{VaR}_{\mathrm{price\ space}}$$ by multiplying $$P_{97.5}$$ by the discount factor

\begin{aligned} \mathrm{DF} = \operatorname{exp}\bigl(r(t_{0},t_{k}) (t_{k} - t_{0})\bigr), \end{aligned}

where $$r(t_{0},t_{k})$$ is the yield at the tenor point $$t_{k}$$, $$t_{0}$$ is today and $$t_{k}$$ is the maturity, i.e., the recommended holding period T. If the PRIIP prices are log-normal, then the distribution of its logarithm can be given as $$\mathcal{N} (- \mathrm{VEV}^{2} / 2)T, \mathrm{VEV}^{2} T)$$. Using this lognormal assumption, the 97.5 percentile of VaR in price space would be

\begin{aligned} \begin{aligned}& \operatorname{VaR}_{\mathrm{price\ space}} = \operatorname{exp}\biggl(- \frac{T}{2} \mathrm{VEV}^{2} + z_{\alpha =0.025}\mathrm{VEV}\sqrt{T}\biggr), \\ &\operatorname{ln}( \operatorname{VaR}_{\mathrm{price\ space}}) = - \frac{T}{2} \mathrm{VEV}^{2} - 1.96 \times \mathrm{VEV}\sqrt{T}, \\ &\mathrm{VEV}^{2} \cdot T + 3.92 \times \mathrm{VEV}\sqrt{T} + 2 \times \operatorname{ln}(\operatorname{VaR}_{\mathrm{price\ space}}) = 0, \end{aligned} \end{aligned}

where $$z_{\alpha =0.025} = -1.96$$. We solve this quadratic equation for VEV

\begin{aligned} \begin{aligned} \mathrm{VEV} &= \frac{-3.92 \sqrt{T} \pm \sqrt{(3.92)^{2} T - 8 T \times \operatorname{ln}( \operatorname{VaR}_{\mathrm{price\ space}}})}{2T} \\ &= \frac{-1.96 \pm \sqrt{3.842 - 2 \times \operatorname{ln}(\operatorname{VaR}_{\mathrm{price}\ \mathrm{space}})}}{\sqrt{T}}, \end{aligned} \end{aligned}

and consider only the positive value

\begin{aligned} \mathrm{VEV} = \frac{\sqrt{(3.842 - 2 \times \operatorname{ln}(\operatorname{VaR}_{\mathrm{price\ space}}))} - 1.96}{\sqrt{T}}. \end{aligned}

Then the instrument is simulated for $$10{,}000$$ different market scenarios, and the $$10{,}000$$ values are sorted into favorable, moderate, and unfavorable performance scenarios, which are the values at 90th, 50th, and 10th percentile, respectively. If the recommended holding period is less than one year, then the performance scenarios are calculated only at the RHP. However, if the recommended holding period is longer than one year, then the performance scenarios at the intermediate holding periods (1 year, RHP/2, RHP) have to be calculated. To be consistent with Category 2 instruments, the measure used after the valuation date is the risk-free measure and it is the physical measure before. Flowchart 1 shows the calculation steps for Category 3 instruments. For more details, see the joint committee of the European Supervisory Authorities flowcharts for the risk and reward calculations in the PRIIPs KID .

The calibration of the financial model generates a high dimensional parameter space which can be computationally costly, when no analytic solutions are available, e.g., in the case of the Black Karasinski model. Since we need to valuate the financial instrument for such a high dimensional parameter space, we establish a parametric model order reduction approach.

## The mathematical model

The management of interest rate risks, i.e., the control of change in future cash flows due to the fluctuations in interest rates is of great importance. Especially, the pricing of products based on the stochastic nature of the interest rate creates the necessity for mathematical models for an underlying financial instrument. In Appendix A we describe the derivation of such a model and consider in particular the Hull–White model [32, 36] which is given by the convection-diffusion-reaction type partial differential equation (PDE)  of the form

\begin{aligned} \frac{\partial V}{\partial t} + \underbrace{\bigl(a(t) - b(t)r(t)\bigr) \frac{\partial V}{\partial r}}_{\text{convection}} + \underbrace{\frac{1}{2} \sigma (t)^{2} \frac{\partial ^{2} V}{\partial r^{2}}}_{\text{diffusion}} - \underbrace{r(t)V}_{\text{reaction}} = 0, \end{aligned}
(1)

where V is an interest rate instrument with maturity T and $$r(t)$$ is the short rate satisfying a stochastic differential equation (SDE)

\begin{aligned} dr(t) = \bigl(a(t) - b(t)r(t)\bigr)\,dt + \sigma (t)\,dW(t), \end{aligned}

with time-dependent parameters $$a(t)$$, $$b(t)$$, and $$\sigma (t)$$ and white noise $$W(t)$$. We solve this SDE for an underlying instrument V by converting it into the PDE defined by (1). The detailed derivation of the PDE is given in Appendix A. The regulation does not prescribe how to obtain $$b(t)$$ and $$\sigma (t)$$; thus, considering them constants should lead to more robust results than time-dependent parameters, see also , so in our approach we consider the resulting robust Hull–White model with constant $$b(t)=b$$ and $$\sigma (t)=\sigma$$ and time-dependent parameter $$a(t)$$ for yield curve simulation and parameter calibration, which determine the average direction in which the short-rate $$r(t)$$ moves. Our results can, however, be extended to the more general case. According to the PRIIPs regulation, for this we have to perform at least $$s=10{,}000$$ yield curve simulations. We construct a simulated yield curve matrix (15)

\begin{aligned} Y = \begin{bmatrix} y_{11} & \cdots & y_{1m} \\ \vdots & \vdots & \vdots \\ y_{s1} & \cdots & y_{sm} \end{bmatrix} \in \mathbb{R}^{s\times m}, \end{aligned}

which is then used to calibrate the parameter $$a(t)$$. We obtain s different piecewise constant parameters $$a_{\ell }(t)$$, which change their values $$\alpha _{\ell,i}$$ only at the m tenor points. We incorporate these in a matrix (19)

\begin{aligned} {\mathcal{A}}= \begin{bmatrix} \alpha _{11} & \cdots & \alpha _{1m} \\ \vdots & \vdots & \vdots \\ \alpha _{s1} & \cdots & \alpha _{sm} \end{bmatrix}. \end{aligned}

Appendix B presents a detailed procedure for the yield curve simulation and the calibration of $$a(t)$$ based on simulated yield curves is described in Appendix C.

To solve (1) numerically, we apply an upwind scheme for the convection term , and a standard second order finite difference scheme for the diffusion term, combined with a trapezoidal rule  for the time discretization, for more details see the technical report  and Appendix D. This discretization is a parametric high dimensional model of the form

\begin{aligned} A\bigl(\rho _{\ell }(t)\bigr)V^{n-1} = B\bigl(\rho _{\ell }(t)\bigr)V^{n}, \end{aligned}
(2)

with a given terminal vector, and matrices $$A(\rho _{\ell }) \in \mathbb{R}^{M\times M}$$, and $$B(\rho _{\ell }) \in \mathbb{R}^{M\times M}$$. We call this the full model for the model reduction procedure in the next section. We solve (2) by propagating backward in time. Here again $$\ell = 1,\dots,s=10{,} 000$$, m is the total number of tenor points, and we need to solve this system at each time step n with an appropriate boundary condition and a known terminal value for the underlying instrument. Altogether we have a parameter space $$\mathcal{P}$$ of size $$10{,} 000 \times m$$ to which we now apply model reduction.

## Parametric model order reduction

To perform the parametric model reduction for system (2), we employ Galerkin projection onto a low dimensional subspace via

\begin{aligned} \bar{V}^{n} = QV^{n}_{d}, \end{aligned}
(3)

where the columns of $$Q \in \mathbb{R}^{M\times d}$$ represent the reduced-order basis with $$d \ll M$$, $$V_{d}^{n}$$ is a vector of reduced coordinates, and $$\bar{V}^{n} \in \mathbb{R}^{M}$$ is the solution in the nth time step obtained using the reduced order model. For the Galerkin projection we require that the residual of the reduced state

\begin{aligned} p^{n}\bigl(V_{d}^{n},\rho _{\ell }\bigr) = A(\rho _{\ell })QV^{n-1}_{d} - B(\rho _{\ell })QV^{n}_{d} \end{aligned}

is orthogonal to the reduced basis matrix Q, i.e.,

\begin{aligned} Q^{T}p^{n}\bigl(V^{n}_{d},\rho _{\ell }\bigr) = 0, \end{aligned}
(4)

so that by multiplying $$p^{n}(V_{d}^{n},\rho _{\ell })$$ with $$Q^{T}$$, we get

\begin{aligned} \begin{aligned}& Q^{T}A(\rho _{\ell })QV^{n-1}_{d} = Q^{T}B(\rho _{\ell })QV^{n}_{d}, \\ &A_{d}(\rho _{\ell })V^{n-1}_{d}= B_{d}(\rho _{\ell })V^{n}_{d}, \end{aligned} \end{aligned}
(5)

where $$A_{d}(\rho _{\ell }) \in \mathbb{R}^{d\times d}$$ and $$B_{d}(\rho _{\ell }) \in \mathbb{R}^{d \times d}$$ are the parameter dependent reduced matrices.

We obtain the Galerkin projection matrix Q in (3) based on a proper orthogonal decomposition (POD) approach, which generates an optimal order orthonormal basis Q in the least square sense that is independent of the parameter space $$\mathcal{P}$$ and we do this by the method of snapshots. The snapshots are nothing but the state solutions obtained by simulating the full model for selected parameter groups. We assume that we have a training set of parameter groups $$\rho _{1}, \ldots,\rho _{k}\in [\rho _{1},\rho _{s}]$$. We compute the solutions of the full model for this training set and combine them in a snapshot matrix $$\hat{V} = [V(\rho _{1}),V(\rho _{2}),\ldots,V(\rho _{k})]$$. The POD method computes

\begin{aligned} \mathrm{POD}(\hat{V}):= \mathop{\operatorname{argmin}}_{Q} \frac{1}{k} \sum_{i=1}^{k} \bigl\Vert V_{i} - QQ^{T}V_{i} \bigr\Vert ^{2}, \end{aligned}

for an orthogonal matrix $$Q \in \mathbb{R}^{M \times d}$$ via a truncated SVD, see ,

\begin{aligned} \hat{V} = \Phi \Sigma \Psi ^{T}=\sum_{i=1}^{k} \Sigma _{i}\phi _{i} \psi _{i}^{T}, \end{aligned}

where $$\phi _{i}$$ and $$\psi _{i}$$ are the left and right singular vectors of the matrix respectively, and $$\Sigma _{i}$$ are the singular values. The truncated SVD computes only the first k columns of the matrix Φ. The optimal projection subspace Q then consists of d left singular vectors $$\phi _{i}$$ known as POD modes. The dimension d of the subspace Q is chosen such that we get a good approximation of the snapshot matrix. According to , large singular values correspond to the main characteristics of the system, while small singular values give only small perturbations of the overall dynamics. The relative importance of the ith POD mode of the matrix is determined by the relative energy $$\Xi _{i}$$ of that mode

\begin{aligned} \Xi _{i} = \frac{\Sigma _{i}}{\sum_{i=1}^{k} \Sigma _{i}}. \end{aligned}
(6)

If the sum of the energies of the generated modes is 1, then these modes can be used to reconstruct a snapshot matrix completely . In general, the number of modes required to generate the complete data set is significantly less than the total number of POD modes . Thus, a matrix can be accurately approximated by using POD modes whose corresponding energies sum to almost all of the total energy. Thus, we choose only d out of k POD modes to construct $$Q = [\phi _{1} \cdots \phi _{d}]$$ which is a parameter independent projection space based on (6).

We summarize the procedure in Algorithm 1, where we denote by $$V(\rho _{i})$$ the solution of the full model for a parameter set $$\rho _{i}$$, by the snapshot matrix, by Φ (Ψ) the matrix of left (right) singular vectors, associated with the diagonal matrix of singular values Σ, and by $$\Xi _{j}$$ the relative energy of the jth POD mode.

It is evident that the quality of the reduced model strongly depends on the selection of parameter groups $$\rho _{1},\ldots,\rho _{k}$$ that are used to compute the snapshots. Hence it is essential to introduce an efficient sampling technique for the parameter space. We could consider standard sampling techniques, like uniform sampling or random sampling . However, these techniques may neglect vital regions within the parameter space. As an alternative, a greedy sampling method has been suggested in the framework of model order reduction, see .

### Greedy sampling method

The greedy sampling technique selects the parameter groups at which the error between the reduced model and the full model is maximal. We compute the snapshots using these parameter groups in such a way that we obtain an optimal reduced basis Q. Let $$\|e\| = \frac{\|{V - \bar{V}}\|}{\| V\|}$$ be the relative error between the reduced and full model and set

\begin{aligned} \rho _{I} = \mathop{\operatorname{argmax}}_{\rho \in \mathcal{P}} \Vert e \Vert . \end{aligned}

At each greedy iteration $$i = 1,\ldots,I_{\mathrm{max}}$$, the greedy sampling algorithm selects the optimal parameter group $$\rho _{I}$$ that maximizes the relative error $$\|e\|$$. However, the computation of the relative error $$\|{e}\|$$ is computationally costly as it entails the solution of the full model. Thus, usually, the relative error is replaced by error bounds or the residual $$\|p\|$$. However, in some cases, it is not possible to obtain error bounds, see . Let ε be the error estimator, i.e., in our case the norm of the residual. At each iteration $$i = 1,\ldots,I_{\mathrm{max}}$$, the classical greedy sampling algorithm chooses the parameter group as the maximizer

\begin{aligned} \rho _{I} = \mathop{\operatorname{argmax}}_{\rho \in \mathcal{P}} \varepsilon (\rho ). \end{aligned}
(7)

The algorithm is initiated by selecting a parameter group $$\rho _{1}$$ from the parameter set $$\mathcal{P}$$ and computing a reduced basis $$Q_{1}$$ as in Sect. 4. Choosing a pre-defined parameter set $$\hat{\mathcal{P}}$$ of cardinality c randomly from the set $$\mathcal{P}$$, at each point of $$\hat{\mathcal{P}}$$ the algorithm determines a reduced order model with reduced basis $$Q_{1}$$ and then computes error estimator values, $$\varepsilon (\rho _{j})_{j=1}^{c}$$. The parameter group in $$\hat{\mathcal{P}}$$ at which the error estimator is maximal is then selected as the optimal parameter group $$\rho _{I}$$. Then the full order model is simulated for this parameter group and the snapshot matrix is updated. Finally, a new reduced basis is obtained by computing a truncated singular value decomposition of the updated snapshot matrix, as in Algorithm 1. These steps are then repeated for $$I_{\mathrm{max}}$$ iterations or until the maximal value of the error estimator is lower than the specified tolerance $$\varepsilon _{\mathrm{tol}}$$. The procedure for the classical greedy approach is summarized in Algorithm 2.

The classical greedy sampling method computes an inexpensive a posteriori error estimator for the reduced model. However, it is not feasible to calculate the error estimator values for the entire parameter space $$\mathcal{P}$$. To address this, the classical greedy sampling technique chooses the pre-defined parameter set $$\hat{\mathcal{P}}$$ randomly as a subset of $$\mathcal{P}$$. Random sampling is designed to represent the whole parameter space $$\mathcal{P}$$, but there is no guarantee that $$\hat{\mathcal{P}}$$ will reflect the complete space $$\mathcal{P}$$, since the random selection may neglect parameter groups corresponding to the most significant error. These observations motivate us to design a new criterion for the selection of the subset $$\hat{\mathcal{P}}$$.

Another drawback of the classical greedy sampling technique is that we have to specify the maximum error estimator tolerance $$\varepsilon _{\mathrm{tol}}$$. The error estimator usually depends on some error bound, which may not be tight or may not exist. To overcome this drawback, following ideas in , we establish a strategy to construct a surrogate model for the error as a function of the error estimator and we use this error model to control the convergence of the greedy sampling algorithm.

To overcome drawbacks of the classical greedy sampling approach, we implemented the adaptive sampling approach which selects the parameter groups adaptively at each iteration of the greedy procedure, using an optimized search based on surrogate modeling. We construct a surrogate model ε̄ of the error estimator to approximate the error estimator ε over the entire parameter space. Further, we use this surrogate model to select the parameter groups $$\hat{\mathcal{P}_{k}} = \{\rho _{1},\ldots,\rho _{c_{k}}\}$$ with $$c_{k} < c$$, where the values of the error estimator are highest. For each parameter group within the parameter set $$\hat{\mathcal{P}_{k}}$$, we determine a reduced model and compute the values of the error estimator. Then the process repeats itself until the total number of parameter groups reaches c, resulting in the desired parameter set $$\hat{\mathcal{P}}$$.

The first stage of the adaptive greedy sampling algorithm computes the error estimator over a randomly selected parameter set $$\hat{\mathcal{P}_{0}}$$ of cardinality $$c_{0}$$. The algorithm uses these error estimator values $$\{\varepsilon _{i} \}_{i=1}^{c_{0}}$$ to build a surrogate model $$\bar{\varepsilon }_{0}$$ and locates the $$c_{k}$$ parameter groups corresponding to the $$c_{k}$$ maximal values of the surrogate model. This process repeats itself for $$k = 1,\ldots,K$$ iterations until the total number of parameter groups reaches c. Finally, the optimal parameter group $$\rho _{I}$$ is the one that maximizes the error estimator within the parameter set

\begin{aligned} \hat{\mathcal{P}} = \hat{\mathcal{P}}_{0} \cup \hat{ \mathcal{P}}_{1} \cup \hat{\mathcal{P}}_{2} \cup \cdots \cup \hat{\mathcal{P}}_{K}. \end{aligned}

Thus, at the kth iteration, a surrogate model $$\bar{\varepsilon }_{k}$$ is constructed that approximates the error estimator over the entire parameter space $$\mathcal{P}$$. There are different choices to build a surrogate model . In this paper, we use the principal component regression (PCR) technique. Suppose $${\hat{\varepsilon }} = (\varepsilon _{1},\ldots,\varepsilon _{c_{k}}) \in \mathbb{R}^{c_{k} \times 1}$$ is the vector of error estimator values at the kth iteration. Since the parameters b and σ are assumed constant, we build a surrogate model with the parameter $$a(t)$$ only. Let $$\hat{\mathcal{P}_{k}} = [\rho _{1},\ldots,\rho _{c_{k}}] \in \mathbb{R}^{c_{k} \times {m}}$$ be the matrix composed of $$c_{k}$$ parameter groups at the kth iteration. The rows of the matrix $$\hat{\mathcal{P}_{k}}$$ represent $$c_{k}$$ parameter vectors, while the m columns represent m tenor points for the parameter vector $$a(t)$$. We can fit a simple multiple regression model as

\begin{aligned} {\hat{\varepsilon }} = \hat{\mathcal{P}_{k}} \cdot \eta + \mathrm{err}, \end{aligned}
(8)

where $$\eta = [\eta _{1},\ldots,\eta _{m}]$$ is an array containing regression coefficients and err is an array of residuals. The least square estimate of η is obtained as

\begin{aligned} \hat{\eta } = \mathop{\operatorname{argmin}}_{\eta } \Vert {\hat{\varepsilon }} - \hat{\mathcal{P}_{k}} \cdot \eta \Vert _{2}^{2} = \mathop{\operatorname{argmin}}_{\eta } \Biggl\Vert {\hat{\varepsilon }} - \sum _{i=1}^{m} \rho _{i}\eta _{i} \Biggr\Vert _{2}^{2}. \end{aligned}

If $$c_{k}$$ is not much larger than m, then the model may give weak predictions due to the risk of over-fitting for the parameter groups which are not used in model training. Also, if $$c_{k}$$ is smaller than m, then the least square approach cannot produce a unique solution, restricting the use of the simple linear regression model. We may face this problem during the first few iterations of the adaptive greedy sampling algorithm, as we will have few error estimator values to build a reasonably accurate model.

The principal component regression (PCR) technique is a dimension reduction technique in which m explanatory variables are replaced by p linearly uncorrelated variables called principal components. The dimension reduction is achieved by considering only a few relevant principal components. The PCR approach helps to reduce the problem of estimating m coefficients to the more simpler problem of determining p coefficients. In the following, we describe the method to construct a surrogate model at the kth iteration in detail.

Before performing a principal component analysis (PCA), we center both the response vector ε̂ and the data matrix $$\hat{\mathcal{P}_{k}}$$. The PCR starts by performing a PCA of the matrix $$\hat{\mathcal{P}_{k}}$$. For this, we compute an SVD

\begin{aligned} \hat{\mathcal{P}_{k}} = \hat{\Phi }\hat{\Sigma }\hat{\Psi }, \end{aligned}

and then the principal components are the columns of the matrix $$\hat{\mathcal{P}_{k}} \hat{\Psi }$$. For dimension reduction, we select only p columns of the matrix Ψ̂ to construct a fairly accurate reduced model. In  it is suggested that the first three or four principal components are enough to analyze the yield curve changes. Let $$Z = \hat{\mathcal{P}_{k}}\hat{\Psi }_{p} = [\hat{\mathcal{P}_{k}} \hat{\psi }_{1},\ldots,\hat{\mathcal{P}_{k}}\hat{\psi }_{p}]$$ be the matrix containing the first p principal components. We regress ε̂ on these principal components via

\begin{aligned} {\hat{\varepsilon }} = Z \Omega + \mathrm{err}, \end{aligned}

where $$\Omega = [\omega _{1},\ldots,\omega _{p}]$$ is the vector containing the regression coefficients obtained using the principal components. The least square estimate for Ω is given as

\begin{aligned} \hat{\Omega } = \mathop{\operatorname{argmin}}_{\Omega } \Vert {\hat{\varepsilon }} - Z \Omega \Vert _{2}^{2} = \mathop{\operatorname{argmin}}_{\omega } \Biggl\Vert { \hat{\varepsilon }} - \sum_{i=1}^{p} z_{i}\omega _{i} \Biggr\Vert _{2}^{2}. \end{aligned}

We obtain the PCR estimate $$\eta _{\mathrm{PCR}} \in \mathbb{R}^{m}$$ of the regression coefficients η as

\begin{aligned} \eta _{\mathrm{PCR}} = \hat{\Psi }_{p}\hat{\Omega }. \end{aligned}

Finally, the value of the surrogate model for any parameter vector $$a_{\ell }= [a_{\ell.1},\ldots,a_{\ell, m}]$$ is

\begin{aligned} \bar{\varepsilon }(\rho _{\ell }) = \eta _{1} a_{\ell,1} + \cdots + \eta _{m} a_{\ell,m}. \end{aligned}

Algorithm 3 presents this surrogate error model, which we will use in the construction of an adaptive greedy sampling method.

The adaptive greedy sampling algorithm utilizes the designed surrogate model to locate optimal parameter groups adaptively at each greedy iteration $$i = 1,\ldots,I_{\operatorname{max}}$$. The first few steps of the algorithm resemble the classical greedy sampling approach. First the parameter group $$\rho _{1}$$ is selected from the parameter space $$\mathcal{P}$$ and the reduced basis $$Q_{1}$$ is constructed. Furthermore, the algorithm randomly selects $$c_{0}$$ parameter groups and constructs a temporary parameter set $$\hat{\mathcal{P}}_{0} = \{\rho _{1},\ldots,\rho _{c_{0}}\}$$. For each parameter group in the parameter set $$\hat{\mathcal{P}}_{0}$$, the algorithm determines a reduced order model and computes an array of corresponding residual errors $$\varepsilon ^{0} = \{\varepsilon (\rho _{1}),\ldots,\varepsilon (\rho _{c_{0}}) \}$$. Then a surrogate model for the error estimator ε̄ is constructed based on the estimator values $$\{\varepsilon (\rho _{j})\}_{j=1}^{c_{0}}$$, as discussed in Sect. 4.2. The obtained surrogate model is then simulated for the entire parameter space $$\mathcal{P}$$. Furthermore, we locate $$c_{k}$$ parameter groups corresponding to the first $$c_{k}$$ maximal values of the surrogate model. We then construct a new parameter set $$\hat{\mathcal{P}}_{k} = \{\rho _{1},\ldots,\rho _{k}\}$$ composed of these $$c_{k}$$ parameter groups.

The algorithm determines a reduced model for each parameter group within the parameter set $$\hat{\mathcal{P}}_{k}$$ and obtains an array of error estimator values $$\varepsilon ^{k} = \{\varepsilon (\rho _{1}),\ldots,\varepsilon (\rho _{c_{k}}) \}$$ for the parameter set $$\hat{\mathcal{P}}_{k}$$. Furthermore, we concatenate the set $$\hat{\mathcal{P}}_{k}$$ and the set $$\hat{\mathcal{P}}_{0}$$ to form a new parameter set $$\hat{\mathcal{P}} = \hat{\mathcal{P}}_{k} \cup \hat{\mathcal{P}}_{0}$$. Let $$e_{\mathrm{sg}} = \varepsilon ^{0} \cup \cdots \cup \varepsilon ^{k}$$ be the set composed of all the error estimator values available at the kth iteration. The algorithm then uses this error estimator set $$e_{\mathrm{sg}}$$ to build a new surrogate model. The quality of the surrogate model increases with each iteration as we get more error estimator values. This process is repeated until the cardinality of the set $$\hat{\mathcal{P}}$$ reaches c, giving

\begin{aligned} \hat{\mathcal{P}} = \hat{\mathcal{P}}_{0} \cup \hat{ \mathcal{P}}_{1} \cup \hat{\mathcal{P}}_{2} \cup \cdots \cup \hat{\mathcal{P}}_{K}. \end{aligned}

Finally, the optimal parameter group $$\rho _{I}$$ which maximizes the error estimator (7) is extracted from the parameter set $$\hat{\mathcal{P}}$$. Note that typically it is not necessary to obtain a very accurate sampling using the designed surrogate model. Sampling the full model in the neighborhood of the parameter group with maximal error is sufficient to obtain good results. To monitor the convergence of the adaptive greedy algorithm, we have designed an error model ē for the relative error e as a function of the residual error ε.

To build an approximate error model, we simulate one full model at each greedy iteration for the optimal parameter group $$\rho _{I}$$, and update the snapshot matrix . A new reduced basis Q is then obtained by computing the truncated singular value decomposition of the updated snapshot matrix as explained in Sect. 4. Furthermore, we solve the reduced model for the optimal parameter group before and, after updating the reduced basis, obtain the respective error estimator values $$\varepsilon ^{\mathrm{bef}}(\rho _{I})$$, and $$\varepsilon ^{\mathrm{aft}}(\rho _{I})$$. Then we compute the relative errors $$e^{\mathrm{bef}},e^{\mathrm{aft}}$$ between the full and reduced model constructed before and after updating the reduced basis.

In this way, at each greedy iteration, we get a set of error values $$E_{p}$$ that we use to construct a linear approximate error model for the exact error e based on the error estimator ε.

\begin{aligned} E_{p} = \bigl\{ \bigl(e^{\mathrm{bef}}_{1}, {\varepsilon }^{\mathrm{bef}}_{1}\bigr) \cup \bigl(e^{\mathrm{aft}}_{1},{ \varepsilon }^{\mathrm{aft}}_{1}\bigr),\ldots,\bigl(e^{\mathrm{bef}}_{i}, {\varepsilon }^{\mathrm{bef}}_{i}\bigr) \cup \bigl(e^{\mathrm{aft}}_{i},{ \varepsilon }^{\mathrm{aft}}_{i}\bigr)\bigr\} , \end{aligned}
(9)

as

\begin{aligned} \operatorname{log}(\bar{e}_{i}) = \gamma _{i} \operatorname{log}({\varepsilon }) + \operatorname{log}\tau. \end{aligned}
(10)

Setting $$\mathcal{Y} = \operatorname{log}(\bar{e}), \mathcal{X} = \operatorname{log}( \varepsilon )$$ and $$\hat{\tau } = \operatorname{log}(\tau )$$ we get

\begin{aligned} \mathcal{Y} = \gamma _{e} \mathcal{X} + \hat{\tau }, \end{aligned}

where $$\gamma _{e}$$ is the slope of the linear model and τ̂ is the intersection with the logarithmic axis $$\operatorname{log}(y)$$.

After each greedy iteration, we get more data points in the error set $$E_{p}$$, which increases the accuracy of the error model. In Sect. 5, we illustrate that this linear model is sufficient to achieve an accurate error model. The adaptive greedy sampling approach is summarized in Algorithm 4. The list of symbols used in Algorithm 4 is presented in Table 2.

## Numerical example

A numerical example of a floater with cap and floor  is used to test the developed algorithms and methods. We model the floater instrument using the Hull–White model and compare the results of the PDE model by discretizing as in Sect. D with the results of the reduced model. The reduced model is generated by implementing the POD method along with the classical and the adaptive greedy sampling techniques. The characteristics of the floater instrument are as shown in Table 3. The interest rates are capped at $$C_{R} = 2.25 \%$$ p.a. and floored at $$C_{F} = 0.5 \%$$ p.a. with the reference rate as Euribor3M. The coupon rates can be written as

\begin{aligned} \mathrm{coup} = \operatorname{min}\bigl(2.25 \%,\operatorname{max}(0.5 \%,\mathrm{Euribor3M})\bigr). \end{aligned}
(11)

Note that the coupon rate $$\mathrm{coup}^{(n)}$$ at time $$t_{n}$$ is set in advance as the coupon rate at $$t_{n-1}$$. All computations are carried out on a PC with 4 cores and 8 logical processors at 2.90 GHz (Intel i7 7th generation). We used MATLAB R2018a for the yield curve simulations and the model reduction. The numerical method for the yield curve simulations is tested with real market based historical data. Market data are available from market data providers like Thomson Reuters, Bloomberg, and several others. We obtained this data from MathConsult within their UnRisk Omega datasets . The daily interest rate data are collected at 21 tenor points in time over the past 5 years, where each year has 260 working days, so there are 1300 observation periods. We have used the inbuilt UnRisk tool for the parameter calibration, which is well integrated with Mathematica (version used: Mathematica 11.3). Further, we used calibrated parameters for the construction of a Hull–White model.

We have computed the model parameters as explained in Appendix C. The yield curve simulation is the first step to compute the model parameters. Based on the procedure described in Appendix B, we have performed the bootstrapping process for the recommended holding period of 10 years, i.e., for the maturity of the floater. The collected historical data has 21 tenor points and 1306 observation periods as follows (D: Day, M: Month, Y: Year):

\begin{aligned} \begin{aligned} &m=: \{1D, 3M, 6M, 1Y, 2Y, 3Y, \ldots, 10Y, 12Y, 15Y, 20Y, 25Y, 30Y, 40Y, 50Y\}, \\ &n=: \{\text{1306 daily interest rates at each tenor point}\}. \end{aligned} \end{aligned}

The $$10{,} 000$$ simulated yield curves in 10 years in the future are presented in Fig. 2. For the floater example, we need parameter values only until the 10Y tenor point (maturity of the floater). Henceforth, we consider the simulated yield curves with only the first 13 tenor points. The calibration generates the real parameter space of dimension $$\mathbb{R}^{10{,}000\times 13}$$ for the parameter $$a(t)$$. We considered the constant volatility $$\sigma =0.006$$ and the constant mean reversion $$b=0.015$$. All variable parameters are assumed to be piecewise constants between the tenor points $$(0-3M,3M-6M,6M-1Y,1Y-2Y,2Y-3Y,\ldots,9Y-10Y)$$. Figure 3 shows $$10{,}000$$ different piecewise constant parameters $$a(t)$$. The computational domain for a spatial dimension r is restricted as described in Sect. D. Here, $$r_{\mathrm{low}} = -0.1$$ and $$r_{\mathrm{up}}= 0.1$$. We applied homogeneous Neumann boundary conditions of the form

\begin{aligned} \frac{\partial V}{\partial r}\bigg|_{r_{\mathrm{low}}} = 0, \qquad\frac{\partial V}{\partial r}\bigg|_{r_{\mathrm{up}}} = 0. \end{aligned}
(12)

We divided the spatial domain into $$M = 600$$ equidistant grid points $$\{r(1),r(2), \dots,r(M)\}$$ and used the N time points (measured in days) starting from $$t=0$$ until maturity T, i.e., in our case, the number of days until maturity are assumed to be $$3600\approx 10Y$$ with an interval $$\tau =1$$. Rewriting (2), for given final value of V at the last tenor point, we obtain

\begin{aligned} A\bigl(\rho _{\ell }(t)\bigr)V^{n-1} = B\bigl(\rho _{\ell }(t)\bigr)V^{n}. \end{aligned}

We can apply the first boundary condition in (12) by updating the first and the last rows ($$A_{1}$$ and $$A_{M}$$) of the matrix $$A(\rho _{\ell })$$ which yields

\begin{aligned} A_{1} = (-1, 1, 0, \dots, 0) \quad\mathrm{and} \quad A_{M} = (0, \dots,0,1,-1). \end{aligned}

The second Neumann boundary condition can be applied by changing the last entry of the vector $$BV^{n}$$ to zero. Starting at $$t=T$$ with the known terminal condition $$V(T)$$ as the principal amount, at each time step, we solve the system of linear equations (2).

Note that we need to update the value of the grid point $$r(i)$$ every three months as the coupon frequency is quarterly by adding coupon $$f^{n}$$ based on the coupon rate given by (11). To calculate the value at the intermediate holding period, we calculate the fair value of the floater and add it to the accreted coupons.

We have implemented the parametric model reduction approach for the floater example, as discussed in Sect. 4 using both the classical and the adaptive greedy sampling algorithms.

At each iteration of the classical greedy sampling approach, the algorithm constructs a reduced basis via Algorithm 1. We have specified a maximum number of 40 pre-defined candidates to construct a set $$\hat{\mathcal{P}}$$ and a maximum number of iterations $$I_{\mathrm{max}}=10$$. The progression of the maximal and average residuals with each iteration of the greedy algorithm is presented in Fig. 4. It is observed that the maximal residual error typically decreases in the process and the proposed greedy algorithm efficiently locates the optimal parameter groups and constructs the desired reduced basis Q. Furthermore, we tested the effect of change in the cardinalities of the set $$\hat{\mathcal{P}}$$. The proposed algorithm is applied with three different cardinalities of $$\hat{\mathcal{P}}$$: $$|\hat{\mathcal{P}_{1}}| = 20$$, $$|\hat{\mathcal{P}_{2}}|= 30$$, $$|\hat{\mathcal{P}_{3}}| = 40$$. Note that we have constructed $$\hat{\mathcal{P}}$$ by randomly selecting the parameter groups from the parameter space $$\mathcal{P}$$. Figure 5 shows the plot of the maximal residual against the number of iterations for three different cardinalities.

It is evident that with an increasing number of candidates, the maximal residual error decreases and the decrease is sufficient for a cardinality 20 of $$\hat{\mathcal{P}}$$. Figure 6 illustrates the relative error between the full model and the reduced-order model for two different parameter groups. During our tests we noticed that there are some parameter groups (e.g., $$\rho _{238}$$) for which the reduced model gives unsatisfactory results.

One can observe that the reduced model for the parameter group $$\rho _{238}$$ (dashed line) shows inferior results as compared to the reduced model for the parameter group $$\rho _{786}$$ (solid line). Even an increase in the reduced dimension d does not improve the quality of the result substantially.

This reveals that the selection of trial candidates by random sampling may neglect parameter groups corresponding to the significant error.

To overcome this drawback, we have also implemented the adaptive greedy sampling approach for the floater example. At each greedy iteration, the algorithm locates $$c_{k} = 10$$ parameter groups adaptively using the surrogate modeling technique, as described in Sect. 4.2. We have fixed the maximum number of elements within the parameter set to 40. Furthermore, the adaptively obtained parameter set $$\hat{\mathcal{P}}$$ has been used to locate the optimal parameter group $$\rho _{I}$$. These steps are repeated for a maximum of $$I_{\mathrm{max}} = 10$$ iterations or until convergence. The algorithm has been initiated by selecting $$c_{0} = 20$$ random parameter groups.

The optimal parameter group updates the snapshot matrix, and consecutively the algorithm generates a new reduced basis at each greedy iteration. Figure 7 shows the evolution of maximum and average residual errors with each iteration of the adaptive greedy algorithm.

The residual error decreases with each incrementing iteration and hence the algorithm succeeded in locating the optimal parameter group efficiently. To monitor the convergence of the adaptive greedy algorithm, we have designed an error model ē for the relative error e as a function of the residual error ε. Figure 8 shows the designed error model compared to the available error set $$E_{p}$$ for four different greedy iterations. The error plot exhibits a strong correlation between the relative error and the residual error. The results indicate that a linear error model is satisfactory to capture the overall behavior of the exact error as a function of the residual error. We have used the reduced basis obtained from the adaptive greedy sampling procedure to design the reduced model. Figure 9 presents the relative error plot for the parameter groups $$\rho _{238}$$, and $$\rho _{786}$$. We see that the adaptive greedy approach gives better results than the classical greedy method. With a reduced dimension of $$d=6$$, we obtained an excellent result as the relative error is less than 10−3.

### Computational cost

In the case of the classical greedy sampling approach, the algorithm solves c reduced models and one full model at each greedy iteration. It also computes a truncated SVD of the updated snapshot matrix with each proceeding iteration. Let $$t_{\mathrm{RM}}$$ be the time required to solve one reduced model, $$t_{\mathrm{FM}}$$ the computational time required for one full model, and $$t_{\mathrm{SVD}}$$ the time required to obtain a truncated SVD of the snapshot matrix. The total computational time $$T_{Q}^{CG}$$ required to obtain the reduced basis after i steps of the greedy algorithm in the case of the classical greedy sampling approach can be given as

\begin{aligned} T_{Q}^{CG} \approx \bigl[ c \times t_{\mathrm{RM}} + (t_{\mathrm{FM}} + t_{\mathrm{SVD}}) \bigr] \times i. \end{aligned}

Similarly, in the case of the adaptive greedy approach, the total computational time $$T_{Q}^{AG}$$ can be given as

\begin{aligned} T_{Q}^{AG} \approx \bigl[ c_{0}\times t_{\mathrm{RM}} + \underbrace{k\bigl(C_{k} \times t_{\mathrm{RM}} + t_{\mathrm{SM}} + t_{\mathrm{SM}}^{ev}\bigr)}_{t_{ \rho _{I}}} + t_{\mathrm{FM}} + t_{\mathrm{SVD}} + 2t_{\mathrm{RM}}^{af,bf} + t_{\mathrm{EM}} \bigr] \times i, \end{aligned}

where $$t_{\mathrm{SM}}$$ and $$t_{\mathrm{SM}}^{ev}$$ denote the computational times required to build and evaluate a surrogate model for the entire parameter space respectively. $$t_{\mathrm{EM}}$$ is the time required to build an error model. The term $$2t_{\mathrm{RM}}^{\mathrm{aft},\mathrm{bef}}$$ shows the computational time needed to solve the reduced model after and before updating the reduced basis.

Table 4 compares the computating times required to generate the reduced basis Q for different sets of $$\hat{\mathcal{P}}$$ in case of the classical greedy sampling approach with that needed to obtain the reduced basis in case of the adaptive greedy sampling approach. The computing time $$t_{\rho _{I}}$$ required to locate the optimal parameter group by constructing a surrogate model for one greedy iteration is approximately 8 seconds. $$t_{\rho _{I}}$$ is nothing but the time required to complete a while loop outlined in Algorithm 4 for a single greedy iteration. Thus, the total time contributed to generate the reduced basis via surrogate modeling is $$I_{\mathrm{max}} \times t_{\rho _{I}} = 78.56$$s, considering the adaptive greedy algorithm runs for $$I_{\mathrm{max}}$$ iterations. Figure 8 shows that we can truncate the algorithm after 4 or 5 iterations as the residual error falls below 10−4. The computing times required to simulate reduced models and full models are presented in Table 5. The time required to solve the complete system with a parameter space of $$10{,}000 \times m$$ for both a full and reduced model is given in the total time column.

We can see that the evaluation time required for the reduced model is at least 18–20 times less than that of the full model. However, there is a slight increase in total time due to the addition of the reduction time $$T_{Q}$$. One can also observe that with an increase in the dimension d of the reduced model, the evaluation time increases significantly. The reduced model with the classical greedy sampling approach is at least 10–11 times faster than that for the full model. The time required to simulate the reduced model with the adaptive greedy sampling approach is a bit higher due to the time invested in building surrogate and error models. Despite of that, the reduced model is at least 8–9 times faster than the full model. The computational time presented in both tables considers that the greedy algorithms run for the maximal number of iterations $$I_{\mathrm{max}}$$. However, we can truncate the algorithms after 4 or 5 iterations, i.e., we can practically achieve even more speedup than discussed here.

### Floater performance scenario values

To design a key information document, we need the values of the floater at different spot rates. The spot rate $$r_{sp}$$ is the yield rate at the first tenor point from the simulated yield curve. The value of a floater at the spot rate $$r_{sp}$$ is nothing but the value at that short rate $$r=r_{sp}$$. For $$10{,}000$$ simulated yield curves, we get $$10{,}000$$ different spot rates and the corresponding values for the floater. At the recommended holding period, we determine a VaR equivalent volatility to calculate a market risk indicator. For the floater example, the calculated VEV = 0.3989%, which gives the market risk indicator as 1.

The 10.000 values are further used to calculate three different scenarios: (i) favorable scenario, (ii) moderate scenario, (iii) unfavorable scenario which are the values at 90th percentile, 50th percentile and 10th percentile of $$10{,}000$$ values, respectively. Table 6 shows the floater performance scenario results obtained using the reduced-order model and the commercially available UnRisk software for the comparison. Figure 10 shows the distribution of fair values of the floater, plus the coupons in the respective path obtained so far, after five years and ten years. At five years, the main contribution to price variability arises from discounting the future cash flows (between five and ten years) with a discount factor arising from log-normally distributed rates. Therefore we have (in our example) a left-skewed distribution at five years, and a right-skewed distribution at ten years.

## Conclusion

This paper presents a parametric model reduction approach for a discretized convection-diffusion-reaction PDE that arises in the analysis of financial risk. The parameter space with time-dependent parameters is generated via the calibration of financial models based on market structures. A finite difference method has been implemented to solve this PDE. The selection of parameters to obtain the reduced basis is of utmost importance. We have established a greedy approach for parameter sampling, and we noticed that there are some parameter groups for which the classical greedy sampling approach gave unsatisfactory results. To overcome this drawback, we have applied an adaptive greedy sampling method using a surrogate model for the error estimator that is constructed for the entire parameter space and further used to locate the parameters which most likely maximize the error estimator. The surrogate model is built using the principal component regression technique. We tested the designed algorithms for a numerical example of a floater with cap and floor solved using the Hull–White model. The results indicate the computational advantage of the parametric model reduction technique for the short-rate models. A reduced model of dimension $$d=6$$ was enough to reach an accuracy of 0.01%. The reduced model was at least 10–12 times faster than the full model. The developed model order reduction approach shows potential applications in the historical or Monte Carlo value at risk calculations as well, where a large number of simulations need to be performed for the underlying instrument.

## Outlook

In this paper, we solved the numerical example of the floater with cap and floor based on the Hull–White model, and the obtained results show that the model order reduction approach works well. However, there are more complicated financial instruments under the Category 3 PRIIPs, e.g., instruments with Bermudan or American callabilities, callable/puttable steepener, target redemption notes, snowball. The coupons for the steepener instrument depend on the difference between two rates, while in the case of snowball, a new coupon depends on the old coupon. Thus, to solve such complicated instruments, we need to consider multi-factor models like the two-factor Hull–White model. These challenges demand to explore the developed model order reduction algorithms for more complicated financial models. In the future, we will implement the developed approaches for financial instruments like steepener, snowball.

## Availability of data and materials

The interest rate data that support the findings of this study are available from MathConsult within their UnRisk Omega datasets but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available.

## Abbreviations

VaR:

Value-at-Risk

CVA:

DVA:

PRIIPs:

packaged retail and insurance-based investment products

EU:

European Union

KID:

Key information document

RHP:

recommended holding period

PDE:

partial differential equation

POD:

proper orthogonal decomposition

MOR:

model order reduction

PCR:

principal component regression

DAX:

Deutscher Aktienindex

VEV:

VaR equivalent volatility

DF:

discount factor

MRM:

market risk indicator

SVD:

Singular value decomposition

PCA:

principal component analysis

CG:

classical greedy

AG:

## References

1. Aichinger M, Binder A. A workout in computational finance. 1st ed. West Sussex: Wiley; 2013.

2. Jorion P. Financial risk manager handbook. 2nd ed. New York: Wiley; 2007.

3. Wilmott P, Howison S, Dewynne J. The mathematics of financial derivatives: a student introduction. 1st ed. London: Cambridge University Press; 2002.

4. Jorion P. Counterparty credit risk and credit value adjustment: a continuing challenge for global financial markets. 2nd ed. New York: Wiley; 2012.

5. European Commission. Commission delegated regulation (EU) 1286/2014. Off J EU. 2014;1:1–23.

6. European Commission. Commission delegated regulation (EU) 2017/653. Off J EU. 2017;1:1–52.

7. Heston S. A closed-form solution for options with stochastic volatility with applications to bond and currency options. Rev Financ Stud. 1993;6(2):327–43.

8. Longstaff F, Valuing SE. American options by simulation: a simple least-squares approach. Rev Financ Stud. 2001;14(1):113–47.

9. Rebonato R, McKay K, White R. The SABR/LIBOR market model: pricing, calibration and hedging for complex interest-rate derivatives. 1st ed. New York: Wiley; 2009.

10. Briani M, Caramellino L, Zanette A. A hybrid tree/finite-difference approach for Heston–Hull–White-type models. J Comput Finance. 2017;21(3).

11. Falcó A, Navarro L, Cendón CV. Finite difference methods for Hull–White pricing of interest rate derivatives with dynamical consistent curves. Soc Sci Res Net Elec J. 2014.

12. Haentjens T, In’t Hout K. Direction implicit finite difference schemes for the Heston–Hull–White partial differential equation. J Comput Finance. 2012;16:83–110.

13. Brigo D, Mercurio F. Interest rate models—theory and practice. 2nd ed. Berlin: Springer; 2006.

14. Hull J, White A. Numerical procedures for implementing term structure models I: single-factor models. J Deriv. 1994;2:7–16.

15. Black F, Karasinski P. Bond and option pricing when short rates are lognormal. Financ Anal J. 1991;47(4):52–9.

16. Hull J, White A. The general Hull–White model and supercalibration. Financ Anal J. 2001;57(6):34–43.

17. Berkooz G, Holmes P, Lumley JL. The proper orthogonal decomposition in the analysis of turbulent flows. Annu Rev Fluid Mech. 1993;25(1):539–75.

18. Chatterjee A. An introduction to the proper orthogonal decomposition. Curr Sci. 2000;78:808–17.

19. Graham MD, Keverekidis IG. Alternative approaches to the Karhunen–Loéve decomposition for model reduction and data analysis. Comput Chem Eng. 1996;20(5):495–506.

20. Jolliffe I. Principal component analysis. 1st ed. Berlin: Springer; 2014.

21. Graham MD, Proper KIG. Orthogonal decomposition and its applications part I: theory. J Sound Vib. 2002;252(3):527–44.

22. Rathinam M, Petzold LR. A new look at proper orthogonal decomposition. SIAM J Numer Anal. 2003;41(5):1893–925.

23. Vidal A, Sakrison D. On the optimality of the Karhunen–Loéve expansion (corresp.). IEEE Trans Inf Theory. 1969;15(2):319–21.

24. Sirovich L. Turbulence and the dynamics of coherent structures. Part I: coherent structures. Q Appl Math. 1987;45(3):561–71.

25. Lucia JD, Beran PS, Silva WA. Reduced order modeling: new approaches for computational physics. Prog Aerosp Sci. 2004;40(1–2):51–117.

26. Amsallem D, Zahr M, Choi Y. Design optimization using hyper-reduced-order models. Struct Multidiscip Optim. 2015;51(4):919–40.

27. Grepl M, Patera A. A posteriori error bounds for reduced-basis approximations of parametrized parabolic partial differential equations. M2AN, Math Model Numer Anal. 2005;39:157–81.

28. Paul-Dubois-Taine A, Amsallem D. An adaptive and efficient greedy procedure for the optimal training of parametric reduced-order models. Int J Numer Methods Eng. 2014;102:1262–92.

29. Bui-Thanh T, Willcox K, Model GO. Reduction for large-scale systems with high-dimensional parametric input space. SIAM J Sci Comput. 2008;30(6):3270–88.

30. Veroy K, Patera A. Certified real-time solution of the parametrized steady incompressible Navier–Stokes equations: rigorous reduced-basis a posteriori error bounds. Int J Numer Methods Fluids. 2005;47:773–88.

31. Lee H, Park Y, Lee S. Principal component regression by principal component selection. Commun Stat Appl Methods. 2015;22(2):173–80.

32. Hull J, Pricing WA. Interest-rate-derivative securities. Rev Financ Stud. 1990;3(4):573–92.

33. Fabozzi F. Valuation of fixed income securities and derivatives. 3rd ed. New York: Wiley; 1998.

34. Binder A, Jadhav O, Mehrmann V. Model order reduction for parametric high dimensional models in the analysis of financial risk [Technical Report]. 2020. arXiv.

35. European Supervisory Authorities [pdf]; 2017. Available from: https://esas-joint-committee.europa.eu/Publications/Technical

36. Hull J, White A. One-factor interest-rate models and the valuation of interest-rate derivative securities. J Financ Quant Anal. 1993;28(2):235–54.

37. Anderson D, Tannehill JC, Pletcher RH. Computational fluid mechanics and heat transfer. 3rd ed. London: CRC Press; 2013.

38. Darbellay GA. A note on the volatility term structure in short rate models. J Appl Math Mech. 1998;78:885–6.

39. Heinrich JC, Huyakorn PS, Zienkiewicz OC, Mitchell AR. An ‘upwind’ finite element scheme for two-dimensional convective transport equation. Int J Numer Methods Eng. 1977;11:131–43.

40. Sun G, Trueman CW. Efficient implementations of the Crank–Nicolson scheme for the finite-difference time-domain method. IEEE Trans Microw Theory Tech. 1977;11:131–43.

41. Golub GH, Singular RC. Value decomposition and least squares solutions. Numer Math. 1970;24:403–20.

42. Williams MO, Schmid PJ, Hybrid KJN. Reduced-order integration with proper orthogonal decomposition and dynamic mode decomposition. SIAM J Multiscale Model Simul. 2013;11(2):522–44.

43. Model PR. Reduction via proper orthogonal decomposition. In: Schilders W, van der Vorst H, Rommes J, editors. Model order reduction: theory, research aspects and applications. Berlin: Springer; 2008. p. 95–109.

44. James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning. 1st ed. New York: Springer; 2013.

45. Binder A. A clever handful is enough. Wilmott. 2007. 10–14.

46. MathConsult. Calibration of interest rate models. Linz: MathConsult GmbH; 2009.

47. Dudley RM. Unifrom central limit theorems. 1st ed. New York: Cambridge University Press; 2010.

48. Corhay A, Rad AT. Statistical properties of daily returns: evidence from European stock markets. J Bus Finance Account. 1994;21(2):271–82.

49. Albrecher H, Binder A, Lautscham V, Mayer P. Introduction to quantitative methods for financial markets. 1st ed. Berlin: Springer; 2010.

50. Mahnke R, Kaupuzs J, Lubashevsky I. Physics of stochastic processes: how randomness acts in time. 1st ed. Weinheim: Wiley-VCH Verlag GmbH and Co.; 2008.

51. Grigoriu M. Stochastic calculus: applications in science and engineering. 1st ed. Basel: Birkhäuser; 2002.

52. Vasicek O. An equilibrium characterization of the term structure. J Financ Econ. 1977;5(2):177–88.

53. Cox JC, Ingersoll Jr JE, Ross SA. A theory of the term structure of interest rates. Econometrica. 1985;53(2):385–407.

54. Shreve SE. Stochastic calculus and finance. 1st ed. New York: Springer; 2004.

55. Engl H. Calibration problems–An inverse problems view. Wilmott. 2007. 16–20.

## Acknowledgements

We thank our colleagues Michael Schwaiger and Diana Hufnagl from MathConsult who provided insight and expertise that greatly assisted the research.

## Funding

This project has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie Grant Agreement No. 765374. Open Access funding enabled and organized by Projekt DEAL.

## Author information

Authors

### Contributions

OJ and VM conceived the presented idea of the model order reduction. OJ developed the theoretical formalism and performed the numerical simulations. AB wrote the introduction and provided valuable insights for the financial part. VM encouraged OJ to investigate surrogate modeling and supervised the findings of the work. AB and VM supervised the project. All authors discussed the results and contributed to the final manuscript. All authors read and approved the final manuscript.

## Ethics declarations

Not applicable.

### Competing interests

The authors declare that they have no competing interests.

Not applicable.

## Appendices

### Appendix A: Derivation of the model

In this appendix, we derive the model for a risk-less investment with a continuous profit at a low risk-free rate. We follow the standard derivation in . It is based on the same principle of constructing a risk-free portfolio as the seminal theory of Black–Scholes.

Let $$B(t)$$ be the value of a bank account at time $$t \geq 0$$. We assume that the bank account evolves according to the differential equation

\begin{aligned} dB(t) = B(t) r(t) \,dt, \end{aligned}

with initial value $$B(0)=1$$, where $$r(t)$$ is the short-rate, i.e. the growth rate of the bank account B within a small time interval $$(t,t+dt)$$. This leads to the formula

\begin{aligned} B(t) = \operatorname{exp} \biggl( \int _{0}^{t} r(\tau ) \,d\tau \biggr). \end{aligned}

When working with interest-rate products, the study of the variability of interest rates is essential. Therefore, the short-rate is modelled as a stochastic process. Let S be the price of stock at the end of the nth trading day. The daily return from days n to $$n+1$$ is given by $$(S_{n+1}-S_{n})/S_{n}$$. In general, it is common to work with log returns, since the log return of k days can be easily computed by adding up the daily log returns,

\begin{aligned} \operatorname{log}(S_{k}/S_{0}) = \operatorname{log}(S_{1}/S_{0}) + \cdots + \operatorname{log}(S_{k}/S_{k-1}). \end{aligned}

Based on the assumption that the log returns over disjoint time intervals are stochastically independent and equally distributed, the central limit theorem  implies that the log returns are normally distributed . These properties  can be realized by a standard Brownian motion $$W(t)$$, i.e., a family of random variables $$W(t)$$, indexed by $$t\geq 0$$, with the properties

• $$W(0)=0$$.

• With probability 1, the function $$W(t)$$ is continuous in t.

• For $$t\geq 0$$, the increment $$W(t+\tau )-W(\tau )$$ is normally distributed with mean 0 and variance t, i.e.,

\begin{aligned} W(t+\tau ) - W(\tau ) \sim N(0,t). \end{aligned}
• For all N and times $$t_{0} < t_{1} < \cdots < t_{N-1} < t_{N}$$, the increments $$W({t_{j}})-W({t_{j-1}})$$ are stochastically independent.

These properties lead to the Langevin equation, see , which is a stochastic differential equation

\begin{aligned} \frac{dX(t)}{dt} = q(t)X(t), \end{aligned}

with initial condition $$X(0) = X_{0}$$ with probability 1, where the stochastic parameter $$q(t)$$ is given, see , by

\begin{aligned} q(t) = f\bigl(r(t),t\bigr) + h\bigl(r(t),t\bigr)w (t). \end{aligned}

Here $${w} (t)$$ is a white noise process, and $$f,h$$ are given functions of the interest rate $$r(t)$$. This leads to

\begin{aligned} \frac{dX(t)}{dt} = f\bigl(r(t),t\bigr)X(t) + g\bigl(r(t),t\bigr)X(t)w (t). \end{aligned}
(13)

The force $$w(t)=dW(t)/dt$$ is a fluctuating quantity with Gaussian distribution. Substituting $$dW(t) = w(t) \,dt$$ in (13), we get

\begin{aligned} dX(t) = f\bigl(r(t),t\bigr)X(t)\,dt + g\bigl(r(t),t\bigr)X(t)\,dW(t) \end{aligned}

and we obtain an SDE for the short-rate $$r(t)$$ via

\begin{aligned} dr(t) = f\bigl(t,r(t)\bigr)\,dt + g\bigl(t,r(t)\bigr)\,dW(t). \end{aligned}

Based on the Ito lemma , we can derive a general PDE for any underlying instrument depending on the short-rate. Consider a risk neutral portfolio $$\Pi (t)$$ that depends on the short-rate $$r(t)$$ and consists of two interest rate instruments $$V_{1}$$ and $$V_{2}$$ with different maturities $$T_{1}$$ and $$T_{2}$$, respectively. Suppose that there are $$\Delta = (\frac{\partial V_{1}}{\partial r(t)} / \frac{\partial V_{2}}{\partial r(t)} )$$ units of the instrument $$V_{2}$$. For an infinitesimal time interval, the value change of the portfolio is $$d\Pi (t) = \Delta \,dV_{2} - dV_{1}$$. To avoid the arbitrage, we consider a risk-free rate , which gives

\begin{aligned} d\Pi (t) = \Delta\, dV_{2} + (V_{1} - \Delta V_{2})r(t) \,dt - dV_{1}, \end{aligned}

and obtain the PDE

\begin{aligned} \begin{aligned} d\Pi (t) = {}&(V_{1} - \Delta V_{2})r(t)\,dt \\ &{}- \biggl[ \biggl(\frac{\partial V_{1}}{\partial r(t)}f\bigl(r(t),t\bigr) + \frac{\partial V_{1}}{\partial t} + \frac{1}{2} \frac{\partial ^{2} V_{1}}{\partial r^{2}_{t}} g^{2}\bigl(r(t),t\bigr) \biggr) \,dt + \frac{\partial V_{1}}{\partial r(t)} g\bigl(r(t),t\bigr)\,dW(t) \biggr] \\ &{}+ \Delta \biggl[ \biggl(\frac{\partial V_{2}}{\partial r(t)}f\bigl(r(t),t\bigr) + \frac{\partial V_{2}}{\partial t} + \frac{1}{2} \frac{\partial ^{2} V_{2}}{\partial r^{2}_{t}} g^{2} \bigl(r(t),t\bigr) \biggr) \,dt + \frac{\partial V_{2}}{\partial r(t)} g\bigl(r(t),t\bigr)\,dW(t) \biggr]. \end{aligned} \end{aligned}

Assuming a zero net investment requirement, i.e., $$d\Pi (t) = 0$$, we obtain

\begin{aligned} \begin{aligned} 0 ={}& \biggl[V_{1} - \biggl( \frac{\partial V_{1}}{\partial r(t)} /\frac{\partial V_{2}}{\partial r(t)} \biggr) V_{2} \biggr] r(t)\,dt \\ &{}- \biggl[ \biggl(\frac{\partial V_{1}}{\partial r(t)}f\bigl(r(t),t\bigr) + \frac{\partial V_{1}}{\partial t} + \frac{1}{2} \frac{\partial ^{2} V_{1}}{\partial r^{2}_{t}} g^{2}\bigl(r(t),t\bigr) \biggr) \,dt + \frac{\partial V_{1}}{\partial r(t)} g\bigl(r(t),t\bigr)\,dW(t) \biggr] \\ &{}+ \biggl(\frac{\partial V_{1}}{\partial r(t)} / \frac{\partial V_{2}}{\partial r(t)} \biggr) \biggl[ \biggl( \frac{\partial V_{2}}{\partial r(t)}f\bigl(r(t),t\bigr) + \frac{\partial V_{2}}{\partial t} + \frac{1}{2} \frac{\partial ^{2} V_{2}}{\partial r^{2}_{t}} g^{2}\bigl(r(t),t\bigr) \biggr) \,dt\\ &{} + \frac{\partial V_{2}}{\partial r(t)} g\bigl(r(t),t\bigr)\,dW(t) \biggr]. \end{aligned} \end{aligned}

Eliminating the stochastic term, we obtain

\begin{aligned} \begin{aligned} & \biggl[V_{1} - \biggl(\frac{\partial V_{1}}{\partial r(t)} / \frac{\partial V_{2}}{\partial r(t)} \biggr) V_{2} \biggr] r(t)\,dt \\ &\quad= \biggl[ \frac{\partial V_{1}}{\partial t} + \frac{1}{2} \frac{\partial ^{2} V_{1}}{\partial r^{2}_{t}} g^{2}\bigl(r(t),t\bigr) - \biggl(\frac{\partial V_{1}}{\partial r(t)} / \frac{\partial V_{2}}{\partial r(t)} \biggr) \biggl( \frac{\partial V_{2}}{\partial t} + \frac{1}{2} \frac{\partial ^{2} V_{2}}{\partial r^{2}_{t}} g^{2}\bigl(r(t),t\bigr) \biggr) \biggr]\,dt. \end{aligned} \end{aligned}

Rearranging the terms,we get

\begin{aligned} \frac{\frac{\partial V_{1}}{\partial t} + \frac{1}{2} \frac{\partial ^{2} V_{1}}{\partial r^{2}} g^{2}(r(t),t) - rV_{1}}{\frac{\partial V_{1}}{\partial r(t)}} = \frac{\frac{\partial V_{2}}{\partial t} + \frac{1}{2} \frac{\partial ^{2} V_{2}}{\partial r(t)^{2}} g^{2}(r(t),t) - r(t)V_{2}}{\frac{\partial V_{2}}{\partial r(t)}}=:u\bigl(r(t),t\bigr), \end{aligned}

and we obtain a PDE for the financial instrument $$V_{1}$$ depending on $$r(t)$$ given by

\begin{aligned} \frac{\partial V_{1}}{\partial t} + \frac{1}{2} g^{2}\bigl(r(t),t\bigr) \frac{\partial ^{2} V_{1}}{\partial r(t)^{2}} - u\bigl(r(t),t\bigr) \frac{\partial V_{1}}{\partial r(t)} - r(t)V_{1} = 0. \end{aligned}

There exist several well-known one-factor short-rate models, such as the Vasicek model , the Cox–Ingersoll–Ross model , or the Hull–White model [32, 36] which is an extension of the Vasicek model. The stochastic differential equation in the Hull–White model is given as

\begin{aligned} dr(t) = \bigl(a(t) - b(t)r(t)\bigr)\,dt + \sigma (t)\,dW(t), \end{aligned}

with time-dependent parameters $$a(t)$$, $$b(t)$$, and $$\sigma (t)$$. The term $$(a(t) - b(t)r(t))$$ is a drift term and $$a(t)$$ is known as deterministic drift. Setting $$g(r(t),t) = \sigma (t)$$, and $$-u(r(t),t) = (a(t) - b(t)r(t))$$.

The Hull–White model is calibrated based on today’s ($$t_{0}$$) market data for bond prices $$B(t,T)$$. To project the parameters $$b(t)$$ and $$\sigma (t)$$ into the future ($$t_{1}$$), one could use either $$B(t_{1},T+(t_{1} - t_{0}))$$ or $$B(t_{1},T)$$ . In the first case, the shape of parameters remains unchanged and does not cover seasonalities or expected changes like money market politics, while in the second case, we lose the information concentrated on the short end. When b and σ are constants, both approaches deliver the same parameters.

### Appendix B: Yield curve simulation

The PRIIP regulation demands to perform yield curve simulations for at least $$10{,}000$$ times , and that the data set must contain at least 2 years of daily interest rates for an underlying instrument, 4 years of weekly interest rates, or 5 years of monthly interest rates. We construct a data matrix $$D \in \mathbb{R}^{n\times m}$$ of the collected historical interest rates data, where each row of the matrix forms a yield curve, and the column represents the m tenor points, which are the different contract lengths of an underlying instrument. For example, if we have collected the daily interest rate data at $$m\approx 20$$ tenor points in time over the past five years, then since a year has approximately 260 working days, one obtains $$n \approx 1306$$ observation periods.

The regulations demand to take the natural logarithm of the ratio between the interest rate at each observation period and the interest rate at the preceding period. To ensure that we can form the natural logarithm, we need that all elements of the data matrix D are positive which is achieved by adding a correction term. With $$\mathcal{W}$$ the matrix of all ones, we set $$\bar{D} = D + \gamma \mathcal{W}$$, where γ is chosen so that all elements of matrix are positive. We are compensating γ shift at the bootstrapping stage by subtracting it from the simulated rates. Then we calculate the log returns over each period and store them into a new matrix $$\hat{D} = \hat{d}_{ij} \in \mathbb{R}^{n\times m}$$ as

\begin{aligned} \hat{d}_{ij} = \frac{\operatorname{ln}(\bar{d}_{ij})}{\operatorname{ln}(\bar{d}_{i-1,j})}. \end{aligned}

We calculate the arithmetic mean $$\mu _{j}$$ of each column of the matrix ,

\begin{aligned} \mu _{j} = \frac{1}{n} \sum_{i=1}^{n} \hat{d}_{ij}, \end{aligned}

subtract $$\mu _{j}$$ from each element of the corresponding jth column of and store the obtained results in a matrix $$\bar{\bar{D}}$$ with entries $$\bar{\bar{d}}_{ij} = \hat{d}_{ij} - \mu _{j}$$. We then compute the singular value decomposition (SVD) ,

\begin{aligned} \bar{\bar{D}} = \Phi \Sigma \Psi ^{T}, \end{aligned}

where Σ is a diagonal matrix having singular values $$\Sigma _{i}$$ arranged in descending order. The columns of Φ are the normalized left singular vectors and the columns of ΦΣ are known as principal components. The colums of Ψ are the right singular vectors or principal directions of the covariance matrix $$\mathcal{C} = \bar{\bar{D}}^{T}\bar{\bar{D}}$$.

The relative importance of the ith singular value is determined by the relative energy

\begin{aligned} \Xi _{i} = \frac{\Sigma _{i}}{\sum_{i=1}^{m} \Sigma _{i}}, \end{aligned}

where the total energy is given by $$\sum_{i=1}^{m} \Xi _{i} =1$$. We then select the p right singular vectors corresponding to the maximal p energies and construct a matrix

\begin{aligned} \bar{\Psi } = \begin{bmatrix} \psi _{11} & \cdots & \psi _{1p} \\ \vdots & \vdots & \vdots \\ \psi _{m1} & \cdots & \psi _{mp} \end{bmatrix} , \end{aligned}

project the matrix $$\bar{\bar{D}}$$ onto the matrix Φ via

\begin{aligned} M_{p} = \bar{\bar{D}} \cdot \bar{\Psi }\in \mathbb{R}^{n\times p}, \end{aligned}

and then calculate the matrix of returns $$M_{R} = M_{p}\bar{\Psi }^{T}\in \mathbb{R}^{n\times m}$$. The regulations suggest selecting the first three ($$p=3$$) right singular vectors. This process simplifies the statistical data $$\bar{\bar{D}}$$ and transforms m correlated tenor points into p uncorrelated principal components, reproducing the same data by simply reducing the total size of the model.

We then perform bootstrapping, where large numbers of small samples of the same size are drawn repeatedly from the original data set. According to the PRIIP regulations, for the yield curve simulation we have to perform a bootstrapping procedure for at least $$10{,}000$$ times. The standardized KID also has to include the recommended holding period, i.e., the period between the acquisition of an asset and its sale. The time step in the simulation of yield curves is typically one observation period. If H is the recommended holding period in days, e.g., $$H \approx 2600$$ days, then there are H observation periods in the recommended holding period.

For each such observation period, we select a random row from the matrix $$M_{R}$$, i.e., altogether H random rows, and construct a matrix $$[\mathfrak{\chi }_{ij}] \in \mathbb{R}^{H\times m}$$ from these selected rows. Then we sum over the selected rows of the columns corresponding to the tenor point j, i.e.,

\begin{aligned} \bar{\chi }_{j} = \sum_{i=1}^{h} \mathfrak{\chi }_{ij},\quad j = 1, \ldots, m. \end{aligned}

In this way, we obtain a row vector $$\bar{\chi }=[\bar{\chi }_{1} \bar{\chi }_{2} \cdots \bar{\chi }_{m}]\in \mathbb{R}^{1\times m}$$. The final simulated yield rate $$y_{j}$$ at tenor point j is then the rate $$\bar{d}_{nj}$$ of the last observation period at the corresponding tenor point j, multiplied by the exponential of $$\bar{\chi }_{j}$$, adjusted for any shift γ used to ensure positive values for all tenor points, and adjusted for the forward rate so that the expected mean matches current expectations.

The forward rate between time points $$t_{k}$$ and $$t_{\ell }$$ starting from a time point $$t_{0}$$ is given as

\begin{aligned} r_{k,\ell } = \frac{R(t_{0},t_{\ell })(t_{\ell } - t_{0}) - R(t_{0},t_{k})(t_{k} - t_{0})}{t_{\ell }- t_{k}}, \end{aligned}

where $$t_{k}$$ and $$t_{\ell }$$ are measured in years and $$R(t_{0},t_{k})$$ and $$R(t_{0},t_{\ell })$$ are the interest rates available from the data matrix for the time periods $$(t_{0},t_{k})$$ and $$(t_{0},t_{\ell })$$, respectively. Thus, the final simulated yield curve between time points $$t_{k}$$ and $$t_{\ell }$$ is given by

\begin{aligned} y(t_{\ell }) = \bar{d}_{k,\ell } \operatorname{exp}({\bar{\chi }_{\ell }}) - \gamma + {r_{k,l}}, \quad\ell = 1,\ldots,m, \end{aligned}
(14)

and the simulated yield curve from the calculated simulated returns is given by

\begin{aligned} y = [y_{1} y_{2} \cdots y_{m}]. \end{aligned}

We then perform the bootstrapping procedure for at least $$s = 10{,}000$$ times and construct a simulated yield curve matrix

\begin{aligned} Y = \begin{bmatrix} y_{11} & \cdots & y_{1m} \\ \vdots & \vdots & \vdots \\ y_{s1} & \cdots & y_{sm} \end{bmatrix} \in \mathbb{R}^{s\times m}, \end{aligned}
(15)

which is then used to calibrate the parameter $$a(t)$$.

### Appendix C: Parameter calibration

For a zero-coupon bond $$B(t,T)$$ maturing at time T, based on the Hull–White model, one obtains a closed-form solution, see , as

\begin{aligned} B(t,T) = \operatorname{exp}\bigl\{ -r(t)\Gamma (t,T) - \Lambda (t,T)\bigr\} , \end{aligned}
(16)

where $$\kappa (t) = \int _{0}^{t} b(s)\,ds=bt$$, since b is assumed constant,

\begin{aligned} \begin{aligned} &\Gamma (t,T)= \int _{t}^{T} e^{-\kappa (t)} \,dt, \\ &\Lambda (t,T)= \int _{t}^{T} \biggl[ e^{\kappa (v)}a(v) \biggl( \int _{v}^{T} e^{-\kappa (z)}\,dz \biggr) - \frac{1}{2} e^{2\kappa (v)} \sigma ^{2} \biggl( \int _{v}^{T} e^{-\kappa (z)} \,dz \biggr)^{2} \biggr] \,dv. \end{aligned} \end{aligned}

Here we have again used that σ is constant.

To perform the calibration, we use as input data i) the initial value of $$a(0)$$ at $$t=0$$, ii) the zero-coupon bond prices, iii) the constant value of the volatility σ of the short-rate $$r(t)$$, and iv) the constant value b each for all maturities $$T_{m}$$, $$0 \leq T_{m} \leq T$$, where $$T_{m}$$ is the maturity at the mth tenor point. Then we compute $$\kappa (t)$$ from $$\frac{\partial }{\partial T} \kappa (T) = \frac{\partial }{\partial T} \int _{0}^{T} b(s) \,ds = b$$ and use

\begin{aligned} \frac{\partial }{\partial T} \Gamma (0,T)= e^{-\kappa (T)} \end{aligned}

to compute $$\Gamma (t)$$.

Then, for $$0 \leq T_{m} \leq T$$, we get

\begin{aligned} \begin{aligned} &\frac{\partial }{\partial T} \Lambda (0,T)= \int _{0}^{T} \biggl[e^{ \kappa (v)} a(v) e^{-\kappa (T)} - e^{2\kappa (v)}\sigma ^{2}e^{-\kappa (T)} \biggl( \int _{v}^{T} e^{- \kappa (z)} \,dz \biggr) \biggr] \,dv, \\ &e^{\kappa (T)} \frac{\partial }{\partial T} \Lambda (0,T) = \int _{0}^{T} \biggl[e^{\kappa (v)} a(v) - e^{2\kappa (v)} \sigma ^{2} \biggl( \int _{v}^{T} e^{-\kappa (z)} \,dz \biggr) \biggr] \,dv, \\ &\frac{\partial }{\partial T} \biggl[ e^{\kappa (T)} \frac{\partial }{\partial T} \Lambda (0,T) \biggr]= e^{\kappa (T)} a(T) - \int _{0}^{T} e^{2\kappa (v)}\sigma ^{2}e^{-\kappa (T)} \,dv, \\ &e^{\kappa (T)} \biggl[ e^{\kappa (T)} \frac{\partial }{\partial T} \Lambda (0,T) \biggr] = e^{2\kappa (T)} a(T) - \int _{0}^{T} e^{2 \kappa (v)}\sigma ^{2} \,dv, \\ &\frac{\partial }{\partial T} \biggl[ e^{\kappa (T)} \biggl[ e^{\kappa (T)} \frac{\partial }{\partial T} \Lambda (0,T) \biggr] \biggr] = \frac{\partial a(T)}{\partial T} e^{2\kappa (T)} + 2a(T) e^{2\kappa (T)} \frac{\partial }{\partial T}\kappa (T) - e^{2\kappa (T)}\sigma ^{2}, \\ &\frac{\partial }{\partial T} \biggl[ e^{\kappa (T)} \biggl[ e^{\kappa (T)} \frac{\partial }{\partial T} \Lambda (0,T) \biggr] \biggr] = \frac{\partial a(T)}{\partial T} e^{2\kappa (T)} + 2a(T)e^{2\kappa (T)}b(T) - e^{2\kappa (T)}\sigma ^{2}. \end{aligned} \end{aligned}

The simulated yield $$y(T)$$ at the tenor point T is then given by

\begin{aligned} y(T) = -\operatorname{ln}B(0,T), \end{aligned}
(17)

and from (17) we obtain $$\Lambda (0,T) = [y(T) -r(0)\Gamma ]$$. In this way, for $$a(t)$$ we obtain the ordinary differential equation (ODE)

\begin{aligned} \begin{aligned} \frac{\partial }{\partial t}a(t) e^{2\kappa (t)} + 2a(t) \cdot b \cdot e^{2\kappa (t)} - e^{2\kappa (t)}\sigma ^{2} = \frac{\partial }{\partial t} \biggl[ e^{\kappa (t)} \biggl[ e^{\kappa (t)} \frac{\partial }{\partial t}\bigl(y(t) - r(0)\Gamma (0,t)\bigr) \biggr] \biggr], \end{aligned} \end{aligned}

which we solve numerically with the given initial conditions. If we approximate $$a(t)$$ by a piecewise constant function with values $$a(i)$$ which change at the tenor point i, then we obtain a linear system

\begin{aligned} L\alpha = F, \end{aligned}

for the vector $$\alpha =[a(i)]$$, where L is lower triangular with non-zero diagonal elements. In  it is noted that the integral equation Λ is of the first kind with L2 kernel and a small perturbation (noise) in the market data that are used to obtain the yield curves leads to large changes in the model parameter $$a(t)$$. This means that the problem to compute $$a(t)$$ from the data is an ill-posed problem and for this reason we determine the vector α via Tikhonov regularization as

\begin{aligned} \alpha ^{\delta }_{\mu }= \mathop{\operatorname{argmin}} \bigl\Vert L \alpha - F^{\delta } \bigr\Vert ^{2} + \mu \Vert \alpha \Vert ^{2}, \end{aligned}
(18)

where $$\alpha ^{\delta }_{\mu }$$ is an approximation to α, μ is the regularization parameter, $$\delta = \| F - F^{\delta }\|$$ is the noise level, and $$\mu \|\alpha \|^{2}$$ is a regularization term. We then solve the optimization problem (18) to obtain an approximation to the parameter $$a(t)$$ via the commercial software UnRisk PRICING ENGINE for the parameter calibrations . By providing the simulated yield curve, the UnRisk pricing function returns the calibrated parameter $$a(t)$$ for that yield curve. Based on $$s=10{,}000$$ different simulated yield curves, we obtain s different piecewise constant parameters $$a_{\ell }(t)$$, which change their values $$\alpha _{\ell,i}$$ only at the m tenor points. We incorporate these in a matrix

\begin{aligned} {\mathcal{A}}= \begin{bmatrix} \alpha _{11} & \cdots & \alpha _{1m} \\ \vdots & \vdots & \vdots \\ \alpha _{s1} & \cdots & \alpha _{sm} \end{bmatrix}. \end{aligned}
(19)

### Appendix D: Numerical methods

The Hull–White model (3) is discretized by applying a finite difference method. As computational domain for the interest rate $$r(t)$$ we use an interval $$[r_{\mathrm{low}},r_{\mathrm{up}}]$$, according to  given by

\begin{aligned} r_{\mathrm{low}} = r(T) - 7\sigma \sqrt{T},\qquad r_{\mathrm{up}} = r(T) + 7\sigma \sqrt{T}, \end{aligned}

where $$r(T)$$ is the yield at the maturity T also known as a spot rate. We divide the spatial domain into M equidistant grid points $$\{r(1),r(2),\dots,r(M) \}$$, $$r(i)=r(i-1)+h$$ with spacial step size h, and the time interval $$[0,T]$$ in N points $$t_{0}=0,t_{1},\ldots, t_{N}=T$$, $$t_{n}=n \tau$$, with time step τ. Using the spatial discretization operator $$\mathcal{L}(n)$$ at time point n, we get a system of ODEs for the vector $$V=[V_{i}]=[V(r(i))]$$ of values at the spatial grid points

\begin{aligned} \frac{V(t) - V(t-\tau )}{\tau } = (1-\Theta ) \bigl(\mathcal{L}(t) V(t)\bigr) + \Theta \bigl( \mathcal{L}(t- \tau ) V(t - \tau )\bigr), \end{aligned}

which is given componentwise by

\begin{aligned} \begin{aligned} &\text{for } \bigl(a(n) - br(i)\bigr) > 0 \\ &\quad {\mathcal{L}}(n)V_{i}^{n} := \frac{1}{2}\sigma ^{2} \frac{V_{i+1}^{n} - 2V_{i}^{n} + V_{i-1}^{n}}{h}^{2} + \bigl(a(n) - br(i)\bigr) \frac{V_{i}^{n} - V_{i-1}^{n}}{h} - r(i)V_{i}^{n}, \\ &\text{for } \bigl(a(n) - br(i)\bigr) < 0 \\ &\quad {\mathcal{L}}(n)V_{i}^{n} := \frac{1}{2}\sigma ^{2} \frac{V_{i+1}^{n} - 2V_{i}^{n} + V_{i-1}^{n}}{h}^{2} + \bigl(a(n) - br(i)\bigr) \frac{V_{i+1}^{n} - V_{i}^{n}}{h} - r(i)V_{i}^{n}, \end{aligned} \end{aligned}

so that the Crank–Nicolson scheme in time gives a linear system

\begin{aligned} \underbrace{ \biggl( 1 - \frac{1}{2}\tau \mathcal{L}(t-\tau ) \biggr)}_{A( \rho _{\ell }(t)) \in \mathbb{R}^{M\times M}} V(t-\tau ) = \underbrace{ \biggl( 1 + \frac{1}{2} \tau \mathcal{L}(t) \biggr)}_{B( \rho _{\ell }(t)) \in \mathbb{R}^{M\times M}} V(t), \end{aligned}

where the matrices $$A(\rho _{\ell }(t))$$, and $$B(\rho _{\ell }(t))$$ depend on $$\rho _{\ell }(t) = \{a(t),b,\sigma \}$$ for the th group of these parameters and are given by

\begin{aligned} A\bigl(\rho _{\ell }(t)\bigr) = I - \frac{\sigma ^{2}\tau }{2h^{2}}J - \frac{\tau }{2h}\bigl(H^{+}G + H^{-}G^{T}\bigr) + R_{o}, \end{aligned}

and

\begin{aligned} B\bigl(\rho _{\ell }(t)\bigr) = I + \frac{\sigma ^{2}\tau }{2h^{2}}J + \frac{\tau }{2h}\bigl(H^{+}G + H^{-}G^{T}\bigr) - R_{o}, \end{aligned}

where $$R_{o} = \operatorname{diag}(r(1),\ldots,r(M))$$, $$H^{+}= \operatorname{diag} (\operatorname{max}(a(n) - br(1)),\ldots, \operatorname{max}(a(n) - br(M)) )$$, $$H^{-}= \operatorname{diag} (\operatorname{min}(a(n) - br(1)),\ldots, \operatorname{min}(a(n) - br(M)) )$$, and

\begin{aligned} J = \begin{bmatrix} -2 & 1 & 0 & \cdots & 0 \\ 1 & -2 & 1 & \ddots & \vdots \\ 0 & 1 & \ddots & \ddots & 0 \\ \vdots &\ddots & \ddots & \ddots & 1 \\ 0 & \cdots & 0 & 1 & -2 \end{bmatrix},\qquad G = \begin{bmatrix} 1 & 0 & 0 & \cdots & 0 \\ -1 & 1 & 0 & \ddots & \vdots \\ 0 & -1 & \ddots & \ddots & 0 \\ \vdots &\ddots & \ddots & \ddots & 0 \\ 0 & \cdots & 0 & -1 & 1 \end{bmatrix}. \end{aligned}

This discretization is a parametric high dimensional model of the form

\begin{aligned} A\bigl(\rho _{\ell }(t)\bigr)V^{n-1} = B\bigl(\rho _{\ell }(t)\bigr)V^{n}, \end{aligned}

with given terminal vector $$V^{T}$$, and matrices $$A(\rho _{\ell }) \in \mathbb{R}^{M\times M}$$, and $$B(\rho _{\ell }) \in \mathbb{R}^{M\times M}$$. We solve this model by propagating backward in time. Here again $$\ell = 1,\dots,s=10{,} 000$$, m is the total number of tenor points, and we need to solve this system at each time step n with an appropriate boundary condition and a known terminal value for the underlying instrument. Altogether we have a parameter space $$\mathcal{P}$$ of size $$10{,} 000 \times m$$ to which we now apply model reduction.

## Rights and permissions 