Pharmaceutical portfolio optimization under cost uncertainty via chance constrained-type method

*Correspondence: mahboubeh.farid@Captario.com 1Captario AB, Gothenburg, Sweden Full list of author information is available at the end of the article Abstract Project selection for a portfolio is a pivotal decision in the pharmaceutical industry. In this paper, we study a portfolio optimization problem for pharmaceutical companies considering the uncertainty of the cost of each phase of drug development and the specific value of the annual budget. The presented optimization model is suitable to make investment decisions for multi-phase drug development projects and a stochastic approach is applied to handle the uncertainty in the model. Post-optimality analysis for annual budget is studied. An illustrative example is included to demonstrate the presented approach.


Introduction
Selecting projects for a portfolio is one of the major decision-making problems in portfolio management [1] and decision-making is a core function of drug development in the pharmaceutical industry. Drug development in a pharmaceutical pipeline is a well-defined and long-term process which must proceed through several crucial phases and is connected with timing. Drug development can be described as a sequence of phases ( Fig. 1).
In project management, decisions must be made at key points throughout the development process, and whether to continue or terminate a particular project is a pivotal decision. To state the obvious, the cost of development will not be refunded if the project fails to reach market, and all phases must be sequentially accomplished before revenue is realized. Therefore, allocating funds to projects with greater potential revenue and higher probabilities of success in each phase is a major priority for portfolio managers. According to a study from 2013, the cost of new medicine was estimated at $ 2.5 billion [2], and on average only 5%-10% of potential drugs will successfully pass through all phases of drug development [3]. Given that, any strategy for successfully achieving portfolio optimization in the pharmaceutical industry should consider long-term survival the fundamental driving force [4]. There are, however, other factors to consider when selecting a drug can- Figure 1 Schematic illustration of the phases of a drug development process didate for a portfolio, for example success risk, expected revenue, development costs, and budget allocation.
Since the execution of a drug development project is exceptionally costly, the company may not have sufficient R&D resources to pursue all available projects. Consequently, a selection needs to be made of which projects to fund. Optimization techniques have been widely used to improve decision-making for portfolios and an appropriate optimization model can efficiently improve the procedure of selecting projects. Several mathematical models have been developed in the form of linear, nonlinear, multi-objective optimization, etc. for solving portfolio optimization problems [5,6]. A binary multi-objective linear programming was presented by Ghasemzadeh et al. [7], and different approaches have been presented which take into an account the uncertainty of data [8][9][10].
A key to achieving successful portfolio optimization is to capture the uncertainty of important parameters and also the factors that affect them. As such, mathematical models built on deterministic optimization are not suitable approaches for portfolio optimization since they cannot account for uncertainty. An obvious source of uncertainty is the cost of running the projects through the development process. To account for this uncertainty, stochastic programming may be applied as an approach for modeling optimization problems that involve uncertainty with random parameters. The stochastic framework provides a roadmap for decisions when information about the future may be subjected to significant uncertainty. Stochastic programming is among the most difficult problems in the field of mathematical modeling and finding an exact solution to these problems is quite challenging. Chance constrained (CC) programming was presented by Charnes et al. [11]. It is a quintessential technique that can handle the uncertainty inherent in portfolio optimization, in which the constraints in the model cannot be violated with more than a predefined level of probability.
In this study, our aim is to develop an approach in the framework of chance constrained programming that enables us to select projects for a portfolio that maximizes expected revenue and addresses uncertainty while remaining computationally efficient. We will formulate the problem as an abstract chance constrained program and present an approach to convert it into a form that can be tractably handled by modern optimization solvers. Furthermore, we will employ parametric programming for our mathematical modeling to investigate how the optimal value of the expected revenue and optimal project selection depend on the annual budget.
The rest of the paper is organized as follows: Sect. 2 presents the background of mathematical optimization and in Sect. 3, we outline the structure of the data underlying the optimization problem. In subsequent sections we introduce the mathematical formulation of the problem and the approaches employed for solving the problem. An illustrative example of the proposed approach is included, and a discussion section concludes the paper.

Mathematical background
The general mathematical model of optimization problem is: where f is the objective function, G is the inequality constraints and x is the vector of decision variables.
Stochastic programming has been developed to model optimization problems where coefficients of decision variables, in the objective function and constraints, are under uncertainty. To handle uncertainty in the optimization problem, the stochastic information is combined with the model of the optimization problem (1). The general mathematical model of the optimization problem with uncertain coefficients can be written as follows: where ζ is the vector of coefficients' uncertainties.
Several approaches have been proposed for solving (2). In 1959, Charnes and Cooper [11] proposed chance constrained programming and it has received a great deal of attention for solving optimization problems with uncertain coefficients in constraints G. In chance constrained programming, the inequality of constraints G in (2) must be satisfies with the predefined confidence level and these constraints are called chance constraints. Furthermore, an optimization problem which employs chance constrained programming is called a chance constrained optimization problem. In the following, we describe the general mathematical model of a chance constrained optimization problem: where α ∈ (0, 1). The model (3) searches to find the optimal decision vector that maximizes the objective function f subject to the constraints being satisfied with probability at least (1α).
In chance constrained programming, inequalities G i (x) (i = 1, . . . , m) can be handled in two ways: individually and jointly. In individual chance constrained programming, it would be guaranteed that each constraint is satisfied with its own confidence level. In joint chance constrained programming, all constraints will be satisfied with a certain confidence level. Joint chance constrained programming is more robust than the individual version but it is significantly more difficult to solve as the problem may become non-convex (see [12][13][14][15][16]). Generally, chance constrained programming is powerful when modeling optimization problems under uncertainty but solving the problems can be challenging in real applications. Chance constrained programming is widely used in practice, including for water management [17], chemical processes [18], energy management [19], and supply chain planning [20].
If we assume the sample space is finite and that G is bounded, we can change (3) into an integer program [21]. By employing binary control variables z k , (k = 1, 2, . . . , N ) and "big-M" coefficients, the joint chance constrained program can be reformulated as a set of linear constraints as follows: where M > 0 is a large constant and α ∈ (0, 1). There are N scenarios and π k , (k = 1, 2, . . . , N ) is the probability of each scenario. When z k = 0, the related constraint must be satisfied, otherwise when z k = 1, the constraint can be violated and big value of M ensures that it would be an inactive constraint. The constraints in (4) are called Big-M constraints. Even when we do not have full knowledge of the sample space, (4) can still be a useful proxy for (3). When combined with Monte Carlo sampling, this is known as the sample average approximation (SAA) and its statistical properties are well studied [22,23].
For simplicity, we call (4), CC-MBP. The CC-MBP (4) with binary decision variables is an integer programming problem that can be solved by integer programming solvers, as will be further described in later parts of this article.

Input data structure
The drug development process was described schematically in Fig. 1. There are many parameters that impact the outcome of such a process. For the specific decision problem at hand, i.e. the selection of projects to form an optimal portfolio, the following parameters are of particular importance: • The cost of a project in each phase, • The expected revenue of a project, • The probability that a project proceeds to the next phase of development. We will for the optimization problem presume that some assumptions have been made regarding these parameters for each of the available projects. We assume that a cost of running a phase is distributed over the duration of that phase, and let G * ij denote the cost associated with running project i in year j. Further, let R * i denote the revenue from project i if it is successfully launched to the market.
In the problem described in this paper, the values of G * ij and R * i are outcomes from a Monte Carlo simulation. The simulations are made to represent the drug development process as illustrated in Fig. 1, and they follow the framework for modelling of drug development processes outlined by [24]. The simulations account for the fact that many drug projects are terminated before completion, as illustrated in Fig. 1 by the possibility that the project is stopped after each phase. The terminated projects will consequently induce zero costs for the phases after termination, and they will also in these cases generate zero revenue.
Following an often-used terminology in the optimization literature, we will refer to the iterations of the Monte Carlo simulations as scenarios. Assume that simulations have generated K scenarios. The actually realized cost of project i in year j in scenario k is then where τ is the time of termination of the project. Similarly, the revenue is where L is the time at which the project might be launched to the market and start generating revenue. From the simulated scenarios we then estimate the expected revenue, The values of C ijk andR i are key inputs to the optimization problem to be described in the subsequent sections. A numerical illustration of the optimization problem, using these variables as input, is given in a later section.

Mathematical modeling of the problem
In this section we will present the mathematical model of the project selection problem by adopting the general mathematical model of the optimization problem under uncertainty (2). The aim is here to maximize the expected revenue,R i , subject to the constraint that the cost under uncertainty, C ij , may not exceed a defined annual budget, B j . Let x i denote the binary decision variables, defined as We then have the problem as: The objective function vector, coefficient matrix and right-hand side values are defined as R i , C ij and B j , respectively. Since the cost of each project in each year is a random variable and we are interested to ensures that the probability of being within the annual budget B j is above a defined confidence level, we employ chance constrained programming. We replace the constraint in (5) with a chance constraint and present the mathematical model as follows: where α ∈ (0, 1) and will be decided by decision makers. The chance constraint in (6) guarantees that the probability of exceeding the annual budget should be less than a predefined risk level α.

The big-M constraint
We next adapt the general form of the CC-MBP (4) to our project selection problem, as defined in (6). In our problem, we sample the cost K times in the Monte Carlo simulation, and since K is typically a large number, this results in a large number of stochastic scenarios for cost. All scenarios from the simulation have the same probability, p k = 1/K .
Let z k be a binary variable for each scenario controlling the enforcement of constraints of scenario k. If z k = 0, the budgetary constraint is enforced, otherwise it can be violated.
Model (6) can then be reformulated with a set of linear constraints as follows: where α ∈ (0, 1), K k=1 p k = 1 and M is a sufficiently large number which handles violation when z k = 1. If the budgetary constraints are satisfied in a scenario, it is referred to as a responsive scenario, otherwise it is a non-responsive scenario and its constraints are ignored. Now our chance constrained optimization problem is converted into an integer programming problem that can be solved by integer programming solvers, as will be further described in later parts of this article.

Related approaches
We have in the sections above presented the CC-MBP for solving the optimization problem under uncertainty of coefficient at hand. While this is a computationally challenging approach, there are other simpler approaches that could also have been applied. Two such approaches are briefly described below, i.e. the Deterministic optimization and the Quantile chance constraint. In the Numerical Illustrations section, we will also give some numerical results comparing the performance of these two approaches with the proposed CC-MBP.

Deterministic optimization
In deterministic optimization, the stochastic nature of the cost would be ignored. Instead, the cost of a project would be represented with a fixed number. In our application, this would correspond to replacing the distribution of costs generated from the Monte Carlo simulation and instead use the expected cost of each project in each year. Let the mean, C ij = k C ijk /K , be the estimate of the expected cost, we have the following model for the deterministic optimization problem

Quantile chance constraint
With the Quantile chance constraint, Q-CC, the stochastic nature of the cost is handled by replacing the actual distributions with the (1α) percentile of the underlying distribution. LetC ij be the (1α) percentile, for each project in each year, from the empirical distribution of the K scenarios from the Monte Carlo simulation. We then have the following deterministic formulation of the optimization problem.
It can be noted that the simplification of the Deterministic optimization and Q-CC reduces the number of constraints of the problem. While there is (K · T + n + 1) constraints in the CC-MBP, the other approaches only involve (T + n) constraints. Despite the large number of constraints in CC-MBP, it is possible to solve the portfolio problem in a very reasonable amount of time.

Perturbation analysis of the annual budget
In the previous section, we developed the CC-MBP to deal with the presence of cost uncertainty and to enable us to solve the optimization problem at hand. In this section, we employ a parametric linear programming approach for investigating how the optimal portfolio selection changes as the annual budget limitation varies.
The parametric approach was developed in parallel with sensitivity analysis [25]. Both approaches use the information of post-optimality analysis and investigate how the optimal solution changes with uncertain parameters, but they are different. By employing the sensitivity analysis approach we can just investigate the current situation and what amount of changes in the current annual budget are allowed, such that the current optimal value of expected revenue as well as optimal project selection remain optimal. On the contrary, parametric programming can handle global uncertainty and by applying a parametric approach, the optimization problem is solved as a function of one parameter. Information can be provided about the dependence of the optimal solution on the uncertain parameter for the entire range from the minimum to the maximum value of expected revenue without the need to resolve the model. LetB be the same size of B and consider the parametrized linear program z * (λ) = maximize{E T x : Cx = B + λB, x ∈ {0, 1}}. We try to answer the question of over what range of λ an optimal project selection and expected revenue remain optimal for the current budget. By replacing the budget constraint in (7) as follows, we describe the parametric program for our model: where α ∈ (0, 1), K k=1 p k = 1, λ is a scalar andB j is the perturbation of the annual budget. First the mathematical problem (10) is solved based on the current budgetary constraint. Then parametric linear programming is employed to investigate the effect of budget variation, for more details, see [26]. This provides information about the critical region as a subset of all the parameters λ for which the optimal project selection set remains unchanged and the break point of λ which is the point when the optimally selected projects and expected revenue change.

Software
We have in the previous sections described the mathematical model of the optimization problem at hand (6), as well as the approach that we propose for solving the model (7), i.e. the CC-MBP. The final step in the analysis is to employ appropriate software to perform the actual numerical optimization procedure. Since our variables are restricted to be binary, we use the integer optimizer in the packages PULP and GLPK, and the code was written in Python 3 for solving models (7), (8) and (9). The calculations were performed on an Intel(R) i5 CPU, 2.5 GHz, with 8 GB of RAM.

Numerical illustration
In this section, we analyze the performance of the proposed approaches by applying each to a sample portfolio in the pharmaceutical industry. The sample portfolio contains a total of 15 independent candidate drugs in different stages of their development process. The number of simulation K is equal to 1000. At the time of the modelled decision-making, five projects are in the pre-clinical phase (PC), two projects are in phase 1 (PH1), six projects are in phase 2 (PH2), and two projects are in phase 3 (PH3). Each phase of drug development may take several years and the planning horizon, T, is set to 23 years. The timelines for the projects in the portfolio are illustrated in Fig. 2. There is a limitation for the annual budget and therefore all projects in the portfolio cannot be developed. Decision makers must decide which projects should be selected to maximize the expected revenue, while also considering the uncertainty in the related cost.
In Table 1, we summarize the assumptions regarding average costs to begin subsequent phases of each drug development project and expected revenue of each project. These costs are then distributed over the duration of the actual phase, to provide a cost per year   7  191  Project 2  1  7  27  101  7  235  Project 3  -5  27  127  7  14,205  Project 4  -1  47  63  7  8100  Project 5  --22  164  7  955  Project 6  --11  75  5  744  Project 7  ---27  7  6043  Project 8  1  7  27  123  6  179  Project 9  ---26  6  5677  Project 10  ---27  8  13,530  Project 11  --20  155  8  668  Project 12  --21  164  7  1036  Project 13  --27  132  9  1347  Project 14  --12  140  9  1285  Project 15  3  7  32  146  7  259 as described in the Input Data Structure Section. While the expected costs are summarized in Table 1, the underlying distributions are illustrated in Fig. 3. The costs presented in Table 1 and Fig. 3 are the costs conditional on the actual phase of development being conducted, not adjusted for the early termination of projects, hence being the basis for C * as defined for the input data. It may be noted from Table 1 that both the cost and the expected revenue vary substantially between the project. Each of the three approaches, CC-MBP, Deterministic optimization and Q-CC, were employed for portfolio optimization. The results are shown in Table 2, where the expected revenue of selected portfolios, the risk of exceeding the specified annual budget and computation time are reported. The risk level was set to α = 0.1 and the budgetary constraint was B j = 210 MUSD. The results in Table 2 shows that the Deterministic optimization leads to a portfolio selection with a high expected revenue, but with a very large risk of exceeding the budget. Q-CC is conservative in that it has a much lower risk than the nominal 10%, and consequently selects a portfolio with lower expected revenue. The CC-MBP is successful in selected an optimal portfolio close to the specified nominal risk level. It may also be noted that despite the large number of constraints in CC-MBP, the applied approach is able to solve  Table 3, we report the result of employing the CC-MBP with different risk levels and the annual budget set to 220 MUSD.
Corresponding results are also visualized in Fig. 4. From these results it can be seen that the expected revenue changes significantly between α = 0.0 and α = 0.01, and after that the expected revenue is monotonically increasing. This type of data allows the decision maker to strike a balance between higher expected revenue and an increased risk of exceeding the allocated budget. Table 4 shows the numerical result of post-optimality analysis for a ranges of annual budgetary restrictions with α = 0.2 risk. The effect of budget perturbation on expected revenue and optimal project selection is studied globally. In this study, the budget of all    years is perturbed equivalently and perturbation vectorB is considered as vector of ones, − → 1 . The result of λ is converted into budget in which the decision maker is able to find break points on the annual budget. The optimal selected projects, and corresponding expected revenue of each break point of annual budgetary constraint, are listed in the Table 4. Critical regions can be found by the difference of two successive budget break points im-plying that increasing annual budget in these regions does not change the optimal selected projects and maximum expected revenue. As an example, Table 4 demonstrates that by allocating 200 million dollars each year, all projects except project 11 can be in the portfolio, yielding an expected revenue of 53,787 MUSD. An increase of budget to 210 MUSD is required to change the selected portfolio, hence increasing the expected revenue.

Discussion
The aim of this paper was to develop a model for stochastic portfolio optimization in the pharmaceutical industry. We have developed a chance constraint with Big-M coefficients and binary decision variables, CC-MBP, accounting for the uncertainty in the cost of drug development phases which leads to maximizing the expected revenue of the portfolio, while satisfying an annual budgetary limitation. A parametric programming technique was applied to systematically study the effect of annual budget variation on the optimal solution. The presented model can be a very valuable tool for decision makers, where adjustments can be made to cost distributions, expected revenues, risk level and budgetary limitations to intelligently select projects for a portfolio without dealing with mathematical formulae. Even though the representation of pharmaceutical assets in these models is simplified, the results will provide valuable insights for portfolio decision makers not in the sense that the optimal solution is the one that a decision maker should necessary implement, but in the sense that the solutions provide more knowledge about the portfolio and its dynamics and constraints. The perturbation analysis provides insights about the relative value of assets in the portfolio in a very straight-forward way; it shows the thresholds at which the optimal solution changes. Furthermore, it prioritizes the assets based on risk, cost and return from a portfolio perspective, and will then be a well-rounded evaluation of the relative value of each asset. An addition to the perturbation analysis will be to add sensitivity on the risk appetite of the decision maker. If a greater appetite for exceeding budget is tolerated, will the composition of the optimal portfolio change? The answer will give yet another level of insight to the decision maker and will enable optimality across several portfolios that share budget constraints.
This study addressed the strategic objective of maximizing expected revenue subject to budgetary limitation. The cost for each project has uncertainty that we manage in the suggested models. Further additions to the model would be to introduce uncertainty in the target function (revenue), i.e., just as we are using uncertainty for project costs, we should do the same for project revenue. Rather than using a risk-adjusted expected revenue, it would be interesting to see if taking into account the volatility of the project revenue (see Fig. 5 for an example) may influence the selection of the optimal set of projects. We have throughout the methodology section given a brief illustration of the general version of the methodology, alongside the adaptation of the methods to the current situation of our application. This serves to indicate that the developed approaches should be generally applicable to a wide range of stochastic optimization problems, not confined to the current pharmaceutical portfolio selection problem that was the focus of our investigation. As an example, the current data did not take into account the uncertainties and complexities involved in the pharmaceutical portfolio such as the dependency among drug development projects, the uncertainty in the lengths of phases, the budget limitations varying over time, as well as non-mathematical aspects such as the decision maker's preferences. However, the general approach should also allow the handling of these types of situations.