A nonmonotone flexible filter method for nonlinear constrained optimization
 Ke Su^{1}Email author,
 Xiaochuan Li^{1} and
 Ruyue Hou^{1}
Received: 1 June 2016
Accepted: 28 September 2016
Published: 12 October 2016
Abstract
In this paper, we present a flexible nonmonotone filter method for solving nonlinear constrained optimization problems which are common models in industry. This new method has more flexibility for the acceptance of the trial step compared to the traditional filter methods, and requires less computational costs compared with the monotonetype methods. Moreover, we use a selfadaptive parameter to adjust the acceptance criteria, so that Maratos effect can be avoided a certain degree. Under reasonable assumptions, the proposed algorithm is globally convergent. Numerical tests are presented that confirm the efficiency of the approach.
Keywords
nonmonotone filter selfadaptive global convergence trust regionMSC
90C30 65K051 Introduction
There are various methods for solving the inequality constrained nonlinear optimization problem (P). For example, sequential quadratic programming methods, trust region approaches [1], penalty methods and interior point methods [2]. But in these works, a penalty or Lagrange function is always used to test the acceptability of the iterates. However, as we all know, there are several difficulties associated with the use of penalty function, and in particular the choice of the penalty parameter. In 2002, Fletcher and Leyffer [3] proposed a class of filter methods, which does not require any penalty parameter and has promising numerical results. Consequently, filter technique has employed to many approaches, for instance, SLP methods [4], SQP methods [5, 6], interior point approaches [7] and derivativefree optimization [8, 9]. Furthermore, Fletcher et al. [5] proved the global convergence of the filterSQP method, then Ulbrich and Ulbrich [10] showed its superlinear local convergence. But the filter methods also encounter the Maratos effect. Marotos effect, observed by Maratos in his PhD thesis in 1978, means some steps that make good progress toward a solution are rejected by the merit function. To overcome the drawback in filter methods, Ulbrich [11] introduced a new filter method using the Lagrangian function instead of the objective function as the acceptance criterion. After that, Nie and Ma [12] used a fixed scalar to combine the objective function and violation constraint function as one measure in the entry of the filter. But both of them used the fixed criterion to decide whether accept a trial point or not, that means the criterion is invariable no matter what improvements made by the trial point. Actually, if we can change the criterion according to the different improvements made by the current trial point, we can avoid Maratos effect to a certain degree, and decrease the computational costs as well.
On the other hand, the promising numerical results of filter methods owe to their nonmonotonicity in a certain degree. Based on this property, some other nonmonotonetype filter methods are proposed [13–15]. Gould and Toint [16] also introduced a new nonmonotone filter method using the area of the region in \(hf\) plane as the criteria to decide whether a trial point is acceptable or not, where \(h=h(x)\) is the constraint violation function and \(f=f(x)\) is the objective function at the current point x.
Motivated by the idea and methods above, we proposed a class of nonmonotone filter trust region methods with selfadaptive parameter for solving problem (P). Our method improves previous nonmonotone filter method. Unlike Ulbrich [11], we do not use a Lagrangian function in the filter but use the similar type of function as that in Nie and Ma [12]. Moreover, different from Nie and Ma [12], the parameter in our method is not fixed but variable, that means the criterion is adjusted according to the different improvements. To avoid the trial point from falling into a ‘valley’, we also add the nonmonotonic technique into the criterion. Different from existing SQPfilter methods, we use a quadratic subproblem that always feasible to avoid the feasible restoration, hence decrease the scale of the calculation to a certain degree.
This paper is organized as follows: in Section 2, we introduce the feasible SQP subproblem and the nonmonotonic flexible filter. We propose the nonmonotone filter method with selfadaptive parameter in Section 3. Section 4 presents the global convergence properties and some numerical results are reported in Section 5. We end our presentation in short conclusion in Section 6.
2 The modified SQP subproblem and the nonmonotone flexible filter method
2.1 The modified SQP subproblem
Lemma 1
[17]
If \(d_{k}=0\) is the solution to \(Q(x_{k},H_{k},\rho_{k})\), then \(x_{k}\) is a KKT point of the problem (P).
Proof
The proof is similar to that of Lemma 4.1 in [16]. □
2.2 The nonmonotone flexible filter with a selfadaptive parameter
In traditional filter method, originally proposed by Fletcher and Leyffer [3], the acceptability of iterates is determined by comparing the value of constraint violation and the objective function with previous iterates collected in a filter. Define the violation function \(h(x)\) by \(h(x)=\c(x)^{+}\_{\infty}\), where \(c_{i}(x)^{+}=\max\{c_{i}(x),0, i\in I\}\). Obviously, \(h(x)=0\) if and only if x is a feasible point. So a trial point should either reduce the value of constraint violation or the objective function f.
Definition of filter set is based on the definition of dominance as following,
Definition 1
A pair \((h_{k},f_{k})\) is dominated by \((h_{j}, f_{j})\) if and only if \(h_{k}\leq h_{j}\) and \(f_{k}\leq f_{j}\) for each \(j\neq k\).
Definition 2
A filter set \(\mathcal{F}\) is a set of pairs \((h,f)\) such that no pair dominates any other.
To ensure the convergence, some additional conditions are required to decide whether to accept a trial point to the filter or not. The traditional acceptable criterion is as following.
Definition 3
If \(x_{k}\) moves into region I, we will also accept it because the value of constraint violation does decrease, and we can also accept the increase of \(l(x_{k})\) in finite steps. Meanwhile, the value of \(\delta _{k}\) will not be changed. If \(x_{k}\) moves into region IV, that means this trial point is rejected, and \(\delta_{k}\) also should be remained in the next iterate.
As we all know, because of the nonmonotone properties of filter method in a certain degree, it has the good numerical results. Su and Pu [13] also proposed a modified nonmonotone filter method to exhibit a further nonmonotone technique. Motivated by this, we loosen the acceptance criterion by nonmonotonic technique and give the following criteria.
Definition 4
Similar to the traditional filter methods, we also need to update the filter set \(\mathcal{F}\) at each successful iteration, the technique is equivalent to the traditional method with the modified acceptance rule (10).
To control the infeasibility, an upper bound condition of violation function is needed, namely \(h(x)\leq u\), where u is a positive scalar, which can be implemented in the algorithm by initiating the filter with the pair \((u, \infty)\).
3 A nonmonotone flexible filter algorithm
A formal description of the algorithm is given as follows.
Algorithm A
 Step 0.:

Let \(0<\rho_{0}<1\), \(0<\gamma<\beta<1\), \(0<\lambda\le 1\), \(0<\gamma_{0}<\gamma_{1}\le1<\gamma_{2}\), \(M\geq1\), \(u>0\), \(\alpha_{1}=\alpha _{2}=0.5\). Choose an initial point \(x_{0}\in R^{n}\), a symmetric matrix \(H_{0}\in R^{n\times n}\) and an initial region radius \(\Delta_{0}\geq\Delta_{\min}>0\), \({\mathcal {F}}_{0}=\{(u,\infty)\}\). Set \(k=0\), \(m(k)=0\).
 Step 1.:

Solve the subproblem \(Q(x_{k},H_{k},\rho_{k})\), if \(\d_{k}\ =0\), stop.
 Step 2.:

Let \(x_{k}^{+}=x_{k}+d_{k}\), compute \(h_{k}^{+}\), \(l_{k}^{+}\).
 Step 3.:

If \(x_{k}^{+}\) is acceptable to the filter \({\mathcal {F}}_{k}\), go to step 4, otherwise go to step 5.
 Step 4.:

If \(x_{k}^{+}\) is located in the region I or region IV, let \(\delta_{k+1}=\delta_{k}\), if \(x_{k}^{+}\) is located in the region II, let \(\delta_{k+1}\) is updated by (9), if \(x_{k}^{+}\) is in the region III, let \(\delta_{k+1}\) is updated by (8).
 Step 5.:

If \(\mathrm{rared}_{k}^{l}\leq\eta\, \mathrm{pred}_{k}^{f}\) and \(h_{l}(k)\leq\alpha_{1}\d_{k}\_{\infty}^{\alpha_{2}}\), then go to step 6, otherwise go to step 7.
 Step 6.:

Let \(\rho_{k}\in[\gamma_{0}\rho_{k},\gamma_{1}\rho_{k}]\), go to step 1.
 Step 7.:

Let \(x_{k+1}=x_{k}^{+}\), update the filter set. \(\rho_{k+1}\in[\rho_{k},\gamma_{2}\rho_{k}]\geq\rho_{\min}\), update \(H_{k}\) to \(H_{k+1}\), \(m(k+1)=\min\{m(k)+1,M\}\), \(k=k+1\) and go to step 1.
Remark 1
At the beginning of each iteration, we always set \(\rho_{k}\geq\rho_{\min}\), which will avoid too small trust region radius.
Remark 2
4 The convergent properties
In this section, to present a proof of global convergence of algorithm, we always assume that following conditions hold.
Assumptions
 A1.
The objective function f and the constraint functions \(c_{i}\) (\(i\in I=\{1,2,\ldots,m\}\)) are twice continuously differentiable.
 A2.
For all k, \(x_{k}\) and \(x_{k}+d_{k}\) all remain in a closed, bounded convex subset \(S\subset R^{n}\).
 A3.
The matrix sequence \(\{H_{k}\}\) is uniformly bounded.
 A4.
The functions \(A=\nabla c\) are uniformly bounded on S.
By the above assumptions, we can suppose that there exist constants \(v_{1}\), \(v_{2}\), \(v_{3}\) such that \(\f(x)\\le v_{1}\), \(\\nabla f(x)\\le v_{1}\), \(\\nabla^{2} f(x)\\le v_{1}\), \(\c(x)\\le v_{2}\), \(\\nabla c(x)\\le v_{2}\), \(\\nabla^{2} c(x)\\le v_{2}\).
Definition 5
[2]
Lemma 2
[12]
If \(h_{k}>0\) and \(\rho_{k}\leq\sqrt{\frac{2\beta h_{k}}{n^{2}v_{2}}}\) then \(h(x_{k}^{+})\leq\beta h_{k}\).
Lemma 3
Suppose that Assumptions hold, then Algorithm A is well defined.
Proof
We will show that the trial point \(x_{k}^{+}\) is acceptable to the filter when \(\rho_{k}\), is small enough. We consider the following two cases.
Case 1. \(h_{k}= 0\).
To prove the implementation of Algorithm A, we have to show for all k such that \(\rho_{k}\leq\delta\) it holds \(\mathrm{rared}_{k}^{l}\geq\eta\, \mathrm {pred}_{k}^{f}\). We know \(\mathrm{ared}_{k}^{l}=l(x_{k})l(x_{k}^{+})\).
Case 2. \(h_{k}>0\).
There exists a constant \(\delta>0\) and \(k_{0}\) such that \(\rho_{k}\leq\delta \) when \(k< k_{0}\). Let \(\delta=\sqrt{\frac{2\beta h_{k}}{n^{2}M_{2}}}\) by Lemma 2, we have \(h_{k}^{+}\leq\beta h_{k}\), that is \(h_{k}^{+}\leq\beta\max_{0\leq j\leq m(k)1}\{h_{kj}\}\). So \(x_{k}^{+}\) must be acceptable to the filter by the definition.
Lemma 4
Suppose that Assumptions hold and Algorithm A does not terminated finitely, then \(\lim_{k\rightarrow\infty}h_{k}=0\).
Proof
 (i)
\(K_{1}=\{kh_{k}^{+}\le\beta\max_{0\le r\le m(k)1}h_{kr}\}\) is an infinite set.
 (ii)
\(K_{2}=\{kl_{k}^{+}\le\max[l_{k},\sum_{r=0}^{m(k)1}\lambda_{kr} l_{kr}]\gamma h_{k}\}\) is an infinite set.
If \(k=1\), we have \(l_{1}\le l_{0}\gamma h_{0}\le l_{0}\lambda\gamma h_{0}\).
Assume that (27) holds for \(1,2,\ldots,k\), then we consider (27) holds for \(k+1\) in the following two cases.
Case 2. \(\max[l_{k},\sum_{r=0}^{m(k)1}\lambda_{kr} l_{kr}]=\sum_{r=0}^{m(k)1}\lambda_{kr} l_{kr}\).
Lemma 5
Suppose that Assumptions hold. If Algorithm A does not termination finitely, then \(\lim_{k\rightarrow \infty}\d_{k}\=0\).
Proof
Suppose by contradiction that there exist constants \(\epsilon>0 \) and \(\bar{k}>0\) such that \(\d_{k}\>\epsilon\) for all \(k>\bar{k}\).
Then by Lemma 2, \(\mathrm{pred}_{k}^{f}>\frac{1}{3}\xi_{1}\\rho_{k}\>\frac {1}{3}\xi_{1}\d_{k}\>\frac{1}{3}\xi\epsilon>0\), because of \(\mathrm{rared}_{k}^{l}\geq\eta\, \mathrm{pred}_{k}^{f}\), we have \(\max[l_{k},\sum_{r=0}^{m(k)1}\lambda_{kr} l_{kr}]l_{k+1}\geq\eta\, \mathrm{pred}_{k}^{f}\). We take the sum at the both sides, together with the sequence \({l_{k}}\) is bounded below, we have \(\eta\sum\mathrm{pred}_{k}^{f}<\infty\), that follows \(\mathrm{pred}_{k}^{f}\rightarrow0\) as \(k\rightarrow\infty\), which contradicts to \(\mathrm{pred}_{k}^{f}>0\). Hence the conclusion follows. □
Theorem 1
Suppose \(\{x_{k}\}\) is an infinite sequence generated by Algorithm A. Then every cluster point of \(\{x_{k}\}\) is a KKT point of problem (P).
5 Numerical results
 (1)[2] Updating of \(H_{k}\) is done bywhere \(y_{k}=\theta_{k}\hat{y}_{k}+(1\theta_{k})H_{k}s_{k}\),$$H_{k+1}=H_{k}+ \frac{y_{k}^{T}y_{k}}{y_{k}^{T}s_{k}}\frac{H_{k}s_{k}s_{k}^{T}H_{k}}{s_{k}^{T}H_{k}s_{k}}, $$and \(\hat{y}_{k}=g_{k+1}g_{k}\), \(s_{k}=x_{k+1}x_{k}\).$$ \theta_{k}=\left \{ \textstyle\begin{array}{l@{\quad}l} 1,& s_{k}^{T}\hat{y}_{k}\geq0.2s_{k}^{T}H_{k}s_{k}, \\ \frac{0.8s_{k}^{T}H_{k}s_{k}}{s_{k}^{T}H_{k}s_{k}s_{k}^{T}\hat{y}_{k}}, &\text{otherwise} \end{array}\displaystyle \right . $$(31)
 (2)
We assume the error tolerance is 10^{−6}.
 (3)
The algorithm parameters were set as follows: \(H_{0}=I\in R^{n\times n}\), \(\beta=0.9\), \(\gamma=0.1\), \(\rho=0.5\), \(\alpha_{1}=\alpha_{2}=0.5\), \(\sigma_{0}=0.1\), \(\Delta_{\min}=10^{6}\), \(\Delta_{0}=1\). The program is written in Matlab.
The numerical results of different algorithm
Algorithm A (NGNF)  Filter (NGNF)  Matlab (NF)  

HS2  2736  1932  28 
HS6  99  3741  23 
HS11  5982    32 
HS13  102102    203 
HS14  66  66  23 
HS15  102192  2446  50 
HS16  1717  2234  23 
HS17  4444  4444  15 
HS18  3847  3643  40 
HS19  88  88  27 
HS20  1717  2134  63 
HS21  88  88  15 
HS22  22  22  19 
HS23  77  77  31 
HS41  1515  1515  41 
HS45  22  22  20 
HS59  1040  1346  53 
HS64  5462  5786  301 
HS65  2828  4040  44 
HS72  5272  3850  101 
HS73  122  122  35 
HS106  1755    509 
HS108  77  1429  182 
S216  413  313  21 
S235  3638  3638  110 
S252  1834  5858  139 
S265  22  22  17 
S269  99  1431  48 
The results of different M in our algorithm ( i.e. using different degree of nonmonotone)
M = 1 (NGNF)  M = 3 (NGNF)  M = 10 (NF)  

HS2  2840  2736   
HS6  2636  99  99 
HS11  2958  5982  3998 
HS13  102102  102102  102102 
HS14  66  66  66 
HS15  1834  102192  2760 
HS16  6370  1717  1717 
HS17  4444  4444  4444 
HS18  3037  3847   
HS19  2323  88  88 
HS20  8592  1717  1717 
First numerical results show that the nonmonotone algorithm is more effective than monotone one for most test examples and our algorithm is effective and satisfactory.
6 Conclusions
In our method, the criterion used to test the trial points is flexible, the refuse region is variable according to the different improvement made by the previous trial point, while in the traditional filter methods, the elements in the filter structure are fixed. By the numerical results, we also find the new method has more effective results and less computational costs than not only the traditional methods but also the Matlab algorithms. Moreover, the use and adjustment of the selfadaptive parameter in our method is a good way to balance the value of objective function and the violation constraint function. On the other hand, the application of nonmonotone in criterion avoids the Maratos effect to a certain degree, because more trial points are accepted by the filter according to the algorithm. We also compared the results of different nonmonotonic degree, although we can not decide which value of M is the best one, at least the results with nonmonotone is better than that with monotone technique.
Declarations
Acknowledgements
This research is supported by the National Natural Science Foundation of China (No. 11101115), the Natural Science Foundation of Hebei Province (No. 2014201033) and the Key Research Foundation of Educational Bureau of Hebei Province (No. ZD2015069). In addition, we would like to show our deepest gratitude to the editor and the anonymous reviewers who have helped to improve the paper.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 Zhang J. A robust trust region method for nonlinear optimization with inequality constraint. Appl Math Comput. 2006;176:68899. MathSciNetMATHGoogle Scholar
 Nocedal J, Wright S. Numerical optimization. New York: Springer; 1999. View ArticleMATHGoogle Scholar
 Fletcher R, Leyffer S. Nonlinear programming without a penalty function. Math Program. 2002;91:23969. MathSciNetView ArticleMATHGoogle Scholar
 Chin C, Fletcher R. On the global convergence of an SLPfilter algorithm takes EQP steps. Math Program. 2003;96:16177. MathSciNetView ArticleMATHGoogle Scholar
 Fletcher R, Gould N, Leyffer S, Toint P, Wachter A. A global convergence of a trust region SQPfilter algorithm for general nonlinear programming. SIAM J Optim. 2002;13:63560. MathSciNetView ArticleMATHGoogle Scholar
 Fletcher R, Leyffer S, Toint P. On the global convergence of a filterSQP algorithm. SIAM J Optim. 2002;13:4459. MathSciNetView ArticleMATHGoogle Scholar
 Ulbrich M, Ulbrich S, Vicente L. A global convergent primaldual interiorpoint method for nonconvex nonlinear programming. Math Program. 2004;100:379410. MathSciNetView ArticleMATHGoogle Scholar
 Audet C, Dennis J. A pattern search filter method for nonlinear programming without derivatives. SIAM J Optim. 2004;14:9801010. MathSciNetView ArticleMATHGoogle Scholar
 Karas E, Riberio A, Sagastizabalc C, Solodov M. A bunble filter method for nonsmooth convex constrained optimization. Math Program. 2009;116:297320. MathSciNetView ArticleGoogle Scholar
 Ulbrich M, Ulbrich S. Nonmonotone trust region methods for nonlinear equality constrained optimization without a penalty function. Math Program. 2003;95:10335. MathSciNetView ArticleMATHGoogle Scholar
 Ulbrich S. On the superlinear local convergence of a filterSQP method. Math Program. 2004;100:21745. MathSciNetMATHGoogle Scholar
 Nie P, Ma C. A trust region filter method for general nonlinear programming. Appl Math Comput. 2006;172:100017. MathSciNetMATHGoogle Scholar
 Su K, Pu D. A nonmonotone filter trust region method for nonlinear constrained optimization. J Comput Appl Math. 2009;223:2309. MathSciNetView ArticleMATHGoogle Scholar
 Chen Z. A penaltyfreetype nonmonotone trust region method for nonlinear constrained optimization. Appl Math Comput. 2006;173:101446. MathSciNetMATHGoogle Scholar
 Chen Z, Zhang X. A nonmonotone trust region algorithm with nonmonotone penalty parameters for constrained optimization. J Comput Appl Math. 2004;172:739. MathSciNetView ArticleMATHGoogle Scholar
 Gould N, Toint P. Global convergence of a nonmonotone trustregion SQPfilter algorithm for nonlinear programming. Nonconvex Optim Appl. 2006;82:12550. MathSciNetView ArticleMATHGoogle Scholar
 Zhou G. A modified SQP method and its global convergence. J Glob Optim. 1997;11:193205. MathSciNetView ArticleMATHGoogle Scholar
 Hock W, Schittkowski K. Test examples for nonlinear programming codes. Lecture notes in economics and mathematics system. New York: Springer; 1981. View ArticleMATHGoogle Scholar
 Schittkowski K. More test examples for nonlinear mathematical programming codes. New York: Springer; 1987. View ArticleMATHGoogle Scholar