A nonmonotone flexible filter method for nonlinear constrained optimization

In this paper, we present a flexible nonmonotone filter method for solving nonlinear constrained optimization problems which are common models in industry. This new method has more flexibility for the acceptance of the trial step compared to the traditional filter methods, and requires less computational costs compared with the monotone-type methods. Moreover, we use a self-adaptive parameter to adjust the acceptance criteria, so that Maratos effect can be avoided a certain degree. Under reasonable assumptions, the proposed algorithm is globally convergent. Numerical tests are presented that confirm the efficiency of the approach.


Introduction
We consider the following inequality constrained nonlinear optimization problem where x ∈ R n , the functions f : R n → R and c i (i ∈ I) : R n → R are all twice continuously differentiable. For convenience, let g(x) = ∇f (x), c(x) = (c  (x), c  (x), . . . , c m (x)) T and A(x) = (∇c  (x), ∇c  (x), . . . , ∇c m (x)). And f k refers to f (x k ), c k to c(x k ), g k to g(x k ) and A k to A(x k ), etc.
There are various methods for solving the inequality constrained nonlinear optimization problem (P). For example, sequential quadratic programming methods, trust region approaches [], penalty methods and interior point methods []. But in these works, a penalty or Lagrange function is always used to test the acceptability of the iterates. However, as we all know, there are several difficulties associated with the use of penalty function, and in particular the choice of the penalty parameter. In , Fletcher and Leyffer [] proposed a class of filter methods, which does not require any penalty parameter and has promising numerical results. Consequently, filter technique has employed to many approaches, for instance, SLP methods [], SQP methods [, ], interior point approaches [] and derivative-free optimization [, ]. Furthermore, Fletcher et al. [] proved the global convergence of the filter-SQP method, then Ulbrich and Ulbrich [] showed its superlin-ear local convergence. But the filter methods also encounter the Maratos effect. Marotos effect, observed by Maratos in his PhD thesis in , means some steps that make good progress toward a solution are rejected by the merit function. To overcome the drawback in filter methods, Ulbrich [] introduced a new filter method using the Lagrangian function instead of the objective function as the acceptance criterion. After that, Nie and Ma [] used a fixed scalar to combine the objective function and violation constraint function as one measure in the entry of the filter. But both of them used the fixed criterion to decide whether accept a trial point or not, that means the criterion is invariable no matter what improvements made by the trial point. Actually, if we can change the criterion according to the different improvements made by the current trial point, we can avoid Maratos effect to a certain degree, and decrease the computational costs as well.
On the other hand, the promising numerical results of filter methods owe to their nonmonotonicity in a certain degree. Based on this property, some other non-monotone-type filter methods are proposed [-]. Gould and Toint [] also introduced a new nonmonotone filter method using the area of the region in hf plane as the criteria to decide whether a trial point is acceptable or not, where h = h(x) is the constraint violation function and f = f (x) is the objective function at the current point x.
Motivated by the idea and methods above, we proposed a class of nonmonotone filter trust region methods with self-adaptive parameter for solving problem (P). Our method improves previous non-monotone filter method. Unlike Ulbrich [], we do not use a Lagrangian function in the filter but use the similar type of function as that in Nie and Ma []. Moreover, different from Nie and Ma [], the parameter in our method is not fixed but variable, that means the criterion is adjusted according to the different improvements. To avoid the trial point from falling into a 'valley' , we also add the non-monotonic technique into the criterion. Different from existing SQP-filter methods, we use a quadratic subproblem that always feasible to avoid the feasible restoration, hence decrease the scale of the calculation to a certain degree. This paper is organized as follows: in Section , we introduce the feasible SQP subproblem and the non-monotonic flexible filter. We propose the non-monotone filter method with self-adaptive parameter in Section . Section  presents the global convergence properties and some numerical results are reported in Section . We end our presentation in short conclusion in Section .

The modified SQP subproblem
Our algorithm is an SQP method, to avoid the infeasibility of the quadratic subproblem, we choose a quadratic program that presented by Zhou []. At the kth iterate, we compute a trial step by solving the following quadratic problem, where and (x k ; d k ) is the first order approximation to ( and ρ k > . We notice that these convex programs have the following properties. We can condense the above definitions by the following form Proof The proof is similar to that of Lemma . in [].

The non-monotone flexible filter with a self-adaptive parameter
In traditional filter method, originally proposed by Fletcher and Leyffer [], the acceptability of iterates is determined by comparing the value of constraint violation and the objective function with previous iterates collected in a filter. Define the violation function x is a feasible point. So a trial point should either reduce the value of constraint violation or the objective function f . Definition of filter set is based on the definition of dominance as following, Definition  A filter set F is a set of pairs (h, f ) such that no pair dominates any other.
To ensure the convergence, some additional conditions are required to decide whether to accept a trial point to the filter or not. The traditional acceptable criterion is as following.

Definition  A trial point x is called acceptable to the filter if and only if
where  < γ < β <  are constants. In practice, β is close to  and γ close to .
Actually, in traditional filter method, some good point such as superlinear convergent step may be rejected due to the increase of both objective function value and constraint violation value compared to other entries in filter. That is the reason why the Maratos effect occurs. So motivated by [], we substitute the original objective function f (x k ) at the kth iterate by the following function where Here δ k is a self-adaptive parameter at kth iterate, it can be changed according to the different improvements that made by the current trial point. Note that the traditional filter methods are the special cases with δ k = , and we hope overcome the Maratos effect with suitable δ k . We aim to reduce the value of both h(x) and l(x). By original criterion, the trial point is acceptable if and only if () holds. Nie and Ma [] proposed a trust region filter method with a given penalty parameter which is negative, but in this paper, different from [], the parameter δ k is a variable scalar which is changed according to the different improvements caused by the trial point. Specifically, at the beginning, let δ  = , that is what the traditional filter method does, and f (x k ) = l(x k ) (see Figure ).
There are four regions in the right-hand half space I, II, III, IV. At the current iterate k, if the trial point x k moves into the region IV, that means the pair (h k , l k ) is located in region IV, we say that the trial point is rejected according to our criterion. If x k moves into the region I, II, or III, we accept it, but need to adjust the parameter δ k in the criterion. For region III, we say that the algorithm does not make a good improvement, since we do not want to accept points with larger constraint violation. Thus we intent to impose stricter acceptance criterion, that means to increase the value of δ k , which will result in the bigger reject area and smaller acceptable area (see Figure ). So update δ k as following: If x k moves into the region II, we say that the algorithm makes good improvement since it reduces not only the objective function l(x k ) but also the constraint violation h(x k ), so we intend to loosen the acceptance criterion to hope for more improvements. That means to decrease the value of δ k so that make the reject area become smaller and the acceptable area bigger (see Figure ). So update δ k as following: If x k moves into region I, we will also accept it because the value of constraint violation does decrease, and we can also accept the increase of l(x k ) in finite steps. Meanwhile, the value of δ k will not be changed. If x k moves into region IV, that means this trial point is rejected, and δ k also should be remained in the next iterate.
As we all know, because of the non-monotone properties of filter method in a certain degree, it has the good numerical results. Su and Pu [] also proposed a modified nonmonotone filter method to exhibit a further non-monotone technique. Motivated by this, we loosen the acceptance criterion by non-monotonic technique and give the following criteria.

Definition  A point x is acceptable to the filter if and only if
where (h k-r , l k-r ) ∈ F for  ≤ r ≤ m(k) -, and  ≤ m(k) ≤ min{m(k -) + , M}, M ≥  is a given positive constant, m(k)- r= λ kr = , λ kr ∈ (, ) and there exists a positive constant λ such that λ kr ≥ λ.
Similar to the traditional filter methods, we also need to update the filter set F at each successful iteration, the technique is equivalent to the traditional method with the modified acceptance rule ().
To control the infeasibility, an upper bound condition of violation function is needed, namely h(x) ≤ u, where u is a positive scalar, which can be implemented in the algorithm by initiating the filter with the pair (u, -∞).

A nonmonotone flexible filter algorithm
At the current kth iterate, the trial point x k is accepted by our algorithm if it satisfies two conditions, first is accepted by the filter set, second is sufficiently reduction. We define the sufficient reduction condition is as following: where α  , α  are constants, the relaxed actual reduction rared l k and the predicted reduction pred f k are defined as and the matrix H k is the Hessian matrix ∇  f (x k ) or an approximate to it, m(k)- r= λ kr = , λ kr ∈ (, ),  ≤ m(k) ≤ min{m(k -) + , M}, M ≥  is a given positive constant.
A formal description of the algorithm is given as follows.

Algorithm A
Step Choose an initial point x  ∈ R n , a symmetric matrix H  ∈ R n×n and an initial region radius  ≥ min > , Step . If x + k is acceptable to the filter F k , go to step , otherwise go to step .
Step . If x + k is located in the region I or region IV, let δ k+ = δ k , if x + k is located in the region II, let δ k+ is updated by (), if x + k is in the region III, let δ k+ is updated by ().
Step . If rared l k ≤ η pred f k and h l (k) ≤ α  d k α  ∞ , then go to step , otherwise go to step .
Remark  At the beginning of each iteration, we always set ρ k ≥ ρ min , which will avoid too small trust region radius.
Remark  In above algorithm, let M be a nonnegative integer. For each k, let m(k) satisfy In fact, if M = , the algorithm actual is a monotone method, the nonmonotonicity is showed as M > .

The convergent properties
In this section, to present a proof of global convergence of algorithm, we always assume that following conditions hold. By the above assumptions, we can suppose that there exist constants

Assumptions
Definition  [] The Mangasarian-Fromowitz constraint qualification (MFCQ) is said to be satisfied at a point x ∈ R n with respect to the underlying constraint system g(x) ≤ , if there is a z ∈ R n such that ()

Lemma  []
Let Assumptions hold, and letx be a feasible point of problem (P) at which MFCQ holds but which is not a KKT point. Then there exists a neighborhood N ofx and positive constants ξ  , ξ  , ξ  such that for all x k ∈ N ∩ S and all ρ k for which it follows that SQP subproblem has a feasible solution d k , and the predicted reduction satisfies

Lemma  Suppose that Assumptions hold, then Algorithm A is well defined.
Proof We will show that the trial point x + k is acceptable to the filter when ρ k , is small enough. We consider the following two cases.
To prove the implementation of Algorithm A, we have to show for all k such that ρ k ≤ δ it holds rared l k ≥ η pred f k . We know ared l k = l(x k )l(x + k ). In fact, where y k = x k + ξ d k , ξ ∈ (, ) denotes some point on the line segment from x k to x + k . By the update of δ k and the definition of h(x + k ), we know |δ k+ | ≤ ρ k , where s k denotes some point in the line from x k to x + k . Hence we obtain that We have The conclusion follows. This is the end of proof.

Lemma  Suppose that Assumptions hold and Algorithm
A does not terminated finitely, then lim k→∞ h k = .
Proof If Algorithm A can not be terminate finitely, then there are infinite many points accepted by the filter. We prove the result in two cases by the definition of filter.
which implies that {h(x l(k) )} converges. Then by h( Since β ∈ (, ), we deduce that h(x l(k) ) →  (k → ∞). Therefore (ii) We first show that for all k ∈ S, it holds We prove this by induction. If k = , we have l  ≤ l γ h  ≤ l λγ h  . Assume that () holds for , , . . . , k, then we consider () holds for k + in the following two cases. In Table , the problems are numbered in the same way as in Hock and Schittkowski [] and Schittkowski []. For example, 'HS' is the problem  in Hock and Schittkowski [] and 'S' is the problem  in Schittkowski []. Some equality constrained problems are also included in our test problems, such as S, S, S and so on. NF, NG represent the number of function and gradient calculations respectively. In Table , the results in first column are calculated by Algorithm A, those in second column are calculated by traditional filter method, which are shown in [], those in third column are calculated by Matlab function 'fmincon' , compared the three methods, our algorithm has a smaller number of function calculations and gradient calculations.
To show the effect of the non-monotone method, we also list the numerical results in Table , these tests are done for M = , M =  and M =  respectively, that means the degree of non-monotonicity is increasing.
First numerical results show that the nonmonotone algorithm is more effective than monotone one for most test examples and our algorithm is effective and satisfactory.

Conclusions
In our method, the criterion used to test the trial points is flexible, the refuse region is variable according to the different improvement made by the previous trial point, while in