Open Access

A nonmonotone flexible filter method for nonlinear constrained optimization

Journal of Mathematics in Industry20166:8

DOI: 10.1186/s13362-016-0029-1

Received: 1 June 2016

Accepted: 28 September 2016

Published: 12 October 2016


In this paper, we present a flexible nonmonotone filter method for solving nonlinear constrained optimization problems which are common models in industry. This new method has more flexibility for the acceptance of the trial step compared to the traditional filter methods, and requires less computational costs compared with the monotone-type methods. Moreover, we use a self-adaptive parameter to adjust the acceptance criteria, so that Maratos effect can be avoided a certain degree. Under reasonable assumptions, the proposed algorithm is globally convergent. Numerical tests are presented that confirm the efficiency of the approach.


nonmonotone filter self-adaptive global convergence trust region


90C30 65K05

1 Introduction

We consider the following inequality constrained nonlinear optimization problem
$$\begin{aligned}& (\mathrm{P})\quad \min f(x) \\& \hphantom{(\mathrm{P})\quad}\quad \text{s.t. } c_{i}(x)\leq0,\quad i\in I=\{1,2, \ldots,m\}, \end{aligned}$$
where \(x\in\mathbb{R}^{n}\), the functions \(f:\mathbb{R}^{n}\to\mathbb{R}\) and \(c_{i}\ (i\in I):\mathbb {R}^{n}\to\mathbb{R}\) are all twice continuously differentiable. For convenience, let \(g(x)=\nabla f(x)\), \(c(x)=(c_{1}(x),c_{2}(x),\ldots,c_{m}(x))^{T}\) and \(A(x)=(\nabla c_{1}(x),\nabla c_{2}(x),\ldots,\nabla c_{m}(x))\). And \(f_{k}\) refers to \(f(x_{k})\), \(c_{k}\) to \(c(x_{k})\), \(g_{k}\) to \(g(x_{k})\) and \(A_{k}\) to \(A(x_{k})\), etc.

There are various methods for solving the inequality constrained nonlinear optimization problem (P). For example, sequential quadratic programming methods, trust region approaches [1], penalty methods and interior point methods [2]. But in these works, a penalty or Lagrange function is always used to test the acceptability of the iterates. However, as we all know, there are several difficulties associated with the use of penalty function, and in particular the choice of the penalty parameter. In 2002, Fletcher and Leyffer [3] proposed a class of filter methods, which does not require any penalty parameter and has promising numerical results. Consequently, filter technique has employed to many approaches, for instance, SLP methods [4], SQP methods [5, 6], interior point approaches [7] and derivative-free optimization [8, 9]. Furthermore, Fletcher et al. [5] proved the global convergence of the filter-SQP method, then Ulbrich and Ulbrich [10] showed its superlinear local convergence. But the filter methods also encounter the Maratos effect. Marotos effect, observed by Maratos in his PhD thesis in 1978, means some steps that make good progress toward a solution are rejected by the merit function. To overcome the drawback in filter methods, Ulbrich [11] introduced a new filter method using the Lagrangian function instead of the objective function as the acceptance criterion. After that, Nie and Ma [12] used a fixed scalar to combine the objective function and violation constraint function as one measure in the entry of the filter. But both of them used the fixed criterion to decide whether accept a trial point or not, that means the criterion is invariable no matter what improvements made by the trial point. Actually, if we can change the criterion according to the different improvements made by the current trial point, we can avoid Maratos effect to a certain degree, and decrease the computational costs as well.

On the other hand, the promising numerical results of filter methods owe to their non-monotonicity in a certain degree. Based on this property, some other non-monotone-type filter methods are proposed [1315]. Gould and Toint [16] also introduced a new non-monotone filter method using the area of the region in \(h-f\) plane as the criteria to decide whether a trial point is acceptable or not, where \(h=h(x)\) is the constraint violation function and \(f=f(x)\) is the objective function at the current point x.

Motivated by the idea and methods above, we proposed a class of nonmonotone filter trust region methods with self-adaptive parameter for solving problem (P). Our method improves previous non-monotone filter method. Unlike Ulbrich [11], we do not use a Lagrangian function in the filter but use the similar type of function as that in Nie and Ma [12]. Moreover, different from Nie and Ma [12], the parameter in our method is not fixed but variable, that means the criterion is adjusted according to the different improvements. To avoid the trial point from falling into a ‘valley’, we also add the non-monotonic technique into the criterion. Different from existing SQP-filter methods, we use a quadratic subproblem that always feasible to avoid the feasible restoration, hence decrease the scale of the calculation to a certain degree.

This paper is organized as follows: in Section 2, we introduce the feasible SQP subproblem and the non-monotonic flexible filter. We propose the non-monotone filter method with self-adaptive parameter in Section 3. Section 4 presents the global convergence properties and some numerical results are reported in Section 5. We end our presentation in short conclusion in Section 6.

2 The modified SQP subproblem and the non-monotone flexible filter method

2.1 The modified SQP subproblem

Our algorithm is an SQP method, to avoid the infeasibility of the quadratic subproblem, we choose a quadratic program that presented by Zhou [17]. At the kth iterate, we compute a trial step by solving the following quadratic problem,
$$ \begin{aligned} &Q(x_{k},H_{k},\rho_{k})\mbox{:} \quad \min g_{k}^{T}d+\frac{1}{2}d^{T}H_{k}d \\ &\hphantom{Q(x_{k},H_{k},\rho_{k})\mbox{:}\quad}\quad \text{s.t. } c_{j}(x_{k})+\nabla c_{j}(x_{k})^{T}d\le \Psi^{+}\bigl(x^{k}, \rho_{k}\bigr),\quad j\in I, \\ &\hphantom{Q(x_{k},H_{k},\rho_{k})\mbox{:}\quad\quad\text{s.t. }}\|d\|\le\rho_{k}, \end{aligned} $$
$$\begin{aligned}& \Psi^{+}(x_{k},\rho_{k})=\max\bigl\{ \Psi(x_{k}, \rho_{k}),0\bigr\} , \end{aligned}$$
$$\begin{aligned}& \Psi(x_{k},\rho_{k})=\min\bigl\{ \overline{ \Psi}(x_{k};d_{k}):\|d_{k}\| \le\rho_{k} \bigr\} \end{aligned}$$
and \(\overline{\Psi}(x_{k};d_{k})\) is the first order approximation to \(\Psi(x_{k}+d_{k})=\max\{c_{j}(x_{k}+d_{k}):j\in I\}\), namely
$$ \overline{\Psi}(x_{k};d_{k})=\max\bigl\{ c_{j}(x_{k})+ \nabla c_{j}(x_{k})^{T}d_{k} :j\in I\bigr\} $$
and \(\rho_{k}>0\). We notice that these convex programs have the following properties.
We can condense the above definitions by the following form
$$ \Psi^{+}(x_{k},\rho_{k})=\max\Bigl\{ \min_{\|d_{k}\|\le\rho_{k}} \Bigl\{ \max_{j\in I}\bigl\{ c_{j}(x_{k})+ \nabla c_{j}(x_{k})^{T}d_{k}\bigr\} \Bigr\} ,0\Bigr\} . $$

Lemma 1


If \(d_{k}=0\) is the solution to \(Q(x_{k},H_{k},\rho_{k})\), then \(x_{k}\) is a KKT point of the problem (P).


The proof is similar to that of Lemma 4.1 in [16]. □

2.2 The non-monotone flexible filter with a self-adaptive parameter

In traditional filter method, originally proposed by Fletcher and Leyffer [3], the acceptability of iterates is determined by comparing the value of constraint violation and the objective function with previous iterates collected in a filter. Define the violation function \(h(x)\) by \(h(x)=\|c(x)^{+}\|_{\infty}\), where \(c_{i}(x)^{+}=\max\{c_{i}(x),0, i\in I\}\). Obviously, \(h(x)=0\) if and only if x is a feasible point. So a trial point should either reduce the value of constraint violation or the objective function f.

Definition of filter set is based on the definition of dominance as following,

Definition 1

A pair \((h_{k},f_{k})\) is dominated by \((h_{j}, f_{j})\) if and only if \(h_{k}\leq h_{j}\) and \(f_{k}\leq f_{j}\) for each \(j\neq k\).

Definition 2

A filter set \(\mathcal{F}\) is a set of pairs \((h,f)\) such that no pair dominates any other.

To ensure the convergence, some additional conditions are required to decide whether to accept a trial point to the filter or not. The traditional acceptable criterion is as following.

Definition 3

A trial point x is called acceptable to the filter if and only if
$$ \text{either} \quad h(x)\le\beta h_{j} \quad \text{or}\quad f(x)\le f_{j}-\gamma h_{j} \quad \text{for } \forall(h_{j},f_{j}) \in\mathcal{F}, $$
where \(0<\gamma<\beta<1\) are constants. In practice, β is close to 1 and γ close to 0.
Actually, in traditional filter method, some good point such as superlinear convergent step may be rejected due to the increase of both objective function value and constraint violation value compared to other entries in filter. That is the reason why the Maratos effect occurs. So motivated by [12], we substitute the original objective function \(f(x_{k})\) at the kth iterate by the following function
$$ l(x_{k})=f(x_{k})+\delta_{k} h(x_{k})=f(x_{k})+ \delta_{k}\bigl\Vert c(x_{k})^{+}\bigr\Vert _{\infty}, $$
where \(c_{i}(x_{k})^{+}=\max\{c_{i}(x_{k}),0\}\) for \(i=1,2,\ldots,m\). Here \(\delta_{k}\) is a self-adaptive parameter at kth iterate, it can be changed according to the different improvements that made by the current trial point. Note that the traditional filter methods are the special cases with \(\delta_{k}=0\), and we hope overcome the Maratos effect with suitable \(\delta_{k}\).
We aim to reduce the value of both \(h(x)\) and \(l(x)\). By original criterion, the trial point is acceptable if and only if (6) holds. Nie and Ma [12] proposed a trust region filter method with a given penalty parameter which is negative, but in this paper, different from [12], the parameter \(\delta_{k}\) is a variable scalar which is changed according to the different improvements caused by the trial point. Specifically, at the beginning, let \(\delta_{0}=0\), that is what the traditional filter method does, and \(f(x_{k})=l(x_{k})\) (see Figure 1).
Figure 1

Traditional filter with \(\pmb{\delta_{k}=0}\) .

There are four regions in the right-hand half space I, II, III, IV. At the current iterate k, if the trial point \(x_{k}\) moves into the region IV, that means the pair \((h_{k},l_{k})\) is located in region IV, we say that the trial point is rejected according to our criterion. If \(x_{k}\) moves into the region I, II, or III, we accept it, but need to adjust the parameter \(\delta_{k}\) in the criterion. For region III, we say that the algorithm does not make a good improvement, since we do not want to accept points with larger constraint violation. Thus we intent to impose stricter acceptance criterion, that means to increase the value of \(\delta_{k}\), which will result in the bigger reject area and smaller acceptable area (see Figure 2). So update \(\delta_{k}\) as following:
$$ \delta_{k+1}=\min\biggl\{ \rho_{k},\delta_{k}+\biggl\vert \frac {l_{k}-l_{k}^{+}}{h_{k}-h_{k}^{+}}\biggr\vert \biggr\} . $$
Figure 2

Increase \(\pmb{\delta_{k}}\) to impose stricter acceptance criterion.

If \(x_{k}\) moves into the region II, we say that the algorithm makes good improvement since it reduces not only the objective function \(l(x_{k})\) but also the constraint violation \(h(x_{k})\), so we intend to loosen the acceptance criterion to hope for more improvements. That means to decrease the value of \(\delta_{k}\) so that make the reject area become smaller and the acceptable area bigger (see Figure 3). So update \(\delta _{k}\) as following:
$$ \delta_{k+1}=\max\biggl\{ -\rho_{k},\delta_{k}- \biggl\vert \frac {l_{k}-l_{k}^{+}}{h_{k}-h_{k}^{+}}\biggr\vert \biggr\} . $$
Figure 3

Decrease \(\pmb{\delta_{k}}\) to loosen the acceptance criterion.

If \(x_{k}\) moves into region I, we will also accept it because the value of constraint violation does decrease, and we can also accept the increase of \(l(x_{k})\) in finite steps. Meanwhile, the value of \(\delta _{k}\) will not be changed. If \(x_{k}\) moves into region IV, that means this trial point is rejected, and \(\delta_{k}\) also should be remained in the next iterate.

As we all know, because of the non-monotone properties of filter method in a certain degree, it has the good numerical results. Su and Pu [13] also proposed a modified non-monotone filter method to exhibit a further non-monotone technique. Motivated by this, we loosen the acceptance criterion by non-monotonic technique and give the following criteria.

Definition 4

A point x is acceptable to the filter if and only if
$$ h(x)\le\beta\max_{0\le r\le m(k)-1} h_{k-r} \quad \text{or} \quad l(x)\le \max\Biggl[l_{k},\sum_{r=0}^{m(k)-1} \lambda_{kr}l_{k-r}\Biggr]-\gamma h(x), $$
where \((h_{k-r},l_{k-r})\in\mathcal{F}\) for \(0\le r\le m(k)-1\), and \(0\le m(k)\le\min\{m(k-1)+1,M\}\), \(M\geq1\) is a given positive constant, \(\sum_{r=0}^{m(k)-1}\lambda_{kr}=1\), \(\lambda_{kr}\in(0,1)\) and there exists a positive constant λ such that \(\lambda_{kr}\geq\lambda\).

Similar to the traditional filter methods, we also need to update the filter set \(\mathcal{F}\) at each successful iteration, the technique is equivalent to the traditional method with the modified acceptance rule (10).

To control the infeasibility, an upper bound condition of violation function is needed, namely \(h(x)\leq u\), where u is a positive scalar, which can be implemented in the algorithm by initiating the filter with the pair \((u, -\infty)\).

3 A nonmonotone flexible filter algorithm

At the current kth iterate, the trial point \(x_{k}\) is accepted by our algorithm if it satisfies two conditions, first is accepted by the filter set, second is sufficiently reduction. We define the sufficient reduction condition is as following:
$$ \mathrm{rared}_{k}^{l}\geq\eta\, \mathrm{pred}_{k}^{f} \quad \text{and}\quad h_{l(k)}\leq\alpha_{1} \|d_{k} \|_{\infty}^{\alpha_{2}}, $$
where \(\alpha_{1}\), \(\alpha_{2}\) are constants, the relaxed actual reduction \(\mathrm{rared}_{k}^{l}\) and the predicted reduction \(\mathrm{pred}_{k}^{f}\) are defined as
$$\begin{aligned}& \mathrm{rared}_{k}^{l}=l(x_{k}+d_{k})- \max\Biggl\{ l(x_{k}),\sum_{r=0}^{m(k)-1} \lambda_{kr}l_{k-r}\Biggr\} , \end{aligned}$$
$$\begin{aligned}& \mathrm{pred}_{k}^{f}=-g_{k}^{T}d_{k}- \frac{1}{2}d_{k}^{T}H_{k}d_{k}, \end{aligned}$$
$$\begin{aligned}& h_{l(k)}=\max_{0\le r\le m(k)-1} h_{k-r} \end{aligned}$$
and the matrix \(H_{k}\) is the Hessian matrix \(\nabla^{2} f(x_{k})\) or an approximate to it, \(\sum_{r=0}^{m(k)-1}\lambda_{kr}=1\), \(\lambda_{kr}\in (0,1)\), \(0\le m(k)\le\min\{m(k-1)+1,M\}\), \(M\geq1\) is a given positive constant.

A formal description of the algorithm is given as follows.

Algorithm A

Step 0.: 

Let \(0<\rho_{0}<1\), \(0<\gamma<\beta<1\), \(0<\lambda\le 1\), \(0<\gamma_{0}<\gamma_{1}\le1<\gamma_{2}\), \(M\geq1\), \(u>0\), \(\alpha_{1}=\alpha _{2}=0.5\). Choose an initial point \(x_{0}\in R^{n}\), a symmetric matrix \(H_{0}\in R^{n\times n}\) and an initial region radius \(\Delta_{0}\geq\Delta_{\min}>0\), \({\mathcal {F}}_{0}=\{(u,-\infty)\}\). Set \(k=0\), \(m(k)=0\).

Step 1.: 

Solve the subproblem \(Q(x_{k},H_{k},\rho_{k})\), if \(\|d_{k}\| =0\), stop.

Step 2.: 

Let \(x_{k}^{+}=x_{k}+d_{k}\), compute \(h_{k}^{+}\), \(l_{k}^{+}\).

Step 3.: 

If \(x_{k}^{+}\) is acceptable to the filter \({\mathcal {F}}_{k}\), go to step 4, otherwise go to step 5.

Step 4.: 

If \(x_{k}^{+}\) is located in the region I or region IV, let \(\delta_{k+1}=\delta_{k}\), if \(x_{k}^{+}\) is located in the region II, let \(\delta_{k+1}\) is updated by (9), if \(x_{k}^{+}\) is in the region III, let \(\delta_{k+1}\) is updated by (8).

Step 5.: 

If \(\mathrm{rared}_{k}^{l}\leq\eta\, \mathrm{pred}_{k}^{f}\) and \(h_{l}(k)\leq\alpha_{1}\|d_{k}\|_{\infty}^{\alpha_{2}}\), then go to step 6, otherwise go to step 7.

Step 6.: 

Let \(\rho_{k}\in[\gamma_{0}\rho_{k},\gamma_{1}\rho_{k}]\), go to step 1.

Step 7.: 

Let \(x_{k+1}=x_{k}^{+}\), update the filter set. \(\rho_{k+1}\in[\rho_{k},\gamma_{2}\rho_{k}]\geq\rho_{\min}\), update \(H_{k}\) to \(H_{k+1}\), \(m(k+1)=\min\{m(k)+1,M\}\), \(k=k+1\) and go to step 1.

Remark 1

At the beginning of each iteration, we always set \(\rho_{k}\geq\rho_{\min}\), which will avoid too small trust region radius.

Remark 2

In above algorithm, let M be a nonnegative integer. For each k, let \(m(k)\) satisfy
$$m(0)=0,\qquad 0\le m(k)\le \min\bigl\{ m(k-1)+1,M\bigr\} \quad \text{for } k \geq1. $$
In fact, if \(M=1\), the algorithm actual is a monotone method, the nonmonotonicity is showed as \(M>1\).

4 The convergent properties

In this section, to present a proof of global convergence of algorithm, we always assume that following conditions hold.


  1. A1.

    The objective function f and the constraint functions \(c_{i}\) (\(i\in I=\{1,2,\ldots,m\}\)) are twice continuously differentiable.

  2. A2.

    For all k, \(x_{k}\) and \(x_{k}+d_{k}\) all remain in a closed, bounded convex subset \(S\subset R^{n}\).

  3. A3.

    The matrix sequence \(\{H_{k}\}\) is uniformly bounded.

  4. A4.

    The functions \(A=\nabla c\) are uniformly bounded on S.


By the above assumptions, we can suppose that there exist constants \(v_{1}\), \(v_{2}\), \(v_{3}\) such that \(\|f(x)\|\le v_{1}\), \(\|\nabla f(x)\|\le v_{1}\), \(\|\nabla^{2} f(x)\|\le v_{1}\), \(\|c(x)\|\le v_{2}\), \(\|\nabla c(x)\|\le v_{2}\), \(\|\nabla^{2} c(x)\|\le v_{2}\).

Definition 5


The Mangasarian-Fromowitz constraint qualification (MFCQ) is said to be satisfied at a point \(x\in{\mathbb{R}}^{n}\) with respect to the underlying constraint system \(g(x)\leq0\), if there is a \(z\in\mathbb{R}^{n}\) such that
$$ \nabla c_{i}(x)^{T}z< 0, \quad i\in\bigl\{ i: c_{i}(x)\geq0, i\in I\bigr\} . $$

Lemma 2


Let Assumptions hold, and let be a feasible point of problem (P) at which MFCQ holds but which is not a KKT point. Then there exists a neighborhood N of and positive constants \(\xi_{1}\), \(\xi_{2}\), \(\xi_{3}\) such that for all \(x_{k}\in N\cap S\) and all \(\rho_{k}\) for which
$$ \xi_{2} h_{k}\leq\rho_{k}\leq\xi_{3} $$
it follows that SQP subproblem has a feasible solution \(d_{k}\), and the predicted reduction satisfies
$$ \mathrm{pred}_{k}^{f}\geq\frac{1}{3}\rho_{k} \xi_{1}. $$
If \(\rho_{k}\leq(1-\eta_{3})\xi_{1}/3nv_{2}\), then
$$ f(x_{k})-f\bigl(x_{k}^{+}\bigr)\geq\eta\, \mathrm{pred}_{k}^{f}, $$
where \(\eta<\eta_{3}\).

If \(h_{k}>0\) and \(\rho_{k}\leq\sqrt{\frac{2\beta h_{k}}{n^{2}v_{2}}}\) then \(h(x_{k}^{+})\leq\beta h_{k}\).

Lemma 3

Suppose that Assumptions hold, then Algorithm A is well defined.


We will show that the trial point \(x_{k}^{+}\) is acceptable to the filter when \(\rho_{k}\), is small enough. We consider the following two cases.

Case 1. \(h_{k}= 0\).

To prove the implementation of Algorithm A, we have to show for all k such that \(\rho_{k}\leq\delta\) it holds \(\mathrm{rared}_{k}^{l}\geq\eta\, \mathrm {pred}_{k}^{f}\). We know \(\mathrm{ared}_{k}^{l}=l(x_{k})-l(x_{k}^{+})\).

In fact,
$$\begin{aligned} \bigl\vert \mathrm{ared}_{k}^{l}- \mathrm{pred}_{k}^{f}\bigr\vert =& \biggl\vert l(x_{k})-l\bigl(x_{k}^{+}\bigr)+g_{k}^{T}d_{k}+ \frac{1}{2}d_{k}^{T}H_{k}d_{k}\biggr\vert \\ =& \biggl\vert f(x_{k})+\delta_{k}h(x_{k})-f \bigl(x_{k}^{+}\bigr)-\delta _{k+1}h(x_{k}+d_{k})+g_{k}^{T}d_{k}+ \frac{1}{2}d_{k}^{T}H_{k}d_{k}\biggr\vert \\ \le&\biggl\vert \frac{1}{2}d_{k}^{T}\bigl( \nabla^{2} f(y_{k})-H_{k}\bigr)d_{k}+ \delta_{k}h(x_{k})-\delta _{k+1}h \bigl(x_{k}^{+}\bigr)\biggr\vert \\ \le& \rho_{k}^{2}\frac{1}{2}\bigl\Vert \nabla^{2} f(y_{k})-H_{k}\bigr\Vert +\vert \delta_{k+1}\vert h\bigl(x_{k}^{+}\bigr), \end{aligned}$$
where \(y_{k}=x_{k}+\xi d_{k}\), \(\xi\in(0,1)\) denotes some point on the line segment from \(x_{k}\) to \(x_{k}^{+}\). By the update of \(\delta_{k}\) and the definition of \(h(x_{k}^{+})\), we know \(|\delta_{k+1}|\leq\rho_{k}\),
$$h\bigl(x_{k}^{+}\bigr)=\bigl\Vert c^{+}(x_{k}+d_{k}) \bigr\Vert =\biggl\Vert c^{+}(x_{k})+A(x_{k})d_{k}+ \frac{1}{2}d_{k}^{T}\nabla ^{2}c(s_{k})d_{k} \biggr\Vert \leq\nu_{2}\rho_{k}+\frac{1}{2} \nu_{2}\rho_{k}^{2}, $$
where \(s_{k}\) denotes some point in the line from \(x_{k}\) to \(x_{k}^{+}\).
Hence we obtain that
$$\begin{aligned} \bigl\vert \mathrm{ared}_{k}^{l}-\mathrm{pred}_{k}^{f} \bigr\vert \le& \rho_{k}^{2}b+\rho_{k}\biggl( \nu_{2}\rho_{k}+\frac{1}{2}\nu_{2} \rho_{k}^{2}\biggr) \\ \leq&\biggl(b+\nu_{2}\biggl(1+\frac{1}{2}\delta\biggr)\biggr) \rho_{k}^{2}, \end{aligned}$$
where \(b=\frac{1}{2}(\sup \|H_{k}\|+\max_{x\in S} \|\nabla^{2}f(x)\|)\), together with Lemma 2
$$\bigl\vert \mathrm{pred}_{k}^{f}\bigr\vert =\biggl\vert -g_{k}^{T}d_{k}-\frac{1}{2}d_{k}^{T}H_{k}d_{k} \biggr\vert \geq\frac{1}{3}\xi _{1}\rho_{k}. $$
We have
$$ \biggl\vert \frac{\mathrm{ared}_{k}^{l}-\mathrm{pred}_{k}^{f}}{\mathrm{pred}_{k}^{f}}\biggr\vert \le\frac{(b+\nu_{2}(1+\frac{1}{2}\delta))\rho_{k}^{2}}{ \frac{1}{3}\xi_{1}\rho_{k}}\rightarrow 0 \quad \text{as } \rho_{k}\rightarrow0. $$
We deduce that \(\mathrm{rared}_{k}^{l}\geq\mathrm{ared}_{k}^{f}\geq\eta\, \mathrm{pred}_{k}^{f}\) for some \(\eta\in(0,1)\), since
$$ \mathrm{rared}_{k}^{l}=\max\Biggl\{ l(x_{k}),\sum _{r=0}^{m(k)-1}\lambda_{kr}l_{k-r} \Biggr\} -l\bigl(x_{k}^{+}\bigr)\geq l(x_{k})-l \bigl(x_{k}^{+}\bigr)=\mathrm {ared}_{k}^{l}. $$
By \(\max\{l(x_{k}),\sum_{r=0}^{m(k)-1}\lambda_{kr}l_{k-r}\}-l(x_{k}^{+})\geq \eta\, \mathrm{pred}_{k}^{f}>\gamma h(x_{k})\), we can see
$$l\bigl(x_{k}^{+}\bigr)\leq\max\Biggl\{ l(x_{k}),\sum _{r=0}^{m(k)-1}\lambda_{kr}l_{k-r}\Biggr\} -\gamma h(x_{k}), $$
so \(x_{k}^{+}\) is acceptable to the filter.

Case 2. \(h_{k}>0\).

There exists a constant \(\delta>0\) and \(k_{0}\) such that \(\rho_{k}\leq\delta \) when \(k< k_{0}\). Let \(\delta=\sqrt{\frac{2\beta h_{k}}{n^{2}M_{2}}}\) by Lemma 2, we have \(h_{k}^{+}\leq\beta h_{k}\), that is \(h_{k}^{+}\leq\beta\max_{0\leq j\leq m(k)-1}\{h_{k-j}\}\). So \(x_{k}^{+}\) must be acceptable to the filter by the definition.

With the similar analysis to case 1, we have
$$\begin{aligned} \bigl\vert \mathrm{ared}_{k}^{l}-\mathrm{pred}_{k}^{f} \bigr\vert \le&\biggl\vert \frac{1}{2}d_{k}^{T} \bigl(\nabla^{2} f(y_{k})-H_{k}\bigr)d_{k}+ \sigma_{k}h(x_{k})-\sigma _{k+1}h \bigl(x_{k}^{+}\bigr)\biggr\vert \\ \le& \rho_{k}^{2}\frac{1}{2}\bigl\Vert \nabla^{2} f(y_{k})-H_{k}\bigr\Vert +\vert \sigma _{k}\vert h(x_{k})+\vert \sigma_{k+1} \vert h\bigl(x_{k}^{+}\bigr) \\ \le& \rho_{k}^{2}b+\rho_{k}\bigl(1+\beta h(x_{k})\bigr) \\ \le& \biggl(b+\frac{1+\beta}{\xi_{2}}\biggr)\rho_{k}^{2}. \end{aligned}$$
Then it holds
$$ \biggl\vert \frac{\mathrm{ared}_{k}^{l}-\mathrm{pred}_{k}^{f}}{\mathrm{pred}_{k}^{f}}\biggr\vert \le\frac{(b+\frac{1+\beta}{\xi_{2}})\rho_{k}^{2}}{ \frac{1}{3}\xi_{1}\rho_{k}}\rightarrow 0 \quad \text{as } \rho_{k}\rightarrow0.$$
The conclusion follows. This is the end of proof. □

Lemma 4

Suppose that Assumptions hold and Algorithm A does not terminated finitely, then \(\lim_{k\rightarrow\infty}h_{k}=0\).


If Algorithm A can not be terminate finitely, then there are infinite many points accepted by the filter. We prove the result in two cases by the definition of filter.
  1. (i)

    \(K_{1}=\{k|h_{k}^{+}\le\beta\max_{0\le r\le m(k)-1}h_{k-r}\}\) is an infinite set.

  2. (ii)

    \(K_{2}=\{k|l_{k}^{+}\le\max[l_{k},\sum_{r=0}^{m(k)-1}\lambda_{kr} l_{k-r}]-\gamma h_{k}\}\) is an infinite set.

In view of convenience, let
$$h(x_{l(k)})=\max_{0\le r\le m(k)-1}h_{k-r}, $$
where \(k-m(k)+1\le l(k)\le k\).
(i) Since \(m(k+1)\le m(k)+1\), we have
$$\begin{aligned} h(x_{l(k+1)}) =&\max_{0\le r\le m(k+1)-1}\bigl[h(x_{k+1-r}) \bigr] \\ \le& \max_{0\le r\le m(k)}\bigl[h(x_{k+1-r})\bigr] \\ =&\max\bigl\{ h(x_{l(k)}),h(x_{k+1})\bigr\} \\ =&h(x_{l(k)}) \end{aligned}$$
which implies that \(\{h(x_{l(k)})\}\) converges. Then by \(h(x_{k+1})\le \beta\max_{0\le r\le m(k)-1}[h(x_{k-r})]\), we have
$$ h(x_{l(k)})\le\beta h(x_{l(l(k)-1)}). $$
Since \(\beta\in(0,1)\), we deduce that \(h(x_{l(k)})\rightarrow 0\) (\(k\rightarrow\infty\)).
$$h(x_{k+1})\le\beta h(x_{l(k)})\rightarrow0 $$
holds by Algorithm A. That is \(\lim_{k\rightarrow\infty}h(x_{k})=0\).
(ii) We first show that for all \(k\in S\), it holds
$$ l_{k}\le l_{0}-\lambda\gamma\sum _{r=0}^{k-2}h_{r}-\gamma h_{k-1}\le l_{0}-\lambda\gamma\sum_{r=0}^{k-1}h_{r}. $$
We prove this by induction.

If \(k=1\), we have \(l_{1}\le l_{0}-\gamma h_{0}\le l_{0}-\lambda\gamma h_{0}\).

Assume that (27) holds for \(1,2,\ldots,k\), then we consider (27) holds for \(k+1\) in the following two cases.

Case 1. \(\max[l_{k},\sum_{r=0}^{m(k)-1}\lambda_{kr} l_{k-r}]=l_{k}\),
$$ l_{k}^{+}\le l_{k}-\gamma h_{k}\le l_{0}- \lambda\gamma\sum_{r=0}^{k-1}h_{r}- \gamma h_{k}\le l_{0}-\lambda\gamma\sum _{r=0}^{k}h_{r}. $$

Case 2. \(\max[l_{k},\sum_{r=0}^{m(k)-1}\lambda_{kr} l_{k-r}]=\sum_{r=0}^{m(k)-1}\lambda_{kr} l_{k-r}\).

Let \(p=m(k)-1\), then
$$\begin{aligned} l_{k+1} \le&\sum_{t=0}^{p} \lambda_{kt}l_{k-t}-\gamma h_{k} \\ \le& \sum_{t=0}^{p}\lambda_{kt} \Biggl(l_{0}-\lambda\gamma\sum_{r=0}^{k-t-2}h_{r}- \gamma h_{k-t-1}\Biggr) -\gamma h_{k} \\ =&\lambda_{k0}\Biggl(l_{0}-\lambda\gamma\sum _{r=0}^{k-p-2}h_{r}-\lambda\gamma \sum _{r=k-p-1}^{k-2} h_{r}-\gamma h_{k-1}\Biggr)-\gamma h_{k} \\ &{}+\lambda_{k1}\Biggl(l_{0}-\lambda\gamma\sum _{r=0}^{k-p-2}h_{r}-\lambda\gamma \sum _{r=k-p-1}^{k-3} h_{r}-\gamma h_{k-2}\Biggr) \\ &{}+\cdots +\lambda_{kp}\Biggl(l_{0}-\lambda\gamma\sum _{r=0}^{k-p-2}h_{r}-\gamma h_{k-p-1}\Biggr) \\ \le&\sum_{t=0}^{p}\lambda_{kr}l_{0}- \lambda\gamma\sum_{r=0}^{k-p-2} \Biggl(\sum _{t=0}^{p}\lambda_{kr} \Biggr)h_{r} -\sum_{t=0}^{p} \lambda_{kr}\gamma h_{k-t-1}-\gamma h_{k}. \end{aligned}$$
By the fact that \(\sum_{t=0}^{p}\lambda_{kt}=1\), \(\lambda_{kt}\geq\lambda\), and \(h_{r}\geq0\), we have
$$\begin{aligned} l_{k+1} \le&l_{0}-\lambda\gamma\sum _{r=0}^{k-p-2} h_{r} -\lambda\gamma\sum _{r=k-p-1}^{k-1} h_{r}-\gamma h_{k} \\ =&l_{0}-\lambda\gamma\sum_{r=0}^{k-1}h_{r}- \gamma h_{k} \\ \le&l_{0}-\lambda\gamma\sum_{r=0}^{k}h_{r}. \end{aligned}$$
Then for all \(k\in S\), (27) holds.
Moreover, since \(\{l_{k}\}\) is bounded below, let \(k\rightarrow\infty\), we can get that
$$\lambda\gamma\sum_{r=0}^{\infty}h_{r}< \infty. $$
It follows that \(h_{k}\rightarrow0\) (\(k\rightarrow\infty\)). □

Lemma 5

Suppose that Assumptions hold. If Algorithm A does not termination finitely, then \(\lim_{k\rightarrow \infty}\|d_{k}\|=0\).


Suppose by contradiction that there exist constants \(\epsilon>0 \) and \(\bar{k}>0\) such that \(\|d_{k}\|>\epsilon\) for all \(k>\bar{k}\).

Then by Lemma 2, \(\mathrm{pred}_{k}^{f}>\frac{1}{3}\xi_{1}\|\rho_{k}\|>\frac {1}{3}\xi_{1}\|d_{k}\|>\frac{1}{3}\xi\epsilon>0\), because of \(\mathrm{rared}_{k}^{l}\geq\eta\, \mathrm{pred}_{k}^{f}\), we have \(\max[l_{k},\sum_{r=0}^{m(k)-1}\lambda_{kr} l_{k-r}]-l_{k+1}\geq\eta\, \mathrm{pred}_{k}^{f}\). We take the sum at the both sides, together with the sequence \({l_{k}}\) is bounded below, we have \(\eta\sum\mathrm{pred}_{k}^{f}<\infty\), that follows \(\mathrm{pred}_{k}^{f}\rightarrow0\) as \(k\rightarrow\infty\), which contradicts to \(\mathrm{pred}_{k}^{f}>0\). Hence the conclusion follows. □

Theorem 1

Suppose \(\{x_{k}\}\) is an infinite sequence generated by Algorithm A. Then every cluster point of \(\{x_{k}\}\) is a KKT point of problem (P).

5 Numerical results

In this section, we give some numerical experiments to show the success of our proposed method. All examples are chosen from [18] and [19].
  1. (1)
    [2] Updating of \(H_{k}\) is done by
    $$H_{k+1}=H_{k}+ \frac{y_{k}^{T}y_{k}}{y_{k}^{T}s_{k}}-\frac{H_{k}s_{k}s_{k}^{T}H_{k}}{s_{k}^{T}H_{k}s_{k}}, $$
    where \(y_{k}=\theta_{k}\hat{y}_{k}+(1-\theta_{k})H_{k}s_{k}\),
    $$ \theta_{k}=\left \{ \textstyle\begin{array}{l@{\quad}l} 1,& s_{k}^{T}\hat{y}_{k}\geq0.2s_{k}^{T}H_{k}s_{k}, \\ \frac{0.8s_{k}^{T}H_{k}s_{k}}{s_{k}^{T}H_{k}s_{k}-s_{k}^{T}\hat{y}_{k}}, &\text{otherwise} \end{array}\displaystyle \right . $$
    and \(\hat{y}_{k}=g_{k+1}-g_{k}\), \(s_{k}=x_{k+1}-x_{k}\).
  2. (2)

    We assume the error tolerance is 10−6.

  3. (3)

    The algorithm parameters were set as follows: \(H_{0}=I\in R^{n\times n}\), \(\beta=0.9\), \(\gamma=0.1\), \(\rho=0.5\), \(\alpha_{1}=\alpha_{2}=0.5\), \(\sigma_{0}=-0.1\), \(\Delta_{\min}=10^{-6}\), \(\Delta_{0}=1\). The program is written in Matlab.

In Table 1, the problems are numbered in the same way as in Hock and Schittkowski [18] and Schittkowski [19]. For example, ‘HS2’ is the problem 2 in Hock and Schittkowski [18] and ‘S216’ is the problem 216 in Schittkowski [19]. Some equality constrained problems are also included in our test problems, such as S216, S235, S252 and so on. NF, NG represent the number of function and gradient calculations respectively. In Table 1, the results in first column are calculated by Algorithm A, those in second column are calculated by traditional filter method, which are shown in [3], those in third column are calculated by Matlab function ‘fmincon’, compared the three methods, our algorithm has a smaller number of function calculations and gradient calculations.
Table 1

The numerical results of different algorithm


Algorithm A (NG-NF)

Filter (NG-NF)

Matlab (NF)

















































































































To show the effect of the non-monotone method, we also list the numerical results in Table 2, these tests are done for \(M=1\), \(M=3\) and \(M=10\) respectively, that means the degree of non-monotonicity is increasing.
Table 2

The results of different M in our algorithm ( i.e. using different degree of nonmonotone)


M  = 1 (NG-NF)

M  = 3 (NG-NF)

M  = 10 (NF)













































First numerical results show that the nonmonotone algorithm is more effective than monotone one for most test examples and our algorithm is effective and satisfactory.

6 Conclusions

In our method, the criterion used to test the trial points is flexible, the refuse region is variable according to the different improvement made by the previous trial point, while in the traditional filter methods, the elements in the filter structure are fixed. By the numerical results, we also find the new method has more effective results and less computational costs than not only the traditional methods but also the Matlab algorithms. Moreover, the use and adjustment of the self-adaptive parameter in our method is a good way to balance the value of objective function and the violation constraint function. On the other hand, the application of non-monotone in criterion avoids the Maratos effect to a certain degree, because more trial points are accepted by the filter according to the algorithm. We also compared the results of different nonmonotonic degree, although we can not decide which value of M is the best one, at least the results with nonmonotone is better than that with monotone technique.



This research is supported by the National Natural Science Foundation of China (No. 11101115), the Natural Science Foundation of Hebei Province (No. 2014201033) and the Key Research Foundation of Educational Bureau of Hebei Province (No. ZD2015069). In addition, we would like to show our deepest gratitude to the editor and the anonymous reviewers who have helped to improve the paper.

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors’ Affiliations

College of Mathematics and Computer Science, Hebei University


  1. Zhang J. A robust trust region method for nonlinear optimization with inequality constraint. Appl Math Comput. 2006;176:688-99. MathSciNetMATHGoogle Scholar
  2. Nocedal J, Wright S. Numerical optimization. New York: Springer; 1999. View ArticleMATHGoogle Scholar
  3. Fletcher R, Leyffer S. Nonlinear programming without a penalty function. Math Program. 2002;91:239-69. MathSciNetView ArticleMATHGoogle Scholar
  4. Chin C, Fletcher R. On the global convergence of an SLP-filter algorithm takes EQP steps. Math Program. 2003;96:161-77. MathSciNetView ArticleMATHGoogle Scholar
  5. Fletcher R, Gould N, Leyffer S, Toint P, Wachter A. A global convergence of a trust region SQP-filter algorithm for general nonlinear programming. SIAM J Optim. 2002;13:635-60. MathSciNetView ArticleMATHGoogle Scholar
  6. Fletcher R, Leyffer S, Toint P. On the global convergence of a filter-SQP algorithm. SIAM J Optim. 2002;13:44-59. MathSciNetView ArticleMATHGoogle Scholar
  7. Ulbrich M, Ulbrich S, Vicente L. A global convergent primal-dual interior-point method for nonconvex nonlinear programming. Math Program. 2004;100:379-410. MathSciNetView ArticleMATHGoogle Scholar
  8. Audet C, Dennis J. A pattern search filter method for nonlinear programming without derivatives. SIAM J Optim. 2004;14:980-1010. MathSciNetView ArticleMATHGoogle Scholar
  9. Karas E, Riberio A, Sagastizabalc C, Solodov M. A bunble filter method for nonsmooth convex constrained optimization. Math Program. 2009;116:297-320. MathSciNetView ArticleGoogle Scholar
  10. Ulbrich M, Ulbrich S. Nonmonotone trust region methods for nonlinear equality constrained optimization without a penalty function. Math Program. 2003;95:103-35. MathSciNetView ArticleMATHGoogle Scholar
  11. Ulbrich S. On the superlinear local convergence of a filter-SQP method. Math Program. 2004;100:217-45. MathSciNetMATHGoogle Scholar
  12. Nie P, Ma C. A trust region filter method for general nonlinear programming. Appl Math Comput. 2006;172:1000-17. MathSciNetMATHGoogle Scholar
  13. Su K, Pu D. A nonmonotone filter trust region method for nonlinear constrained optimization. J Comput Appl Math. 2009;223:230-9. MathSciNetView ArticleMATHGoogle Scholar
  14. Chen Z. A penalty-free-type nonmonotone trust region method for nonlinear constrained optimization. Appl Math Comput. 2006;173:1014-46. MathSciNetMATHGoogle Scholar
  15. Chen Z, Zhang X. A nonmonotone trust region algorithm with nonmonotone penalty parameters for constrained optimization. J Comput Appl Math. 2004;172:7-39. MathSciNetView ArticleMATHGoogle Scholar
  16. Gould N, Toint P. Global convergence of a non-monotone trust-region SQP-filter algorithm for nonlinear programming. Nonconvex Optim Appl. 2006;82:125-50. MathSciNetView ArticleMATHGoogle Scholar
  17. Zhou G. A modified SQP method and its global convergence. J Glob Optim. 1997;11:193-205. MathSciNetView ArticleMATHGoogle Scholar
  18. Hock W, Schittkowski K. Test examples for nonlinear programming codes. Lecture notes in economics and mathematics system. New York: Springer; 1981. View ArticleMATHGoogle Scholar
  19. Schittkowski K. More test examples for nonlinear mathematical programming codes. New York: Springer; 1987. View ArticleMATHGoogle Scholar


© Su et al. 2016