CONVERGENCE OF QUASI-NEWTON METHODS FOR SOLVING CONSTRAINED GENERALIZED EQUATIONS

. In this paper, we focus on quasi-Newton methods to solve constrained generalized equations. As is well-known, this problem was ﬁrstly studied by Robinson and Josephy in the 70’s. Since then, it has been extensively studied by many other researchers, specially Dontchev and Rockafellar. Here, we propose two Broyden-type quasi-Newton approaches to dealing with constrained generalized equations, one that requires the exact resolution of the subproblems, and other that allows inexactness, which is closer to numerical reality. In both cases, projections onto the feasible set are also inexact. The local convergence of general quasi-Newton approaches is established under a bounded deterioration of the update matrix and Lipschitz continuity hypotheses. In particular, we prove that a general scheme converges linearly to the solution under suitable assumptions. Furthermore, when a Broyden-type update rule is used, the convergence is superlinearly. Some numerical examples illustrate the applicability of the proposed methods


Introduction
All of our study and contributions are focused on the problem known as the Constrained Generalized Equation.
Basically, it consists of finding x ∈ X such that where f : Ω → Y is a continuously differentiable function, X and Y are Banach spaces, Ω ⊆ X is an open set, C ⊂ Ω is a nonempty closed convex set, and F : Ω ⇒ Y is a multifunction with a closed nonempty graph.
Generalized equations were first proposed by Robinson [36].In that work, the author deals with the unconstrained problem, which aims to find x ∈ X such that 0 ∈ f (x) + F (x), (1.2) where f and F are basically the same as in problem (1.1).This problem differs from (1.1) by the absence of the constraint x ∈ C. Josephy [30] shows that several problems can be rewritten as in (1.2), namely, the general nonlinear optimization, the variational inequality and the equilibrium problems.In the last ten years, many researchers have devoted their efforts to studying the application of Newton's method and its variants to solve (1.2), see for instance [6, 7, 10, 19, 23-27, 30, 35, 37, 38].In particular, we highlight the important contributions of Dontchev [6,7,24], Adly [2,4], Bonnans [10], Ferreira [14,25,27] and their collaborators.
The problem addressed in this paper appeared in a recent work by Oliveira et al. [14], where Newton's method for solving (1.1) was considered.The presence of the constrained set C allows us to write, in addition to the problems already mentioned previously, others in the form (1.1).For instance, the Constrained Variational Inequality Problem (CVIP) find x ∈ U ∩ V such that f (x), y − x ≥ 0 for all y ∈ U, U, V ⊂ X closed convex sets, can be stated as where N U is the normal cone associated to U .Problem (1.3) has been extensively studied over the past ten years, see for instance [11,29].Another important equivalence to constrained generalized equations is the Split Variational Inequality Problem (SVIP), stated as follows: let U, V ⊂ Y be nonempty, closed convex sets, and A : X → Y be a linear operator.Let f : X → X and g : Y → Y be functions.Then SVIP consists of find x * ∈ U such that f (x * ), x − x * ≥ 0, for all x ∈ U, and such that y * = Ax * ∈ V satisfies g(y * ), y − y * ≥ 0 for all y ∈ V.
Taking D := U × V and V := {w = (x, y) ∈ X × Y | Ax = y}, SVIP is equivalent to the following CVIP ( [11], Lem.5.1): find w * ∈ D ∩ V such that h(w * ), w − w * ≥ 0 for all w ∈ D, where w = (x, y) and h(x, y) := (f (x), g(y)).In turn, this CVIP is equivalent to the following constrained generalized equation: It is known that SVIP includes several optimization problems, for instance, Split Minimization Problem and Common Solutions to Variational Inequalities Problem.For more details about these problems, see [1,11,12,29,34].Artacho et al. [6] studied a quasi-Newton method for the unconstrained problem (1.2).The authors considered the following iterative scheme: where {B k } is a sequence of bounded linear mappings between Banach spaces X and Y satisfying the classical Broyden update rule.They proved that if the multifunction f + F is metrically regular at x * for 0 and the derivative mapping f is Lipschitz continuous, then the sequence {x k } generated by (1.4) is linearly convergent to x * , see Theorem 4.3 of [6].More generally, Adly and Huynh [3] introduced quasi-Newton schemes like (1.4) for solving (1.2), allowing f possibly not differentiable.In this case, the authors assume the regularity metric condition with respect to a kind of semismooth regularization of f + F .They proved that if B k satisfies a suitable modified Broyden update, the sequence {x k } generated by (1.4) is linearly convergent to a solution x * of (1.2) ( [3], Thm.4.3).Similar approach was employed in [8,33].
In this paper, we propose two quasi-Newton schemes to solve the constrained generalized equation (1.1).The first one is based on the following idea: given x 0 ∈ C and the initial B 0 , near to f (x * ), we compute, in each iteration, an intermediate point y k such that Since y k can be infeasible, that is, y k ∈ C, we project it onto C by an inexact projection procedure, obtaining a new iterate x k+1 almost feasible.Thus we prove that under suitable assumptions the main sequence {x k } converges linearly to a solution of (1.1).The second method follows an analogous idea.The difference is that (1.5) is allowed to be solved inexactly.Specifically, y k only must be in an open ball centred at This strategy is more suitable to be implemented, since solving (1.5) exactly can be practically impossible even in simple problems.Naturally, allowing inexact solutions leads to a more complicated convergence theory, which is addressed in Section 4. This paper is divided into two parts.In the first one, we use the quasi-Newton approach (1.5) to find a solution of problem (1.1) with B k satisfying the classical Broyden update rule.By assuming the regularity metric of the multifunction f + F at x * to 0, we show that the sequence {x k } generated by (1.5) is linearly convergent to x * .Firstly, we suppose that B k satisfies a bounded deterioration condition to obtain a general convergence result.As a particular case, we show that the Broyden update satisfies this bounded deterioration.It is worth mentioning that the used bounded deterioration condition on B k was previously considered in [3,6].Furthermore, we use the fact that Lipschitz properties of the multifunction F −1 are inherited by the multifunction (f + F ) −1 , as demonstrated by Dontchev and Hager in [21].In the second part, we address the problem (1.1) using an inexact approach and similar ideas proposed by Dontchev and Rockafellar in [24].The proposed inexact quasi-Newton method is described by where R k : X ⇒ Y is a sequence of multifunctions with closed graphs representing the inexactness.It is not difficult to see that if is the open ball centered at 0 and radius η k f (x) ), then the iterative scheme (1.6) reduces to which can be seen as an inexact quasi-Newton method for solving f (x) = 0, x ∈ C.Then, assuming the multifunction f + F metrically regular at x * for 0, R k partially Aubin continuous, and d(0, R k (u, x * )) fulfilling a suitable boundedness property, we show that the sequence {x k } generated by (1.6) is linearly convergent to x * , with B k satisfying the Broyden update.It should be mentioned that inexact quasi-Newton methods for solving the unconstrained problem (1.2) are considered in [13].This work is organized as follows.In Section 2, we present the notations and basic necessary concepts.In Section 3, we present the first quasi-Newton algorithm, where subproblems are solved exactly, and its local convergence analysis.Section 4 is devoted to the inexact quasi-Newton algorithm and its convergence.In Section 5, we discuss the important particular case of Broyden-type methods.Numerical experiments are presented in Section 7, illustrating the theory.Finally, Section 8 brings our conclusions.

Preliminaries
In this section, we briefly present the basic concepts that we will use throughout the work.A detailed presentation can be found in [23].
Firstly we establish some notations.Unless otherwise stated, X and Y are Banach spaces.A generic norm will be denoted by . .The sets will denote the open and closed balls of radius δ > 0, centered at x, respectively.The set R + is the set of all non-negative real numbers.The vector space of all continuous linear mappings A : X → Y will be denoted by L(X, Y), and the norm of A ∈ L(X, Y) is defined by Let Ω ⊆ X be an open set and f : Ω → Y be Fréchet differentiable at all x ∈ Ω (the Fréchet derivative of f at x is the continuous linear mapping f In the context of generalized equations, it is common to consider some regularity condition over F .Here, we will use the following notions: Definition 2.1.Let Ω ⊆ X be an open and nonempty set.We say that the multifunction G : Ω ⇒ Y is metrically regular at x ∈ Ω for ū ∈ Y with modulus λ > 0 when ū ∈ G(x), and there exist a > 0 and b > 0 such that B a [x] ⊂ Ω and strongly metrically subregular at x ∈ Ω for ū ∈ Y with modulus λ > 0 when ū ∈ G(x), and there exists a > 0 such that It is easy to see that strong metric subregularity of G at x for ū implies that x is an isolated point in G(ū).Another intermediate concept is the metric subregularity, which consists in relaxing metric regularity by requiring (2.2) with u = ū fixed.Remark 2.2.It is known that a multifunction Γ : X ⇒ Y is metrically regular at x ∈ X for ȳ ∈ Y with modulus λ > 0 if and only if Γ −1 : Y ⇒ X has the Aubin property at ȳ for x with the same constant λ, i.e., e(Γ −1 (y) ∩ X , Γ −1 (y )) ≤ λ y − y for all y, y ∈ Y, where X and Y are neighborhoods of x and ȳ, respectively.See Theorem 5A.3, p. 255 of [23].
The next result establishes a connection between the metric regularity of f + F and the Aubin property of an associated map, which proof is analogous to that presented in [24].Proposition 2.3.Let ζ > 0 and assume that the multifunction f + F is metrically regular at x for 0 with modulus λ > 0, where λζ < 1.Let u ∈ X, B u some approximation to f (u) and consider the multifunction where the operator B u is such that B u − f (x) ≤ ζ.Then, for every κ > λ/(1 − λζ), there exist positive numbers a and b such that Another important result is a generalization of the contraction mapping principle for set-valued mappings, stated below.It will be useful to prove the convergence of the quasi-Newton method in the next section.Its proof can be found in Theorem 5E.2, p. 313 of [23].
In the sequel, we present the feasible inexact projection used in our proposed algorithms, as well as some of their properties of interest.This type of projection was used in [14] within a Newton method for constrained generalized equations over Euclidean spaces.See also [28].Definition 2.5.Let θ ≥ 0, C ⊂ X be a closed convex set and x ∈ C. The feasible inexact projection mapping relative to x with error tolerance θ, denoted by P C (•, x, θ) : X ⇒ C, is the multifunction (2.5) We say that w ∈ P C (y, x, θ) is a feasible inexact projection of y onto C with respect to x and with error tolerance θ.
Remark 2.6.It follows from Proposition 2.1.3,p. 201 of [9] that, for each y ∈ X, the exact projection P C (y) is a vector in P C (y, x, θ).Hence, P C (y, x, θ) is nonempty for all y ∈ X and x ∈ C.
Finally, we enunciate an useful version of the Aubin property suitable for multifunctions with two blocks of variables.We say that a multifunction T : V × W ⇒ S is partially Aubin continuous at (v, w) ∈ V × W with respect to w uniformly in v for s ∈ S with modulus λ > 0 [23] (or simply, partially Aubin continuous at (v, w) w.r.t.w for s with modulus λ > 0) if s ∈ T (v, w) and there are neighborhoods V of v, W of w and S of s such that e(T (v, w) ∩ S, T (v, w )) ≤ λ w − w for all v ∈ V, w, w ∈ W.

The quasi-Newton method and its local convergence analysis
In this section, we propose the first quasi-Newton method for solving (1.1).Here, it is required that the auxiliary iterate y k be an exact solution of the correspondent unconstrained subproblem 0 ∈ f (x k ) + B k (y − x k ) + F (y).As we already mentioned, y k may be infeasible.So, an inexact projection onto C is employed to achieve feasibility at the limit, in the spirit of Definition 2.5.Algorithm 1 below formalizes this idea.
Remark 3.1.The projection in Step 3 can be computed as an approximate feasible solution of the problem In [5], a Frank-Wolfe algorithm is design to compute such a projection.The choice of θ k will be detailed in Section 7.
Next, we state the local convergence of the QN-InexP method.This is the main result of this section.The point x * will always refer to a solution of (1.1).Theorem 3.2.As in (1.1), let Ω ⊂ X be an open set, f : Ω → Y be a Fréchet differentiable function, F : Ω ⇒ Y be a multifunction with closed graph and C ⊂ Ω be a nonempty closed convex set.Furthermore, let x * such that 0 ∈ f (x * ) + F (x * ), x * ∈ C. Suppose the following conditions hold: (i) f + F is metrically regular at x * for 0 with modulus λ > 0; (ii) there exist > 0 and a neighborhood X of x * such that (iv) θ k ≥ 0 for all k ≥ 0 and θ := sup θ k < 1 2 ; (v) there exists a constant c > 0 such that, for each k ≥ 0, B k+1 satisfies the bounded deterioration condition Then there exists a neighborhood U of x * such that, starting from any x 0 ∈ C ∩ U \{x * }, there is a sequence {x k } ⊂ C ∩ U generated by the QN-InexP method that converges linearly to x * .
Proof.Let us consider the radii a > 0 and b > 0 associated with the metric regularity of f + F (see Def. 2.1).Taking λ > λ, we can assume without loss of generality that a is small enough to B a (x * ) ⊂ X , where δ We define the constant It is immediate from (3.4) that 0 < γ < 1.In the sequel, we use induction to prove that, starting from any x 0 close enough to x * , is possible to generate a sequence {x k } linearly convergent to x * .Take .
To construct the next iterate x 1 , let us verify the conditions in Theorem 2.4.Defining the auxiliary multifunction By using (3.1) and the definition of r * , we get that 3)), we obtain from (3.6) and Definition 2.1 Taking into account (3.4) and x 0 ∈ B r * (x * )\{x * }, we can verify that ρ < r * .Therefore, for s = p or s = q we obtain that where the second inequality holds since p, q ∈ B ρ [x * ] and, by (3.4), ρ < x 0 − x * .As e(∅, Φ x0 (q)) = 0, we can assume that As λδ < 1 (see (3.2)), we can apply Theorem 2.4 with Φ = Φ x0 , x = x * and µ = λδ to conclude that there exists y 0 ∈ Φ x0 (y 0 ) such that At this point, we have constructed y 0 .The next iterate x 1 is obtained according to Step 2, that is, Hence, remembering that To complete the induction process we proceed analogously to the first step.As in (3.6), we need to show that f On the other hand, from (3.3) we obtain that By (3.7), the last inequality becomes Hence, from (3.8), (3.9) and the condition on r * , we have By Definition 2.1 and taking into account that 0 ∈ G x * (x * ), we obtain Combining the two last inequalities we conclude that Taking the supremum with respect to z ∈ Φ x k (p) ∩ B a [x * ] in the last inequality and using the definition of excess given in (2.1), we have ).Hence, from the last inequality and the properties of the norm, we obtain By (3.4) we have λ δ + 2ca 1−γ < 1, so we can apply Theorem 2.4 with Φ = Φ x k , x = x * and µ = λ δ + 2ca 1−γ to conclude that there exists As before, we take Therefore, we conclude the induction process.
Remark 3.3.In [3], Adly and Van Ngai considered a quasi-Newton method similar to (1.5).The main difference between their method and (1.5) is that B k is a multifunction from X to Y. Also, in [3], the authors introduced a generalization of the semismooth to functions, see Definition 2.3 of [3].Another difference between these results consist in the assumption of regularity.But, applying Proposition 2.3 combined with Theorem 3E.7 of [23] and Remark 3.3 of [3], we conclude that these assumptions are equivalent.Hence, after some adjustments, assuming that f is Fréchet differentiable and C ⊂ X, we have that Theorem 3.2 extends ([3], Thm.3.2).Evidently, the power of our work relies on the case C = X.Nevertheless, Theorem 3.2 encompasses ([6], Thm.3.1) in the unconstrained case C = X.
Remark 3.4.Although we do not known the solution x * a priori, the bounded deterioration condition (3.3) involving the derivative at x * acts as a theoretical expectation for the convergence.In the same way is the requirement (3.2) on the initial B 0 .Specifically, the bounded deterioration as we stated is used in other related works, e.g., Theorem 3.1 of [6].It is worth mentioning that here we deal with general quasi-Newton schemes, in which case (3.3) plays an important role even for quasi-Newton methods for standard nonlinear programming [17].On the other hand, the particular Broyden update rule considered in Section 5 satisfies the bounded deterioration (see Prop. 5.1), in accordance with standard nonlinear programming.

The inexact quasi-Newton approach
In this section, we propose a version of QN-InexP where subproblems need not to be solved exactly.Specifically, they become where {B k } is a sequence of matrices and R k : X ⇒ Y is a sequence of multifunctions with closed graphs representing the inexactness.To illustrate the flexibility of condition (4.1), we observe that when F ≡ {0} and R k (x k , x k+1 ) := B η k f (x k ) (0), η k > 0, then we recover the inexact quasi-Newton method developed in [15] for nonlinear systems of equations.Also, if where {r k } is a sequence of functions representing the inexactness, our method reduces to an instance of the inexact quasi-Newton method considered in [13].Similarly to the previous section, we formally state our inexact quasi-Newton scheme with inexact projections in Algorithm 2.
As we made for QN-InexP, we will establish the local convergence of IQN-InexP.One may think this can be done simply by adapting the proof of Theorem 3.2 by introducing inexactness, but this is not totally true.Here, differently from the previous theorem that uses the principle of contractions, the Coincidence Theorem will serve as support ( [24], Thm. 1).Theorem 4.1 (Coincidence Theorem).Let X and Y be two metric spaces, ρ : X × X → R + be a metric in X and consider the multifunctions Φ : X ⇒ Y and Γ : Y ⇒ X.Let x ∈ X and ȳ ∈ Y .Also, let η, κ and µ be positive scalars such that µκ < 1. Suppose that one of the sets is closed while the other is complete, or that both sets are complete.Also, suppose the following conditions hold:  Suppose valid conditions (i) to (v) of Theorem 3.2.Also, suppose that the following additional conditions hold: (vi) for each k ≥ 0, the mapping (u, x) → R k (u, x) is partially Aubin continuous at (x * , x * ) w.r.t.x for 0 with modulus µ > 0; (vii) there are scalars t ∈ 0, , 0 < γ < t(1 − λµ)/2µ and β > 0 such that Then there exists a neighborhood U of x * such that, starting from any x 0 ∈ C ∩ U \{x * }, there is a sequence {x k } ⊂ C ∩ U generated by the IQN-InexP method that converges linearly to x * .
By using similar arguments as in Theorem 3.2 (see also Rem. 2.8) we obtain Since t0 < 1, we have . By induction, we suppose that there exist an integer k > 1 and points x 1 , x 2 , . . ., x k ∈ C ∩ B r * [x * ] and y 1 , y 2 , . . ., y k ∈ B r * [x * ], satisfying ) Without loss of generality, we assume that y j−1 , x j−1 and x * are distinct from each other.Note that, as x k ∈ B r * [x * ] and r * ≤ β, we can repeat the same argument from (4.9) with x 0 replaced by x k , obtaining To apply Proposition 2.3 in the induction step, firstly we need to show that B k − f (x * ) ≤ ζ for some positive scalar ζ such that ζλ < 1.By combining (3.3), (4.11), (4.12) and using the fact that That is, for all k ≥ 1, The above inequality combined with (4.4) implies ).Then, using (4.5), (4.6), (4.13) and the last inequality we can apply Proposition 2.3 to obtain Thus the conditions (vi) and (vii) in Theorem 4.1 are satisfied with u = x k .Now, since x k ∈ B r * [x * ] and η = t x k − x * , condition (4.7) ensures that η ≤ r * ≤ a and η/µ ≤ b.So, (4.3) implies condition (iii) of Theorem 4.1.Furthermore, from (2.4) we conclude that condition (iv) in Theorem 4.1 holds for Γ = G −1 xj with j = k.Thus, the assumptions of Theorem 4.1 are satisfied with η = t x k − x * , and hence there exists As tk < 1, we obtain x k+1 ∈ B r * [x * ], concluding the proof.
Remark 4.3.In Theorem 4 of [24], a related result to Theorem 4.2 for the unconstrained case, C = X, was provided.But the authors suppose that B k = f (x k ) for all k.So, even for C = X, Theorem 4.2 generalizes such previous result since we only require that B k approximates f (x k ).

Linear convergence of the Broyden-type quasi-Newton methods
In this section, we consider both QN-InexP and IQN-InexP methods when B k+1 in Step 3 is computed by the classical Broyden update scheme.Let us denote by •, • scalar products in R n .The Broyden update rule is defined as where The aim of this section is to show that the Broyden update rule satisfies the bounded deterioration property (3.3), a crucial condition in Theorems 3.2 and 4.2.Therefore, (5.1) is one practical choice that fulfils the general framework of our quasi-Newton methods.
Proposition 5.1.Let X and Y be Hilbert spaces.Suppose that the Fréchet derivative mapping f is Lipschitz continuous with constant L in a convex neighborhood X of a point x * .Given B k ∈ L(X, Y) and x k , y k ∈ X , y k = x k , the operator B k+1 defined as in (5.1) satisfies Proof.The proof is straightforward from the proof of Proposition 4. Suppose f + F is metrically regular at x * for 0 with modulus λ > 0, and that f has Fréchet derivative Lipschitz continuous locally around x * .Consider the QN-InexP method with θ k ≥ 0 for all k and assume that Then, if we choose B k+1 as in (5.1), there exists a neighborhood U of x * such that, starting from any x 0 ∈ C ∩ U \{x * }, there is a sequence {x k } ⊂ C ∩ U generated by the QN-InexP method that converges linearly to x * .If, in addition, items (vi) and (vii) of Theorem 4.2 are fulfilled, then the same conclusion is valid for the IQN-InexP method with B k+1 as in (5.1).

Superlinear convergence under the Dennis-Moré condition
We dedicate this section to show under what conditions we can obtain superlinear convergence of the inexact quasi-Newton method (4.1) by using the Broyden update (5.1).For this purpose, we will use the Hilbert-Schmidt norm of an operator A ∈ L(X, Y) between Hilbert spaces X and Y, defined as where {e i , i ∈ I} is an orthonormal basis of X.We denote by T (X, Y) := {A ∈ L(X, Y) | A HS < +∞} the set of Hilbert-Schmidt operators.
It is straightforward to show, as done in [6], that Proposition 5.1 is valid also for the Hilbert-Schmidt norm.To show the desired superlinear convergence we will use the Dennis-Moré theorem.This theorem gives a characterization for the superlinear convergence in quasi-Newton methods and these results can be adapted to generalized equations, see [6,18].Briefly, if a quasi-Newton method generates a sequence {x k } which stays near x * and x k+1 = x k for all k, then {x k } converges superlinearly to x * if, and only if, it is convergent and lim k→∞ where s k is taken as It is well known that the Broyden update (5.1) with y k = x k+1 , applied to a smooth equation in finite dimensions with a nonsingular Jacobian at the reference solution x * , satisfies (6.1); see e.g.Theorem 7.2.4 of [31].There are extensions of this claim for infinite dimensional Hilbert spaces [32,39].In [20], the Dennis-Moré condition (6.1) was applied to generalized equations in Banach spaces.In our context, we have the following statement: Proposition 6.1.Let Ω ⊂ X be an open set, f : Ω → Y be a function which is Fréchet differentiable, F : Ω ⇒ Y be a multifunction with closed graph and C ⊂ Ω be a nonempty closed convex set.Furthermore, let x * be a solution of (1.1).Suppose that f + F is strongly metrically subregular at x * for 0 with modulus λ > 0. Let {x k } be a sequence generated by the IQN-InexP method with θ k → 0, and assume that for all x in some neighborhood of x * , k ≥ 0 and γ k → 0.Then, if {x k } and {y k } both converge to x * and (6.1) Proof.The strong subregularity of f + F at x * for 0 with modulus λ > 0 implies that (see Def. 2.1) for all k large enough, let us say, k ≥ k 0 .In turn, Step 1 of IQN-InexP (expression (4.1)) ensures, for each k ≥ 0, the existence of Thus, combining (6.3) and (6.4) we obtain, for all k ≥ k 0 , From the Fréchet differentiability of f , there is an index where X k is a neighborhood of x * containing x k and y k (remember that y k → x * by hypothesis).Also, from (6.1), we can increase if necessary the terms of the sequence { k } to satisfy for all k ≥ k 1 .Due to condition (6.2), x k → x * and u k ∈ R k (x k , x * ) for all k, we have for all k ≥ k 1 .Using (6.6) and the fact that x k , y k ∈ X k , the second term of sum in (6.5) can be bounded as Thus, we use the previous inequality, (6.7) and (6.8) in (6.5) to obtain for all k ≥ k 1 .As k → 0, we can suppose without loss of generality that k 1 is such that 2λ k < 1 for all k ≥ k 1 , and hence we conclude that for all k ≥ k 1 .Now, from Remark 2.8 we have for all k.So, combining the previous two inequalities we arrive at Since k → 0 and θ k → 0, we have r k → 0 and thus {x k } converges to x * superlinearly.This concludes the proof.
The next result, which proof can be found in Theorem 4.8 of [7], establishes a condition for the Broyden update to satisfy the Dennis-Moré condition (6.1) with s k = s k := y k − x k .Proposition 6.2.Consider a function f : X → Y and a point x * ∈ X such that the derivative mapping f (x) is Lipschitz continuous around x * with respect to the Hilbert-Schmidt norm.Also, consider the Broyden update (5.1) in which B 0 − f (x * ) is a Hilbert-Schmidt operator.If the sequences {x k } and {y k } are linearly convergent to x * then they satisfy the Dennis-Moré condition (6.1).
We finalize this section with the superlinear convergence of the QN-InexP method (Algorithm 1), which is a consequence of Propositions 6.1 and 6.2.Theorem 6.3.Consider the constrained generalized equation (1.1) with a solution x * and suppose that the derivative mapping f (x) is Lipschitz continuous around x * with respect to the Hilbert-Schmidt norm.Consider the QN-InexP method applied to (1.1) with the Broyden update (5.1),where B 0 is chosen to satisfy (3.2) and such that B 0 − f (x * ) is a Hilbert-Schmidt operator.If f + F is strongly metrically subregular at x * for 0, then every sequence {x k } generated by QN-InexP that converges to x * is superlinearly convergent.

Numerical experiments
In this section we consider the quasi-Newton scheme with matrices B k following (5.1).For simplicity, we suppose Ω = R n .In order to write the inclusion z ∈ F (x) in a tractable way, we assume that the set F (x) can be described by equality and inequality constraints, that is, where h and g are continuously differentiable functions.Thus z ∈ F (x) ⇔ h(x, z) = 0, g(x, z) ≤ 0.
Therefore, determining a solution y k of the subproblem 0 In turn, a sufficient condition for the above expressions to be satisfied is (y k , z k ) to be an optimal solution of min y,z (7.2a) subject to h(y, z) = 0, (7.2b) g(y, z) ≤ 0 (7.2c) with null objective value.This problem can be solved by standard nonlinear optimization methods.So, we have a practical way to test whether z ∈ F (x) or not, at least approximately since (7.2) is considered solved when an optimality accuracy is achieved.This is in agreement with the inexactness allowed in the IQN-InexP variant (Algorithm 2).Taking into account (7.1), the stopping criterion 0 Thus, the following (approximate) criterion is natural to declare convergence: Note that x k ∈ C by construction in Algorithms 1 and 2, since the approximate projection in the sense of Definition 2.5 maintains feasibility.
In order to compute the iterate x k+1 ∈ P C (y k , x k , θ k ) satisfying condition (2.5), we employ the Frank-Wolfe algorithm provided in [5], which is applied to the projection problem min We added large box-constraints ≤ x ≤ u to the above problem, since the Frank-Wolfe algorithm is designed to deal with compact convex feasible sets.
In the next sections, we present three illustrative numerical examples of our theory.Our implementation is made in Matlab© R2016b using double precision.All tests were run on GNU/Linux Ubuntu 20.04.To solve problem (7.2) we use the interior point method implemented in the fmincon routine with maximum number of iterations equals to 1000 and optimality tolerance 10 −6 .Regarding the projection step, the sequence {θ k } was chosen as θ 0 = 10 −2 and θ k = max{0.9θ k−1 , 10 −8 }, k ≥ 1.This choice is purely empiric, and tries to reflect the inexact computation allowed by theory (note that even θ k → 0 is not required in Thms.3.2 and 4.2).A more accurate choice may be supported by the inequality in Remark 2.8, but more research is necessary on this topic.The gradients of h i and g j are given in each case.Finally, since it is required that x 0 ∈ C, we project the given initial point onto C if necessary by solving (7.3) once with Matlab's interior point method (the Frank-Wolfe algorithm of [5] also needs to be initialized within C).

Problem 1
Similarly to [6], let us consider f : R 2 → R 2 , F : R 2 ⇒ R 2 and C given by The solutions of the generalized equation f There are different ways to write F (x) as (7.1).Each of them affects the numerical resolution of the problem by a previously selected algorithm (in our case, the interior point method of Matlab©).For this example, we choose F 1 shows the execution starting from x 0 = (0.7, 0).Columns "iter" and "conv.rate" stands for "iteration" and x k − x * ∞ / x k−1 − x * ∞ , respectively, where x * = (0, 0.5) is the point to that algorithm approximates.From the third column, we can observe the linear convergence rate of {x k } to x * related in the theory (e.g.Thm.3.2).We run the algorithm from several distinct initial points, and in all cases it converges to (0, min{1, max{0.5, (x 0 ) 2 }}).That is, the sequence {(x 2 ) k } converges to the extreme of the interval [1/2, 1] if the initial value is not in this interval, while otherwise it is constant.This behaviour is expected and reflects the The algorithm has a preference for solutions with x 1 = 0.This probably occurs by the fact that the term 3x 3 1 of f tends to be minimized in the objective function of (7.2).Evidently, the algorithm reaches a valid solution of the problem.

Problem 2
The next problem is a modification of Problem 1 that imposes a stricter relationship between the variables x 1 and x 2 : The solutions are (0, x 2 ), x 2 ∈ [0, 2], and (1, 0).Here, The algorithm is attracted to (0, 1) for different initial points, including the solution (0, 0).Evidently this depends on the implementation, and can be justified by the way that we initialize z when solving problem (7.2).Table 2 shows the execution for x 0 = (0.9, 0.5).As in Problem 1, we can observe a linear rate of convergence to (0, 1).

Problem 3
Here we consider the bilevel problem min where x ∈ R n , y ∈ Y and c ∈ R p .We assume that D = ∅ is closed and convex, and that q(x, •) is a continuously differentiable convex function for all x.
It is well known that computing a feasible point (x, y) of the above problem is NP-hard in general, since y must be a global minimizer of an optimization problem.This difficult is increased if the upper level constraints (x, y) ∈ D contain lower level variables i 's.For details on bilevel problems, see [16].One way to deal with bilevel problems is rewriting the lower level problem under their Karush-Kuhn-Tucker (KKT) conditions: Since q(x, •) is convex, these conditions are sufficient to optimality of the lower level problem.Defining a point (x, y) is feasible for the bilevel problem if, and only if, there is a λ such that 0 ∈ f (x, y, λ) + F (x, y, λ), (x, y, λ) ∈ C.
Note that the above generalized equation does not involve inequalities.They are totally encapsulated by the projection step.
To illustrate the functionality of our algorithm, let us consider the particular instance In general, our implementation converges to different points when we vary the initial point.This is in agreement with the fact that, for each x, the lower level problem possibly admits a different minimizer.Starting from (x 0 , y 0 , λ 0 ) = (0, 0, 0, 0, 0, 0), the algorithm reaches  3. We do not calculate the "rate of convergence" for Problem 3 since it requires to know a priori the solution set of the generalized equation, i.e., the feasible set of the bilevel problem.So, it is hard to decide numerically if the iterate (x k , y k , λ k ) is close to the solution set, or even close to a particular solution.On the other hand, it is not

Conclusions
In this paper, we deal with constrained generalized equations.This is a very general class of problems, encompassing several other contexts, such as standard nonlinear optimization, variational inequalities and equilibrium problems.We presented two general quasi-Newton frameworks for solving constrained generalized equations that employ an inexact projection step.Firstly, we discuss a quasi-Newton scheme where subproblems are solved exactly.Its local convergence is provided under a bounded deterioration condition on the update quasi-Newton operator/matrix.Secondly, we extend the proposed method allowing subproblems to be solved inexactly.The resulting (inexact quasi-Newton) method is closer to the numerical practice, where inexactness is naturally present.We also proved that the inexact scheme converges locally under mild assumptions.
We analysed the particular case when the classical Broyden update rule is employed.For both exact and inexact quasi-Newton methods, we show that, in this case, the deterioration condition is satisfied directly.Illustrative numerical experiments with the Broyden variant were made to align theory with practice.
As future works, we intend to answer whether the condition of bounded deterioration can be replaced by something weaker or whether, with additional assumptions, we could obtain another update rule besides Broyden's.Another line of research is to study the variational inequality problem 0 ∈ f (x) + N C (x), by considering the inexact quasi-Newton method with B k satisfying the Broyden update, B an n × n matrix, {η k } a sequence of positive numbers converging to zero and ψ a Lipschitz function.As in [22], we intend to consider Newton-Kantorovich theorem on (8.1).Moreover, we propose to apply the quasi-Newton iteration ∇h(x k ) + B k (x k+1 − x k ) + N C (x k+1 ) 0 (8.2) to the first-order necessary optimality condition ∇h(x) + N C (x) 0 of the optimization problem minimize h(x) subject to x ∈ C, where h is a twice continuously differentiable function and C is a closed and convex set.We propose to update B k defined in (8.2) firstly by the Broyden update (5.1) with z k = ∇h(x k+1 ) − ∇h(x k ) and s k = x k+1 − x k , and secondly using the BFGS method, that is, , where y k := ∇h(x k+1 ) − ∇h(x k ) and s k := x k+1 − x k .
It is worth noting that, as reported in [22], the scheme (8.2) can be used for solving control-constrained optimal control problems.So, we also propose to do numerical experiments to the quasi-Newton method (8.2) and to compare the results with the ones obtained in [22].
During the review process, an anonymous referee asks about the possibly of weakening the assumption that f + F is metrically regular in our convergence results.This is a very interesting issue.In fact, as pointed out recently in [40], metric regularity fails to hold in important situations.So, a topic for further research is to establish the convergence of Quasi-Newton schemes under, for instance, Hölder-type hypotheses.Also, besides Definition 2.5, other type of projections should be considered in the convergence analysis, especially those with numerical appeal.
x, D) := inf y∈D x − y , e(C, D) := sup x∈C d(x, D) (2.1) are the distance from x to D and the excess of C beyond D, respectively.The outer distance from a point x ∈ X to a subset C ⊂ X, denoted by d + (x, C), is defined as d + (x, C) := sup{ x − y | y ∈ C}.The following conventions are adopted: d(x, ∅) = +∞, e(∅, D) = 0 when D = ∅, and e(∅, ∅) = +∞.
are single-valued, then the points x and ŷ are unique in B η [x] and B η/µ [ȳ], respectively.Now, we apply Theorem 4.1 to obtain the desired convergence result for Algorithm 2.

Theorem 4 . 2 .
Let Ω ⊂ X be an open set, f : Ω → Y be a Fréchet differentiable function, F : Ω ⇒ Y be a multifunction with closed graph and C ⊂ Ω be a nonempty closed convex set.Furthermore, let x * such that 0 ∈ f (x * ) + F (x * ), x * ∈ C.
x,y φ(x, y)subject to (x, y) ∈ D y ∈ arg min y q(x, y) subject to Y y ≤ c − Xx,

Table 3 .
Resolution of Problem 3.reasonable use the last iterate (x * , y * , λ * ) to make this computation because this point carries approximation errors, leading to a false estimative forx k − x * ∞ / x k−1 − x * ∞ .