NULL SPACE GRADIENT FLOWS FOR CONSTRAINED OPTIMIZATION WITH APPLICATIONS TO SHAPE OPTIMIZATION

. The purpose of this article is to introduce a gradient-ﬂow algorithm for solving equality and inequality constrained optimization problems, which is particularly suited for shape optimization applications. We rely on a variant of the Ordinary Diﬀerential Equation (ODE) approach proposed by Yamashita [65] for equality constrained problems: the search direction is a combination of a null space step and a range space step, aiming to decrease the value of the minimized objective function and the violation of the constraints, respectively. Our ﬁrst contribution is to propose an extension of this ODE approach to optimization problems featuring both equality and inequality constraints. In the literature, a common practice consists in reducing inequality constraints to equality constraints by the introduction of additional slack variables. Here, we rather solve their local combinatorial character by computing the projection of the gradient of the objective function onto the cone of feasible directions. This is achieved by solving a dual quadratic programming subproblem whose size equals the number of active or violated constraints. The solution to this problem allows to identify the inequality constraints to which the optimization trajectory should remain tangent. Our second contribution is a formulation of our gradient ﬂow in the context of—inﬁnite-dimensional—Hilbert spaces, and of even more general optimization sets such as sets of shapes, as it occurs in shape optimization within the framework of Hadamard’s boundary variation method. The cornerstone of this formulation is the classical operation of extension and regularization of shape derivatives. The numerical eﬃciency and ease of implementation of our algorithm are demonstrated on realistic shape optimization problems.


Introduction
The increasing popularity encountered by shape and topology optimization algorithms in industry calls for their use in more and more realistic physical contexts.In such applications, the optimized designs are often subjected to a large number of complex engineering constraints.To name a few of them, it is often required that the stress developed inside mechanical structures do not exceed a prescribed safety threshold [6,28,40]; it is also customary to impose constraints on the geometry of the optimized shape-e.g. on the thickness of its structural members, on its curvature radius, etc. [8,55]-, or in keeping with manufacturability issues; see for instance [7,4,63], or [42] for an overview.This raises the need for advanced constrained optimization algorithms, adapted to the specificities of shape optimization.
Over the past decades, many iterative algorithms have been devised to solve for generic constrained optimization problems of the form: min where V is typically a Hilbert space, J : V → R is a differentiable objective function, g : V → R p and h : V → R q are differentiable functions accounting for p equality and q inequality constraints, respectively.'Classical' gradient-based algorithms for the numerical resolution of (1.1) include, e.g., Penalty, Lagrangian, Interior Point and Trust Region Methods, Sequential Quadratic or Linear Programming (SQP or SLP) [17,48,66,34,61], the Method of Moving Asymptotes (MMA) [58], the Method of Feasible Directions (MFD) [68,60].
As a matter of fact, advanced mathematical programming methods are not frequently described in the literature devoted to shape optimization based on Hadamard's method (see [35,56] for an introduction).In most contributions, where, usually, only one constraint is considered, standard Penalty and Augmented Lagrangian Methods are used for the sake of implementation simplicity [9,23].Morin et.al. introduced a variant of SQP in [46] but the volume constraint featured in the optimization problem is treated with a Lagrange Multiplier method.When it comes to more complex applications, some authors have introduced adapted variants of Sequential Linear Programming [27] or of the Method of Feasible Direction [30].However, a major difficulty related to the practical use of these algorithms in topology optimization lies in that the aforementioned techniques require fine tuning of the algorithm parameters in order to actually solve the minimization problem.These parameters are e.g. the penalty coefficients in the Augmented Lagrangian and Interior Point methods, the size of the trust region in SLP algorithms, the strategy for approximating the Hessian matrix in SQP, the bounds on the asymptotes in MMA and the Topkis parameters in MFD.The correct determination of these parameters is strongly case-dependent and often unintuitive: for instance, the penalty coefficients must be neither 'too large' nor 'too small' in Lagrangian methods, the SLP trust region size-which acts as a step length-cannot be chosen too small (otherwise the involved quadratic subproblems may not have a solution).In shape and topology optimization practice, a fair amount of trials and errors is often required in order to obtain satisfying minimizing sequences of shapes.Since every optimization step depends on the resolution of partial differential equations, such tunings are very tedious, time consuming for 2-d cases, and downright unaffordable for realistic 3-d applications.2) (labeled 'NLSPACE').Trajectories travel tangentially to an optimal subset I(x) ⊂ I(x) of the active constraints I(x), which is determined by a dual problem (see Section 3).A less optimal trajectory (labeled 'NLSPACE (no dual)') is obtained if the set I(x) is not identified, because it is unable to escape the tangent space to the constraints labeled by I(x).
The first main contribution of this article is to propose a novel algorithm for constrained optimization which is rather easy to implement and reliable in the sense that it allows to solve (1.1) without the need for tuning non physical parameters; it is therefore particularly well adapted to the specificities of shape and topology optimization applications.The essence of our method is a modification of the celebrated gradient flow which enables it to 'see the constraints': optimization trajectories x(t) are obtained by solving an Ordinary Differential Equation (ODE): where the descent direction ẋ is a combination of a so-called 'null space' direction ξ J (x) and a 'range space' direction ξ C (x), lying respectively in the null space of the constraint set and in its orthogonal complement (for this reason, we call the ODE (1.2) a 'null space' gradient flow).The null space direction ξ J (x) is the projection of the gradient ∇J(x) onto the cone of feasible directions (see Figure 1).The range space direction ξ C (x) is a Gauss-Newton direction, aimed to smoothly drive the optimization path toward the feasible domain.Finally, α J , α C > 0 are two (optional) parameters scaling the imposed decrease rates of the objective function and of the violation of the constraints; we shall see in particular that the latter quantity decreases along trajectories x(t) at least as fast as e −α C t .In numerical practice, (1.2) is solved by using a suitable discretization scheme; one which works well to solve a number of shape optimization examples or in R d is proposed in Algorithm 1 below.The cornerstone of our method is the computation of the null space direction ξ J (x); it relies on the resolution of a dual program to identify an 'optimal' subset I(x) of the set of active inequality constraints I(x) ⊂ {1, . . ., q} to which the optimization path must remain tangent.The remaining constraints I(x)\ I(x) become inactive, allowing the trajectory to naturally re-enter the feasible domain.More specifically, for a given subset of indices I ⊂ {1, . . ., q}, let us denote by h I (x) := (h i (x)) i∈I the collection of those inequality constraints indexed by I and by C I (x) the vector: (1.3) Then, for inequality constrained problems, the directions ξ J (x) and ξ C (x) in (1.2) are defined as follows: ξ J (x) = (I − DC T I(x) (DC I(x) DC T I(x) ) −1 DC I(x) )∇J(x), (1.4) ξ C (x) = DC T I(x) (DC I(x) DC T I(x) ) −1 C I(x) (x), (1.5) where I is the identity matrix and (DC(x)) ij = ∂ j C i (x) denotes the Jacobian matrix of a vector function C(x) = (C i (x)) i (the dependence with respect to x is omitted when the context is clear).The symbol T denotes the transposition operator; it may differ from the usual transpose T if the optimization set V is infinite-dimensional (see below and Section 2).Formulas (1.4) and (1.5) involve two different subsets I(x) ⊂ I(x) ⊂ {1, ..., q} of indices of inequality constraints: the first one I(x) is the set of all saturated or violated constraints, defined by I(x) = {i ∈ {1, . . ., q} | h i (x) ≥ 0}. (1.6) The set I(x) ⊂ I(x) is characterized thanks to the introduction of the following 'dual' quadratic optimization subproblem: (λ * (x), µ * (x)) := arg min λ∈R p µ∈R q(x) , µ≥0 ||∇J(x) + Dg(x) T λ + Dh I(x) (x) T µ|| V . (1.7) The problem (1.7) amounts to computing the projection of ∇J(x) onto the cone of feasible directions.The set of indices I(x) involved in the definition (1.4) of ξ J (x) is inferred from the positive entries of the optimal Lagrange multiplier µ * (x): (1.8) Proposition 4 shows that this definition of I(x) is somehow 'optimal' since it ensures that −ξ J (x) is the 'best' descent direction for J(x) respecting locally equality and inequality constraints.This approach turns out to be very efficient when dealing with a large number of (possibly violated) inequality constraints (see Figure 1).
Our ODE (1.2) is a generalization of rather classical dynamical system approaches to nonlinear constrained optimization [44,59,65,51], which seem to be little known within the topology optimization community.The flexibility of these methods lies in that, in principle, an admissible local minimizer to (1.1) is reached as the stationary point x * of the continuous trajectory x(t), for almost any (feasible or not) initialization x(0), regardless of the value of the coefficients α J , α C > 0 (a formal proof for the ODE (1.2) is out of the scope of this work, but we refer to [39,41,50,51] for related results when inequality constraints are not present).The descent properties of (1.2) are expected to be preserved at the discrete level provided it is discretized with a sufficiently small Euler step size ∆t (see Remark 13 below).As a result, the success of the use of our method depends truly only on the selection of a sufficiently small step ∆t, and to a less extent on the physically interpretable, dimensionless parameters α J , α C , whose tuning by the user is quite intuitive.
When the problem (1.1) features no constraint, (1.4) and (1.5) read simply ξ J (x) = ∇J(x) and ξ C (x) = 0, so that the ODE (1.2) reduces to the standard gradient flow ẋ(t) = −∇J(x(t)). (1.9) When (1.1) features only equality constraints g(x) = 0, but no inequality constraint (in that case ), the same ODE (1.2) as ours was previously derived and studied in the early 1980s by Tanabe [59] (without the use of the Gauss-Newton direction ξ C (x(t))) and by Yamashita [65] (where a combination of both directions ξ J (x(t)) and ξ C (x(t)) is brought into play), in the finite-dimensional setting V = R k .In this particular case, the solution to our dual problem (1.7) has a closed-form expression and (1.2) reads with our notation: (1.10) In the general situation where (1.1) features both inequality and equality constraints, variants of the ODE (1.10) have been considered by various authors, with a different method from ours, however [51,37,38,54].
The most common approach in the literature consists in introducing q slack variables {z i } 1≤i≤q ∈ R q to transform the q inequalities h i (x) ≤ 0 for 1 ≤ i ≤ q into as many equality constraints h i (x) + z 2 i = 0, before solving the ODE (1.2) in the augmented space (x, z) ∈ V × R q .This approach offers convergence guarantees [51] and could also be beneficial for shape optimization, however this is not the strategy we have retained.Indeed, our method does not need to resort to slack variables for handling inequality constraints, and it presents additional advantages described in Section 3.5.
Our second main contribution is the exposure of our dynamical system strategy in a setting compatible with the inherently infinite-dimensional nature of shape optimization based on the method of Hadamard.In such a context, a clear distinction between the Fréchet derivative DJ(x(t)) (which is an element of the dual space V ) and the gradient ∇J(x(t)) (which is an element of V ) is in order: the gradient ∇J(x(t)) is obtained by identifying DJ(x(t)) with an element of V thanks to the Riesz representation theorem.The same distinction is also needed between the differential of a vector valued function DC(x(t)) and its transpose DC(x(t)) T ; see Definition 1.
For shape optimization applications, the minimization set in (1.1) is the set of all open Lipschitz domains Ω enclosed in some 'hold-all' domain D ⊂ R d .This set is not a vector space, but it can be locally parameterized by the Sobolev space W 1,∞ (D, R d ).More precisely, for a given Lipschitz open set Ω ⊂ D, one can restrict the minimization to the space of variations of Ω parametrized by vector fields θ.This space O is a Banach space (but not a Hilbert space), after being identified with W 1,∞ (D, R d ).The vector field θ can be interpreted as a displacement field, and the minimized shape functional Ω → J(Ω) (like the constraint functionals) is restricted to a functional θ → J((I + θ)(Ω)) defined on O, whose derivative can be computed in the sense of Fréchet [47,56,35].The space W 1,∞ (D, R d ) can be interpreted as a 'tangent space' for the 'manifold' of all open Lipschitz domain Ω ⊂ D. In practice, the identification of the aforementioned Fréchet derivative with a gradient is achieved by solving an extension and regularization problem, which has major consequences in numerical practice, see e.g.[21,24].This step is naturally and consistently included in our algorithm thanks to the suitable definition of the transposition operator T .So far, this matter does not seem to have been clearly addressed in the literature concerned with constrained shape optimization: common approaches rather compute a descent direction first, before performing a regularization, see e.g.[27,30].
Several contributions in the field of shape and topology optimization can be related to ours.In fact, our method is very close in spirit to the recent work of Barbarosie et.al. [16], where an iterative algorithm for equality constrained optimization is devised, which turns out to be a discretization of (1.2) with a variable scaling for the parameter α C .When dealing with inequality constraints, the authors propose an active set strategy which is based-like ours-on the extraction of an appropriate subset of the active constraints (without convergence guarantee, however).This strategy relies on a different algorithm from ours, which generally yields a different (suboptimal) set than I(x) whose mathematical properties are a little unclear; see Remark 4 below for more details.Finally, Yulin and Xiaoming also suggested in [67] to project the gradient of the objective function onto the cone of feasible directions; nevertheless, they remained elusive regarding how the projection problem is solved or how violated constraints are tackled.
An open-source implementation of our null space algorithm is made freely available online at https://gitlab.com/safrandrti/null-space-project-optimizer.The repository includes demonstrative pedagogical examples for the resolution of optimization problems in R d , but can also serve as a basis for the resolution of more complicated applications; it has been used as is for the realisation of the shape optimization test cases of this article and other of our related works [32,31].
The present article is organized as follows.In Section 2, we review the definition and the properties of the gradient flow (1.2) for equality constrained optimization, in the case where the minimization set V is a Hilbert space.We detail then in Section 3 the necessary adaptations to account for inequality constraints and in particular the introduction of the dual subproblem allowing to determine the null space direction ξ J (x).Under some technical assumptions, we prove in Proposition 5 the convergence of our 'null space' gradient flow (1.2) towards points satisfying the full Karush Kuhn Tucker (KKT) first-order necessary conditions for optimality.Several algorithmic implementation aspects of our method are discussed in Section 4. From Section 5 on, we then concentrate on shape optimization applications.After clarifying the necessary adaptations required to extend the discretization of (1.2) when the optimization set V is only locally a Banach space such as W 1,∞ (D, R d ), we explain how our algorithm can be integrated within the level set method for shape optimization [9,62].Numerical examples are eventually proposed in Section 6: we first solve an optimal design problem in thermoelasticity inspired from [64] for which the optimized solutions feature active and inactive inequality constraints-an example meant to emphasize the key role of our dual problem in the identification of the optimal subset of inequality constraints which need to be enforced at each stage of the optimization process.Then, we consider the shape optimization of a bridge structure subject to multiple loads, which involves up to ten constraints.Many more shape optimization applications built upon this algorithm, including 3-d multiphysics test cases, are provided in the PhD dissertation [31].The article ends with a technical appendix containing the proofs of two theoretical results stated in the main text.

Gradient flows for equality-constrained optimization in Hilbert spaces
Before turning to the general shape optimization setting in Section 5, we first consider in this section (and the next ones) the case where the optimization takes place in a Hilbert space V with inner product The first focus of our study is the minimization problem (1.1) in a simplified version where only equality constraints are present, namely: where J : V → R and g : V → R p are Fréchet differentiable functions.Our purpose is to recall how the ODE approach (1.2) works in the present Hilbertian setting.Let us emphasize that, although this section is quite elementary and not new per se, it is not easily found as is in the literature.Since it is key in understanding our method for handling inequality constraints in Section 3, the present context is thoroughly detailed for the reader's convenience.

Notation and first-order optimality conditions
We start by setting notation about differentiability and gradients in Hilbert spaces that we use throughout this article.As we have mentioned indeed, a clear distinction between gradients and Fréchet derivatives proves crucial in our shape optimization applications in Section 5.

Definition 1.
(1) A vector-valued function g : V → R p is differentiable at a point x ∈ V if there exists a continuous linear mapping Dg(x) : Dg(x) is called the Fréchet derivative of g at x.
(2) If g : V → R p is differentiable, for any µ ∈ R p , the Riesz representation theorem [18] ensures the existence of a unique function Dg(x where the superscript T stands for the usual transpose of a vector in the Euclidean space R p .The linear operator Dg(x) T : R p → V thus defined is called the transpose of Dg(x) .(3) If J : V → R is a scalar function which is differentiable at x ∈ V , the Riesz representation theorem ensures the existence of a unique vector ∇J(x) ∈ V satisfying This vector ∇J(x) is called the gradient of J at x.
Throughout the following, we shall sometimes omit the explicit mention to x in the notation for differentials or gradients when the considered point x ∈ V is clear, so as to keep expressions as light as possible.
(1) If V is the (finite-dimensional) Euclidean space R k , and •, • V is the usual inner product, the Fréchet derivative (resp.its transpose) of a vector function g : R k → R p are respectively given by the Jacobian matrix (Dg In the literature, the differential matrix Dg is often denoted with the nabla notation ∇g.For the sake of clarity, we reserve the ∇ symbol for the gradient of scalar functions J : V → R. The calligraphic transpose notation T appearing in the objects DJ(x) T or Dg(x) T encodes at the same time the operator transposition (reversing the input and range spaces) and the Riesz identifications.In particular, it holds that ∇J(x) = DJ(x) T 1.
(2) Still in the case where V = R k is finite-dimensional, but the inner product •, • V is given by a symmetric positive definite matrix A (that is, ξ, ξ V = ξ T Aξ), the calligraphic transpose of a p × k matrix M : R k → R p with respect to •, • V now reads M T = A −1 M T .In shape optimization applications, the inner product •, • V often stands for the bilinear form associated to an elliptic operator, and the calligraphic transpose T encompasses the extension and regularization steps of the shape derivative, see Section 5.2 below.(3) If V is replaced by the tangent space to a Riemannian manifold, the bilinear form •, • V can be interpreted as a metric and ∇J(x), as given by (2.4), is the covariant gradient with respect to this metric.(4) When V is a general Hilbert space, for a vector-valued function g : V → R p with components g(x) = (g i (x)) 1≤i≤p , Dg : V → R p is the 'row' matrix whose entries are the p linear forms Dg i (x) : V → R.
The transpose Dg(x) T is the 'column' matrix gathering the p vectors (∇g i (x)) 1≤i≤p obtained by solving the p identification problems, similar to (2.4), ∇g i (x), ξ V = Dg i (x)ξ for any ξ ∈ V .More precisely, Dg(x) T µ = p i=1 µ i ∇g i (x) for any µ ∈ R d .In particular, the p × p matrix DgDg T has entries (DgDg T ) ij = ∇g i , ∇g j V = Dg i (x)(∇g j (x)).
In the present section, the equality constraints are said to satisfy the "Linear Independence Constraint Qualification" (LICQ) condition at a point x ∈ V if rank(Dg(x)) = p, or equivalently Dg(x)Dg(x) T is an invertible p × p matrix. (2.5) Note that (2.5) makes sense even at points x ∈ V where g(x) = 0, an essential fact in the sequel: as we have mentioned in the introduction, realistic shape optimization problems are often initialized with an unfeasible design.Under the above notation, let us recall the classical first-order necessary optimality conditions (KKT) for the equality-constrained problem (2.1) at some point x * ∈ V where the constraints are satisfied and qualified: there exists λ(x * ) ∈ R p such that, (2.6) see for instance [17,48].
2.2.Definitions and properties of the null space and range space steps ξ J and ξ C We are now in position to define the null space and range space steps ξ J (x) and ξ C (x) featured in the ODE (1.2) for equality constrained problems in the present Hilbert space setting.
Definition 2. Consider the optimization problem (2.1).For any point x ∈ V satisfying the LICQ condition (2.5), we define the null space and range space directions ξ J (x) and ξ C (x) by, respectively: ξ C (x) := Dg T (DgDg T ) −1 g(x). (2.8) In the finite-dimensional case where V = R k , it is well-known that the null space step ξ J (x) in (2.7) is the orthogonal projection of the gradient ∇J(x) onto the null space of the constraints which is also the tangent space at x to the manifold {y ∈ V | g(y) = g(x)}.This is still true when V is a Hilbert space, as we recall in the next lemma.Lemma 1.Let x ∈ V be a point where the LICQ condition (2.5) is satisfied.Then: (1) The space V has the following orthogonal decomposition: where the range is defined as Ran(Dg(x) T ) := {Dg(x) T λ | λ ∈ R p }.Moreover, the operator Π g(x) : V → V defined by Π g(x) = I − Dg T (DgDg T ) −1 Dg(x) is the orthogonal projection onto Ker(Dg(x)).
(1) Any ξ ∈ V may be decomposed as ξ = Π g(x) (ξ) + (I − Π g(x) )(ξ), where it is straightforward to verify that Π g(x) (ξ) ∈ Ker(Dg(x)), and (I − Π g(x) )(ξ) ∈ Ran(Dg(x) T ).In addition, Ker(Dg(x)) and Ran(Dg(x) T ) are orthogonal for the inner product a since from (2.3), one has, (2) It follows from the first point that for any ξ ∈ Ker(Dg(x)) such that whence we easily infer that ξ := −Π g(x) (∇J(x))/||Π g(x) (∇J(x))|| V is the global minimizer of (2.9).(3) The Pythagore identity yields, for any ξ ∈ Ker(Dg(x)), Hence the orthogonal projection Π g(x) (∇J(x)) is the best approximation of ∇J(x) within Ker(Dg(x)).On the other hand, recalling from the first point that Ran(Dg(x) T ) is the orthogonal complement of Ker(Dg(x)), we obtain also, for any λ ∈ R p , whence the expression (2.10) and the minimization property (2.11) follow.Note that the uniqueness of the solution λ * (x) to (2.11) results from the LICQ condition (2.5).Finally, the optimization problem (2.9) can be rewritten as Hence the (formal) dual problem of (2.9) reads: According to the definitions (2.3) and (2.4) of the gradient and of the Hilbertian transpose, the latter problem rewrites: where for given λ ∈ R p , the value is that achieving the minimum in the inner minimization problem at the left-hand side of (2.12).This shows that (2.11) is the dual problem of (2.9).
The next lemma is also fairly classical in the literature, at least in the finite-dimensional case.It characterizes the range space step ξ C (x), defined by (2.8), as the unique Gauss-Newton direction for the cancellation of the constraint function g(x) which is orthogonal to the (linearized) set of constraints: Lemma 2. Let x ∈ V be a point where the LICQ condition (2.5) is satisfied.Then: (1) The range space step ξ C (x) = Dg T (DgDg T ) −1 g(x) is orthogonal to Ker(Dg(x)): (2) −ξ C (x) is a descent direction for the violation of the constraints: (3) The set of solutions to the Gauss-Newton program Proof.Point ( 1) is an immediate consequence of point (1) in Lemma 1. Point ( 2) is obvious from the definition (2.8) of ξ C (x).Note that (2.13) means that −ξ C (x) is a descent direction for the violation of the constraints in the sense that it ensures that any coordinate To prove point (3), since (2.14) is a convex optimization problem, a necessary and sufficient condition for ξ ∈ V to be one solution is given by the usual first-order condition: . Since the matrix (DgDg T ) is invertible, this is in turn equivalent to Dg(x)ξ = −g(x).Finally, we know from point (2) that −ξ C (x) is one particular solution to this last equation.Thus, point (3) follows from the fact that any two solutions of this problem differ up to an element ζ ∈ V such that Dg(x)ζ = 0.

Decrease properties of the equality constrained gradient flow
The main features of the definitions of ξ J (x) and ξ C (x) are the facts that ξ J (x) is orthogonal to the set of constraints, i.e.Dg(x)ξ J (x) = 0, and that −ξ C (x) decreases the violation of the constraints while being orthogonal to ξ J (x).These ensure that the entries of the constraint functional g(x(t)) tend to 0 along the trajectories of the ODE (1.2), independently of the value of ξ J (x).Then, as soon as the violation of the constraints becomes sufficiently small, the objective function J(x(t)) decreases without compromising the asymptotic vanishing of g(x(t)).We review these properties in the next proposition, which was also observed in [65] in the finite-dimensional context; see Appendix A for the proof.
Proposition 1. Assume that the trajectories x(t) of the flow exist on some time interval [0, T ] for T > 0, and that the LICQ condition (2.5) holds at any point x(t), t ∈ [0, T ].Then the following properties hold true: (1) The violation of the constraints vanishes at exponential rate: (2) The value J(x(t)) of the objective decreases 'as soon as the violation (2.16) of the constraints is sufficiently small' in the following sense: assume that rank(Dg where σ p (x) is the smallest singular value of Dg(x).Then there exists a constant C > 0 such that

Extension to equality and inequality constraints
We now proceed to extend the dynamical system (1.2) or (2.15) so as to handle inequality constraints as well.
We return to the full optimization problem (1.1), still posed in a Hilbert space V with inner product •, • V , and where the objective J : V → R, equality constraints g : V → R p and inequality constraints h : V → R q are differentiable functions.
Inspired by the methodology developed in Section 2, we still propose to solve the equality and inequality constrained problem (1.1) by means of a dynamical system of the form: whose forward Euler discretized version reads: After introducing notation conventions related to inequality constraints in Section 3.1, the range space step ξ C (x) is defined in Section 3.2 from a formula analogous to (1.5).The definition of the null space step ξ J (x) is examined in details in Section 3.3; it involves a procedure for identifying a relevant subset I(x) ⊂ I(x) of the saturated or violated constraints, which relies on the dual problem (1.7).Finally, the properties of the resulting flow (3.1) are outlined in Section 3.4.

Notation and preliminaries
Let us recall the definition (1.6) for the set I(x) of saturated or violated inequality constraints at x ∈ V .We denote by q(x) := Card( I(x)) the number of such constraints.For a given subset I ⊂ {1, ..., q}, the vector h I (x) = (h i (x)) i∈I collects the inequality constraints indexed by I.The vector C I (x), defined by (1.3), collects all equality constraints g(x) and those selected inequality constraints h I (x).
In the present context, the constraints are said to satisfy the "Linear Independence Constraint Qualification" (LICQ) condition at x ∈ V if the linearized saturated or violated constraints are independent, namely if rank(DC If the point x satisfies the constraints, (3.3) is one usual LICQ condition (of course, there are other possible qualification conditions, see [17,48]), but (3.3) also applies to points x where constraints are not satisfied.Define Π C I : V → V , the orthogonal projection operator onto Ker(DC I (x)), by and let (λ be the corresponding Lagrange multipliers: Last but not least, let us recall that in the present context of the equality and inequality constrained problem (1.1), the necessary first-order optimality conditions (the KKT conditions) at a given point x * ∈ V satisfying the LICQ condition (3.3), read as follows: there exist λ(x * ) ∈ R p and µ(x see again [17,48].

Definition of the range step direction
Definition 3 (range space step).For the optimization problem (1.1), the range step ξ C (x) is defined at any x ∈ V satisfying the LICQ condition (3.3) by where I(x) is the set of all saturated or violated constraints, defined by (1.6).
The purpose of the range space step ξ C (x) is to decrease the violation of the constraints as we shall see in Proposition 5 below.The exact counterpart of Lemma 2 holds in this context, in particular: (1) ξ C (x) is orthogonal to Ker(DC I(x) ).
(2) −ξ C (x) is a Gauss-Newton direction for the violation of the constraints:

Definition and properties of the null space step
The definition of the null space direction ξ J (x) is slightly more involved in the present context, as it is not obtained by simply replacing Dg(x) by DC I(x) in (2.7).It requires the introduction of a subset I(x) ⊂ I(x) of saturated or violated inequality constraints, as we now discuss.
Like in the equality-constrained case discussed in Section 2, the rationale behind the definition of the null space step ξ J (x) is that it is sought, up to a change of sign, as a best normalized descent direction diminishing violated or saturated inequality constraints: −ξ J (x) is set positively proportional to the solution ξ * (x) of the following minimization problem (compare with Lemma 1): The problem (3.8) could be solved directly with standard quadratic programming algorithms.However, it is convenient to characterize explicitly the minimizer ξ * (x) of (3.8) by examining the dual problem.This will allow us to obtain in Definition 4 an explicit formula for the null space direction ξ J (x), under the form (1.4).
We now introduce the dual optimization problem to (3.8), which is analogous to that (2.11) in the previous section.
Proposition 2. Let x ∈ V satisfy the LICQ condition (3.3).There exists a unique couple of multipliers λ * (x) ∈ R p and µ * (x) ∈ R q(x) + solving the following quadratic optimization problem which is the dual of (3.8): (λ * (x), µ * (x)) := arg min (3.9) Proof.At first, (3.8) is equivalent to the following min-max formulation: Exchanging formally the min and the max and solving explicitly the inner minimization problem with respect to ξ (see the proof of Lemma 1) yields that (3.9) is the dual problem of (3.8) up to a change of signs (the duality gap between (3.9) and (3.8) will be shown to vanish in Proposition 3).The program (3.9) brings into play the closed convex set R p × R q(x) + and the least-squares functional The latter is strictly convex over R p × R q(x) + by virtue of (3.3).Hence, (3.9) has a unique solution.
The optimization problem (3.9) belongs to the class of non negative least-squares problems; it can be solved efficiently with a number of dedicated solvers, such as cvxopt [12] or IPOPT [61].One nice feature of (3.9) lies in that its dimension is the number p + q(x) of saturated or violated constraints (and not the total number p + q of constraints), which can be small for many practical cases, as e.g. in our shape optimization applications of Section 5.It is also possible to exploit the sparsity of the constraints if p + q(x) is large, see Remark 5 below.
The next proposition relates the optimal values and the solutions ξ * (x) and (λ * (x), µ * (x)) of the primal and dual problems (3.8) and (3.9).Roughly speaking, it claims that the optimal feasible descent direction ξ * (x) of (3.11) is the projection of the gradient ∇J(x) onto the cone of feasible directions.The proof follows classical arguments of duality theory in linear programming and it is detailed for the convenience of the reader.
(2) For any subset I ⊂ I(x) satisfying Dh I(x) (x)Π C I (∇J(x)) ≥ 0, the direction is feasible for the primal problem (3.8), and we obtain by definition of ξ * (x) that whence the maximization property (3.19).
For I ⊂ I(x) satisfying µ I (x) ≥ 0, we obtain feasible multipliers (λ, µ) for the dual problem (3.9) by taking the components of µ to coincide with those of µ I on the indices of I and assigning them the value 0 in the complementary subset I(x) \ I. Then the optimality of (λ * (x), µ * (x)) for (3.9) reads: whence the minimization property (3.20).
Remark 2. In view of (3.16), the optimal multiplier µ * (x) can be interpreted as an indicator specifying which constraints of I(x) are 'not aligned' with the gradient ∇J(x) and should be kept in the subset I(x).The best descent direction (in the sense of (3.8)) is obtained by projecting the gradient ∇J(x) onto the tangent space of the constraint subset I(x) rather than onto the full set of violated or saturated constraints I(x).Indeed, the descent direction ξ = −Π C I(x) ∇J(x) that would be obtained by projecting ∇J(x) on the whole set I(x) would only keep them constant at first order, i.e.Dh i (x)ξ = 0, (see Remark 6 for more details), whereas considering ξ = −Π C I(x) ∇J(x) instead allows the inequality constraints associated to i ∈ I(x) \ I(x) to decrease, which is much more efficient.
Remark 3. Note that actually, the use of a dual problem such as (3.9) in order to obtain information about which constraints should remain active is quite classical in active set methods, see e.g.[19,36,48].
In principle, the subset I(x) could be found by solving the discrete problems (3.19) or (3.20).However, we expect that in practice, it is more efficient to rely on iterative solvers based on gradient descent strategies for the dual problem (3.9), e.g. a cone programming solver or a non negative least-squares algorithm such as that in [19].This is what we do in the sequel.
Having introduced the subset I(x) (defined in (3.16)), we are now in position to define the null space direction ξ J (x) in the present context: −ξ J (x) is set to be a positive multiple of the optimal descent direction ξ * (x) supplied by (3.18).Definition 4. For any point x ∈ V satisfying the LICQ condition (3.3), the null space direction ξ J (x) at x for the optimization problem (1.1) is defined by: where I(x) is the set defined by (3.16).
Observe that all violated and saturated constraints are taken into account in the Gauss-Newton direction ξ C (x) defined by (3.7), while only those constraints in I(x), not aligned with the gradient ∇J(x), occur in the definition of ξ J (x).
Remark 4. With our notation conventions, the discrete optimization scheme proposed by Barbarosie et.al. [15,16] reads where the set I(x n ) is obtained by removing indices from I(x n ) one by one, starting from the index i 0 associated with the most negative multiplier ν n,i0 < 0, until all of them become non negative.Therefore, the set I(x n ) used in the latter strategy and that I(x n ) featured in ours (given by (3.16)) do not coincide in general; one could think of configurations where the procedure of [16] would fail to find the optimal set I(x n ) (for example if i 0 ∈ I(x n )) and would project the gradient on a less optimal subset of constraints.We note that no convergence result is given by the authors of [15,16] about their procedure.
Remark 5. Let us discuss two extreme cases related to the involved computational effort in the numerical implementation of (3.24).Upon discretization, we may assume that V = R k is a finite-dimensional space.
(1) If the total number p + q of saturated or violated constraints is small compared to the dimension k of V , it is best, for numerical efficiency, to assemble the small square matrix (DC I(x) DC T I(x) ) and to invert it by a direct method.
(2) If V = R k is equipped with an inner product encoded by a matrix A, and if p + q is of the order of k or larger, the computation of the inverse of (DC I(x) DC T I(x) ) can be expensive.However, it is still tractable if both DC and A are sparse matrices (i.e.matrices with many 0 entries).For instance, this occurs in the case of bound constraints on the optimization variable x = (x 1 , ..., x k ), i.e. constraints of the form Recalling from Remark 1 that in this setting, DC T I(x) = A −1 DC T I(x) , the calculation of ξ J (x) can be carried out thanks to the following procedure: at first, the vector where Z ∈ R p+Card( I(x)) is an extra slack variable.Then, the desired null space direction is obtained as ξ J (x) = ∇J(x) − X.A similar strategy, exploiting the sparsity of A and DC I(x) , can be used to compute the range space direction ξ C (x) of (3.7), or to solve the dual quadratic subproblem (3.9).
Remark 6.As we have already mentioned, the Lagrange multiplier µ * (x) given by (3.17) may be understood as an indicator of which inequality constraints are aligned with the gradient of J at x.This insight is especially intuitive in the particular situation where the gradients of the constraint functions are orthogonal to one another, i.e.: ∇g i (x), ∇g j (x) V = 0, for i, j = 1, ..., p, i = j, ∇h i (x), ∇h j (x) V = 0, for i, j = 1, ..., q, i = j, ∇g i (x), ∇h j (x) V = 0, for i = 1, ..., p, j = 1, ..., q.
Indeed, in this case, it easily follows from the Pythagore theorem that for any (λ, µ) ∈ R p × R q(x) Therefore the minimization problem (3.9) is separable with respect to the components of (λ, µ) ∈ R p ×R q(x) + : (λ * i (x)) 1≤i≤p and (µ * i (x)) i∈ I(x) are the respective solutions to the minimization problems: which yields eventually: Hence, µ * i (x) is positive if and only if following the descent direction −∇J(x) leads to an increase (i.e.violation) of the i th inequality constraint.
In the general case where all the constraint gradients are not mutually orthogonal, the interpretation of µ * (x) is similar, up to the additional complication that (3.9) accounts for the possible alignments between different constraint gradients.In the following, with a slight abuse of language, we shall nevertheless refer to the indices i ∈ I(x) \ I(x) as those associated to constraints which are 'aligned' with ∇J(x), in the sense that −Dh i (x)ξ J (x) ≤ 0, i.e. the violation h i (x) decreases along −ξ J (x) (or, at least, stays constant).

Decrease properties of the trajectories of the null space ODE
The final result of this section is the counterpart of Proposition 1 in the case of the equality and inequality constrained optimization problem (1.1); see Appendix A for the proof and further remarks.
Proposition 5. Assume that the trajectories x(t) of the flow Then the following properties hold true: (1) The violation of the constraints decreases exponentially: (2) J(x(t)) decreases 'as soon as the violation (3.27) of the constraints is sufficiently small' in the following sense.Assume that rank(DC where σ p (x) is the smallest singular value of DC I(x) (x).Then there exists a constant C > 0 such that (3) Any stationary point x * of the flow (3.26) satisfies the KKT optimality conditions (3.6) which equivalently rewrite: where are defined in (3.9) or (3.17).

Comparison with the method of slack variables for inequality constraints
One popular method in the literature to address inequality constraints in problems such as (1.1), which is significantly different from ours, is to introduce slack variables so as to turn them into equality constraints of an augmented problem.In this section, we briefly review this idea which was investigated by [51] in the context of dynamical system approaches to constrained optimization, and we compare it with our method based on the dual problem (3.9).
The method of slack variables consists in replacing (1.1) with the following equivalent equality-constrained problem, involving as many extra variables (z 1 , . . ., z q ) ∈ R q as there are inequality constraints in (1.1): where the augmented vector of constraints C(x, z) reads: Problem (3.31) is an equality constrained optimization problem of the form (2.1), set over the Hilbert space V = V × R q with inner product (x, z), (x , z ) V = x, x V + z T z .It can be solved thanks to the proposed algorithm in Section 2; the associated gradient flow for (3.31) reads: x(0) = x 0 and z(0 where x 0 ∈ V is the considered initial point in the resolution of (1.1), and the variable z is initialized with a value z 0 ∈ R q in such a way that the inequality constraints of (1.1) which are inactive for x 0 (i.e.h i (x 0 ) < 0) are associated with satisfied equality constraints C p+i (x 0 , z 0 ) = 0 in (3.31), namely z 0,i = 2|h i (x 0 )|.In the finite-dimensional setting V = R k and when J, g and h are C 2 functions, Schropp and Singer proved in [51] that: (i) Stationary points of the extended flow (3.32) are exactly critical points of (1.1), that is points x * satisfying (3.6) but whose multiplier µ(x * ) ∈ R q may have negative entries.(ii) Among all possible critical points of (1.1),only KKT points (fulfilling all three conditions (3.6) with µ(x * ) ∈ R q + ) are asymptotically stable equilibria.As a consequence, the solution vector x(t) to (3.32) converges in practice to a KKT point for problem (1.1).
The main differences between this slack variable approach and our new approach in Section 3.2 for dealing with equality and inequality constrained problems can be summarized as follows.
(1) Any point x crit satisfying the constraints (C I(x crit ) (x crit ) = 0) and Π C I(x crit ) (∇J(x crit )) = 0 is a stationary point of the slack variable dynamical system (3.32),although it might violate the full KKT condition (because (3.5) may yield negative values of the multiplier µ I(x crit ) (x crit )).BY contrast, only true, feasible KKT points are stationary points of our flow (3.26), see Proposition 5. (2) The computation of the directions ξ J (x) and ξ C (x) involved in our flow (3.26) requires to invert a matrix of size at most (p + q(x))-by-(p + q(x)) where q(x) is the number of active or violated constraints at x.The process of equalizing inequality constraints as in [51,54] rather requires to invert the full (p + q)-by-(p + q) matrix DC(x, z)DC(x, z) T .Our method is therefore more efficient if q(x) q, that is, if a lot of inequality constraints are inactive.
(3) At feasible points, our ODE (3.26) follows the best locally admissible descent direction (with respect to the norm of V ), which is not the case for the extended ODE (3.32).Therefore, from a common feasible point x, the ODE (3.26) always decreases the objective function with a 'steeper slope'.All in all, our numerical observations (not reported in the present article, but extensively discussed in [31]) tend to illustrate that both flows (3.26) and (3.32) may have equivalent performances for solving the non linear optimization problem (1.1), this performance being measured in term of the total length covered by the optimization path to reach the optimum.However, the two ODEs (3.26) and (3.32) yield optimization paths of essentially different natures.Our null space flow (3.26) ignores inactive constraints and those aligned with the gradient of the objective function.As a result, it produces non smooth paths that are more likely to reach quickly the saturation of the constraint.The extended flow (3.32) yields smoother trajectories that more likely stay away from the constraints, at the cost of inverting at every step the full matrix DC(x, z)DC(x, z) T whose size equals the total number of constraints (active and inactive).

Numerical discretization and time-stepping schemes for the null space ODE
This short section describes practical implementation details for the discretization of the ODE (1.2) by an explicit Euler scheme.Two important issues are discussed in Sections 4.1 and 4.2, respectively.First, we propose small adaptations in the computation of ξ J (x) and ξ C (x) in order to account for the discontinuous changes of the right-hand side −(α J ξ J + α C ξ C ).Then, a merit function is proposed for adapting the time step ∆t.The complete implementation of the algorithm is summarized in Section 4.3 below.

Accounting for discontinuities near the inequality constraint barriers
A potential issue when implementing the above Algorithm 1 comes from the fact that the vector fields ξ J and ξ C given by (3.7) and (3.24) suffer from the same discontinuities as the discrete index mapping x → I(x).As a result, abrupt oscillations of the discrete optimization path (x n ) may occur near the boundary of the feasible set: if h i (x n ) = 0 and i ∈ I(x n ) for some index i ∈ {1, . . ., q}, then in the definition (3.24) of ξ J (x n ), the gradient ∇J(x n ) is projected tangentially to the constraint h i , but it is not projected after any slight deviation (e.g.due to the discretization) making this constraint inactive (h i (x n+1 ) < 0).This kind of issue is very classical in the discretization of ODEs with discontinuous vector fields and can be tackled by various methods, see e.g.[25] for a review.
In this section, we suggest a simple alternative: constraints are felt from a short distance by replacing the set I(x n ) in (1.6) with the set I (x n ) of inequality constraints violated "up to i ": The tolerances i > 0 can be estimated in an automatic fashion, independently of an arbitrary rescaling of the constraints, thanks to an posteriori bound which we now detail.Let h be a user-defined parameter, representing the distance from a point x to the boundary of the feasible set at which we desire that the constraints should be 'felt'.This characteristic length h should be defined in accordance with the typical distance ||x n+1 − x n || V between two successive iterates of the algorithm.For our shape optimization applications in Section 5, h is typically of the order of the size of the mesh discretizing the shape, see Section 5.3.2below.Assume now that the current point x n satisfies the constraint h i up to the uncertainty h on its location: by this we mean that there exists some unknown point Then the error h for the location of x n propagates to the constraint values h i (x n ) according to the following inequality: 3) for the value of i in (4.1).
Note that more generally, the a posteriori bound (4.2) allows to assert whether a constraint C i (x n ) can be considered as satisfied or not with respect to the numerical discretization.
The dual problem (3.9) is then solved with I (x n ) instead of I(x n ) in order to obtain a new subset of indices I (x n ) which indicates which constraints are likely to be not aligned with the gradient ∇J(x n ) when approaching the barrier h i = 0 | i ∈ {1, . . ., q}\ I(x n ) .The null space and range space directions ξ J (x n ) and ξ C (x n ) in Definitions 3 and 4 are finally replaced with ξ J, (x n ) and ξ C, (x n ) computed as follows: where ) is the set of constraints that are either violated, saturated or not aligned with the gradient ∇J(x n ) at h = −( 1 , ..., q ) T .The use of I (x n ) in the definition of ξ J, (x n ) ensures that the gradient ∇J(x n ) is being projected tangentially to the constraint in a small neighborhood of the boundary of the feasible set.The use of I * (x n ) in (4.5) (instead of I(x n )) helps ensuring that the optimization trajectory eventually stabilizes on the constraint barrier {h i = 0 | i ∈ I (x n )\ I(x n )} (and not on the i -level set).As a result, no abrupt discontinuity occurs for ξ J, and ξ C, when crossing the boundary of the feasible domain as long as the optimization path stays in this neighborhood.Including inequality constraints indexed by i ∈ I (x n ) in the Gauss-Newton direction ξ C, (x n ) even if they are satisfied (i.e. if − i ≤ h i (x n ) ≤ 0) further allows to stabilize the values of these constraints closer to zero.

Time step adaptation based on a merit function.
The ODE (1.2) is discretized by an explicit scheme of the form: with a variable time step ∆t n > 0. The practical implementation of such a strategy is often guided by a merit function, i.e. an indicator allowing to detect when a step has been too large, a situation where a choice has to be made regarding whether the step should be reduced or accepted.For our null space algorithm, a merit function which resembles very much that of the Augmented Lagrangian Method is readily available, however with a specific choice of multipliers: Lemma 3.For a given x n ∈ V , let merit xn : V → R be the function defined by where is the vector of multipliers defined as the solution to the dual problem (3.9) (see (3.17 Proof.It is a straightforward computation of the gradient of (4.7).
One possible implementation of an optimization strategy of the form (4.6) based on this merit function is summarized in Algorithm 1, which requires the introduction of a few extra parameters: • time step: choose a fixed time step ∆t > 0.
• maxtrials: the optimization time step is decreased up to maxtrials times until the value of the merit function has decreased.If the merit function has not decreased after all maxtrials steps, the smallest step is accepted.• tolLag: a small threshold for the values of the Lagrange multipliers µ * i under which these are considered to be 0 (in our examples, we took tolLag=1e-8).This value should be set in accordance with the machine precision and that of the quadratic programming solver for the dual problem (3.9).Let us emphasize that these parameters have a quite intuitive and physical meaning, so that the task of assigning their values does not involve fine tuning in practice.
Importantly, the rescaling induced by the inverse of the correlation matrix (DC I(xn) DC T I(xn) ) −1 normalizes all the constraints; in particular, the whole Algorithm 1 as outlined below is invariant under multiplication of the constraints by arbitrary positive constants (up to the machine precision for the step 3) and no preliminary rescaling of the constraints is required from the user.

Overall algorithm pseudo code
The resulting algorithmic implementation of our null space gradient flow taking into account both adaptations of Section 4.1 and (4.2) is summarized in Algorithm 1 below.
Algorithm 1 Discretization of the null space gradient flow (3.26).
2. For all inequality constraints 1 ≤ i ≤ q, compute the tolerance Determine the set I(x n ) of active or violated constraints and the set I (x n ) of constraints violated "up to i ": 4. Denoting by q := Card( I ), solve the dual problem to obtain the optimal Lagrange multiplier µ * (x n ).Infer the subset I (x n ) ⊂ I (x n ) indicating which constraints must remain active (Proposition 4) : (4.9) for k = 1 . . .maxtrials do Compute the step

Application to shape optimization
With the previous material at hand, we are now in position to present our optimization strategy dedicated to shape and topology optimization problems.In such applications, the optimization takes place within a set of shapes in R d (d = 2 or 3): where D ⊂ R d is an enclosing 'hold-all' domain.Since X is not even a vector space, the present context does not fall into the optimization framework described in Sections 2 and 3.However, X can be locally replaced by a subset which is naturally identified to the (infinite-dimensional) Banach space W 1,∞ (D, R d ) (see below).This is achieved in the framework of Hadamard's method of boundary variations, as explained in Section 5.1.Introducing a suitable Hilbert space V ⊂ W 1,∞ (D, R d ), the set X can be endowed with a manifold structure, where V plays the role of a tangent space, as we outline in Section 5.2.These facts make it possible to extend our dynamical system (3.1) to this context, up to small adaptations, as described below.Section 5.3 explains several implementations details of Algorithm 1 that are specific to shape optimization.
In particular, we highlight how the classical extension and regularization procedures of shape derivatives are naturally included in our method when using the definition (2.3) of the Hilbertian transposition T .

Hadamard's framework for gradient-based shape optimization
The essence of Hadamard's method (see for instance [3,35,47,57]) is to replace the complicated set of shapes X in (5.1), by the set O defined in (1.11).which has the simpler structure of a Banach space.The induced parametrization of shapes by vector fields θ ∈ W 1,∞ (D, R d ) gives rise to the following definition of shape derivative.
, is called the shape derivative of F at Ω and the following expansion holds in the vicinity of θ = 0: (5.2) When dealing with shape optimization problems of the form (1.1), we consider objective and constraint functions J : O → R, g : O → R p and h : O → R q which are shape differentiable in the sense of Definition 5.
Since W 1,∞ (D, R d ) is not a Hilbert space, the shape derivative DJ(Ω) of J at Ω (and those of g and h) cannot be readily identified with a gradient vector ξ ∈ W 1,∞ (D, R d ).To circumvent this drawback, we introduce a Hilbert space of vector fields V ⊂ W 1,∞ (D, R d ), with inner product •, • V , and where the inclusion is continuous.This ensures that DJ(Ω), Dg(Ω) and Dh(Ω) are also continuous linear operators on V , hence the definitions of the gradient ∇J(Ω) ∈ V and of the transposed operators Dg T (Ω) : R p → V , Dh T (Ω) : R q → V with respect to the inner product •, • V make sense; see Definition 1.For instance, the gradient ∇J(Ω) ∈ V is obtained by solving the so-called identification problem: ∀θ ∈ V, ∇J(Ω), θ V = DJ(Ω)(θ). (5.3) A typical choice as for the Hilbert space , equipped with its standard inner product (the inclusion H m (D, R d ) ⊂ W 1,∞ (D, R d ) being a consequence of the Sobolev embedding theorem, see [18]).In this case, the identification problem (5.3) boils down to a linear elliptic problem of order 2m.Let us recall that, under mild regularity assumptions on the objective function J(Ω), the shape derivative of J(Ω) can be written in the form of a boundary integral involving only the normal component of the deformation θ (this is the so-called Hadamard structure theorem [35,47,56]).In practice, in all the considered applications hereafter, there exists v J (Ω) ∈ L 1 (∂Ω) such that: (5.4) A common strategy in the literature (see for instance [10,14,21,32,24,45]) consists in taking simply H 1 (D, R d ) as for the Hilbert space V , equipped with the inner product where γ > 0 is a user-defined parameter which can physically be interpreted as a length-scale for the regularity of deformations θ (typically, γ = 3 hmin where hmin is the minimum edge length of the mesh discretization).Note that this choice of V is an abuse of the above framework since However, under the very mild assumption v J (Ω) ∈ L 2 (∂Ω), (which is for instance satisfied in the situations considered in Section 6), the identification problem (5.3) is still well-posed because (5.4) defines a continuous linear form on H 1 (D, R d ).In such a situation, the identification (5.3) to (5.5) is interpreted as an extension and regularization of the normal velocity v J (Ω) to the whole domain D. This practice and its consistency with respect to optimization are very classical issues in shape optimization, see [10,14,21,24,45].In particular, variants can be considered for tuning more finely the smoothness of such extensions, or to prescribe non optimizable boundaries by imposing a zero Dirichlet boundary condition in (5.4).Eventually, the choice V = H 1 (D, R d ) is quite convenient because this space is easily discretized with P 1 finite elements.Since this leads to very good results in practice, we shall rely on this strategy in the following.
In light of the previous discussion, the proposed dynamical system (3.1) for tackling shape optimization problems of the form (1.1) is extended and discretized as follows.
(1) The null space and range space directions ξ J (Ω) and ξ C (Ω) are computed as elements of V = H 1 (D, R d ) thanks to the formulas (3.7) and (3.24).This requires the computation of the gradient ∇J(Ω) and of the transposes Dg T (Ω), Dh T (Ω) via the resolution of identification problems such as (5.3).In particular, steps 1 to 4 of Algorithm 1 including the resolution of the dual problem (3.9) are achieved from the knowledge of the Fréchet derivatives and of their transposes.(2) The update (3.2) of the design from one iteration to the next is performed by and the step 5 of Algorithm 1 is adapted accordingly.The numerical procedure to account for the deformation from Ω n to Ω n+1 is presented in Section 5.3.

Manifold structures for shape optimization
Another interpretation of the Hadamard framework of Section 5.1 can be made in terms of manifold structures, see e.g.[13,52].This allows to extend the material of Sections 2 and 3 to shape optimization purposes by using 'classical' optimization strategies on smooth embedded manifolds M ⊂ R k .In such a context, a descent direction at a point x n ∈ M for some objective functional is typically sought as an element ξ n ∈ T xn M of the tangent space T xn M to M at x n ; see e.g.[29,1].Then one relies on a retraction, i.e., a mapping ρ xn : T xn M → M , satisfying the following two consistency conditions: The mapping ρ xn then allows to convert ξ n into a practical update on M : where ∆t > 0 is the descent step; see [2] and Figure 2. Since the new point x n+1 belongs to M , this procedure can be repeated iteratively.
In the context of Section 5.1, the set of shapes X plays the role of the manifold M and, by Hadamard's method, in view of (1.11), the tangent space to X at Ω can be identified to W 1,∞ (D, R d ).The corresponding retraction is, for any ρ Ω (θ) := (I + θ)(Ω). (5.8) Formally, the set W 1,∞ (D, R d ) may be interpreted as the tangent space to X at Ω and the mapping ρ Ω , which is defined by (1.11) on a neighborhood of θ = 0 in W 1,∞ (D, R d ), plays the role of a retraction.Finally, the bilinear form •, • V introduced in (5.3), can be interpreted as a metric on the 'manifold of shapes' X , see e.g.[52,53] about this idea.

Implementation of the constrained gradient flow for level set based shape optimization
The employed level set framework for numerical shape and topology optimization is recalled in Section 5.3.1.
Further technical details about the practical implementation of Algorithm 1 are then presented in Section 5.3.2.
5.3.1.Numerical shape optimization using the level set method and a mesh evolution strategy Our numerical representation of shapes and their deformations relies on the level set method, pioneered in [49], then introduced in the shape optimization context in [9,62].A given shape Ω inside the fixed 'hold-all' domain D is represented by means of a scalar, level set function φ : D → R such that: ( The motion of a domain Ω(t) in D evolving over a period of time (0, T ), starting from a known shape Ω(0) = Ω, according to a velocity field θ : D → R d translates in terms of an associated level set function φ(t, x) (i.e.(5.9) holds at every time t ∈ (0, T )) by the following advection equation: where φ 0 is one level set function for Ω.
In the implementation of the discretized optimization flow (1.2), passing from the current iteration, indexed by n to the next one n + 1 implies the motion of the corresponding shape Ω n along the descent direction for a small time step ∆t n .Here, the coefficients α J and α C of the update procedure (5.6) may vary from one iteration to the next, as reflected by the n subscript (this slight modification of Algorithm 1 is detailed in Section 5.3.2below).The level set function φ n , corresponding to the current shape Ω n , is updated by solving equation (5.10) on the current mesh T n of D with θ = θ n and φ 0 = φ n .After a time step ∆t the new shape Ω n+1 is defined as {x ∈ D | φ(∆t n , x) < 0}.
In the implementation, we use the mesh evolution technique of our previous works [5,32].In a few words, at every iteration n, the current shape Ω n is explicitly discretized as a submesh of a triangulated mesh T n of D as a whole (see e.g. Figure 10 below).After solving equation (5.10) thanks to an adapted solver (in practice we use that of our previous work [20]), T n is remeshed adaptively into a new mesh T n+1 featuring a discretization of Ω n+1 as a submesh, by using the open-source library Mmg [22].
Remark 7. In our method, the velocity θ ∈ H 1 (D, R d ) is a vector field, in contrast with more classical level set methods [9,62] that rather rely on a non linear Hamilton-Jacobi equation, which contrary to (5.10) involves only the normal component of θ.In the latter case, it is enough to regularize only the normal component θ • n (a scalar field) of the shape derivative; see [31] for more details.

Adaptive normalizations for the null space and range space directions
A few comments are in order regarding the appropriate scaling of the null and range space steps with respect to the size of the mesh discretization in the practice of Algorithm 1, and the choice of variable coefficients α J,n and α C,n in the descent direction θ n given by (5.11).
For stability reasons, the vertices of the current mesh T n accounting for Ω n should move by a distance which equals at most a few mesh elements in order to produce the subsequent shape Ω n+1 .Hence, the minimum edge length hmin of the computational mesh is a natural candidate for the limiting step size value h in the discussion in Section 4.1.In our practical implementation, we set ∆t = 1 and a descent direction where α J and α C of the update (5.6) have been replaced by dynamic coefficients α J,n and α C,n .
The parameters α J,n and α C,n scaling the null space and range space steps ξ J (Ω n ) and ξ C (Ω n ) are updated dynamically in order to control the step size norm is considered because all values of the displacement θ n should be of the order of the mesh size.We consider A J and A C two user-defined parameters, which are expressed in terms of the minimum edge length hmin for a clearer intuitive meaning.The coefficients α J,n and α C,n are updated at every iteration according to the following rules: if n ≥ n 0 (5.13) α C,n := min 0.9, These normalizations ensure that the null space and range space steps always remain smaller than multiples A J and A C of the mesh size: Actually, the null space contribution α J,n ξ J (Ω n ) of θ n is scaled so that its size is exactly A J hmin during the first n 0 iterations: The range step α C,n ξ C (Ω n ) is also set to remain smaller than the constant 0.9, in view of the stability condition 0 < α C ∆t < 2 (see Remark 13).The role of the constant 1e-9 is only to avoid division by 0 when no constraint is active.
Remark 8. Since we measure step sizes with the norm ||θ n || L ∞ (D) rather than with the Hilbertian norm , the tolerance bounds (4.3) need to be updated with respect to this norm as follows: where it is assumed that the shape derivative of each constraint functional C i (Ω n ) can be written as a boundary integral, as in (5.4), featuring the scalar field v Ci (Ω n ):

Applications to shape optimization in the design of mechanical structures
In this final section, we illustrate the efficiency of our optimization strategy with practical structural design examples.We first treat in Section 6.1 a structural design problem in thermoelasticity which was tackled in [64] with an Augmented Lagrangian Method.This case study is interesting because it features a situation where an initially violated inequality constraint is not saturated by the final design.Then, we treat in Section 6.2 a multiple load bridge test case in order to show the ability of the method to handle multiple objective criteria or multiple constraint functions.

Minimum compliance problem in thermoelasticity: detection of unsaturated constraints
In this section, we reproduce a test case coming from Xia and Wang [64] concerned with compliance minimization in thermoelasticity.The objective of our study is to illustrate (i) the efficiency of our null space optimization scheme in contrast with the classical Augmented Lagrangian strategy used in [64] and (ii) the importance of the use of the dual problem (3.9) for discriminating the inequality constraints which must remain saturated in the course of the optimization process.structure, which induces thermal expansion of the material.The boundary of the considered shape is divided into three parts: where: • Γ D is the reunion of the (non optimizable) left and right-hand boundaries of D on which the structure Ω is clamped.• Γ g is a (non optimizable) small portion of the middle of the bottom boundary on which a vertical traction load g = (0, −1/|Γ g |) is applied.In our implementation, we set |Γ g | = 0.0125.• Γ is the remaining part of ∂Ω which is traction-free; it is the only part of ∂Ω which is subject to optimization.The setting is reproduced on Figure 3.Both thermal and traction loads ∆T and g induce an elastic displacement u ∈ H 1 (Ω, R d ) which is given by the solution of the linearized thermoelasticity system: where σ s (u, ∆T ) is the thermoelastic tensor given by The shape derivatives of J and Vol are classically given by (see [64,31]): where n denotes the normal vector to ∂Ω, pointing outward Ω.
The upper bound for the volume of the structure is set to V target = 0.4.This problem is particularly interesting when it comes to illustrate the relevance of our dual problem strategy in the determination of whether an active constraint is aligned with the minimization of the objective function or not.Indeed, as we shall see below, the optimized design may not saturate the volume constraint Vol(Ω) ≤ V target depending on the considered value of the parameter ∆T .
Following [64], the optimization problem is solved for four values of ∆T (∆T = 0, 5, 10 or 20).In order to illustrate the importance of the resolution of the dual problem (3.9), we also consider a strategy (labeled 'no-dual ') obtained by ignoring the corresponding step 4 in Algorithm 1 and by setting 'naively' I (x n ) = I (x n ) in the subsequent steps.
Results are shown on Figure 4 where the convergence histories for the objective function and the volume constraint are plotted for each test case.The numerical values for the final objective and constraint functionals are provided in Table 1, while the initial and optimized shapes are shown on Figs. 5 and 6.Note that our numerical values do not coincide exactly with those in [64] because their original physical parameters were multiplied by nondimensionalization constants which are more compatible with our setting.However we clearly retrieve very similar optimized shapes.
Notice the smooth and rather fast convergence of our optimization strategy, whereas in the original paper (Figure 8 of [64]), convergence is obtained in about 1000 iterations, after a lot of oscillations of the constraint function.
Interestingly, and as observed in [64], we retrieve the fact that the volume constraint Vol(Ω) ≤ V target is saturated for the first two test cases ∆T = 0 and ∆T = 5, and is not saturated otherwise.In the first two cases, the sets I (Ω n ) and I (Ω n ) remain identical, hence no difference is observed between the strict application of Algorithm 1 and its 'no-dual' variant (the corresponding convergence histories for the latter strategy are therefore not represented on Figure 4 in these cases).However, significant differences are observed for the two situations ∆T = 10 and ∆T = 15: the strategy labeled 'no-dual' proceeds by enforcing all saturated or violated constraints in I (Ω n ), and not only those in I (Ω n ) (which is empty in the present case).As a result, it is not able to detect that a better descent direction could be obtained by allowing the constraint to become unsaturated (and as a matter of fact, to become 'better' satisfied): the constraint remains saturated and a worse optimal final design is obtained at convergence.

Shape optimization of a bridge structure subjected to multiple loads
Let us now consider the shape optimization of a bridge-like structure Ω contained in a two-dimensional rectangular hold-all domain D ⊂ R 2 with size 10 × 2. The purpose of this part is to show that the null space gradient flow is able to handle multiple constraints.
The boundary ∂Ω of the bridge is divided into disjoint regions as: where • Γ D is a non-optimizable part of the boundary on which the structure Ω is clamped, made of two segments with unit length at the lower extremities of D.    • For i = 0, ..., 8, Γ i is a non-optimizable subset of the upper side of D with respective abscissa i 10 9 , (i + 1) 10 9 ; Γ i is subjected to a unit, vertical downward traction load g i = (0, −1).• The remaining region Γ is traction-free and it is the only region of ∂Ω which is subject to optimization.

∆T
Non-optimizable material layers of width 0.1 are additionally imposed on the upper part of the domain D and above each component of Γ D and we impose that the structure do not infringe on a thin layer of void at the bottom of ∂D; see Figs. 7 and 8.We consider nine different load cases, that are obtained by applying successively and exclusively each of the loads g i on the region Γ i .In each situation, the corresponding elastic Figure 7. Geometric setting for the multiple load case test case displacement u i is the unique solution in H 1 (Ω, R d ) to the linearized elasticity system: The Young's modulus and the Poisson's ratio are set to E = 15 and ν = 0.35, which corresponds to λ = 12.96 and µ = 5.56.Let us emphasize once again (see Section 5.3.1) that the shape is exactly meshed at each iteration (see Figure 10 below), so that each state equation (6.3) is solved by means of a standard finite element method on the meshed subdomain Ω n (without resorting to ersatz material approaches as in e.g.[9]).
Starting from the initial structure Ω 0 depicted in Figure 8, we minimize the volume Vol(Ω) of the structure Ω and maximize the collection of compliances C i (Ω) (for each load case g i ), which are defined by: Their shape derivatives read (see e.g.[9,35]): In what follows, two possible configurations are investigated for the shape optimization of the bridge structure, featuring either multiple constraint functions (in Section 6.2.1) or multiple objective criteria (in Section 6.2.2).

Volume minimization with maximum compliance constraint
At first, the volume Vol(Ω) is minimized and we require that each individual compliance C i (Ω) do not exceed a given threshold C: where I ⊂ {0, 1, . . ., 8} is a set of indices selecting the considered load cases.We solve (6.6) in the following three configurations: (1) Case 1: single load case: I = {4} (only the central load g 4 is applied) (2) Case 2: three load case: I = {0, 4, 8} (only the central load g 4 and the two extreme loads g 0 and g 8 are applied).(3) Case 3: all load cases: I = {0, 1, . . ., 8} (all nine loads are considered).
The value of C in (6.6) is set to a fraction of the maximum of the compliances C i (Ω 0 ) of the initial design Ω 0 (reported on Figure 8): Ae(u i ) : e(u i )dx. (6.7) Let us emphasize that for this example (and the next ones), no fine tuning of the algorithm parameters A J and A C (determining the update of the values of α J,n and α C,n in (5.12)) of Section 5.3.2 is required.The only intuition guiding our choice for this particular test case is that the value of A J should be set lower than A C .Indeed, a too large value of A J might entail a too fast decrease of the volume, which would incur dramatic topological changes violating the rigidity constraints.Therefore these parameters are set to A J = 1 and A C = 2 for this test case.The minimum mesh size is hmin = 0.03.The optimized shapes obtained in the three aforementioned situations are represented on Figure 9.The meshes of the initial and final designs, as well as several intermediate shapes corresponding to the nine load test-case are shown on Figs. 10 and 11.The convergence histories in the three situations are reported on Figs. 12 to 14.They allow to verify the decrease of the objective function even after the saturation of the constraints.Note that for this example and the one to follow, the sets I (Ω n ) and I (Ω n ) (see Algorithm 1) happen to coincide at every iteration.As expected, the optimum value found for the volume of the solid distribution increases with the number of constraints.The major structural change between the different situations is the addition of extra vertical bars of material near the extremities of the structure.(6.9) The optimization is now performed with respect to both the slack variable m and the domain geometry Ω, which demands minor adaptations of our optimization algorithm (similar e.g. to those in Section 3.5): the optimization set X × R is equipped with the tensorized tangent space V = V × R and differentials are       identified to gradients thanks to the inner product •, • V defined by where •, • V is the scalar product of (5.5).The slack variable m is initialized with the maximum value of the compliance of the initial structure Ω 0 over all the considered loads: and its values m n are then updated along with the shape Ω n according to Algorithm 1.
The resulting optimized structures are shown on Figure 15 for each of the three considered configurations and the associated convergence histories are displayed on Figs.16 to 18 for the single, triple and nine load cases respectively.Note that sudden, abrupt peaks on the constraint curves correspond to topological changes (e.g. at iteration 38 for the nine load case) for which the displacements corresponding to the extremal loads g 0 and g 8 are especially sensitive.We observe the decrease of all the functionals C i (Ω) even after all the inequality constraints have been saturated, which occurs as soon as the compliances reach a common value.As expected, the optimized design found for the nine load minimum compliance case (Figure 9c) is similar (up to a few bars) to the corresponding one found for the volume minimization (Figure 15c): indeed, both cases reach at convergence a volume fraction Vol(Ω) = 0.5Vol(D) and a maximum compliance max C i (Ω) 1.30.which is sufficient for optimization purposes.One can indeed verify that, in the latter context: (1) At first order, the constraints vanish at a geometric rate: g(x n+1 ) = (1 − α C ∆t)g(x n ) + o(∆t).
(2) If x * is an accumulation point of the sequence (x n ) n∈N , then g(x * ) = 0 and x * is a KKT point of the problem (2.1), satisfying (2.6).
Remark 10.In our design of the update rule (2.15) with (2.7) and (2.8), it is possible to control more accurately the pace at which each of the constraints decreases: let us indeed introduce a diagonal matrix of positive coefficients K = diag(κ i ) 1≤i≤p and replace the definition (2.8) of ξ C (x) by ξ C (x) := Dg T (DgDg T ) −1 Kg(x).
Then it can be shown along the lines of the previous discussion that each constraint function g i decreases at its own rate κ i α C along the solution x(t) of (2.15): ∀t ∈ [0, T ], g i (x(t)) = e −κiα C t g i (x 0 ).Now we prove Proposition 5. Proof.
Remark 11.The assumption (a) in Proposition 5, whereby the index set I(x(t)) remains constant is essentially made to ensure that the right-hand side of the flow (3.26) is continuous.Indeed, in such a case, the range space direction ξ C (x(t)) is continuous by its definition (3.7), while the null space step ξ J (x(t)) is continuous because and it can be shown that the multipliers (λ * (x(t)), µ * (x(t))) defined by (3.9) are continuous functions.When the sets I(x) or I(x) are subject to change (corresponding to inequality constraints becoming active or inactive), the ODE (3.26) has a discontinuous right-hand side and is only defined formally; we conjecture a rigorous mathematical meaning could still be provided in a weaker sense with the theory of non smooth ODEs, see e.g.[25,33] or [11].At a time T corresponding to a sudden change of the index set I(x(t)), we assume that the solution x(t) can be extended by restarting the ODE (3.26) with the new index set I(x(T )).Under this circumstance and by construction of ξ J (x(t)) and ξ C (x(t)), the bound h I(x0) (x(t)) ≤ e −α C t h I(x0) (x(0)) still holds while the other constraints remain saturated or satisfied for t ≥ T : h I(x(t))\ I(x0) = 0. Hence, assuming this procedure can be extended for all times, all constraints are asymptotically satisfied.Properties (2) and (3) then remain true, up to an adjustment of the constant C in (3.29) (which can be taken global since there are finitely many possible sets I(x(t))).There might exist situations where the set of saturated constraints I(x(t)) could oscillate indefinitely.However (2) states that x(t) always keeps improving (in the sense of (3.29)), and (3) states that if x(t) eventually converges, it is necessarily towards a KKT point.
Remark 12. Assuming the index set I(x(t)) eventually becomes stationary (which is expected to be the case under suitable regularity assumptions on the admissible set), the ODE (3.26) reduces to the one considered in [51,65] involving only equality constraints.In such circumstances, it is possible to prove that all limit points of x(t) are stationary points (satisfying in our case the KKT condition (3.30)), see Theorem 2.3 of [51].
Remark 13.In practice, the analysis of Proposition 5 is sufficient because, similarly to the conclusions of Remark 9, analogous properties hold for the discrete scheme Indeed, one can easily check that: (1) Up to first order, the violation of the constraints vanish at a geometric rate: This suggests that in order to obtain a stable scheme, one must a priori select α C and ∆t such that 0 < α C ∆t < 2. (2) If x * is an accumulation point of the sequence (x n ) n∈N , then x * is feasible, i.e.C I(x * ) (x * ) = 0 and it is a KKT point for (1.1).A flexibility of this ODE approach is that at the continuous level, the results of Proposition 5 do not depend on the values of the parameters α J > 0 and α C > 0. Therefore the convergence of the discrete scheme towards the continuous trajectory should hold as soon as the discretization step size ∆t > 0 is sufficiently small.These arguments however remain formal: a rigorous statement is out of the scope of this work and would require a careful analysis and suitable regularity assumptions; see e.g.[43] for an instance of work discussing the relations between the properties of continuous ODEs and those of their associated discretization schemes.

Figure 1 .
Figure 1.An example of optimization trajectory produced by our null space gradient flow (1.2) (labeled 'NLSPACE').Trajectories travel tangentially to an optimal subset I(x) ⊂ I(x) of the active constraints I(x), which is determined by a dual problem (see Section 3).A less optimal trajectory (labeled 'NLSPACE (no dual)') is obtained if the set I(x) is not identified, because it is unable to escape the tangent space to the constraints labeled by I(x).

)
with ξ J and ξ C given by (3.7) and (3.24) exist on some interval [0, T ] for T > 0 and are such that: (a) the set I(x(t)) defined in (1.6) is constant over [0, T ]: ∀t ∈ [0, T ], I(x(t)) = I(x 0 ); (b) the constraints remain qualified along the flow x(t), in the sense of the LICQ condition (3.3).
)) and S(x n ) = (DC I(xn) (x n )DC I(xn) (x n ) T ) −1 is symmetric positive definite.Then (4.6) is a gradient step for the decrease of the function merit xn , namely:

Figure 2 .
Figure 2. Optimization on a manifold M : a retraction map ρx n is used to project a tangential motion ∆tξn ∈ Tx n M from xn ∈ M back onto the optimization domain M .

Figure 4 .
Figure 4. Convergence histories for the thermoelasticity test case of Section 6.1.

Figure 5 .
Figure 5.Initial design considered for each of the test cases of Section 6.1

Figure 6 .
Figure 6.Final designs computed for each of the test cases of Section 6.1.

Figure 8 .
Figure 8. Initialisation Ω0 (solid in black ) for the shape optimization examples of Section 5.The thin white layer at the bottom is a non optimizable part of the domain.

6. 2 . 2 .
Min/Max compliance optimization with a volume constraint Now, the maximum value of the compliances C i (Ω) is minimized with an equality volume constraint: min Ω∈X max i∈I C i (Ω) s.t.Vol(Ω) = ρ 0 Vol(D) (6.8)

( c )
All nine loads.

Figure 9 .
Figure 9. Optimized shapes for three possible configurations of the volume minimization problem subject to maximum compliance constraint (Section 6.2.1).

Figure 10 .
Figure 10.Meshes of the initial and final shapes for the nine load case of Figure 9c (Section 6.2.1).

Figure 12 .
Figure 12.Convergence histories for the single load case of Section 6.2.1.

Figure 13 .
Figure 13.Convergence history curves for the three load case of Section 6.2.1.

Figure 14 .
Figure 14.Convergence history curves for the nine load case of Section 6.2.1.

( c )
All nine loads.

Figure 15 .
Figure 15.Optimized shapes for three possible configurations of the min/max optimization problem (6.9) of Section 6.2.2.

Figure 16 .
Figure 16.Convergence history curves for one load case of Section 6.2.2.
Inequality constraints values: Hi = C i − m.

Figure 17 .
Figure 17.Convergence history curves for three load case of Section 6.2.2.

Figure 18 .
Figure 18.Convergence history curves for nine load case of Section 6.2.2.

Table 1 .
Optimized [62]liance and volume values for the thermoelasticity test case of Section 6.1.The results are analogous to those of[62].