Approximation of null controls for semilinear heat equations using a least-squares approach

The null distributed controllability of the semilinear heat equation $y_t-\Delta y + g(y)=f \,1_{\omega}$, assuming that $g$ satisfies the growth condition $g(s)/(\vert s\vert \log^{3/2}(1+\vert s\vert))\rightarrow 0$ as $\vert s\vert \rightarrow \infty$ and that $g^\prime\in L^\infty_{loc}(\mathbb{R})$ has been obtained by Fern\'andez-Cara and Zuazua in 2000. The proof based on a fixed point argument makes use of precise estimates of the observability constant for a linearized heat equation. It does not provide however an explicit construction of a null control. Assuming that $g^\prime\in W^{s,\infty}(\mathbb{R})$ for one $s\in (0,1]$, we construct an explicit sequence converging strongly to a null control for the solution of the semilinear equation. The method, based on a least-squares approach, generalizes Newton type methods and guarantees the convergence whatever be the initial element of the sequence. In particular, after a finite number of iterations, the convergence is super linear with a rate equal to $1+s$. Numerical experiments in the one dimensional setting support our analysis.

The system (1) is said to be controllable at time T if, for any u 0 ∈ L 2 (Ω) and any globally defined bounded trajectory y ⋆ ∈ C 0 ([0, T ]; L 2 (Ω)) (corresponding to the data u ⋆ 0 ∈ L 2 (Ω) and f ⋆ ∈ L 2 (q T )), there exist controls f ∈ L 2 (q T ) and associated states y that are again globally defined in [0, T ] and satisfy (4) and We refer to [5] for an overview of control problems in nonlinear situations. The uniform controllability strongly depends on the nonlinearity g. Fernández-Cara and Zuazua proved in [13] that if g is too "super-linear" at infinity, then, for some initial data, the control cannot compensate the blow-up phenomenon occurring in Ω\ω: Theorem 1 ( [13]) There exist locally Lipschitz-continuous functions g with g(0) = 0 and |g(s)| ∼ |s| log p (1 + |s|) as |s| → ∞, p > 2, such that (1) fails to be controllable for all T > 0.
On the other hand, Fernández-Cara and Zuazua also proved that if p is small enough, then the controllability holds true uniformly.
Then (1) is controllable at time T .
Therefore, if |g(s)| does not grow at infinity faster than |s| log p (1 + |s|) for any p < 3/2, then (1) is controllable. This result extends [9] obtaining the uniform controllability for any p < 1. We also mention [1] which gives the same result assuming additional sign condition on g, namely g(s)s ≥ −C(1 + s 2 ) for all s ∈ R and some C > 0. The problem remains open when g behaves at infinity like |s| log p (1 + |s|) with 3/2 ≤ p ≤ 2. We mention however the recent work of LeBalc'h [16] where uniform controllability results are obtained for p ≤ 2 assuming additional sign conditions on g, notably that g(s) > 0 for s > 0 and g(s) < 0 for s < 0. This condition is not satisfied for g(s) = −s log p (1 + |s|). Let us also mention [6] in the context of Theorem 1 where a positive boundary controllability result is proved for a specific class of initial and final data and T large enough.
In the sequel, for simplicity, we shall assume that g(0) = 0 and that f ⋆ ≡ 0, u ⋆ 0 ≡ 0 so that y ⋆ is the null trajectory. The proof given in [13] is based on a fixed point method. Precisely, it is shown that the operator Λ : L ∞ (Q T ) → L ∞ (Q T ), where y z := Λz is a null controlled solution of the linear boundary value problem y z,t − ∆y z + y zg (z) = f z 1 ω in Q T y z = 0 on Σ T , y z (·, 0) = u 0 in Ω ,g(s) := g(s)/s s = 0, maps a closed ball B(0, M ) ⊂ L ∞ (Q T ) into itself, for some M > 0. The Kakutani's theorem then provides the existence of at least one fixed point for the operator Λ, which is also a controlled solution for (1).
The main goal of this work is to determine an approximation of the controllability problem associated to (1), that is to construct an explicit sequence (f k ) k∈N converging strongly toward a null control for (1). A natural strategy is to take advantage of the method used in [16,13] and consider the Picard iterates associated with the operator Λ: y k+1 = Λ(y k ), k ≥ 0 initialized with any element y 0 ∈ B(0, M ). The sequence of controls is then (f k ) k∈N so that f k ∈ L 2 (q T ) is a null control for y k solution of y k,t − ∆y k + y kg (y k−1 ) = f k 1 ω in Q T , y k = 0 on Σ T , y k (·, 0) = u 0 in Ω. (8) Numerical experiments for d = 1 reported in [11] exhibit the non convergence of the sequences (y k ) k∈N and (f k ) k∈N for some initial conditions large enough. This phenomenon is related to the fact that the operator Λ is a priori not contractant. We also refer to [2] where this strategy is implemented. Still in the one dimensional case, a least-squares type approach, based on the minimization over L 2 (Q T ) of the functional R : is introduced and analyzed in [11]. Assuming thatg ∈ C 1 (R) and g ′ ∈ L ∞ (R), it is proved first that R ∈ C 1 (L 2 (Q T ); R + ) and secondly that, if u 0 L ∞ (Ω) is small enough, then any critical point for R is a fixed point for Λ. Under this smallness assumption on the data, numerical experiments reported in [11] display the convergence of minimizing sequences for R (based on a gradient method) and a better behavior than the Picard iterates. The analysis of convergence is however not performed. As is usual for nonlinear problems and considered in [11], we may also employ a Newton type method to find a zero of the mapping F : Y → W defined by for some appropriate Hilbert spaces Y and W (see below). It is shown for d = 1 in [11] that, if g ∈ C 1 (R) and g ′ ∈ L ∞ (R), then F ∈ C 1 (Y ; W ) allowing to derive the Newton iterative sequence: given (y 0 , f 0 ) in Y , define the sequence (y k , f k ) k∈N iteratively as follows (y k+1 , such that Y k (·, T ) = −y k (·, T ) in Ω. Once again, numerical experiments for d = 1 in [11] exhibits the lack of convergence of the Newton method for large enough initial condition, for which the solution y is not close enough to the zero trajectory. As far as we know, the construction of a convergent approximation (f k ) k∈N in the general case where the initial data to be controlled is arbitrary in L 2 (Ω) remains an open issue. Still assuming that g ′ ∈ L ∞ (R) and in addition that there exists one s in (0, 1] such that sup a,b∈R,a =b |g ′ (a)−g ′ (b)| |a−b| s < ∞, we construct, for any initial data u 0 ∈ L 2 (Ω), a strongly convergent sequence (f k ) k∈N toward a control for (1). Moreover, after a finite number of iterates related to the norm g ′ L ∞ (R) , the convergence is super linear with a rate equal to 1 + s. This is done (following and improving [20] devoted to a linear case) by introducing a quadratic functional which measures how a pair (y, f ) ∈ Y is close to a controlled solution for (1) and then by determining a particular minimizing sequence enjoying the announced property. A natural example of so-called error (or least-squares) functional is given by E(y, f ) := 1 2 F (y, f ) 2 W to be minimized over Y . In view of controllability results for (1), the non-negative functional E achieves its global minimum equal to zero for any control pair (y, f ) ∈ Y of (1).
The paper is organized as follows. In Section 2, we first derive a controllability result for a linearized wave equation with potential in L ∞ (Q T ) and source term in L 2 (0, T ; H −1 (Ω)). Then, in Section 3, we define the least-squares functional E and the corresponding optimization problem (26) over the Hilbert space A. We show that E is Gateaux-differentiable over A and that any critical point (y, f ) for E for which g ′ (y) belongs to L ∞ (Q T ) is also a zero of E (see Proposition 4). This is done by introducing a descent direction (Y 1 , F 1 ) for E(y, f ) for which E ′ (y, f ) · (Y 1 , F 1 ) is proportional to E(y, f ). Then, assuming that the nonlinear function g is such that g ′ belongs to W s,∞ (R) for one s in (0, 1], we determine a minimizing sequence based on (Y 1 , F 1 ) which converges strongly to a controlled pair for the semilinear heat equation (1). Moreover, we prove that after a finite number of iterates, the convergence enjoys a rate equal to 1 + s (see Theorem 3 for s = 1 and Theorem 4 for s ∈ (0, 1)). We also emphasize that this least-squares approach coincides with the damped Newton method one may use to find a zero of a mapping similar to F mentioned above; we refer to Remark 7. This explains the convergence of our approach with a super-linear rate. Section 4 gives some numerical illustrations of our result in the one dimensional case and a nonlinear function g for which g ′ ∈ W 1,∞ (R). We conclude in Section 5 with some perspectives. As far as we know, the analysis of convergence presented in this work, though some restrictive hypotheses on the nonlinear function g, is the first one in the context of controllability for partial differential equations.
Along the text, we shall denote by · ∞ the usual norm in L ∞ (R), (·, ·) X the scalar product of X (if X is a Hilbert space) and by ·, · X,Y the duality product between the spaces X and Y .

A controllability result for a linearized heat equation with
We give in this section a controllability result for a linear heat equation with potential in L ∞ (Q T ) and right hand side in L 2 (0, T ; H −1 (Ω)). As this work concerns the null controllability of parabolic equation, we shall make use of Carleman type weights introduced in this context notably in [14] (we also refer to [10] for a review). Here, we assume that such weights ρ, ρ 0 , ρ 1 and ρ 2 blow up as t → T − and satisfy: , m > 1, η 0 ∈ C(Ω) satisfies η 0 > 0 in Ω, η 0 = 0 on ∂Ω and |∇η 0 | > 0 in Ω\ω (see [10], Lemma 1.2, p.1401).
In the next section, we shall make use the following controllability result.
The controlled solution also satisfies, for some constant C = C(Ω, ω, T, A ∞ ), the estimate Proof-Let us first set The bilinear form where L ⋆ A q := −q t − ∆q + Aq, is a scalar product on P 0 (see [12]). The completion P of P 0 for the norm · P associated to this scalar product is a Hilbert space and the following result proved in [14] holds.
Lemma 1 There exists C = C(Ω, ω, T, A ∞ ) > 0 such that one has the following Carleman estimate, for all p ∈ P : Remark 1 We denote by P (instead of P A ) the completion of P 0 for the norm · P since P does not depend on A (see [11]).
Lemma 2 There exists C = C(Ω, ω, T, A ∞ ) > 0 such that one has the following observability inequality, for all p ∈ P : Proof-From the definition of ρ 0 , ρ 1 and ρ 2 , where each imbedding is continuous. The result follows from Lemma 1. ✷ Lemma 3 There exists p ∈ P unique solution of This solution satisfies the following estimate : and a.e. in (0, T ) ρ −1 2 q 2 and thus, since ρ 1 ≤ T 1/2 ρ 2 a.e. t in (0, T ): We then deduce that, a.e. in (0, T ) and from the Carleman estimate (17) that Thus L 1 is continuous. From (18) we easily deduce that the linear map L 2 : P → R, q → Ω z 0 q(0) is continuous. Using Riesz's theorem, we conclude that there exists exactly one solution p ∈ P of (19). ✷ Let us now introduce the convex set , z must coincide with the unique weak solution of (13) associated to v.
We can now claim that C(z 0 , T ) is a non empty. Indeed we have : Then (z, v) ∈ C(z 0 , T ) and satisfies the following estimate where C = C(Ω, ω, T, A ∞ ) > 0.
Proof-Let us prove that (z, v) belongs to C(z 0 , T ). From the definition of P , ρz ∈ L 2 (Q T ) and ρ 0 v ∈ L 2 (q T ) and from the definition of ρ, ρ 0 , ρ 2 , z ∈ L 2 (Q T ) and v ∈ L 2 (q T ). In view of (19) that is, since from the definition of ρ 2 , B ∈ L 2 (0, T ; H −1 (Ω)) and , z is the solution of (13) associated to v in the transposition sense. Thus C(z 0 , T ) = ∅. ✷ Let us now consider the following extremal problem, introduced by Fursikov and Imanuvilov [14]  . Therefore (23) possesses at most a unique solution in C(z 0 , T ). More precisely we have : y being the solution of (13) associated to w in the transposition sense. Hence (z, v) solves (23).
To finish the proof of Proposition 1, it suffices to prove that (z, v) satisfies the estimate (16). Since z is a weak solution of (13) associated to v, z ∈ L 2 (0, T ; H 1 0 (Ω)) and z t ∈ L 2 (0, T ; H −1 (Ω)). Multiplying (13) by ρ 2 1 z and integrating by part we obtain, a.e. t in (0, T ) Since The following estimates also hold Thus we easily obtain that and therefore, using (21), for all t ∈ [0, T ] : which gives (16) and concludes the proof of Proposition 1.

The least-squares method and its analysis
For any s ∈ [0, 1], we define the space In the sequel, we shall assume that there exists one s ∈ (0, 1] for which the nonlinear function g belongs to W s . Remark that g ∈ W s for some s ∈ [0, 1] satisfies hypotheses (2) and (6). We shall also assume that u 0 ∈ L 2 (Ω).

The least-squares method
We introduce the vectorial space A 0 as follows where ρ, ρ 2 , ρ 1 and ρ 0 are defined in (12). Since L 2 (0, T ; H −1 (Ω)) is also a Hilbert space, A 0 endowed with the following scalar product is a Hilbert space. The corresponding norm is (y, f ) A 0 = ((y, f ), (y, f )) A 0 . We also consider the convex space so that we can write A = (y, f ) + A 0 for any element (y, f ) ∈ A. We endow A with the same norm. Clearly, if (y, f ) ∈ A, then y ∈ C([0, T ]; L 2 (Ω)) and since ρ y ∈ L 2 (Q T ), then y(·, T ) = 0. The null controllability requirement is therefore incorporated in the spaces A 0 and A.
For any fixed (y, f ) ∈ A, we can now consider the following extremal problem : justifying the least-squares terminology we have used.
Let us remark that, if g ∈ W s for one s ≥ 0, then g is Lipschitz and thus, since g(0) = 0, there exists K > 0 such that |g(ξ)| ≤ K|ξ| for all ξ ∈ R. Consequently, ρ 2 g(y) ∈ L 2 (Q T ) (and then ρ 2 g(y) ∈ L 2 (0, T ; H −1 (Ω))) since Since any g ∈ W s satisfies hypotheses (2) and (6), the controllability result of Theorem 2 given in [13] implies the existence of at least one pair (y, f ) ∈ A such that E(y, f ) = 0. The extremal problem (26) admits therefore solutions. Conversely, any pair (y, f ) ∈ A for which E(y, f ) vanishes is a controlled pair of (1). In this sense, the functional E is a so-called error functional which measures the deviation of (y, f ) from being a solution of the underlying nonlinear equation. We emphasize that the L 2 (0, T ; H −1 (Ω)) norm in E indicates that we are looking for weak solutions of the parabolic equation (1). We refer to [18] where a similar socalled weak least-squares method is employed to approximate the solutions of the unsteady Navier-Stokes equation.
A practical way of taking a functional to its minimum is through some clever use of descent directions, i.e the use of its derivative. In doing so, the presence of local minima is always something that may dramatically spoil the whole scheme. The unique structural property that discards this possibility is the strict convexity of the functional E. However, for nonlinear equation like (1), one cannot expect this property to hold for the functional E. Nevertheless, we insist in that one may construct a particular minimizing sequence which cannot converge except to a global minimizer leading E down to zero.
In order to construct such minimizing sequence, we look, for any (y, f ) ∈ A, for a pair (Y 1 , F 1 ) ∈ A 0 solution of the following formulation We have the following property.
Proposition 3 Let any (y, f ) ∈ A. There exists a pair (Y 1 , F 1 ) ∈ A 0 solution of (28) which satisfies the following estimate: . The existence of a null control F 1 is therefore given by Proposition 1. Choosing the control F 1 which minimizes together with the corresponding solution Y 1 the functional J defined in Proposition 1, we get the following estimate (since Y 1 (·, 0) = 0) and

Remark 3
We emphasize that the presence of a right hand side in (28), namely y t − ∆y + g(y) − f 1 ω , forces us to introduce from the beginning the weights ρ 0 , ρ 1 , ρ 2 and ρ in the spaces A 0 and A. This can be seen from the equality (19): since ρ −1 2 q belongs to L 2 (0, T ; H 1 (Ω)) for all q ∈ P , we need to impose that ρ 2 B ∈ L 2 (0, T ; H −1 (Ω)) with here B = y t − ∆y + g(y) − f 1 ω . Working with the linearized equation (7) (introduced in [13]) which does not make appear an additional right hand side, we may avoid the introduction of Carleman type weights. Actually, the authors in (7) consider controls of minimal L ∞ (q T ) norm. Introduction of weights allows however the characterization (19), which is very convenient at the practical level. We refer to [12] where this is discussed at length.
The interest of the pair (Y 1 , F 1 ) ∈ A 0 lies in the following result.
Proposition 4 Let (y, f ) ∈ A and let (Y 1 , F 1 ) ∈ A 0 be a solution of (28). Then the derivative of E at the point (y, f ) ∈ A along the direction (Y 1 , F 1 ) given by Proof-We preliminary check that for all (Y, F ) ∈ A 0 , E is differentiable at the point (y, f ) ∈ A along the direction (Y, F ) ∈ A 0 . For all λ ∈ R, simple computations lead to the equality and where l(y, λY ) = g(y + λY ) − g(y) − λg ′ (y)Y . The application (Y, F ) → E ′ (y, f )·(Y, F ) is linear and continuous from A 0 to R as it satisfies Similarly, for all λ ∈ R ⋆ Since g ′ ∈ L ∞ (R) we have for a.e. (x, t) ∈ Q T : It is now easy to see that and that the functional E is differentiable at the point (y, f ) ∈ A along the direction (Y, F ) ∈ A 0 . Eventually, the equality (34) follows from the definition of the pair (Y 1 , F 1 ) given in (28). ✷ Remark that from the equality (35), the derivative E ′ (y, f ) is independent of (Y, F ). We can then define the norm associated to (A 0 ) ′ , the set of the linear and continuous applications from A 0 to R.
Combining the equality (34) and the inequality (29), we deduce the following estimates of E(y, f ) in term of the norm of E ′ (y, f ).
Proposition 5 For any (y, f ) ∈ A, the inequalities holds true is solution of (28) and therefore, with (29) On the other hand, for all (Y, F ) ∈ A 0 (see the proof of Proposition 4) : ✷ In particular, any critical point (y, f ) ∈ A for E (i.e. for which E ′ (y, f ) vanishes) is a zero for E, a pair solution of the controllability problem. In other words, any sequence (y k , f k ) k>0 satisfying E ′ (y k , f k ) (A 0 ) ′ → 0 as k → ∞ is such that E(y k , f k ) → 0 as k → ∞. We insist that this property does not imply the convexity of the functional E (and a fortiori the strict convexity of E, which actually does not hold here in view of the multiple zeros for E) but show that a minimizing sequence for E can not be stuck in a local minimum. Far from the zeros of E, in particular, when (y, f ) A → ∞, the right hand side inequality indicates that E tends to be convex. On the other side, the left inequality indicates the functional E is flat around its zero set. As a consequence, gradient based minimizing sequences may achieve a very low rate of convergence (we refer to [20] and also [17] devoted to the Navier-Stokes equation where this phenomenon is observed).

A strongly converging minimizing sequence for E
We now examine the convergence of an appropriate sequence (y k , f k ) ∈ A. In this respect, we observe that the equality (34) shows that −(Y 1 , F 1 ) given by the solution of (28) is a descent direction for the functional E. Therefore, we can define at least formally, for any m ≥ 1, a minimizing sequence (y k , f k ) k∈N as follows: and minimizes the functional J defined in Proposition 1. The direction Y 1 k vanishes when E vanishes.
We first perform the analysis assuming the non linear function g in W 1 , notably that g ′′ ∈ L ∞ (R) (the derivatives here are in the sense of distribution). We first prove the following lemma.
Proof-We define the polynomial p k as follows Lemma 5 with (y, f ) = (y k , f k ) allows to write that with p k ( λ k ) := min λ∈[0,m] p k (λ). If c 1 E(y 0 , f 0 ) < 1 (and thus c 1 E(y k , f k ) < 1 for all k ∈ N) then and thus implying that c 1 E(y k , f k ) → 0 as k → ∞ with a quadratic rate. If now c 1 E(y 0 , f 0 ) ≥ 1, we check that I := {k ∈ N, c 1 E(y k , f k ) ≥ 1} is a finite subset of N. For all k ∈ I, since c 1 E(y k , f k ) ≥ 1, and thus, for all k ∈ I, This inequality implies that the sequence (c 1 E(y k , f k )) k∈N strictly decreases and then that the sequence (p k ( λ k ) k∈N decreases as well. Thus the sequence (c 1 E(y k , f k )) k∈N decreases to 0 at least linearly and there exists k 0 ∈ N such that for all k ≥ k 0 , c 1 E(y k , f k ) < 1, that is I is a finite subset of N. Arguing as in the first case, it follows that c 1 E(y k , f k ) → 0 as k → ∞.
In both cases, remark that p k ( λ k ) decreases with respect to k. ✷

Remark 4 Writing from
where ⌊x⌋ denotes the integer part of x ∈ R + .
We also have the following convergence of the optimal sequence {λ k } k>0 .
Proof-In view of (40), we have, as long as E(y k , f k ) > 0, since λ k ∈ [0, m] But, from (39) and (41) and thus Consequently, since E(y k , f k ) → 0 and ✷ We are now in position to prove the following convergence result.
Theorem 3 Assume g ∈ W 1 . Let (y k , f k ) k∈N be the sequence defined by (36). Then, (y k , f k ) k∈N → (y, f ) in A where f is a null control for y solution of (1). Moreover, the convergence is quadratic after a finite number of iterates.
But E(y n , f n ) k∈N and p k ( λ k ) k∈N are decreasing sequences so that We deduce that the series n λ n (Y 1 n , F 1 n ) is normally convergent and so convergent. Consequently, there exists (Y, F ) ∈ A 0 such that (Y k , F k ) k∈N converges to (Y, F ) in A 0 .
Denoting y = y 0 + Y and f = f 0 + F , we then have that (y k , f k ) k∈N = (y 0 + Y k , f 0 + F k ) k∈N converges to (y, f ) in A.
It suffices now to verify that the limit (y, f ) satisfies E(y, f ) = 0. We write that (Y 1 Using that (Y 1 k , F 1 k ) goes to zero in A 0 as k → ∞, we pass to the limit in (45) and get, since g ∈ W 1 , that (y, f ) ∈ A solves (1), that is E(y, f ) = 0. ✷ In particular, along the sequence (y k , f k ) k defined by (36), we have the following coercivity property for E, which confirms the strong convergence of the sequence (y k , f k ) k>0 . In view of the non uniqueness of the zeros of E, remark that this property is not true in general for all (y, f ) in A.
Proposition 7 Let (y k , f k ) k>0 defined by (36) and (y, f ) its limit. Then, there exists a positive constant C such that ✷ We emphasize, in view of the non uniqueness of the zeros of E, that an estimate (similar to (46)) of the form (y, f ) − (y, f ) A 0 ≤ C E(y, f ) does not hold for all (y, f ) ∈ A. We also mention the fact that the sequence (y k , f k ) k>0 and its limits (y, f ) are uniquely determined from the initial guess (y 0 , f 0 ) and from our criterion of selection of the control F 1 . In other words, the solution (y, f ) is unique up to the element (y 0 , f 0 ) and the functional J.

The case g ∈ W s , 0 ≤ s < 1 and additional remarks
The results of the previous subsection devoted to the case s = 1 still hold if we assume only that g ∈ W s for one s ∈ (0, 1). For any g ∈ W s , we introduce the notation g ′ W s,∞ (R) := sup a,b∈R,a =b |g ′ (a)−g ′ (b)| |a−b| s . We have the following result.
Theorem 4 Assume that there exists s ∈ (0, 1) such that g ∈ W s . Let (y k , f k ) k∈N be the sequence defined by (36). Then, (y k , f k ) k∈N → (y, f ) in A where f is a null control for y solution of (1). Moreover, after a finite number of iterates, the rate of convergence is equal to 1 + s.
Proof-We briefly sketch the proof, close to the proof of Theorem 3 for the case s = 1.
-We then check that the sequence (E(y k , f k )) k∈N goes to zero as k → ∞. We define p k as follows If c 2 E(y 0 , f 0 ) < 1 (and thus c 2 E(y k , f k ) < 1 for all k ∈ N) then the above inequality implies that c 2 E(y k , f k ) → 0 as k → ∞. If c 2 E(y 0 , f 0 ) ≥ 1 then let I = {k ∈ N, c 2 E(y k , f k ) ≥ 1}. I is a finite subset of N; for all k ∈ I, since c 2 E(y k , f k ) ≥ 1 min λ∈[0,m] p k (λ) = min and thus, for all k ∈ I, . This inequality implies that the sequence (c 2 E(y k , f k )) k∈N strictly decreases and then that the sequence (p k ( λ k )) k∈N decreases as well. Thus the sequence (c 2 E(y k , f k )) k∈N decreases to 0 at least linearly and there exists k 0 ∈ N such that for all k ≥ k 0 , c 2 E(y k , f k ) < 1, that is I is a finite subset of N. Similarly, the optimal parameter λ k goes to one as k → ∞.
-Using that the sequence (E(y k , f k )) k∈N goes to zero, we conclude exactly as in the proof of Theorem 3.
✷ On the other hand, if we assume only that g belongs to W 0 , then we can not expect the convergence of the sequence (y k , f k ) k>0 if g ′ ∞ is too large.
Remark 5 Assume that g ∈ W 0 . Let any (y, f ) ∈ A and (Y 1 , F 1 ) the solution of (28) which minimizes J. The following inequality holds : for all λ ∈ R where C(Ω, ω, T, g ′ ∞ ) ≥ 0 increases with g ′ ∞ . Indeed, this is a consequence of the following inequality, for all (y, f ) ∈ A, (Y, F ) ∈ A 0 : As a consequence, we get that the sequence (E(y k , f k )) k≥0 decreases to 0 if g satisfies C(Ω, ω, T, g ′ ∞ ) g ′ ∞ < 1.

Remark 6
The estimate (29) is a key point in the convergence analysis and is independent of the choice of the functional J defined by J(Y 1 , F 1 ) = 1 2 ρ 0 F 1 2 L 2 (q T ) + 1 2 ρY 2 L 2 (Q T ) (see Proposition 1) in order to select a pair (Y 1 , F 1 ) in A 0 . Thus, we may consider other weighted functionals, for instance J(Y 1 , F 1 ) = 1 2 ρ 0 F 1 2 L 2 (q T ) as discussed in [21].
we get that E(y, f ) = 1 2 F (y, f ) 2 L 2 (0,T ;H −1 (Ω)) and observe that, for λ k = 1, the algorithm (36) coincides with the Newton algorithm associated to the mapping F . This explains notably the quadratic convergence of Theorem 3 in the case g ∈ W 1 for which we have a control of g ′′ in L ∞ (Q T ). The optimization of the parameter λ k allows to get a global convergence of the algorithm and leads to the so-called damped Newton method (for F ). Under general hypothesis, global convergence for this kind of method is achieved, with a linear rate (for instance; we refer to [7,Theorem 8.7]). As far as we know, the analysis of damped type Newton methods for partial differential equations has deserved very few attention in the literature. We mention [18,22] in the context of fluids mechanics.
Remark 8 Suppose to simplify that λ k equals one (corresponding to the standard Newton method). Then, for each k, the optimal pair (Y 1 k , F 1 k ) ∈ A 0 is such that the element (y k+1 , f k+1 ) minimizes over A the functional (z, v) → J(z − y k , v − f k ). Instead, we may also select the pair (Y 1 k , F 1 k ) such that the element (y k+1 , f k+1 ) minimizes the functional (z, v) → J(z, v). This leads to the following sequence {y k , f k } k defined by (y k+1 (·, 0), y k+1,t (·, 0)) = (u 0 , u 1 ), in Ω.

(48)
This is actually the formulation used in [11]. This formulation is different and the analysis of convergence (at least in the framework of our least-squares setting) is less direct because it is necessary to have a control of the right hand side term g ′ (y k )y k − g(y k ).

Remark 9
We emphasize that the explicit construction used here allows to recover the null controllability property of (1) for nonlinearities g in W s for one s ∈ (0, 1]. We do not use a fixed point argument as in [13]. On the other hand, the conditions we make on g are more restrictives that in [13]. Eventually, it is also important to remark these additional conditions on g does not imply a priori a contraction property of the operator Λ introduced in [13] and mentioned in the introduction. Assume g ∈ W 1 . If (y z i , f z i ), i = 1, 2 are a controlled pair for the system (7) minimizing the functional J, then the following inequality holds : (49) where C(Ω, ω, T, g ∞ ) is the constant appearing in (15). In order to ensure a contraction property, we need a priori to add a smallness assumption on the data g and u 0 .

Numerical illustrations
We illustrate in this section our results of convergence. We first provide some practical details of the algorithm (36) then discussed some experiments in the one dimensional case.

Approximation -Algorithm
Each iterate of the algorithm (36) requires the determination of the null control of with B k := y k,t − ∆y k + g(y k ) − f k 1 ω . From Lemma 4, the pair (F 1 k , Y 1 k ) which minimizes the functional J is given by (51) The numerical approximation of this variational formulation (of second order in time and fourth order in space) has been discussed at length in [12]. In order, first to avoid numerical instabilities (due to the presence of exponential functions in the formulation), and second to make appear explicitly the controlled solution, we introduce the new variables Since ρ −1 2 p ∈ L 2 (0, T ; H 1 0 (Ω)), we obtain notably that ρ −1 subject to the constraint z k = ρ −1 L ⋆ g ′ (y k ) (ρ 0 m k ). This constraint leads to the following wellposed mixed formulation : The variable λ k ∈ L 2 (Q T ) is a Lagrange multiplier. Moreover, from the unique solution (m k , z k ), we get the explicit form of the controlled pair (Y 1 k , F 1 k ) as follows: The algorithm associated to the sequence (y k , f k ) k>0 (see (36)) may be developed as follows: given ǫ > 0 and m ≥ 1, 1. We determine the controlled pair (y 0 , f 0 ) which minimizes the functional J associated to the linear case (for which g ≡ 0 in (1)). (y 0 , f 0 ) is given by where (z 0 , m 0 ) solves the formulation : In view of Proposition 1, we check that (y 0 , f 0 ) belongs to A.
2. Assume now that (λ k , f k ) is computed for some k ≥ 0. We then compute c k ∈ L 2 (0, T ; H 1 0 (Ω)), unique solution of 3. If E(y k , f k ) < ǫ, the approximate controlled pair is given by (y, f ) = (y k , f k ) and the algorithm stops. Otherwise, we determine the solution (Y 1 The minimization is performed using a line search method. Return to step 2. We use the conformal space-time finite element method described in [12]. We consider a regular family T = {T h ; h > 0} of triangulation of Q T such that Q T = ∪ K∈T h K. The family T is indexed by h = max K∈T h diam(K). The variable z k and λ k are approximated with the space P h = {p h ∈ C(Q T ); p h | K ∈ P 1 (K), ∀K ∈ T h } ⊂ L 2 (Q T ) where P 1 (K) denotes the space of affine functions both in x and t. The variable m k is approximated with the space V h = {v h ∈ C 1 (Q T ); v h | K ∈ P(K), ∀K ∈ T h } ⊂ M where P(K) denotes the Hsieh-Clough-Tocher C 1 element (we refer to [4] page 356). These conformal approximation leads to a strong convergent approximation of the control and the controlled solution with respect to the parameter h.
As for the initial condition to be controlled, we consider simply u 0 (x) = β sin(πx) parametrized by β > 0.
The experiments are performed with the Freefem++ package developed at the Sorbonne university (see [15]), very well-adapted to the space-time formulation we employ. The algorithm is stopped when the value E(y k , f k ) is less than ǫ = 10 −6 . The optimal steps λ k are searched in the interval [0, 1]. Table 1, 2 and 3 collect some norms from the sequence (y k , f k ) k≥0 defined by the algorithm (36), initialized with the linear controlled solution, for β = 10., β = 10 2 and β = 10 3 respectively. We use a structured mesh composed of 20 000 triangles, 10 201 vertices and for which h ≈ 1.11 × 10 −2 . For β = 10, we observe the convergence after 4 iterates. The optimal steps λ k are very close to one since max k |λ k − 1| < 0.05; consequently, the algorithm (36) provides similar results than the Newton algorithm (for which λ k = 1 for all k). For β = 10 2 , the convergence remains fast and is reached after 8 iterates. We can observe that some optimal steps differ from one since max k |λ k −1| > 0.4. Nevertheless, the Newton algorithm still converge after 17 iterates. More interestingly, the value β = 10 3 illustrates the features and robustness of the algorithm: the convergence is achieved after 19 iterates. Far away from a zero of E, the variations of the error functional E(y k , f k ) are first quite slow, then increase to become very fast after 16 iterates, when λ k is close to one. In contrast, for β = 10 3 ., the Newton algorithm, still initialized with the linear solution diverges (see Table 4). As discussed in [18], in that case, a continuation method with respect to the parameter β may be combined with the Newton algorithm.
On the contrary, we mention that with these data, the sequences obtained from the algorithm (8) based on the linearization introduced in [13], remain bounded but do not converge, including for the value β = 10. The convergence is observed for instance with a larger size of the domain ω, for instance ω = (0.2, 0.8) (see [11, section 4.2]).        Table 4: β = 10 3 ; Results for the algorithm (36) with λ k = 1 for all k.

Conclusions and perspectives
We have constructed an explicit sequence of functions (f k ) k converging strongly in the L 2 (q T ) norm toward a null control for the semilinear heat equation y t − ∆y + g(y) = f 1 ω . The construction of the sequence is based on the minimization of a L 2 (0, T ; H −1 (Ω)) least-squares functional. The use of a specific descent direction allows to achieve a global convergence (uniform with respect to the data and to the initial guess) with a super-linear rate related to the regularity of the nonlinear function g. Experiment confirms the robustness of the approach. In this analysis, we have assumed in particular that the derivative g ′ of g is uniformly bounded in R. This allows to get a uniform bound of the constant of the form C(Ω, ω, T, g ′ (y) ∞ ) appearing from the Carleman estimate (17). In order to remove this assumption and be able to consider superlinear function g (as in the seminal work [13] by Fernández-Cara and Zuazua, assuming that g is locally Lipschitz-continuous and the asymptotic behavior (6)), we need to refine the analysis and exploit the structure of the constant C(Ω, ω, T, g ′ (y) L ∞ ) (as done in [8] for the observability constant). This may allow, assuming the above hypotheses of [13], not only to recover the null controllability of (1) but also, to construct, within the algorithm (36), approximations of null controls.
We also emphasize that this least-squares approach is very general and may be used to address other PDEs. Following [18] devoted the direct problem, one may notably study the applicability of the method to approximate control for the Navier-Stokes system. We also mentioned the case of nonlinear wave equation studied in [23] making use of a fixed point strategy.