PONTRYAGIN MAXIMUM PRINCIPLE FOR GENERAL CAPUTO FRACTIONAL OPTIMAL CONTROL PROBLEMS WITH BOLZA COST AND TERMINAL CONSTRAINTS

. In this paper we focus on a general optimal control problem involving a dynamical system described by a nonlinear Caputo fractional diﬀerential equation of order 0 < α ≤ 1, associated to a general Bolza cost written as the sum of a standard Mayer cost and a Lagrange cost given by a Riemann-Liouville fractional integral of order β ≥ α . In addition the present work handles general control and mixed initial/ﬁnal state constraints. Adapting the standard Filippov’s approach based on appropriate compactness assumptions and on the convexity of the set of augmented velocities, we give an existence result for at least one optimal solution. Then, the major contribution of this paper is the statement of a Pontryagin maximum principle which provides a ﬁrst-order necessary optimality condition that can be applied to the fractional framework considered here. In particular, Hamiltonian maximization condition and transversality conditions on the adjoint vector are derived. Our proof is based on the sensitivity analysis of the Caputo fractional state equation with respect to needle-like control perturbations and on Ekeland’s variational principle. The paper is concluded with two illustrating examples and with a list of several perspectives for forthcoming works.

the PMP, have a wide field of applications in various domains. We refer the reader to textbooks such as [1,16,18,19,29,33,38,49,51,52,54] for theoretical results and/or practical applications, essentially for dynamical systems described by ordinary differential equations.
From the point of view of calculus of variations, the PMP corresponds to an extension of the Euler-Lagrange equation. Actually, for smooth and unconstrained optimal control problems of Lagrange form, a weak version of the PMP (in which the Hamiltonian maximization condition is replaced by a weaker null Hamiltonian gradient condition) can be derived from a variational approach (see, e.g., [40], Sect. 3.4). However, in the general case, to get a strong version of the PMP, that handles constraints on the state and/or on the control, requires more sophisticated mathematical tools such as the sensitivity analysis of the state equation with respect to control perturbations (needle-like variations for instance) combined with a Brouwer fixed point argument (see, e.g., [29,38]) or with Ekeland's variational principle (see, e.g., [23,39]). Many other variants exist in the literature (based on an implicit function theorem [1], Hahn-Banach separation theorem [16], or Aubin mini-max theorem [54] for example).

Optimal control theory in a fractional context
The fractional calculus generalizes the classical notions of integral and derivative to any real order. Many famous mathematicians introduced several notions of fractional operators, as Leibniz (1690's), Euler (1730's), Fourier (1820's), Liouville (1830's), Riemann (1840's), Sonin (1860's), Grünwald (1860's), Letnikov (1860's), Caputo (1960's), etc. These a priori different notions are not disconnected. In many cases it can be proved that two different notions actually coincide or are correlated by an explicit formula. The fractional operators are extensively used in many applications. We refer for instance to [30] for a large panorama in physics. We refer to the monographs [37,48] for a deep insight on fractional calculus and fractional differential equations. In this paper, as commonly in the literature, we only consider the fractional operators of Riemann-Liouville and Caputo types. For the reader who is not familiar with these notions, we refer to Section 2 for basic recalls and notations.
In [47] Riewe initiates the fractional calculus of variations and derives the first fractional version of the Euler-Lagrange equation, using non-integer order derivatives in order to describe nonconservative systems in mechanics. From there, a large number of publications has been devoted to the minimization of integral functionals involving various fractional operators. Many issues have been addressed and solved, and numerous classical results (such as first-and second-order necessary and/or sufficient optimality conditions, transversality conditions, Noether's theorems, Tonelli's existence theorems, etc.) have been extended to the fractional framework. We refer for example to [2,6,7,9,12,21,41,43,57] and references therein.
Compared to the growing literature on fractional calculus of variations, the fractional optimal control theory (where the dynamical system is driven by a fractional differential equation) had at first a slight development at the beginning of the 21th century. We refer the reader to [3,4,26,28,32] and references therein for some initiating works. These articles constitute a first step in the field and essentially use fractional variational approaches to derive fractional versions of the weak PMP (in which a null Hamiltonian gradient condition is obtained, but no Hamiltonian maximization condition) for smooth and unconstrained fractional optimal control problems. As a second step in the field, we mention the works of Kamocki from 2014. Indeed, a first attempt to establish a strong version of the PMP (with Hamiltonian maximization condition) in the case of a general Riemann-Liouville fractional optimal control problem with a classical Lagrange cost and with control constraints can be found in Theorem 7 of [35]. However, several (quite restrictive) hypotheses are assumed, such as the compactness of the control constraint set, the convexity of the set of augmented velocities, the global Lipschitz continuity of the dynamic and some growth conditions on the dynamic, the Lagrange cost and its gradient. Moreover, a Riemann-Liouville fractional version of the initial condition is fixed, but no other state constraint can be handled. Hence, many challenging questions remain open in that field. Another attempt to derive a strong version of the PMP in a general Caputo fractional context can be found in [5]. Unfortunately we have serious doubts about the correctness of the main result of this paper. We refer to Remark 3.21 for details. Let us mention also that existence results are provided for linear-convex Riemann-Liouville (resp. linear-linear Caputo) fractional optimal control problems associated to a classical Lagrange cost in Theorem 17 of [34] (resp. in [36], Thm. 4.2).
We conclude this subsection by pointing out that a large number of publications has been dedicated to the numerical study of fractional optimal control problems. We refer for instance to [3,4,8,32,46].

Contributions of the present paper
In this paper, we deal with a fixed real interval [a, b]. Our first motivation is to provide a functional framework adapted to the description of fractional optimal control problems. For example, it is well-known that singularities arise at t = a while using left Riemann-Liouville fractional operators. As a consequence, the class of C 1 -functions is too restrictive. In this paper, as in [15], we choose to use the appropriate set AC α a+ ([a, b], R n ) (resp. c AC α a+ ([a, b], R n )) of functions possessing Riemann-Liouville (resp. Caputo) fractional derivatives, see Definition 2.5 (resp. Def. 2.10). In order to avoid unboundedness states (which would deprive us of crucial estimations), we choose to work with the well-known Caputo fractional derivative c D α a+ of order 0 < α ≤ 1 whose corresponding trajectories are continuous. Hence, this paper is dedicated to the study of a general optimal control problem where the dynamical system is driven by the nonlinear Caputo fractional state equation of order 0 < α ≤ 1.
Our second motivation is to consider a sufficiently general fractional optimal control problem to handle: -Mayer and Lagrange costs (classical and fractional). Therefore, we consider the general Bolza cost given by ϕ(x(a), x(b)) + I β a+ [F (x, u, ·)](b), where β ≥ α; -General control constraint: where U is a nonempty closed set. We refer to Remark 3.16 for a discussion on this closeness assumption. -General mixed initial/final state constraint: where C is a nonempty closed convex set. To the best of our knowledge, no endpoint constraint has never been considered yet in the literature on fractional optimal control theory. Moreover, note that the above consideration of state constraint is very general and allows to encompass a lot of typical situations such as fixed initial and/or final conditions, free initial and/or final conditions, equality and/or inequality constraints, etc. We refer to Remark 3.17 for more details.
Moreover, the regularity assumptions that we require in this paper on the functions f , F , ϕ and g are reduced as much as possible (as far as we know) to guarantee the applicability of our proofs. In particular, no growth condition and no global Lipschitz continuity are imposed. For the precise definition of the problem investigated in this paper and the corresponding assumptions, we refer the reader to Problem (OCP) and Hypothesis (H) in Section 3.1.
The main concern of this paper is to state a version of the PMP for Problem (OCP). Nevertheless we are eager to provide a consistent paper by proving in a first place that the existence of optimal controls in some standard settings is preserved at the fractional level. Hence looking for a version of the PMP for our problem is legitimate. Precisely we provide in Section 3.2 a Filippov's existence theorem (Thm. 3.1) for Problem (OCP). This result, as in the classical case, is based on some compactness and convexity assumptions.
The major contribution of this paper, which is a PMP for Problem (OCP), is stated in Section 3.4 (Thm. 3.12). The proof is based on the sensitivity analysis of the Caputo fractional state equation with respect to needle-like control perturbations and on Ekeland's variational principle. In contrast to the proof of Theorem 3.1, the one of Theorem 3.12 requires a lot of technical adjustments from the classical case. Indeed, the nonlocality of the fractional operators induces specific variation vectors (Props. 3.10 and 3.11). Moreover, the necessary optimality inequalities obtained on the variation vectors require the use of fractional Duhamel formulas derived in [13] to conclude. This (quite long) proof is moved to Appendix A. Two examples (including endpoint state and control constraints) illustrating the applicability of Theorem 3.12 are provided in Section 4.
Section 5 is dedicated to some perspectives for future works. Indeed many challenging problems are still open in fractional optimal control theory. We give a tentative list of questions to address in future works. In particular, in this paper, we chose to deal with a Caputo fractional state equation in order to guarantee the boundedness (even the continuity) of the corresponding trajectories. An important issue would be to adapt the whole framework of the present paper to a Riemann-Liouville fractional state equation.

Organization of the paper
The paper is organized as follows. Section 2 is devoted to basic recalls and notations from fractional calculus. The main results (Thms. 3.1 and 3.12) are stated in Section 3 with some comments. Two illustrating examples are detailed in Section 4. Section 5 is dedicated to some perspectives for future works. The proof of Theorem 3.12 is given in Appendix A. Finally Appendix B contains two technical results (in particular a new fractional version of the Gronwall lemma, see Prop. B.1) that are required in the paper.

Basics on fractional calculus
Throughout the paper the abbreviation RL stands for Riemann-Liouville. This section is devoted to some recalls about RL and Caputo fractional operators. All definitions and results below are standard and are mostly extracted from the monographs [37,48]. Let n ∈ N * be a positive integer and let I ⊂ R be a subinterval of R.
In the whole paper we denote by: -L r (I, R n ) the Lebesgue space of r-integrable functions defined on I with values in R n , endowed with its usual norm · L r , for any 1 ≤ r < ∞; -L ∞ (I, R n ) the Lebesgue space of essentially bounded functions defined on I with values in R n , endowed with its usual norm · L ∞ . -C(I, R n ) the space of continuous functions on I with values in R n , endowed with the uniform norm · C ; -AC(I, R n ) the subspace of C(I, R n ) of absolutely continuous functions.
For any E(I, R n ) one of the above spaces, we denote by E loc (I, R n ) the space of functions x : I → R n such that x ∈ E(J, R n ) for every compact subinterval J ⊂ I. In particular, if I is compact, then E loc (I, R n ) = E(I, R n ). Finally, for any a ∈ I, we denote by C a (I, R n ) the set of all functions x ∈ C(I, R n ) such that x(a) = 0 R n .

Left RL and Caputo fractional operators
Let us fix a ∈ R and I ⊂ R an interval such that {a} I ⊂ [a, +∞). Note that I is not necessarily compact. Precisely I can be written either as I = [a, +∞), or I = [a, b) for some b > a, or I = [a, b] for some b > a. In the sequel Γ denotes the standard Gamma function.

Left RL fractional integral
We focus in this section on left fractional integrals of RL type. Definition 2.1 (Left RL fractional integral). The left Riemann-Liouville fractional integral I α a+ [x] of order α > 0 and inferior limit a of a function x ∈ L 1 loc (I, R n ) is defined on I by provided that the right-hand side term exists. For α = 0 we set I 0 We denote by AC α a+ (I, R n ) the set of all functions x ∈ L 1 loc (I, R n ) possessing on I a left RL fractional derivative D α a+ [x] of order 0 ≤ α ≤ 1 and inferior limit a.
Remark 2.6. If α = 1, AC 1 a+ (I, R n ) = AC loc (I, R n ) and D 1 a+ [x] =ẋ is the usual derivative of x for any x ∈ AC loc (I, R n ). If α = 0, AC 0 a+ (I, R n ) = L 1 loc (I, R n ) and D 0 c for all t ∈ I and all c ∈ R n , we deduce that AC α a+ (I, R n ) contains all constant functions.
Proposition 2.8 ( [15], Prop. 5, p. 220). Let 0 ≤ α ≤ 1 and x ∈ L 1 loc (I, R n ). Then x ∈ AC α a+ (I, R n ) if and only if there exist y ∈ L 1 loc (I, R n ) and x a ∈ R n such that for almost every t ∈ I. In that case, it holds that y = D α a+ [x] and x a = I 1−α a+ [x](a).
Remark 2.9. Let 0 ≤ α ≤ 1. In general a function x ∈ AC α a+ (I, R n ) admits a singularity at t = a. Definition 2.10 (Left Caputo fractional derivative). We say that x ∈ C(I, R n ) has a left Caputo fractional derivative c D α a+ [x] of order 0 ≤ α ≤ 1 and inferior limit a on I if and only if x − x(a) ∈ AC α a+ (I, R n ). In that case c D α We denote by c AC α a+ (I, R n ) the set of all functions x ∈ C(I, R n ) possessing on I a left Caputo fractional derivative c D α a+ [x] of order 0 ≤ α ≤ 1 and inferior limit a. Remark 2.11. If α = 1, c AC 1 a+ (I, R n ) = AC loc (I, R n ) and c D 1 a+ [x] =ẋ for any x ∈ AC loc (I, R n ). If α = 0, c AC 0 a+ (I, R n ) = C(I, R n ) and c D 0 for any x ∈ C(I, R n ). Remark 2.12. Let 0 ≤ α ≤ 1. Note that c AC α a+ (I, R n ) ⊂ AC α a+ (I, R n ) and, if α = 1, it holds that for almost every t ∈ I and all x ∈ c AC α a+ (I, R n ). The proof of Proposition 2.8, that can be found in ( [15], Prop. 5, p. 220), can be adapted to the Caputo case in order to derive the next proposition.
Proposition 2.13. Let 0 ≤ α ≤ 1 and x ∈ C(I, R n ). Then x ∈ c AC α a+ (I, R n ) if and only if there exist y ∈ L 1 loc (I, R n ) and x a ∈ R n such that for all t ∈ I. In that case, the above relation holds replacing y by c D α a+ [x] and x a by x(a) (but it might be possible that y = c D α a+ [x] and x a = x(a), see Rem. 2.15). Proof. The proof is obvious in the case α ∈ {0, 1}. Thus we only deal with the fractional case 0 < α < 1. Firstly let us assume that there exist y ∈ L 1 loc (I, R n ) and x a ∈ R n such that x(t) = x a + I α a+ [y](t) for all t ∈ I. From Remark 2.7 and Proposition 2.3, it holds that I 1−α ] + I 1 a+ [y] ∈ AC loc (I, R n ). We deduce that x − x(a) ∈ AC α a+ (I, R n ), that is exactly x ∈ c AC α a+ (I, R n ). Conversely let us assume that x ∈ c AC α a+ (I, R n ). We introduce y := c D α a+ [x] ∈ L 1 loc (I, R n ) and x a := x(a) ∈ R n . Since x − x a ∈ AC α a+ (I, R n ) with D α a+ [x − x a ] = y and I 1−α a+ [x − x a ](a) = 0 R n (since x ∈ C(I, R n ), see Prop. 2.4), we deduce from Proposition 2.8 that x(t) − x a = I α a+ [y](t) for almost every t ∈ I, and then for every t ∈ I from the continuity of x. The proof is complete.
Remark 2.14. Let 0 ≤ α ≤ 1 and x(t) = x a + I α a+ [y](t) for almost every t ∈ I, for some y ∈ L 1 loc (I, R n ) and x a ∈ R n . It might be possible that x / ∈ C(I, R n ) and then Proposition 2.13 cannot be applied. From Proposition 2.4, if α > 0 and y ∈ L ∞ loc (I, R n ), then x ∈ C(I, R n ) with x(a) = x a and Proposition 2.13 can be applied.
Remark 2.15. Let 0 ≤ α ≤ 1 and x ∈ C(I, R n ) such that x(t) = x a + I α a+ [y](t) for all t ∈ I, for some y ∈ L 1 loc (I, R n ) and x a ∈ R n . From Proposition 2.13, we know that x ∈ c AC α a+ (I, R n ) and that x(t) = x(a) + I α a+ [ c D α a+ [x]](t) for all t ∈ I. However, without any additional assumption, one cannot assert that y = c D α a+ [x] and x a = x(a). From Proposition 2.4, if α > 0 and y ∈ L ∞ loc (I, R n ), then we can conclude that y = c D α a+ [x] and x a = x(a).
From the above definitions and propositions, one can recover the following well-known result.

Right RL and Caputo fractional operators
This section is devoted to the right counterparts of the notions recalled in Section 2.1. For this purpose we fix b ∈ R and I ⊂ R an interval such that {b} I ⊂ (−∞, b]. Precisely, the interval I writes either I = (−∞, b], or I = (a, b] for some a < b, or I = [a, b] for some a < b. Definition 2.17 (Right RL fractional integral). The right RL fractional integral I α b− [x] of order α > 0 and superior limit b of x ∈ L 1 loc (I, R n ) is defined on I by provided that the right-hand side term exists. For α = 0 we define I 0 b− [x] := x. Definition 2.18 (Right RL fractional derivative). We say that x ∈ L 1 loc (I, R n ) has a right RL fractional derivative D α b− [x] of order 0 ≤ α ≤ 1 and superior limit b on I if and only if We denote by AC α b− (I, R n ) the set of all functions x ∈ L 1 loc (I, R n ) possessing on I a right RL fractional derivative D α b− [x] of order 0 ≤ α ≤ 1 and superior limit b. Definition 2.19 (Right Caputo fractional derivative). We say that x ∈ C(I, R n ) has a right Caputo fractional We denote by c AC α b− (I, R n ) the set of all functions x ∈ C(I, R n ) possessing on I a right Caputo fractional derivative c D α b− [x] of order 0 ≤ α ≤ 1 and superior limit b. All results recalled in Section 2.1 (for left fractional operators) have each a right counterpart version. We refer the reader to [37,48] for details.

Main results and comments
This section is devoted to the main results (Thms. 3.1 and 3.12) of the present paper.

Framework, terminology and assumptions
Let a < b be two real numbers. Let m, n, j ∈ N * , let 0 < α ≤ 1 and β ≥ α be fixed. We consider the general Caputo fractional optimal control problem of Bolza form given by

(OCP)
A couple (x * , u * ) is said to be an optimal solution to Problem (OCP) if it satisfies all the above constraints and it minimizes the cost among all couples (x, u) satisfying these constraints. Our aim in this section is to fix the terminology and some assumptions associated to Problem (OCP). In Problem (OCP), u is the control function and x is the state function (also called trajectory). In the next box we gather some generic assumptions that will be made on the data of Problem (OCP).

Hypothesis (H)
-the function ϕ : R n × R n → R, that describes the Mayer cost ϕ(x(a), x(b)), is of class C 1 ; -the set C ⊂ R j is a nonempty closed convex subset of R j and the function g : R n × R n → R j , that describes the terminal state constraint g(x(a), x(b)) ∈ C, is of class C 1 ; -the set U ⊂ R m , that describes the control constraint u(t) ∈ U, is a nonempty closed subset of R m ; -the dynamic f : R n × R m × [a, b] → R n , that drives the Caputo fractional state equation , satisfies the following conditions: -f is continuous; -f is differentiable with respect to its first variable; -∂ 1 f is continuous; -f is Lipschitz continuous with respect to its first two variables on every compact subset (see Eq. (3.1) for precisions). In particular, for every compact subset , is assumed to satisfy the same assumptions than the dynamic f .
From now and in the whole paper (in particular in the statements of Thms. 3.1 and 3.12 and all propositions, lemmas, etc.), we will assume that Hypothesis (H) is satisfied.
Usually the fractional Lagrange cost I β a+ [F (x, u, ·)](b) is considered with β = α or β = 1 in the literature. Nevertheless, we can always get back to the case where β = α by noting that the fractional Lagrange cost can be rewritten as I α Since β ≥ α, note that F β satisfies the same regularity assumptions than F (and thus than f ).

Filippov's existence theorem
The main concern of this paper is to state a version of the PMP for Problem (OCP). Nevertheless we are eager to prove in a first place that the existence of optimal controls under some standard assumptions is preserved at the fractional level. Hence looking for a fractional version of the PMP for our problem is legitimate. In this section we provide a result stating the existence of at least one optimal solution to Problem (OCP) under some appropriate compactness and convexity assumptions. Precisely we follow the standard Filippov's approach (see [16,20,25,40] for example). For this purpose we introduce the usual set of augmented velocities defined by , R m ) such that the couple (x, u) satisfies all the constraints of Problem (OCP). Obviously, if T is empty, then Problem (OCP) has no solution. Otherwise, the following existence result holds true.
Theorem 3.1 (Filippov's existence theorem). Assume that U is compact, T is nonempty and bounded in C, . Then Problem (OCP) has at least one optimal solution.
) and thus, up to a subsequence, weakly* converges to some G ∈ L ∞ ([a, b], R n+1 ). Similarly, the sequence (x k (a)) k∈N is bounded in R n and thus, up to a subsequence, converges to some x * a ∈ R n . We deduce that ( Similarly, from the continuity of g and the closeness of C, we deduce that g(x * (a), x * (b)) ∈ C. To conclude the proof, we only need to prove that G ∈ W where Indeed, if G ∈ W, then there exists u * (τ ) ∈ U and γ * (τ ) ≥ 0 such that for almost every τ ∈ [a, b]. Moreover, u * and γ * can be selected measurable on [a, b] from implicit measurable function theorems (see, e.g., [55], Sect. 7). Since U is bounded, we get that u * ∈ L ∞ ([a, b], R m ). Thus x * ∈ T associated to the control u * and the associated cost is equal to which would prove that the couple (x * , u * ) is an optimal solution to Problem (OCP). Now, let us prove that G ∈ W (in two steps). First, one can easily deduce from the assumptions that W is a closed and convex subset of L 2 ([a, b], R n+1 ) with its usual topology, and thus with its weak topology as well. Now let us consider ( Next, we prove that G = H. From the boundedness of T and U and from the hypotheses on (f, F β ) (Eq. (3.1)), we get that for almost all τ ∈ [a, b] and all k ∈ N, for some constant L ≥ 0. Thanks to the pointwise convergence of (x k ) k∈N to x * , we deduce that On the other hand, we obtain from the weak star and weak convergences that Remark 3.2. The regularity assumptions on f , F , g and ϕ introduced in Hypothesis (H) can be weakened for Theorem 3.1. Indeed only the continuity of f , F , g and ϕ and inequality (3.1) (for f and F ) are required in the above proof. Similarly the convexity of C is a useless assumption that can be removed. Also other different approaches might be explored in order to establish existence results for Problem (OCP). However it is not our aim in the present work to provide a complete and detailed study on such existence results with the weakest assumptions as possible. Our main concern in the present work is to state a PMP for Problem (OCP). Remark 3.3. Note that Theorem 3.1 can be extended to the case where we consider in Problem (OCP) some additional intermediate state constraints (not only at t = a and t = b, but also at some times c i ∈ (a, b)) or even running state constraints (that is, over the whole interval [a, b]). Indeed, in the above proof, one can easily see that the pointwise convergence of x k to x * allows to preserve such state constraints if appropriate continuity and closeness properties are satisfied.
Remark 3.4. This remark is devoted to sufficient conditions ensuring the boundedness of T in C. If U is compact, if the terminal state constraint g(x(a), x(b)) ∈ C allows to bound the initial condition x(a) (if the initial condition is fixed for example) and if f satisfies a global Lipschitz condition of type , and some L ≥ 0, then the fractional version of Gronwall lemma given in Proposition B.1 allows to prove that T is bounded in C.

Sensitivity analysis of the Caputo fractional state equation
In this section, we perform the sensitivity analysis of the Caputo fractional state equation in order to get differentiability results on the trajectory x with respect to perturbations on the control u and on the initial condition x a . For this purpose, throughout this section, we fix a couple (u, x a ) ∈ L ∞ ([a, b], R m ) × R n and we focus on the nonlinear Caputo fractional Cauchy problem (CP) given by The results presented in this section are a crucial tool to prove Theorem 3.12 stated in Section 3.4.

Admissibility for globality
Let us recall the definition of solutions to (CP) and some fractional Cauchy-Lipschitz (or Picard-Lindelöf) results. We refer to ( [13], Sect. 3.2.2) for details.
or, equivalently, x ∈ C(I, R n ) and x satisfies the integral representation for all t ∈ I.    In the sequel we denote by A G the set of all couples (u, x a ) ∈ L ∞ ([a, b], R m ) × R n that are admissible for globality.

Needle-like perturbation of the control
Next we assume that (u, x a ) ∈ A G . We look for differentiability of the state x(·, u, x a ) with respect to specific perturbations on the control u. For this purpose, we denote in the sequel by is the unique maximal solution, which is moreover global, to the linear left RL fractional Cauchy problem given by The function w (s,v) (·, u, x a ) is called the variation vector associated to (u, x a ) and (s, v).
Proof. The technical proof of Proposition 3.10 is detailed in Appendix A.1.2.
The existence, uniqueness and globality of [14], Thm. 6), where ρ α s+ : [a, b] → R stands for the weight function given by Finally, the fractional Duhamel formula given in Theorem 5 of [13] by is the left RL fractional state-transition matrix (see [13], Def. 18 for details) associated to the essentially bounded matrix function

Perturbation of the initial condition
Still assuming that (u, x a ) ∈ A G , we now look for differentiability of the state x(·, u, x a ) with respect to perturbations of the initial condition x a . Proposition 3.11. For all y ∈ R n , there existsδ > 0 such that (u, x a + δy) ∈ A G for all 0 ≤ δ ≤δ. Moreover: is the unique maximal solution, which is moreover global, to the linear left Caputo fractional Cauchy problem given by The function w y (·, u, x a ) is called the variation vector associated to (u, x a ) and y.
Proof. The technical proof of Proposition 3.11 is detailed in Appendix A.1.3.
The existence, uniqueness and globality of w y (·, u, x a ) ∈ C([a, b], R n ) follow from Theorem 3 of [13]. Considering the function ξ := w y (·, u, x a ) − y, one can easily see that ξ is the unique maximal solution, which is moreover global, of the (nonhomogeneous) linear left RL fractional Cauchy problem given by The existence, uniqueness and globality of ξ ∈ L 1 ([a, b], R n ) follow from Theorem 1 of [13]. Recall that the product ρ α a+ ξ ∈ C a ([a, b], R n ) (see [14], Thm. 6) and note that the fractional Duhamel formula given in Theorem 5 of [13] applied to ξ allows to get that where the notations ρ α a+ and Φ(·, ·) are introduced in Section 3.3.2.

Pontryagin Maximum Principle (PMP)
Let us recall that the normal cone to C at a point x ∈ C is defined by It is a nonempty closed convex cone containing 0 R j . We recall basic convexity notions such as distance function, projection, etc. in Appendix A.2.2. Also recall that g : R n × R n → R j is said to be submersive at a point (x a , x b ) ∈ R n × R n if its differential at this point is surjective. We may now formulate the main result of the present paper.
is an optimal solution to Problem (OCP). Then there exists a nontrivial couple (p, p 0 ), where p ∈ AC α b− ([a, b], R n ) (called adjoint vector) and p 0 ≤ 0, such that the following conditions hold: (i) Fractional Hamiltonian system (or extremal equations): (ii) Hamiltonian maximization condition: for almost every t ∈ [a, b]; (iii) Transversality conditions on the adjoint vector: if in addition g is submersive at (x * (a), x * (b)), then the nontrivial couple (p, p 0 ) can be selected to satisfy The technical (and quite long) proof of Theorem 3.12 (in its general form) is detailed in Appendix A.2. From a standard change of variable, we reduce Problem (OCP) to the case where there is no Lagrange cost (that is, with F = 0). Then, the proof is based on the sensitivity analysis performed in Section 3.3 and, in order to take into account the terminal state constraint g(x(a), x(b)) ∈ C, from the application of Ekeland's variational principle (recalled in Prop. A.7) on a penalized functional. We feel that it is of interest to provide here a simple proof of Theorem 3.12 in two particular and simpler cases.
Proof of Theorem 3.12 in two particular and simpler cases. In this proof we will assume that there is no Lagrange cost (that is, F = 0) in Problem (OCP), that g is the identity function (in particular j = 2n and g is submersive at any point) and C = C 1 × R n where: (i) either C 1 = {x a } for some x a ∈ R n fixed (corresponding to the case where the initial condition is fixed and the final condition is free in Problem (OCP)); (ii) either C 1 = R n (which corresponds to the case where the initial and final conditions are let free in Problem (OCP)).
be an optimal solution to Problem (OCP). With the notations introduced in Section 3.3, note that (u * , x * (a)) ∈ A G and x * = x(·, u * , x * (a)). Let (s, v) ∈ L(f (x * , u * , ·)) × U. From optimality of (x * , u * ) and using Proposition 3.10, we know that ϕ(x(a, u δ , x * (a)), x(b, u δ , x * (a))) − ϕ(x * (a), x * (b)) δ ≥ 0, for all δ > 0 sufficiently small, where u δ stands for the needle-like perturbation of u * associated to (s, v). From Proposition 3.10 and letting δ → 0, we get From the Duhamel formula (3.2), we obtain Hence, defining for almost all s ∈ [a, b), we deduce from the above inequality the Hamiltonian maximization condition of Theorem 3.12. From the duality theorem given in Theorem 7 of [13], we deduce that p ∈ AC α b− ([a, b], R n ) is the unique maximal solution, which is moreover global, of the linear right RL fractional Cauchy problem given by The existence, uniqueness and globality of p ∈ L 1 ([a, b], R n ) follow from the right counterpart of [13,Theorem 1]. In addition, the product ρ α b− p belongs to C([a, b], R n ) thanks to the right counterpart of [14,Theorem 6], where ρ α b− : [a, b] → R stands for the weight function given by ρ α b− (t) := Γ(α)(b − t) 1−α for all t ∈ [a, b]. Then, from the above Cauchy problem, we get the fractional Hamiltonian system of Theorem 3.12. Next let us define p 0 := −1 (in particular, the nontriviality of (p, p 0 ) is guaranteed).
(i) If C 1 = {x a } for some x a ∈ R n fixed, the transversality conditions in Theorem 3.12 are both satisfied by (a) and Ψ 2 := 0 R n . Indeed the normal cone to the entire space R n (at any point) is reduced to the singleton {0 R n } and the normal cone to the singleton {x a } (at x a ) is the entire space R n . So the proof is complete in the case where C 1 = {x a } for some x a ∈ R n fixed.
(ii) Now, let us assume that C 1 = R n and let y ∈ R n . From the optimality of (x * , u * ) and using Proposition 3.11, we know that for all δ > 0 sufficiently small. From Proposition 3.11 and letting δ → 0, we get that From the Duhamel formula (3.3), we know that Thus we can rewrite the above inequality as Since the above inequality is true for all y ∈ R n , it is clear that the condition ) follows. Hence, the transversality conditions in Theorem 3.12 are both satisfied by considering Ψ = (Ψ 1 , Ψ 2 ) with Ψ 1 := 0 R n and Ψ 2 := 0 R n . This completes the proof in the case where C 1 = R n .
We end this section with some comments.
Remark 3.13. The nontrivial couple (p, p 0 ) in Theorem 3.12, which is a Lagrange multiplier, is defined up to a positive multiplicative scalar. Defining as usual an extremal as a quadruple (x, u, p, p 0 ) solution to the extremal equations, an extremal is said to be normal whenever p 0 = 0 and abnormal whenever p 0 = 0. In the normal case p 0 = 0, it is usual to normalize the Lagrange multiplier so that p 0 = −1.  Remark 3.16. Our strategy to prove Theorem 3.12 in its general form is based on Ekeland's variational principle (Prop. A.7) and thus requires the closeness of U to define the corresponding penalized functional on a complete metric space (see details in Appendix A.2). In the two particular and simpler cases considered above, Ekeland's variational principle is not required and the closeness of U is a useless assumption that can be removed. function and C = {x a } × R n where x a ∈ R n is the fixed initial point. In that case, the nontriviality of the couple (p, p 0 ) and the transversality conditions in Theorem 3.12 imply that p 0 = 0 (which we normalize to p 0 = −1, see Rem. 3.13) and ).

-If the initial point is fixed and the final point is subject to inequality constraints
If G is of class C 1 and is submersive at any point x b ∈ G −1 ((R + ) k ), then the transversality conditions in Theorem 3.12 can be written as for some λ i ≥ 0, i = 1, . . . , k. -If there is no Mayer cost (that is, ϕ = 0) and the periodic condition x(a) = x(b) is considered in Problem (OCP), one may consider g : R n × R n → R n , g(x a , x b ) = x b − x a and C = {0 R n }. In that case, the transversality conditions in Theorem 3.12 yield that . We point out that, in all examples above, the function g is indeed a submersion.
Remark 3.18. Let us assume that the Hamiltonian H considered in Theorem 3.12 is differentiable with respect to its second variable (for example, if f and F are so). In that case, and if U is convex, the Hamiltonian maximization condition in Theorem 3.12 implies the (weaker) nonnegative Hamiltonian gradient condition given by for all v ∈ U and for almost every t ∈ [a, b]. Similarly, if U = R m (that is, no control constraint in Problem (OCP)), then the Hamiltonian maximization condition in Theorem 3.12 implies the (weaker) null Hamiltonian gradient condition given by Remark 3.19. Note that an extension of Theorem 3.12 for a parameterized version of Problem (OCP) (that is, depending on a vectorial parameter z to be optimized) can be easily derived by adding the fractional state equation c D α a+ [z](t) = 0. Remark 3.20. If β = 1 in Problem (OCP), then we recover in Theorem 3.12 the standard Hamiltonian given by H(x, u, p, p 0 , t) := p, f (x, u, t) R n + p 0 F (x, u, t). On the other hand, if β = 1 in Problem (OCP), then the Hamiltonian is not standard any longer since it is given by H(x, u, p, p . This phenomenon is due to the nonlocality of the fractional operator I β a+ but it is natural since the fractional Lagrange cost can be rewritten as In particular, if β = 1, the Hamiltonian considered in Theorem 3.12 may be not autonomous, even if f and F are so.

Remark 3.21.
A previous attempt to derive a version of the PMP in a general Caputo fractional context can be found in Theorem 3.1, p. 3644 of [5]. Precisely the authors of [5] consider Problem (OCP) with no Lagrange cost, an initial condition fixed to some x a ∈ R n and a free final condition, that is (with our notations), F = 0, g is the identity function and C = {x a } × R n . While the adjoint equations derived in Theorem 3.1, p. 3644 of [5] and Theorem 3.12 coincide, the transversality conditions are fundamentally different. Indeed, it is given by in Theorem 3.1, p. 3644 of [5], while it is given by To this aim let us consider in addition that n = 1, f = 0 and ϕ(x 1 , x 2 ) = x 2 in Problem (OCP). In that trivial situation it is clear that any control is optimal. As a consequence, from Theorem 3.1, p. 3644 of [5], it would exist an adjoint vector p such that D α b− [p] = 0 over [a, b] and p(b) = −1, which clearly raises a contradiction from the right counterpart of Proposition 2.8.

Two examples
This section is devoted to the application of Theorem 3.12 to solve simple examples. We focus on the fractional versions (0 < α ≤ 1 and β ≥ α) of two basic problems.

A fractional linear-quadratic problem
In this section we consider the fractional linear-quadratic problem given by where T > 0 and (a, b) = (0, 0). Problem (4.1) corresponds to a fractional version of the classical parking problem (or double integrator problem). Let us assume that Problem (4.1) admits an optimal solution denoted by (x * , u * ).
Then, there exists a nontrivial couple (p, p 0 ) ∈ AC α T − ([0, T ], R 2 ) × R such that all necessary conditions provided in Theorem 3.12 are satisfied, where the Hamiltonian is given by The fractional Hamiltonian system gives D α T − [p 1 ] = 0 and D α T − [p 2 ] = p 1 leading to for every t ∈ [0, T ), for some constants c 1 , c 2 ∈ R. Since there is no control constraint in Problem (OCP) and from Remark 3.18, the null Hamiltonian gradient condition gives for almost all t ∈ [0, T ]. From the nontriviality of the couple (p, p 0 ), one can easily see that p 0 = 0 and normalize the Lagrange multiplier so that p 0 = −1 (Rem. 3.13). Thus

A fractional Zermelo problem
In this section we consider the fractional version of the classical Zermelo problem given by where T > 0. Let us assume that Problem (4.2) admits an optimal solution denoted by (x * , u * ). Then, there exists a nontrivial couple (p, p 0 ) ∈ AC α T − ([0, T ], R 2 ) × R such that all necessary conditions provided in Theorem 3.12 are satisfied, where the Hamiltonian is given by H(x 1 , x 2 , u, p 1 , p 2 , t) = p 1 (x 2 + cos(u)) + p 2 sin(u).
for almost all t ∈ [0, T ]. With the classical Cauchy-Schwarz inequality we get that leading to tan(u * (t)) = p2(t) p1(t) which gives for almost all t ∈ [0, T ].

Conclusion
As a conclusion we present perspectives and forthcoming works which may follow the present paper.

Works in progress
We first consider the framework of Theorem 3.12 in the classical case α = β = 1 and we recall the notion of maximized Hamiltonian H : [a, b] → R given by for almost every t ∈ [a, b]. If H is differentiable with respect to t with ∂ t H continuous (for example if f and F are), it is well-known that H is equal almost everywhere on [a, b] to an absolutely continuous function (denoted similarly) which satisfiesḢ for almost every t ∈ [a, b]. This property is known as the Hamiltonian (absolute) continuity and we roughly say that the total derivative of the Hamiltonian is equal to its partial derivative. In particular, if the problem is autonomous, then H is constant. We refer for instance to Theorem 2.6.3, p. 73 of [24] for details in the classical theory α = β = 1. This property provides an additional necessary optimality condition and is particularly interesting to deal with optimal control problems with free final time (which encompass minimal time problems for example). Indeed, it is well-known that a change of time variable allows to convert a free final time problem into an autonomous fixed final time problem. Then, from the constancy of the corresponding maximized Hamiltonian, combined with a parameterized version of the classical PMP, the classical transversality condition on the optimal free final time is derived. We refer for instance to Chapter 14 of [31] for details in the classical theory α = β = 1.
To the best of our knowledge, no Hamiltonian continuity in the fractional case 0 < α ≤ 1 and β ≥ α has been announced, proved or even refuted in the literature. The priority for the authors of the present paper is to deal with this issue. Preliminary results have been obtained and a complete study will be published in the near future. Our final objective is to establish a version of the PMP that handles Problem (OCP) with a free final time. Let us point out that some earlier works like [10,42,46,53] already deal with fractional optimal control problems with free final time.
We conclude by mentioning that, similarly to the classical case α = β = 1, the PMP stated in Theorem 3.12 only allows to solve explicitly a few number of basic Caputo fractional optimal control problems (see Sect. 4 for two simple examples). Nevertheless, like in the classical theory α = β = 1, Theorem 3.12 induces a numerical way to solve them by adapting to the fractional framework the well-known shooting methods (which are indirect numerical methods reducing optimal control problems to two-point boundary value problems that can be solved by Newton's methods for example). We refer for instance to Section 3.3 of [11] for details on shooting methods in the classical case. To extend these methods to the fractional framework considered in this paper, one may consider Grünwald-Letnikov discretizations of the RL and Caputo fractional derivatives (see, e.g., [44], p. 43 and p. 200 or [50] for details). This issue will also be addressed in a forthcoming paper. We refer to [3,4,8,32,46] for previous numerical studies on fractional optimal control problems.

Other possible extensions
In Theorem 3.1 we provided an existence result for Problem (OCP) based on the classical Filippov's approach (see, e.g., [16,25,40]). Extensions and other approaches are well-known in the classical theory α = β = 1 (see, e.g., [20], Chap. 9). Fractional versions of these results can be considered as perspectives as well. Let us mention here the works of Kamocki in Theorem 17 of [34] and Theorem 4.2 of [36] that provide existence results for some linear RL and Caputo fractional optimal control problems.
Concerning the PMP, many extensions of Theorem 3.12 are possible. We may for instance: -consider α > 1 and/or β < α; -consider fractional multi-order α = (α i ) i=1,...,n as in [13]; -rule out the closeness assumption on U (with a different approach than Ekeland's variational principle, see Rem. 3.16); -introduce time dependence U = U(t) on the control constraint set; -consider intermediate state constraints (that is, not only at t = a or t = b, but also at some times c i ∈ (a, b)) or running state constraints (over the whole interval [a, b]); etc.
We conclude this section by pointing out that we have considered a general optimal control problem involving a Caputo fractional state equation in this paper. In our opinion, a relevant extension to Theorem 3.12 would be to consider a RL fractional state equation driven by the RL fractional operator D α a+ . A lot of difficulties are expected in that framework since the corresponding trajectories x ∈ AC α a+ ([a, b], R n ) are not bounded any longer due to singularities that occur at t = a. As a consequence, most of estimates used in this paper cannot be extended. In that framework, one should consider a Mayer cost and a state constraint that involve the RL initial and final conditions I 1−α a+ [x](a) and

A.1 Proofs of Section 3.3
This section is devoted to the proofs of Propositions 3.10 and 3.11 detailed respectively in Appendices A.1.2 and A.1.3. The notations introduced in the next preliminary Appendix A.1.1 will be required.

A.1.1 Preliminaries on stability and continuity results
We first prove that the set A G is open (Prop. A.1) and establish the continuity of the state x(·, u, x a ) with respect to the couple (u, x a ) ∈ A G (Prop. A.2). For this purpose, we fix (u, x a ) ∈ A G in the whole section and for every R ≥ u L ∞ , we introduce the set Note that any convex combination of two elements (y 1 , v 1 , t), (y 2 , v 2 , t) ∈ K R belongs to K R . Moreover, from the continuity of x(·, u, x a ) on [a, b], K R is a compact subset of R n × R m × [a, b]. From the assumptions on the dynamic f (see Hyp. (H)), there exists a nonnegative constant L R ≥ 0 such that for all (y, v, t) ∈ K R , and Proposition A.1. For every R ≥ u L ∞ , there exists η R > 0 such that the neighborhood of (u, x a ) given by is contained in A G . Moreover, it holds that (x(τ, u , x a ), u (τ ), τ ) ∈ K R for almost every τ ∈ [a, b] and all (u , x a ) ∈ N R .
Proof. Let R ≥ u L ∞ and let 0 < η R < 1 be such that where r α and M 1 α are defined at the beginning of Appendix A.1, and E α,1 denotes the classical Mittag-Leffler function (see Appendix B for details). Let (u , x a ) ∈ N R . Our aim is to prove that (u , x a ) ∈ A G , that is, b ∈ I(u , x a ). By contradiction, let us assume that the set is not empty and let t 0 := inf A. It holds that x(t 0 , u , x a ) − x(t 0 , u, x a ) R n ≥ 1 by continuity. Since x(a, u , x a ) − x(a, u, x a ) R n = x a − x a R n ≤ η R < 1, we get that t 0 > a. Moreover, one has x(t, u , x a ) − x(t, u, x a ) R n ≤ 1 for every t ∈ [a, t 0 ]. Therefore (x(τ, u , x a ), u (τ ), τ ) and (x(τ, u, x a ), u(τ ), τ ) belong to K R for almost every τ ∈ [a, t 0 ]. From integral representations we obtain for every t ∈ [a, t 0 ]. It follows from inequality (A.2) that for every t ∈ [a, t 0 ]. Then, from the classical Hölder inequality applied to the second right-hand side term and since (u , x a ) ∈ N R , we get for every t ∈ [a, t 0 ]. From the fractional version of the Gronwall lemma given in Proposition B.1, we obtain for every t ∈ [a, t 0 ], which raises a contradiction at t = t 0 . Therefore A is empty. We conclude that x(·, u , x a ) is bounded on I(u , x a ), and thus b ∈ I(u , x a ) from Proposition 3.9. Moreover the emptiness of A also implies that for every t ∈ [a, b], and thus (x(τ, u , x a ), u (τ ), τ ) ∈ K R for almost every τ ∈ [a, b].
For every R ≥ u L ∞ , we endow the neighborhood N R of (u, x a ) with the basic distance for all (u , x a ), (u , x a ) ∈ N R . We conclude this section with the following continuity result.
for all (u , x a ), (u , x a ) ∈ N R . In particular, the mapping is continuous.
Proof. Let (u , x a ), (u , x a ) ∈ N R . We know that (x(τ, u , x a ), u (τ ), τ ) and (x(τ, u , x a ), u (τ ), τ ) are elements of K R for almost every τ ∈ [a, b] (Prop. A.1). Following the same arguments as in the proof of Proposition A.1, it follows that A.1.2 Proof of Proposition 3.10 The first item of Proposition 3.10 follows from Proposition A.2. Before proving the second item of Proposition 3.10, we need the following Lemmas A.3, A.4 and A.5. For the ease of notations, in this section, we denote by x δ := x(·, u δ , x a ) for all 0 ≤ δ ≤δ, by x := x(·, u, x a ) and by w := w (s,v) (·, u, x a ).
Proof. Similarly to the proof of Lemma A.3, we get for all t ∈ (s + δ, b] and all 0 ≤ δ ≤δ. Using Lemma A.3 and the inequality s+δ s and thus, for all t ∈ (s + δ, b] and all 0 ≤ δ ≤δ. Once again, the fractional version of the Gronwall lemma given in Proposition B.1 concludes the proof by setting µ 2 Lemma A.5. Let us define for almost every τ ∈ [a, b] and every 0 ≤ δ ≤δ. Then ζ δ L rα tends to zero when δ → 0.

A.1.3 Proof of Proposition 3.11
Let y ∈ R n and let R = u L ∞ . Since x a + δy − x a R n ≤ δ y R n for all δ ≥ 0, we deduce from Proposition A.1 that there existsδ > 0 such that (u, x a + δy) ∈ N R ⊂ A G for all 0 ≤ δ ≤δ. Then the first item of Proposition 3.11 follows from Proposition A.2. Once again, before proving the second item of Proposition 3.11, we need a preliminary lemma. For the ease of notations, in this section, we denote by x δ := x(·, u, x a + δy) for all 0 ≤ δ ≤δ, by x := x(·, u, x a ) and by w := w y (·, u, x a ).
We may now prove the second item of Proposition 3.11. Let us we define z δ (t) : and all 0 < δ ≤δ. We want to show that z δ uniformly converges to zero on [a, b] as δ → 0. The integral representations give for every t ∈ [a, b] and all 0 < δ ≤δ. Recall that x δ − x C ≤ δC R y R n for all 0 ≤ δ ≤δ thanks to Proposition A.2. With the classical Taylor expansion with integral rest, we obtain for every t ∈ [a, b] and all 0 < δ ≤δ. Using Hölder inequality and the fact that ∂ 1 f is bounded by L R ≥ 0 on K R , we get from the fractional version of the Gronwall lemma (Prop. B.1) that for every t ∈ [a, b] and all 0 < δ ≤δ. The second item of Proposition 3.11 is achieved from Lemma A.6.

A.2 Proof of Theorem 3.12
Our strategy to prove Theorem 3.12 follows the five following steps: (i) We first reduce Problem (OCP) to a Mayer problem, that is, with no Lagrange cost (F = 0). For this purpose we use a standard change of variable. We refer to Appendix A.2.1 for details. As a consequence, from this first step until the end of this appendix, we assume that there is no Lagrange cost (F = 0) in Problem (OCP). (ii) We give some preliminaries and notations in Appendix A.2.2. (iii) Using the sensitivity analysis of the fractional state equation performed in Section 3.3, we compute derivatives of the trajectory x with respect to perturbations of the control u and of the initial condition x a .
(iv) We apply Ekeland's variational principle (Proposition A.7) to take into account the terminal state constraint g(x(a), x(b)) ∈ C. We derive crucial inequalities on variation vectors. We refer to Appendices A.2.3, A.2.4 and A.2.5 for details. (v) We conclude the proof by introducing the adjoint vector p in Appendix A.2.6. This adjoint vector allows to derive the transversality conditions and the Hamiltonian maximization condition of Theorem 3.12 from the inequalities obtained in the previous step.
For the reader's convenience, we recall hereafter a simplified version (but sufficient for our purposes) of Ekeland's variational principle.
Proposition A.7 (Ekeland's variational principle [23]). Let (M, d M ) be a complete metric space and let J : M → R + be a nonnegative continuous mapping. Let ε > 0 and x * ∈ M such that J(x * ) ≤ ε. Then, there exists

A.2.1 Reduction to Mayer problem from a change of variable
Let us assume that Theorem 3.12 is already proved in the case of a Mayer problem, that is, with no Lagrange cost (F = 0). Our aim in this section is to derive Theorem 3.12 in the general Bolza case. Let , R m ) be an optimal solution to Problem (OCP) and let us introduce z * := I α a+ [F β (x * , u * , ·)] ∈ c AC α a+ ([a, b], R). One can see that the couple ((x * , z * ), u * ) is an optimal solution to the augmented Caputo fractional optimal control problem of Mayer form given by In both cases, note thatg is submersive at ((x * , z * )(a), (x * , z * )(b)). We deduce the existence of a nontrivial augmented couple ((p, q), p 0 ) ∈ AC α b− ([a, b], R n+1 ) × R satisfying all necessary conditions listed in Theorem 3.12 adapted to the augmented Problem (OCPAM) of Mayer form. In particular, we get that D α b− [q] = 0 with ∈ [a, b). The rest of the proof is staightforward from all necessary conditions provided in Theorem 3.12 (in the case of a Mayer problem).
Thanks to this section, from now and until the end of the proof of Theorem 3.12 in its general form, we may assume that there is no Lagrange cost (F = 0) in Problem (OCP).

A.2.2 Some preliminaries and notations
We denote by d C the distance function to the nonempty closed convex subset C ⊂ R j defined as usual by d C (x) := inf x ∈C x − x R j for all x ∈ R j . Recall that, for every x ∈ R j , there exists a unique element P C (x) ∈ C (projection of x on C) such that d C (x) = x − P C (x) R j . It is characterized by the property x − P C (x), x − P C (x) R j ≤ 0 for every x ∈ C. In particular, x − P C (x) ∈ N C [P C (x)]. The function P C : R j → C is 1-Lipschitz continuous. We refer to p. 131 of [17] for details. The proof of Theorem 3.12 requires the two following lemmas.
Lemma A.9. Let (x k ) k∈N be a sequence of points in R j and let (ς k ) k∈N be a sequence of positive real numbers such that x k → x ∈ C and ς k ( The positive part function is defined on R by Pos : The proof of our main result also needs the following classical result.
Lemma A.10. The function Pos 2 : x → x +2 is differentiable on R, and it holds that DPos 2 (x)(x ) = 2x + x for all x, x ∈ R.
, R m ) be an optimal solution to Problem (OCP). With the notations introduced in Section 3.3, (u * , x * (a)) ∈ A G and x * = x(·, u * , x * (a)). In the sequel, for all R ∈ N such that R ≥ u * L ∞ , we denote by K * R the corresponding compact subset of R n × R m × [a, b] defined as in Appendix 3.3.1 and by L * R the corresponding nonnegative constant (Eq. (A.2)). Similarly, we denote by N * R ⊂ A G the neighborhood of (u * , x * (a)) defined as in Proposition A.1 and by η * R > 0 the corresponding positive radius. Finally we endow N * R with the same basic distance d L 1 ×R n defined as in Equality (A.3). To take into account the control constraint set U in Problem (OCP), we set For example, we know that (u * , x * (a)) ∈ N * ,U R . Since U ⊂ R m is a nonempty closed subset of R m , one gets from the (partial) converse of Lebesgue's dominated convergence theorem that (N * ,U R , d L 1 ×R n ) is a complete metric space. Note that the closeness assumption on the control constraint set U is crucial here in order to apply Ekeland's variational principle on a complete metric space.

A.2.3 A penalized functional with R ∈ N fixed
In the whole section we fix some R ∈ N such that R ≥ u * L ∞ . Let us consider a positive sequence (ε R k ) k∈N converging to zero as k → ∞ such that for all k ∈ N. Then, we define the penalized functional J R k : N * ,U R → R + by for all (u, x a ) ∈ N * ,U R and all k ∈ N.
-Fix k ∈ N -From Proposition A.1, we know that N * ,U R ⊂ N * R ⊂ A G and thus J R k (u, x a ) is well-defined for all (u, x a ) ∈ N * ,U R . From the optimality of the couple (x * , u * ), one gets by contradiction that J R k (u, x a ) > 0 for all (u, x a ) ∈ N * ,U R . As ϕ, g and d 2 C are continuous, it follows from the continuity result of Proposition A.2 that J R k is a positive continuous map, which satisfies moreover J R k (u * , x * (a)) = ε R k . Then Ekeland's variational principle (Prop. A.7) yields that there exists (u R k , x R a,k ) ∈ N * ,U R such that for all (u, x a ) ∈ N * ,U R . Finally we introduce the terms Note that Ψ 0R k ≤ 0 and that |Ψ 0R k | 2 + Ψ R k 2 R j = 1 from the definition of J R k (u R k , x R a,k ) and that −Ψ R k ∈ N C [P C (g(x R a,k , x(b, u R k , x R a,k )))].
-Compactness arguments -Recall that (ε R k ) k∈N converges to zero as k → ∞. Using inequality (A.11), some compactness arguments from the equality |Ψ 0R k | 2 + Ψ R k 2 R j = 1, the (partial) converse of Lebesgue's dominated convergence theorem and extracting some subsequences (denoted similarly), we have the following: -(x R a,k ) k∈N converges to x * (a) in R n ; -(u R k ) k∈N converges to u * in L 1 ([a, b], R m ); -(u R k (t)) k∈N converges to u * (t) in R m for almost every t ∈ [a, b]; -(Ψ 0R k ) k∈N converges to some Ψ 0R in R; -(Ψ R k ) k∈N converges to some Ψ R in R j .

A.2.4 Two crucial inequalities depending on R ∈ N fixed
In the whole section we fix some R ∈ N such that R ≥ u * L ∞ . We introduce Note that L R has a full Lebesgue measure equal to b − a. Let (s, v) ∈ L R × (U ∩ B R m (0, R)) and y ∈ R n .
-Perturbation of the control u R k with k ∈ N fixed -For all δ ∈ [0, b − s), we consider u R k,δ the needle-like perturbation of u R k associated to (s, v) defined as in Section 3.3.2. From inequality (A.11), since v ∈ B R m (0, R), we get that for all δ ∈ [0, b − s). With inequality (A.10) and since v ∈ U ∩ B R m (0, R), we deduce that inequality (A.12) can be applied to (u, x a ) = (u R k,δ , x R a,k ) ∈ N * ,U R for all δ ∈ [0, b − s) sufficiently small. We obtain for all t ∈ (s, b] and all k ∈ N. Since the product ρ α s+ w ∈ C([s, b], R n ) (see [14], Thm. 6), we get from Hölder inequality that 12. It holds that the sequence w y (·, u R k , x R a,k ) uniformly converges on [a, b] to w y (·, u * , x * (a)) when k → ∞.
Proof. For the ease of notations, we denote by x R k := x(·, u R k , x R a,k ), by w k := w y (·, u R k , x R a,k ), A k := ∂ 1 f (x R k , u R k , ·) for all k ∈ N, and w := w y (·, u * , x * (a)), A := ∂ 1 f (x * , u * , ·). Since (x R k ) k∈N and (u R k ) k∈N pointwise converge on [a, b] to x * and u * (Prop. A.2) when k → ∞ and ∂ 1 f is continuous, bounded by L * R on K * R , we get from Lebesgue's dominated convergence theorem that (A k ) k∈N converges to A in L rα ([a, b], R n×n ). Integral representations yield for all t ∈ [a, b] and all k ∈ N. With Hölder inequality, this gives for all t ∈ [a, b] and all k ∈ N. Using the fractional version of the Gronwall lemma given in Proposition B.1 implies that for every t ∈ [a, b] and all k ∈ N. The proof is achieved by letting k → ∞.
The above inequality constitutes the second crucial inequality depending on R ∈ N fixed.
A.2.5 Two crucial inequalities independent of R ∈ N Inequalities (A.14) and (A.16) depend both on R ∈ N such that R ≥ u * L ∞ . In particular, inequality (A.14) is satisfied only for all (s, v) ∈ L R × (U ∩ B R m (0, R)). Our goal in this section is to get rid of the dependence in R. Let us define the set which has a full Lebesgue measure equal to b − a. From the equality |Ψ 0R | 2 + Ψ R 2 R j = 1 and extracting some subsequences (denoted similarly), (Ψ 0R ) R∈N converges to some Ψ 0 in R and (Ψ R ) R∈N converges to some Ψ in R j when R → ∞. Moreover, it holds that Ψ 0 ≤ 0, |Ψ 0 | 2 + Ψ 2 R j = 1 and −Ψ ∈ N C [g(x * (a), x * (b))] (since N C [g(x * (a), x * (b))] is closed).
-If g is not submersive at (x * (a), x * (b)) -If g is not submersive at (x * (a), x * (b)), we have to see that if a couple (x * , u * ) is an optimal solution to Problem (OCP), then (x * , u * ) is also an optimal solution to a similar problem where the function g is replaced by the identity functiong : R n × R n → R n → R n ,g(x a , x b ) := (x a , x b ), which is submersive at any point, and where the closed convex set C is replaced by the singletoñ C := {(x * (a), x * (b))}. So we get back to the submersive case, but with a different function g and a different closed convex set C that would impact only the transversality conditions in Theorem 3.12. With the use ofg andC, note that the transversality conditions do not provide any information.