PONTRYAGIN MAXIMUM PRINCIPLE FOR STATE CONSTRAINED OPTIMAL SAMPLED-DATA CONTROL PROBLEMS ON TIME SCALES

In this paper we consider optimal sampled-data control problems on time scales with inequality state constraints. A Pontryagin maximum principle is established, extending to the state constrained case existing results in the time scale literature. The proof is based on the Ekeland variational principle and on the concept of implicit spike variations adapted to the time scale setting. The main result is then applied to continuous-time min-max optimal sampled-data control problems, and a maximal velocity minimization problem for the harmonic oscillator with sampled-data control is numerically solved for illustration. Mathematics Subject Classification. 26E70; 34H05; 34K35; 34N05; 39A12; 49J15; 49J50; 93C10; 93C15; 93C57. Received March 14, 2020. Accepted April 22, 2021.


Optimal control problems
In mathematics a dynamical system describes the evolution of a point (usually called the state of the system) in an appropriate space (called the state space) following an evolution rule (called the dynamics of the system). Dynamical systems are of many different natures (continuous-time versus discrete-time systems, deterministic versus stochastic systems, etc.). A continuous-time system is a dynamical system in which the state evolves in a continuous way in time (for instance, ordinary differential equations, evolution partial differential equations, etc.), while a discrete-time system is a dynamical system in which the state evolves in a discrete way in time (for instance, difference equations, quantum differential equations, etc.). A control system is a dynamical system in which a control parameter intervenes in the dynamics and thus influences the evolution of the state. Finally an optimal control problem consists in determining a control which allows to steer the state of a control system from an initial condition to some desired target while minimizing a given cost and satisfying some constraints.
in [22] a version of the PMP for continuous-time optimal sampled-data control problems. In that context, as in the PMP for discrete-time optimal permanent control problems, the usual Hamiltonian maximization condition does not hold in general and has to be replaced by the condition known as the nonpositive averaged Hamiltonian gradient condition (see [22], Thm. 2.6). More recently Bourdin and Dhar have extended in [18] the previous work to the state constrained case. In that context the authors have observed and studied a particular behavior of the trajectories with respect to the state constraints, called bouncing trajectory phenomenon. Precisely, under some quite general hypotheses, the authors prove that an admissible trajectory necessarily "bounces" against the boundary of the restricted state space at most a finite number of times. Inherent to this behavior, the singular component of the costate vector vanishes and its discontinuities are reduced to a finite number of jumps, which turns out very useful for numerical simulations based on shooting methods. We refer to Sections 4 and 5 of [18] for details and discussion. We conclude this paragraph by mentioning the work [17] in which the optimization of the sampling times (in the state constraint-free case) has also been explored in view of a PMP formulation. In that context, it has been proved that the necessary condition for optimal sampling times coincides with the continuity of the maximized Hamiltonian function. This necessary condition turns out to be efficient in order to compute numerically the optimal sampling times using shooting methods. For example this approach has been employed in [5] in order to optimize instances of functional electrical stimulations of muscles in human force-fatigue muscular models.

Optimal (permanent) control problems on time scales
The time scale theory was initiated in 1988 by Hilger [42] in order to unify continuous and discrete analyses. By definition, a time scale T is an arbitrary nonempty closed subset of R, and a dynamical system is said to be posed on the time scale T whenever the time variable evolves along this set T. For example a continous-time dynamical system corresponds to T = R + , while T = N is associated with a discrete-time dynamical system. But a time scale can be much more general (it can be even a Cantor set). Many notions of standard calculus (such as derivatives, integrals, etc.) have been extended to the time scale framework, and we refer the reader to [3,9,10] for details on this theory. We also refer to Section 2 of the present paper for some basics. The Cauchy-Lipschitz (or Picard-Lindelöf) theory has been extended in [19] to ordinary differential equations posed on general time scales. For T = R + for example, one recovers the classical theory of (continuous-time) ordinary differential equations, while, for T = N for example, one recovers the theory of difference equations. This provides an illustration that the time scale theory allows to close the gap between continuous and discrete analyses, and this is possible in any mathematical domain in which time scale calculus can be involved. Another example is provided in optimization with the calculus of variations on time scales, initiated in [8], and well-studied in the literature (see, e.g., [7,15,43,46]). On the other hand, in [44,45], the authors establish a weak version of the PMP (with the nonpositive Hamiltonian gradient condition) for (permanent) control systems defined on general time scales. In [20], Bourdin and Trélat derived a strong version of the PMP in the same setting, proving the Hamiltonian maximization condition at right-dense points of the time scale, and the nonpositive Hamiltonian gradient condition at right-scattered points of the time scale (see the beginning of Section 2 for the precise definitions of right-dense and right-scattered points of a time scale). This result thus encompasses the classical version of the PMP for continuous-time optimal (permanent) control problems (taking T = R + for example), and also the one for discrete-time optimal (permanent) control problems (taking T = N for example). Furthermore the work [20] emphasizes the reasons why the Hamiltonian condition is different at right-dense points from the one at right-scattered points: in a right-dense point, L 1 -needle-like variations of the control are possible, while, in a right-scattered point, (only) L ∞ -needle-like variations are possible. Nevertheless there is a price to pay in order to derive a PMP on a general time scale: as explained in Section 3.1 of [20], some standard approaches (based on fixed-point theorems or Hahn-Banach separation arguments essentially) fail due to the lack of convexity of a general time scale in the neighborhoods of its right-dense points. For example another time scale version of the PMP has been provided recently in Theorem 2.11 of [11] using necessary conditions of an extreme in a cone, but this approach requires a density condition on the time scale. In view of keeping a general time scale framework, the authors of [20] use the Ekeland variational principle ( [34], Thm. 1.1) which turns out to be suitable in order to prove a time scale version of the PMP with no assumption on the time scale. We refer to [23] for a detailed discussion on the two papers [11,20].

Optimal sampled-data control problems on time scales
The papers [11,20,23,44,45] mentioned in the previous paragraph are concerned (only) with control systems defined on general time scales with permanent control. In the work [21], Bourdin and Trélat have introduced a new framework allowing to handle control systems defined on general time scales with nonpermanent control, referred to as sampled-data control systems on time scales. In that context it is assumed that the state and the control are allowed to evolve on different time scales, respectively denoted by T and T 1 (the time scale T 1 of the control being a subset of the time scale T of the state, that is, T 1 ⊂ T). This framework is the natural extension of the classical continuous-time and sampled-data control setting, displayed above, considering T = R + and T 1 = N for example. A PMP for optimal sampled-data control problems on time scales is proved in Theorem 2.6 of [21] based on the Ekeland variational principle (as in [20]) and on L 1 -needle-like variations (resp. L ∞ -needle-like variations) of the control at right-dense points (resp. right-scattered points) of the time scale T 1 . This leads to a Hamiltonian maximization condition at right-dense points of T 1 and to a nonpositive averaged Hamiltonian gradient condition at right-scattered points of T 1 . In particular, the PMP in [21] encompasses the one in [20] (and thus the classical ones for continuous-time and discrete-time problems with permanent controls), but also the one established in [22] for continuous-time optimal sampled-data control problems. We refer to Sections 2.3 and 3.2 of [21] for a discussion and numerous remarks about the differences between optimal permanent controls and optimal sampled-data controls.

Contributions of the present paper
In this paper we consider optimal sampled-data control problems on time scales with inequality state constraints. A Pontryagin maximum principle is established (see Thm. 3.7 in Sect. 3), extending to the state constrained case the existing results provided in the time scale literature [11,20,21,44,45]. Our proof is based on the Ekeland variational principle which is, as mentioned above and in contrast to some other approaches, suitable in order to deal with general time scale versions of the PMP. Nevertheless the techniques employed in this paper differ from those used in [20,21] in several aspects. First, in order to take into account the presence of state constraints, we include an additional term in the Ekeland penalized functional which eventually gives rise to Borel measures on T and, as in the PMPs for continuous-time optimal permanent and sampled-data control problems, to a bounded variation costate vector. Second, the needle-like variations of the control used in [20,21] are not suitable in order to handle inequality state constraints such as the ones considered in the present work. Indeed the needle-like variations of the control provide a differentiability (with respect to the perturbation parameter) of the state (only) over a subinterval of T, while the analysis of the inequality state constraints requires differentiability over the whole interval of T. We refer to Remark 3.14 for details. In order to overcome this technical difficulty, our idea in this paper is to involve the sensitivity analysis of the control system under implicit spike variations of the control. This concept is used in [13,16,54] for continuous-time optimal permanent control problems and is based in particular on the fact that the usual Lebesgue measure is nonatomic. However, the adaptation of this concept to the time scale setting is not trivial because the measure on the time scale T 1 is atomic (since the measure of any right-scattered point of T 1 is positive, see Sect. 2 for details). Therefore, in the present work, we introduce a new and suitable time scale version of the concept of implicit spike variations, by distinguishing the perturbation at right-dense points and at right-scattered points of T 1 . In particular, at right-scattered points of T 1 , we involve a L ∞ -needle-like variation of the control as in [21]. We refer to Remark 3.14 and Section 5.1.2 for more details. The new tools introduced in this paper, and mentioned in this paragraph, turn out to be suitable and efficient in order to prove our main result (Thm. 3.7). To conclude the contributions of this paper, and as illustration, we consider in Section 4.1 a general continuoustime min-max optimal sampled-data control problems which can be, using a well known idea (developed for example in [31], Rem. 6 or in [71], Prop. 9.5.4), reformulated as a state constrained continuous-time optimal sampled-data control problem on which Theorem 3.7 can be applied. Finally a maximal velocity minimization problem for the harmonic oscillator with sampled-data control is numerically solved in Section 4.2 using a shooting method based on the necessary conditions provided in Theorem 3.7. As in the recent work [18], a bouncing trajectory phenomenon is observed (see Figs. 1 and 2).

Organization of the paper
The paper is organized as follows. In Section 2 we display some basic notions and results in time scale theory which will be used all along the paper. In Section 3 we introduce the sampling procedure and the general optimal sampled-data control problem on time scales considered in the present work. We provide the invoked regularity and topology assumptions and then we state our major contribution (Thm. 3.7). Section 4 is dedicated to the application of our main result to continuous-time min-max optimal sampled-data control problems. The proof of Theorem 3.7 is built up in several stages and is displayed in Section 6, after collecting some crucial preliminary results in Section 5.

Basics on time scale theory
We start recalling some basic definitions and results employed in time scale theory. The reader already familiar with this topic can skip this section and proceed directly to Section 3. Let T be a time scale, that is, an arbitrary nonempty closed subset of R. Without loss of generality, we will assume that T is bounded below, denoting by a := min T, and unbounded above. 1 Throughout the paper, T will be the time scale on which the state of the control system evolves. The forward jump operator σ : T → T is defined by σ(t) := inf{τ ∈ T | τ > t} for every t ∈ T. A point t ∈ T is said to be right-scattered whenever σ(t) > t. A point t ∈ T is said to be right-dense whenever σ(t) = t. We denote by RS the set of all right-scattered points of T, and by RD the set of all rightdense points of T. Recall that RS is at most countable (see [26], Lem. 3.1) and that RD is the complement of RS in T. The graininess function µ : T → R + is defined by µ(t) := σ(t) − t for every t ∈ T. For every subset A of R, we write A T := A ∩ T. An interval of T is a set I T where I is an interval of R.

∆-differentiability
Let n ∈ N * . A function x : T → R n is said to be ∆-differentiable at t ∈ T if the limit Recall that, if s ∈ RD, then x is ∆-differentiable at s if and only if the limit of , as τ → s with τ ∈ T, exists in R n . In that case it is equal to x ∆ (s). If r ∈ RS and x is continuous at (see, e.g., [9], Thm. 1.16). If two functions x, x : T → R n are both ∆-differentiable at t ∈ T, then the scalar product x, These equalities are usually called Leibniz formulas (see, e.g., [9], Thm. 1.20).
1 Indeed, in this paper, we will only work on a bounded subinterval of type [a, b] ∩ T with a, b ∈ T and a < b. It is not restrictive to assume that a = min T and that T is unbounded above. On the other hand, this last assumption will allow us to simplify the notation introduced in this section, avoiding a systematic distinction between points of T\{max T} and max T, which is not necessary in our context.

Lebesgue ∆-measure and Lebesgue ∆-integrability
Let µ ∆ be the Lebesgue ∆-measure on T defined in terms of Carathéodory extension (see [10], Chap. 5). We also refer the reader to [2,26,41] for more details. For all (c, d) Recall that A ⊂ T is a µ ∆ -measurable set of T if and only if A is a µ L -measurable set of R, where µ L denotes the usual Lebesgue measure on R (see [26], Prop. 3.1), and we have Let A ⊂ T be a µ ∆ -measurable subset of T. A property is said to hold ∆-almost everywhere (in short, ∆-a.e.) on A if it holds for every t ∈ A\A , where A ⊂ A is some µ ∆ -measurable set of T satisfying µ ∆ (A ) = 0. In particular, since µ ∆ ({r}) = µ(r) > 0 for every r ∈ RS, we conclude that, if a property holds ∆-a.e. on A, then it holds for every r ∈ A ∩ RS. Similarly, if µ ∆ (A) = 0, then A ⊂ RD.
The functional space L ∞ ∆ (A, R n ) is the set of all functions x defined ∆-a.e. on A, with values in R n , that are µ ∆ -measurable on A and bounded ∆-almost everywhere. Endowed with the usual norm x L ∞ ∆ (A,R n ) := supess τ ∈A x(τ ) R n , it is a Banach space (see [2], Thm. 2.5). The functional space L 1 ∆ (A, R n ) is the set of all functions x defined ∆-a.e. on A, with values in R n , that are µ ∆ -measurable on A and such that A x(τ ) R n ∆τ < +∞. Endowed with the usual norm see Theorems 5.1 and 5.2 of [26]. Note that if A is bounded then L ∞ ∆ (A, R n ) ⊂ L 1 ∆ (A, R n ).

Absolutely continuous functions
Take

Functions of bounded variation
We denote by BV([c, d] T , R n ) the space of functions of bounded variation defined on [c, d] T taking values in R n , that is, the space of functions x : [c, d] T → R n such that where the supremum is taken over all finite partitions {t k } k of [c, d] T . As in the classical continuous-time literature (taking T = R + for example), it can be proved that both the inclusions

Main result and comments
This section is dedicated to the statement of our main result (Thm. 3.7). In Section 3.1 we first give some reminders on a sampling procedure on time scales extracted from p. 60 of [21]. In Section 3.2 we introduce the general state constrained optimal sampled-data control problem on time scales considered in the present work, and we fix the terminology and assumptions used all along the paper. In Section 3.3 we state the corresponding Pontryagin maximum principle and a list of comments follows.

Sampling procedure
Let T 1 be a second time scale, possibly different from the reference one T introduced in Section 2. Throughout the paper, T 1 will be the time scale on which the control of the control system evolves. We assume that T 1 ⊂ T. 2 As for T, we assume that min T 1 = a and that T 1 is unbounded above. In accordance with the previous section, we use the notation σ 1 , RS 1 , RD 1 , ∆ 1 , etc., for the analytical tools relative to the time scale T 1 . Since T 1 ⊂ T, we have RS ∩ T 1 ⊂ RS 1 and RD 1 ⊂ RD.
A sample-and-hold procedure from T 1 to T involves defining an operator that extends to T any function defined on T 1 , by freezing the values on T\T 1 in the sense given by Definition 3.1 below. In order to introduce this sampling procedure, we define the map For every t ∈ T 1 , we have (t) = t. For every t ∈ T\T 1 , we have (t) ∈ RS 1 and (t) < t < σ 1 ( (t)).
2 Indeed, it is not natural to consider controlling times t ∈ T 1 at which the dynamics does not evolve, that is, at which t / ∈ T. The value of the control at such times t ∈ T 1 \T would not influence the dynamics, or, maybe, only on [t * , +∞) T where t * := inf{τ ∈ T | τ ≥ t}. In this last case, note that t * ∈ T and we can replace T 1 by (T 1 ∪ {t * })\{t} without loss of generality. Definition 3.1 (Sampling procedure). Let m ∈ N * and let u : T 1 → R m be a given function. In this paper the sampled-data function associated with u is the function u : T → R m defined by the composition u := u • .
Example 3.2. Let m ∈ N * and consider T = R + and T 1 = N. If u : N → R m is a given function, then the corresponding sampled-data function u : R + → R m is the piecewise constant function given by u (t) = u(k) for all t ∈ [k, k + 1) and all k ∈ N.
We conclude this section with the following useful proposition that can be found in Proposition 1 of [21].

A general state constrained optimal sampled-data control problem on time scales
Let T 1 ⊂ T be the two (possibly different) time scales introduced in Sections 2 and 3.1 (both unbounded above and both bounded below with a := min T = min T 1 ). Let b ∈ T be such that a < b and let m, n, j and ∈ N * be four fixed positive integers. In this paper we focus on the general state constrained optimal sampled-data control problem on time scales given by are given functions, and where U ⊂ R m and S ⊂ R are given sets.
is said to be admissible for Problem (P) if it satisfies all its constraints. A solution to Problem (P) is an admissible couple (x, u) which minimizes the cost g(x(a), x(b)) among all admissible couples. In Problem (P), x is called the state function (also called the trajectory) and u is called the control function.
In the case where T 1 = T, the control is said to be permanent in Problem (P) because its value in the dynamical system can be modified at any time t ∈ T. Otherwise, in the case where T 1 T, the control is said to be sampled-data in Problem (P) and its value in the dynamical system can be modified only at times t ∈ T 1 and remains frozen elsewhere (see Sect. 3.1).
Throughout this paper we will make use of the following regularity and topology hypotheses: , is continuous and of class C 1 with respect to its first two variables; (H2) the set U ⊂ R m , which describes the control constraint u(t) ∈ U, is a nonempty closed convex subset of R m ; (H3) the function g : R n × R n → R, which describes the Mayer cost g(x(a), x(b)), is of class C 1 ; (H4) the function ψ : R n × R n → R , which describes the terminal state constraint ψ(x(a), x(b)) ∈ S, is of class C 1 ; (H5) the set S ⊂ R , involved in the terminal state constraint ψ(x(a), x(b)) ∈ S, is a nonempty closed convex subset of R ; which describes the inequality state constraints h i (x(t), t) ≤ 0, is continuous and of class C 1 in its first variable.
Remark 3.4. The general time scale framework considered in the formulation of Problem (P) allows to recover several typical situations, among which: -continuous-time optimal permanent control problems (taking T = T 1 = R + for example); -discrete-time optimal permanent control problems (taking T = T 1 = N for example); -general optimal permanent control problems on time scales (taking T 1 = T with T general); -continuous-time optimal sampled-data control problems (taking T = R + and T 1 = N for example); -discrete-time optimal sampled-data control problems (taking T = N and T 1 = 2N for example); -general optimal sampled-data control problems on time scales (taking T 1 T both general).
Moreover the general terminal state constraint ψ(x(a), x(a)) ∈ S allows to cover various cases, among which: fixed or free initial condition (resp. final condition), equality/inequality constraints on the initial condition (resp. final condition), mixed initial/final condition such as the periodic condition x(a) = x(b) for example, etc. We refer to Remark 10 of [21] for more details. Finally, by considering j = 1 and h ≡ −1, note that the formulation of Problem (P) also allows to cover the state constraint-free case.
Remark 3.5. Our objective in the present work is to establish necessary optimality conditions for Problem (P).
Regarding existence results, we refer to Theorem 2.1 of [21] in which, under some appropriate compactness and convexity assumptions, a Filippov-type existence result has been obtained in a very similar time scale and sampled-data control setting. However note that it has been established in the state constraint-free case. Nevertheless, to the best of our knowledge, the techniques can be adapted to the present context (that is, with inequality state constraints) and a similar result can be derived provided that Problem (P) is feasible (in the sense that there exists at least one admissible couple).
Remark 3.6. In order to deal with sampled-data controls in the general time scale setting, a possible alternative approach would be to consider controls u : T → R m , defined on the time scale T (and not on T 1 ), by adding the constraint that they are constant on [r, σ 1 (r)) T for all right-scattered points r of T 1 . Nevertheless, in the general optimization problem (P), note that only the values of u taken over T 1 are of interest. As a consequence, from an optimization point of view, it is more natural and suitable to work with the Lebesgue functional , associated with the time scale measure µ ∆1 of T 1 , which is a well-studied Banach space in the literature (see, e.g., [2,26]) in order to enjoy its known mathematical properties all along the paper, in particular in order to define implicit spike perturbations of the controls with respect to the L 1 ∆1 -norm (see Sect. 5.1.2). On the contrary, with the above mentioned alternative framework, one would be led to introduce a new set of admissible controls (that are defined on T and constant over [r, σ 1 (r)) T for all right-scattered points r of , associated with the time scale measure µ ∆ of T, which would be not well-suited in order to provide a good description of the situation, in particular of the implicit spike perturbations of the controls used in this paper.

Pontryagin maximum principle and comments
Before providing a Pontryagin maximum principle associated with Problem (P), we need to recall some basic notions employed in our statement. The normal cone to the closed convex set S at a point x ∈ S is the set defined by The map ψ : R n × R n → R is said to be submersive at a point (x a , x b ) ∈ R n × R n if its differential at (x a , x b ) is surjective, i.e. if the Jacobian matrix ∇ψ(x a , x b ) ∈ R ×2n has full rank. Finally the Hamiltonian H : We are now ready to state the main result of the paper.
and finite nonnegative Borel measures dη 1 , . . . , dη j on [a, b] T such that the following conditions are satisfied: (iii) Transversality condition: (iv-a) Hamiltonian maximization condition at right-dense points: (iv-b) Nonpositive averaged Hamiltonian gradient condition at right-scattered points: where supp(dη i ) stands for the classical notion of support of the measure dη i .
Proof. The proof of Theorem 3.7 is built up in several stages and will be displayed in Section 6, after collecting some crucial preliminary results in Section 5.
We comment below our result. Remark 3.8. As is well known in optimal control theory, the nontrivial tuple (λ, p, dη 1 , . . . , dη j ) of Theorem 3.7, which is a Lagrange multiplier, is defined up to a positive multiplicative scalar. It is said to be normal whenever λ > 0, and abnormal whenever λ = 0. In the normal case λ > 0, it is usual to normalize the Lagrange multiplier so that λ = 1. Remark 3.9. As is well known in optimal sampled-data control theory on time scales (see, e.g., [20][21][22]), the classical Hamiltonian maximization condition does not hold true in general at right-scattered points of T 1 , in which it is replaced by a nonpositive averaged Hamiltonian gradient condition. Note that, in the context of Theorem 3.7 and under some additional appropriate convexity assumptions such as the one introduced by Holtzman and Halkin in [48], it should be possible to obtain the averaged Hamiltonian maximization condition given by Taking T = T 1 = N for example, one would recover the work [48] in which the authors obtain the Hamiltonian maximization condition for discrete-time optimal permanent control problems.  [20], Sect. 3.1 and [23] for detailed discussions on that point). It requires the closedness of U in order to define the corresponding penalized functional on a complete metric set (see details in Sect. 6.1.1). The closure of U is thus a crucial assumption in our approach. However, note that it is possible to slightly extend Theorem 3.7 to the case where U is not convex, by using the concept of stable U-dense directions. For a discussion on that technical point we refer the reader to [20,21].
Remark 3.11. As mentioned in Remark 3.4, the general terminal constraint ψ(x(a), x(b)) ∈ S in Problem (P) allows to recover various situations of terminal constraints. We refer to Remark 10 of [21] for more details, and also for the description of the corresponding transversality conditions of Theorem 3.7.
Remark 3.12. Observe that, if the map ψ is not submersive at (x * (a), x * (b)), then one might be looking for replacing the transversality condition in Theorem 3.7 by ) stands for the limiting normal cone of the closed set ψ −1 (S) at the point (x * (a), x * (b)) ∈ ψ −1 (S). We refer to Theorem 22.2 of [28] or Theorem 9.3.1 of [71] for similar statements in that direction. Remark 3.13. As in continuous-time optimal permanent control problems (see, e.g., [47,71]), the vector p (resp. the vector q) provided in Thm. 3.7 is called AC-costate vector (resp. BV-costate vector ). Note that the terminology adjoint vector is also frequently used in the literature instead of costate vector. Up to the presence of the shift σ, the AC-costate vector p corresponds to an absolutely continuous part of the BV-costate vector q, and the difference between them can be expressed in terms of From the complementary slackness condition, we deduce that this difference (containing possibly discontinuity jumps and singular parts) intervenes when the inequality state constraints h i (x * (t), t) ≤ 0 are active, that is, when h i (x * (t), t) = 0 for some i = 1, . . . , j. This behavior is well illustrated in Section 4, where Theorem 3.7 is applied to solve numerically a continuous-time min-max optimal sampled-data control problem. Remark 3.14. As in continuous-time optimal permanent control problems, the present extension of the Pontryagin maximum principle for optimal sampled-data control problems on time scales (that can be found in [21], Thm. 2.6) to the state constrained case is not trivial. The authors of [21] involve the sensitivity analysis of the state equation under (explicit) needle-like variations of the control. This method is not applicable to handle inequality state constraints such as the ones considered in the present work. Indeed, taking a needle-like variation of the control at a given time s ∈ [a, b) T1 ∩ RD 1 leads to the differentiability (with respect to the perturbation parameter) of the state x (only) over the interval [s + δ, b] T for some δ > 0 small (see [21], Prop. 4). However the analysis of the inequality state constraints h i (x(t), t) ≤ 0 requires differentiability over the whole interval [a, b] T . In order to overcome this technical difficulty, our idea in this paper is to involve the sensitivity analysis of the state equation under implicit spike variations of the control. This concept was used in [13,16,54] for continuoustime optimal permanent control problems and is based in particular on the fact that the Lebesgue measure µ L is nonatomic. As a consequence, the adaptation of this concept to the time scale framework is not trivial because the Lebesgue ∆ 1 -measure µ ∆1 is atomic (since the ∆ 1 -measure of any right-scattered point of T 1 is positive). In the proof of Theorem 3.7 (precisely in Sect. 5.1.2), we introduce a suitable time scale version of the concept of implicit spike variations, by distinguishing the perturbation at right-dense points and at right-scattered points of T 1 . In particular, at right-scattered points of T 1 , we involve a convex L ∞ -variation of the control as in [21].
Remark 3.15. Consider the framework of Theorem 3.7 in the state constraint-free case (in particular q = p σ in that context). In continuous-time optimal permanent control problems, that is when T = T 1 = R + for example, it is well known that under suitable assumptions the maximized Hamiltonian function H : [a, b] → R defined by for almost every t ∈ [a, b], can be identified to an absolutely continuous function which satisfieṡ for almost every t ∈ [a, b] (see, e.g., [35], Thm. 2.6.3). This property is not true in general for optimal sampleddata control theory. We refer to [17] for a detailed discussion on that particular point.
Remark 3.16. In this paper our main result (Thm. 3.7) provides a Pontryagin maximum principle for a class of state constrained optimal control problems in which the cost is expressed in the Mayer form. However, employing a well known state augmentation technique (see, e.g., [71], Chap. 6) Theorem 3.7 allows to provide the necessary conditions also for problems in the Bolza form, in which we add an integral term in the cost to minimize: is a given continuous function, which is of class C 1 with respect to its first two variables. Then the conclusions of Theorem 3.7 are still valid, replacing H byH : -continuous-time optimal permanent control problems without inequality state constraint (see, e.g., [61]) and with inequality state constraints (see, e.g., [16,47,54,71]); -discrete-time optimal permanent control problems without inequality state constraint (see, e.g., [12]) and with inequality state constraints (see, e.g., [62]); -general optimal permanent control problems on time scales without inequality state constraint (see, e.g., [11,20]); -continuous-time optimal sampled-data control problems without inequality state constraint (see, e.g., [22]) and with inequality state constraints (see, e.g., [18]); -general optimal sampled-data control problems on time scales without inequality state constraint (see, e.g., [21]).
Furthermore the time scale result provided by Theorem 3.7 allows to extend the Pontryagin maximum principle also to numerous hybrid situations (when T and/or T 1 contain isolated points and disjoint intervals of positive length for example).

Application to continuous-time min-max optimal sampled-data control problems
This section is dedicated to an illustrative application of Theorem 3.7 in the continuous-time case T = R + with sampled-data control where T 1 = 2 N N for some N ≥ 2. Nonetheless, thanks to the general time scale setting, recall that our main result can be applied to several different situations described, for example, in Remark 3.4. Precisely our objective in this section is to solve numerically a maximal velocity minimization problem of the continuous-time harmonic oscillator with sampled-data control. More precisely, we consider the second-order control system described bÿ This section is organized as follows. In Section 4.1 we first establish in Proposition 4.1 a Pontryagin maximum principle for a continuous-time min-max optimal sampled-data control problem. Indeed, following a well known idea (see, e.g., [31], Rem. 6 or [71], Chap. 9), we can reformulate such a problem as a state constrained continuous-time optimal sampled-data control problem which corresponds to a particular case of Problem (P) and then we apply Theorem 3.7. Then, in Section 4.2, we reformulate the above maximal velocity minimization problem as a continuous-time min-max optimal sampled-data control problem and we apply Proposition 4.1. Finally we are able to provide numerical results based on the necessary optimality conditions provided by Proposition 4.1 which are solved using a standard shooting method (see, e.g., [69], p. 170-171 for more details on shooting methods). This example exhibits a bouncing phenomenon for the optimal trajectory when it reaches the maximal velocity to be minimized. This is accompanied by a nontrivial Borel measure and by corresponding jumps of the corresponding costate vector.

Necessary conditions for min-max problems
Let P := {t k } k=0,...,N be a partition of the time interval [a, b], that is, a = t 0 < t 1 < . . . < t N = b with N ∈ N * . In this section we consider the general continuous-time min-max optimal sampled-data control problem L(x(t), t), where x a , x b ∈ R n are fixed and where L : R n × [a, b] → R is a given real function. (iv) Nonpositive averaged Hamiltonian gradient condition: for all v ∈ U and all k = 0, . . . , N − 1; (v) Complementary slackness condition: Proof. Following a well known idea (see, e.g., [31], Rem. 6 or [71], Chap. 9), we reformulate Problem (MMP) as the augmented and state constrained continuous-time optimal sampled-data control problem given by Note that the above problem is a particular case of Problem (P), taking T = [a, +∞) and T 1 = P ∪ [b, +∞) (see Exam. 3.2 for a similar situation). As a consequence one can easily see that Proposition 4.1 directly follows from the application of Theorem 3.7.

Application to a maximal velocity minimization problem
In order to provide a numerical solution to the maximal velocity minimization problem presented at the beginning of Section 4, we take a = 0, b = 2 and the uniform N -partition P = {t k = k 2 N } k=0,...,N of the time interval [0, 2] with N ≥ 2. We reformulate the maximal velocity minimization problem as the continuous-time min-max optimal sampled-data control problem given by minimize max x 2 (t), Assume that Problem (4.1) admits a solution (x * , u * ), and denote by V * := max t∈[0,2] x * 2 (t) the corresponding maximal velocity. One can easily prove by contradiction (using the initial and final conditions) that V * > 0. Let us denote by λ ≥ 0, p = (p 1 , p 2 ) ∈ AC([0, 2], R 2 ), q = (q 1 , q 2 ) ∈ BV([0, 2], R 2 ) and dη the elements provided in Proposition 4.1.

Abnormal situation λ = 0
In the abnormal situation λ = 0, from the transversality condition in Proposition 4.1, we get that dη is the null Borel measure on [0, 2] and thus q = p. From the adjoint equation, it follows that for all t ∈ [0, 2]. From the nontriviality condition in Proposition 4.1, we deduce that (p 1 (0), p 2 (0)) = (0, 0). On the other hand, if p 1 (0) = 0, it holds that for all t ∈ [0, 2]. We conclude that, in both cases p 1 (0) = 0 and p 1 (0) = 0, the function p 1 changes of (strict) variation at most one time over the interval [0, 2]. We deduce that p 1 (t k+1 ) − p 1 (t k ) changes of sign at most one time. From the nonpositive averaged Hamiltonian gradient condition in Proposition 4.1, we deduce that the sequence u * is monotone and takes at most three different values which can be 1, some ρ ∈ (−1, 1) and −1. Furthermore note that the sequence u * can take an interior value ρ ∈ (−1, 1) at only one value k = 0, . . . , N − 1.
In the sequel we denote by 0 =: τ 0 < τ 1 < τ 2 < τ 3 := 2, where τ 1 and τ 2 are the precise sampling times t k at which the optimal control u * possibly changes its value (from 1 to ρ, or from 1 to −1, for example). Note that the times τ i are not all the sampling times t k (but only four of them). Furthermore note that one of the intervals [τ i , τ i+1 ) is of length 2 N . We denote by u * i the value of u * corresponding to each interval [τ i , τ i+1 ). One can easily obtain that We performed numerical simulations in order to solve the above equation in the case N = 10 taking into account all the above conditions, in particular that the triplet (u * 0 , u * 1 , u * 2 ) belongs to one of the sets It appears numerically that Equation (4.2), taking into account of our constraints, has no solution. In the next paragraph we deal with the normal case λ = 1.

Normal situation λ = 1
This paragraph is dedicated to an indirect numerical method used for the normal case λ = 1. It consists in providing a guess of the couple (p 1 (0), p 2 (0)), of the times s k (only for k odd) and of the values η k for all k = 0, . . . , 2N , and computing the corresponding state and costate variables (owing to the explicit expressions provided above). Then we use the Matlab function fsolve to find a solution which satisfies all the necessary conditions deduced from Proposition 4.1 (such as the transversality condition and the nonpositive averaged Hamiltonian gradient condition). The numerical results obtained with this method in the case when N = 10 are displayed in Figure 1. Note that the optimal trajectory drawn in Figure 1 exhibits a bouncing trajectory phenomenon. This particular behavior has recently been highlighted and studied in [18], which deals with a class of state constrained continuous-time optimal sampled-data control problems. In that context, under some hypotheses (see [18], Sect. 4.2 for details), it was established that the optimal trajectories touch the state constraint boundary for at most a finite number of times. Figure 2 provides a zoom on the behavior of x * 2 with respect to the maximal velocity V * 0.6929. One can see that the maximal velocity is attained a finite number of times (exactly four) and they exactly correspond to the discontinuity jumps of the BV-adjoint vector q 2 in Figure 1 (see Rem. 3.13). We refer to Section 4 of [18] for more details on the bouncing trajectory phenomenon in state constrained continuous-time optimal sampled-data control problems. The authors are thankful to Professor Emmanuel Trélat for the useful discussions on the illustrative example provided in this section.

Preliminaries for the proof of Theorem 3.7
In this section we establish preliminary results needed for the proof of our main result (Thm. 3.7). Precisely we first investigate in Section 5.1 the sensitivity analysis of the state equation in Problem (P) with respect to particular control variations (called implicit spike variations). The last part of the section is devoted to recall some regularity properties of the distance function in both finite and infinite dimensional settings (see Sect. 5.2). In what follows, for a metric set (M, d M ), we denote by B M (x, ν) the closed ball of M centered at x ∈ M of radius ν > 0.

Sensitivity analysis of the state equation
We first recall some Cauchy-Lipschitz (or Picard-Lindelöf) results. We refer to [19] for a detailed study of ∆-Cauchy problems with Carathéodory dynamics on time scales. To simplify notation we shall write L ∞ ∆ := L ∞ ∆ ([a, b) T , R m ) and L 1 ∆ := L 1 ∆ ([a, b) T , R m ). Accordingly, we will use the notation L ∞ ∆1 and L 1 ∆1 . All along this section we assume that (H1) is satisfied.
Owing to Theorem 1 of [19], for every control u ∈ L ∞ ∆1 and every initial condition x a ∈ R n , there exists a unique maximal solution x(·, u, x a ) to the following forward nonlinear ∆-Cauchy problem (CP u,xa ): (CP u,xa ) defined on a maximal interval denoted by I T (u, x a ). Moreover, from Lemma 1 of [19], it holds that for all t ∈ I T (u, x a ).
Definition 5.1 (Admissible for globality). A couple (u, x a ) ∈ L ∞ ∆1 × R n is said to be admissible for globality whenever b ∈ I T (u, x a ). In that case I T (u, x a ) = [a, b] T and we say that x(·, u, x a ) ∈ AC([a, b] T , R n ) is a global solution to (CP u,xa ).
We denote by AG the set of all couples (u, x a ) ∈ L ∞ ∆1 × R n admissible for globality. It is endowed with the distance d AG ((u, x a ), (u , x a ) for all (u, x a ), (u , x a ) ∈ AG.

Openness and continuity results
For every (u, x a ) ∈ AG and every R ≥ u L ∞ ∆ 1 , we introduce As a consequence, from (H1), note that f , ∇ x f and ∇ u f are bounded on K R (u, x a ) by some L R (u, x a ) ≥ 0 and it follows that The next propositions are both extracted from Lemmas 4.3 and 4.5 of [21]. The first proposition states that AG is an open set of L 1 ∆1 × R n (up to a L ∞ ∆1 -bound), while the second proposition establishes a continuous dependence result of the trajectory x(·, u, x a ) with respect to the pair (u, x a ).
is c R (u, x a )-Lipschitz continuous for some constant c R (u, x a ) ≥ 0.

Implicit spike variations and a differentiable dependence result
In the previous section we recalled a continuous dependence result. In order to establish differentiable dependence properties, the authors of [21] use the concept of (explicit) needle-like variations of the control. However this technique is not suitable in order to handle inequality state constraints such as the ones considered in the present work. We refer to Remark 3.14 for details. Our idea is thus to employ an implicit spike variation technique (see, e.g., [13,16,54]), but we have to adapt it to the general time scale setting because the ∆ 1measure µ ∆1 on the time scale T 1 is atomic (in contrary to the classical Lebesgue measure µ L ). The first crucial step towards this goal is provided in the following technical lemma (a detailed proof of which is displayed in Appendix A), in which we use the notation RD b Remark 5.5. The above result corresponds to an adaptation of the lemma provided in Paragraph 3.2, p. 143 of [54] to the time scale setting. In Lemma 5.4 note that we consider a general function z ∈ L 1 (RD b 1 , R n ) defined (only) on the right-dense points of T 1 in order to work on a set on which the ∆ 1 -measure µ ∆1 has no atom. Now, given (u, x a ) ∈ AG and (u , x a ) ∈ L ∞ ∆1 × R n , we write π := (u, x a , u , x a ). Noting that for ∆-a.e. τ ∈ [a, b) T . For every ρ ∈ (0, 1), from Lemma 5.4 we can consider a set Q ρ ⊂ RD b 1 associated with the restriction of z π to RD b 1 . For ρ = 0, we set Q ρ := ∅. Finally, for every ρ ∈ [0, 1), we introduce the implicit spike variation u π (·, ρ) of u as follows: for ∆ 1 -a.e. τ ∈ [a, b) T1 , by distinguishing the perturbations at right-dense points and at right-scattered points of T 1 . In particular, at right-scattered points, we involve a convex L ∞ -variation of the control as in [21]. Finally we consider the corresponding variation vector w π ∈ AC([a, b] T , R n ) defined as the unique maximal solution, which is global (see [19], Thm. 3), to the forward linear ∆-Cauchy problem given by (5.1) We prove next the following differentiable dependence result.
this is immediate when ρ = 0. Otherwise, if ρ ∈ (0, 1), we use the fact that µ L (Q ρ ) = ρµ L (RD b 1 ) (see Lem. 5.4). As a consequence, for sufficiently small ρ ≥ 0, we have (u π (·, ρ), x a + ρx a ) ∈ N R (u, x a ) ⊂ AG (see Prop. 5.3) and thus F π (ρ) is well-defined. Moreover, it follows from Proposition 5.3 that x(·, u π (·, ρ), In particular x(·, u π (·, ρ), x a + ρx a ) converges uniformly to x(·, u, x a ) on [a, b] T as ρ → 0. Now let us assume by contradiction that there exist ε > 0 and a sequence of positive real numbers (ρ k ) k converging to zero such that for all k ∈ N. In this proof, for ease of notation, we write w := w π , z := z π , x := x(·, u, x a ), x k := x(·, u π (·, ρ k ), x a + ρ k x a ) and u k := u π (·, ρ k ) for every k ∈ N. Since the sequence (u k ) k converges to u in L 1 ∆1 , from Proposition 3.3 we obtain that the sequence (u k ) k converges to u in L 1 ∆ . We deduce from the (partial) converse of the Lebesgue dominated convergence theorem that there exists a subsequence (that we do not relabel) such that (u k ) k tends to u ∆-a.e. on [a, b) T .
We define φ k (t) : for every t ∈ [a, b] T and every k ∈ N. The Taylor formula with remainder in integral form leads to for every t ∈ [a, b] T and every k ∈ N. Hence we deduce that for every t ∈ [a, b] T and every k ∈ N, where From the time scale version of the Gronwall lemma (see [9], Thm. 6.4), we obtain that for every t ∈ [a, b] T and every k ∈ N. Here e L R (u,xa) (b, a) stands for the time scale version of the exponential function (see [9], Chap. 2). Now our aim is to prove that (β k ) k and (γ k ) k tend to zero as k → +∞, and this leads to a contradiction with the inequality φ k ∞ ≥ ε for all k ∈ N assumed in (5.2).
From the continuity and the boundedness of ∇ x f on K R (u, x a ), since (x k ) k converges uniformly to x on [a, b] T , since (u k ) k tends to u ∆-a.e. on [a, b) T and from the Lebesgue dominated convergence theorem, one can easily prove that (γ k ) k tends to zero as k → +∞.
On the other hand, from the equality [a, b) then, in particular from the Taylor formula with remainder in integral form, we obtain that and then, From the continuity and the boundedness of ∇ u f on K R (u, x a ), one can easily prove that the second term tends to zero, as k → +∞, and from Lemma 5.4 one can conclude that (β k ) k tends to zero, as well. The proof is complete.

Preliminaries on the distance function
Assume (H5). We denote by d S : R → R + the standard distance function to S defined by d S (x) := inf x ∈S x − x R for every x ∈ R . Since S is a nonempty closed convex subset of R , recall that, for every x ∈ R , there exists a unique element P S (x) ∈ S (projection of x onto S) such that d S (x) = x − P S (x) R . It is characterized by the property x − P S (x), x − P S (x) R ≤ 0 for every x ∈ S. In particular x − P S (x) ∈ N S (P S (x)) for all x ∈ R (the notion of normal cone is recalled in Sect. 3.3). The map P S : R → S ⊂ R is 1-Lipschitz continuous. In the two following lemmas we summarize well known properties (see, e.g., [28,71] for proofs) which will be used in our analysis.
Lemma 5.7. Assume (H5). Let (x k ) k be a sequence of points of R and (κ k ) k be a sequence of nonnegative real numbers such that x k → x ∈ S and κ k (x k − P S (x k )) → y ∈ R . Then y ∈ N S (x). Proposition 6.1. Assume (H1)-(H6). Then, there exists a nontrivial triplet (λ, ξ, ζ) ∈ R + × N S (ψ(x * a , x * b )) × N S (h(x * , ·)) such that the inequality where w π is the variation vector associated with π = (u * , x * a , u , x a ) defined in Section 5.1.2.

A penalized functional
and let ν R (u * , x * a ) > 0 given in Proposition 5.2. Take a sequence (ε k ) k of positive real numbers tending to zero, as k → +∞, such that 0 < √ ε k < ν R (u * , x * a ) for all k ∈ N. We consider the penalized functional J R k : Observe that, for every k ∈ N, J R k is well-defined because N U R (u * , x * a ) ⊂ AG (see Prop. 5.2) and that J R k is a strictly positive functional from the optimality of (u * , x * a ). Since U is a nonempty closed subset of R m , from the (partial) converse of the Lebesgue dominated convergence theorem it follows that (N U R (u * , x * a ), d AG ) is a nonempty closed subset of L 1 ∆1 × R n and thus (N U R (u * , x * a ), d AG ) is a complete metric set. Moreover, from the continuity of F R (u * , x * a ) (see Prop. 5.3), g, ψ, h, d 2 S and d 2 S , we deduce that J R k is continuous on (N U R (u * , x * a ), d AG ) for all k ∈ N. Clearly we have J R k (u * , x * a ) = ε k for all k ∈ N. As a consequence, from the Ekeland variational principle ( [34], Thm. 1.1), there exists (u R k , x R a,k ) ∈ N U R (u * , x * a ) such that and for all (u, x a ) ∈ N U R (u * , x * a ) and all k ∈ N. For every k ∈ N, we introduce the notation x R k := x(·, u R k , x R a,k ). From Proposition 5.3 and Inequality (6.2), (x R k ) k converges uniformly to x * on [a, b] T as k → +∞. We now define the elements Remark 6.2. In this remark (and in Rems 6.4 and 6.5), our aim is to provide two crucial inequalities satisfied by the elements ζ R k . Recall that, whenever h(x R k ) / ∈ S, Dd S (h(x R k )) belongs to the subdifferential of d S at the point h(x R k ) (see [59], Thm. 3.54). As a consequence, in both cases h(x R k ) / ∈ S and h(x R k ) ∈ S, it holds that for every ϕ ∈ S and all k ∈ N. Since S has a nonempty interior, there exist φ ∈ S and δ > 0 such that φ + δϕ ∈ S for every ϕ ∈ B (C j , 1) and all k ∈ N. We deduce that for all k ∈ N.

A crucial inequality depending on
as in the previous subsection. Using the (partial) converse of the Lebesgue dominated convergence theorem and compactness arguments, we infer the existence of subsequences (that we do not relabel) such that (u R k ) k converges to u * ∆ 1 -a.e. on [a, b) T1 , (λ R k ) k converges to some λ R ≥ 0, (ξ R k ) k converges to some ξ R ∈ R and (ζ R k ) k weakly* converges to some ζ R ∈ (C j T ) * as k → +∞. In particular we obtain that |λ . For a given k ∈ N let us write π k := (u R k , x R a,k , u , x a ). For every ρ ∈ [0, 1), let u π k (·, ρ) be the implicit spike variation of u R k associated with π k (see Sect. 5.1.2). First of all, note that u π k (·, ρ) − u * for ρ ≥ 0 small enough. Then, from convexity of U, it follows that (u π k (·, ρ), x R a,k + ρx a ) ∈ N U R (u * , x * a ) for sufficiently small ρ ≥ 0. Thus we apply Inequality (6.3) and we obtain that for ρ > 0 small enough. From the continuity of J R k , we get lim ρ→0 J R k (u π k (·, ρ), x R a,k + ρx a ) + J R k (u R k , x R a,k ) = 2J R k (u R k , x R a,k ). From the differentiability of F π k (see Prop. 5.6), g, ψ, h, d 2 S and d 2 S , we deduce that with the convention that the last term is zero if h(x R k ) ∈ S. Finally we obtain the inequality To conclude this section, we need the following result.
Proof. In this proof, to ease the notation, we set w := w π , z := z π and w k := w π k , z k := z π k for all k ∈ N. We have ∈ [a, b) T and that f and ∇ x f are bounded on K R (u * , x * a ) by L R (u * , x * a ) ≥ 0. Recall also that (x R k ) k converges uniformly to x * on [a, b] T and (u R k ) k tends to u * ∆-a.e. on [a, b) T . One can conclude the proof using an argument similar to that one employed in the proof of Proposition 5.6.
Invoking the lemma above, the C 1 -regularity of g, ψ and h, and letting k → +∞ in Inequality (6.4), we get (6.5) Remark 6.4. Letting k → +∞ in the estimates obtained in Remark 6.2, we deduce the following two crucial inequalities: for every ϕ ∈ S, and 6.1.3. End of the proof of Proposition 6.1 In the previous step, we obtained Inequality (6.5) that is valid for a fixed R ∈ N such that R ≥ u * In order to conclude the proof of Proposition 6.1, it remains to remove the dependence in R. Since , from standard compactness arguments, we infer the existence of subsequences (that we do not relabel) such that (λ R ) R converges to some λ ≥ 0, (ξ R ) R converges to some ξ ∈ R and (ζ R ) R weakly* converges to some ζ ∈ (C j T ) * as R → +∞. In particular we have |λ| 2 + ξ 2 R + ζ 2 (C j T ) * ≤ 1 and, from the closure of the normal cone, ξ ∈ N S (ψ(x * a , x * b )). Notice that, at this stage, it is not guaranteed that the triplet (λ, ξ, ζ) is not trivial. This is established in the next remark.

Complementary slackness condition
Since the third element ζ of our reference triplet belongs to N S (h(x * , ·)) (see Prop. 6.1), one can easily derive that for all i = 1, . . . , j.

Adjoint equation
Let p ∈ AC([a, b] T , R n ) be the unique maximal solution, which is moreover global (see [19], Thm. 6), to the backward shifted ∆-Cauchy problem given by Notice that p is well-defined since the map t → ∇ x f (x * (t), u * (t), t) is clearly bounded (and thus the dynamics of the above ∆-Cauchy problem satisfies the global Lipschitz condition of Theorem 6 in [19]) and since the map We also introduce the function q ∈ BV([a, b] T , R n ) defined by for all t ∈ [a, b] T . In particular the adjoint equation holds true for ∆-a.e. t ∈ [a, b) T .

Dualization
Take (u , x a ) ∈ L ∞ ∆1 × R n such that u (τ ) ∈ U for ∆ 1 -a.e. τ ∈ [a, b) T1 , and π = (u * , x * (a), u , x a ). From Proposition 6.1, we know that Inequality (6.1) is satisfied. Using the definition of q, we have Applying the integration by parts formula (and the Leibniz formulas recalled in Sect. 2) on the first term on the right, and the Fubini-Tonelli theorem (which is true on any product of σ-finite measure spaces, see [64], Thm. 8.8) on the second term on the right, lead to On the other hand, using the definition of w ∆ π (see Sect. 5.1.2), we get From these two equalities we deduce that Using this last relation in Inequality (6.1), from the definitions of w π (a) and q(b), we obtain 0 ≤ λ∇ 1 g(x * (a), x * (b)) + ∇ 1 ψ(x * (a), x * (b)) × ξ − p(a), x a R n − [a,b) T q(τ ), z π (τ ) R n ∆τ. (6.6)

Hamiltonian maximization condition at right-dense points
If µ ∆1 (RD b 1 ) = µ L (RD b 1 ) = 0, there is nothing to prove. Then we assume that µ ∆1 (RD b 1 ) = µ L (RD b 1 ) > 0. Once again, from Inequality (6.6) and the transversality condition, we have [a,b) T q(τ ), z π (τ ) R n ∆τ ≤ 0, (6.7) for all u ∈ L ∞ ∆1 such that u (τ ) ∈ U for ∆ 1 -a.e. τ ∈ [a, b) T1 . Now we consider control functions of the following type: for ∆ 1 -a.e. τ ∈ [a, b) T1 , where u ∈ L ∞ (RD b 1 , R m ) is such that u (τ ) ∈ U for µ L -a.e. τ ∈ RD b 1 . For this class of control functions, when we consider the associated term z π , bearing in mind that RD b 1 ⊂ RD and µ ∆1 = µ ∆ = µ L on RD b 1 , from (6.7) it follows that Fix any v ∈ U and let s ∈ RD b 1 be a µ L -density point of RD b 1 with s > a. We can also assume that s is simultaneously a continuous point of q (restricted to RD b 1 ) and a µ L -Lebesgue point of the map which belongs to L ∞ ([a, b), R). Taking the particular choice for µ L -a.e. τ ∈ RD b 1 and, for ε > 0 small enough, we get Letting ε → 0 + , from the assumptions on s, we obtain q(s), f (x * (s), v, s) − f (x * (s), u * (s), s) R n ≤ 0.
Lemma A.2. There exists a µ L -measurable set Q ρ ⊂ RD b 1 such that µ L (Q ρ ) = ρµ L (RD b 1 ) and Proof. Since Φ ∈ L 1 (RD b 1 , (R n ) N ), there exists a simple function J : RD b 1 → (R n ) N such that RD b 1 Φ(τ ) − J(τ ) (R n ) N dτ ≤ ρ 2 2(ρ+1) . Set J := K k=1 c k 1 R k , where c k ∈ (R n ) N and R k ⊂ RD b 1 are µ L -measurable sets such that K k=1 R k = RD b 1 . Since µ L is nonatomic, there exist µ L -measurable sets R k ρ ⊂ R k such that µ L (R k ρ ) = ρL(R k ) for all k = 1, . . . , K. Let us define Q ρ := K k=1 R k ρ ⊂ RD b 1 . Note that µ L (Q ρ ) = ρµ L (RD b 1 ). Moreover we obtain It is easy to see that the second integral on the right is bounded by ρ 2 , and the first one is equal to The proof is complete.
We are now ready to conclude the proof of Lemma 5.4. Let t ∈ [a, b]. There exists k ∈ {0, . . . , N − 1} such that t ∈ [t k , t k+1 ]. In particular we have |t − t k | < δ and thus φ(t, ·) − φ(t k , ·) L 1 (RD b 1 ,R n ) ≤ ρ 2 2(ρ+1) (see remark after the proof of Lem. A.1). It follows that [a,t)∩RD b The second term on the right of the last expression can be bounded by (see Lem. A.2), and clearly the first term is bounded by ρ 2 . The proof of Lemma 5.4 is complete.