A Pontryagin Maximum Principle in Wasserstein Spaces for Constrained Optimal Control Problems

In this paper, we prove a Pontryagin Maximum Principle for constrained optimal control problems in the Wasserstein space of probability measures. The dynamics, is described by a transport equation with non-local velocities and is subject to end-point and running state constraints. Building on our previous work, we combine the classical method of needle-variations from geometric control theory and the metric differential structure of the Wasserstein spaces to obtain a maximum principle stated in the so-called Gamkrelidze form.


Introduction
Transport equations with non-local velocities have drawn a great amount of attention from several scientific communities for almost a century. They were first introduced in statistical physics to describe averaged Coulomb interactions within large assemblies of particles (see e.g. [55]), and are still to this day a widely studied topic in mathematical physics. More recently, a growing interest in the mathematical modelling of multi-agent systems has brought to light a whole new panel of problems in which these equations play a central role. Starting from the seminal paper of Cucker and Smale [26] dealing with emergent behaviour in animal flocks, a large literature has been devoted to the fine mathematical analysis of kinetic cooperative systems, i.e. systems described by non-local dynamics with attractive velocities, see e.g. [39,32,12,4]. Besides, several prominent papers aimed at describing the emergence of patterns which were initially discovered for systems of ODEs in the context of kinetic models described by continuity equations [22,38]. Simultaneously, Lasry and Lions laid in [43] the foundations of the theory of mean-field games, which is today one of the most active communities working variational problems involving continuity equations, see e.g. [19,18] and references therein.
Later on, the focus shifted partly to include control-theoretic problems such as reachability analysis, optimal control, or explicit design of sparse control strategies. For these purposes, the vast majority of the existing contributions have taken advantage of the recent developments in the theory of optimal transport. We refer the reader to [53,50] for a comprehensive introduction to this ever-expanding topic. In particular, the emergence of powerful tools of analysis in the so-called Wasserstein spaces has allowed for the establishment of a general existence theory for non-local transport equations (see e.g. [9,8]), which incorporates natural Lipschitz and metric estimates in the smooth cases (see [45]).
Apart from a few controllability results as in [30], most of the attention of the community has been devoted to optimal control problems in Wasserstein spaces. The existence of optimal solutions has been investigated with various degrees of generality in [2,1,34,35,33,48], mostly by means of Γ-convergence arguments. Besides, a few papers have been dealing with numerical methods either in the presence of diffusion terms, which considerably simplify the convergence analysis of the corresponding schemes (see e.g. [31]), or in the purely metric setting [5,17,47].
The derivation of Hamilton-Jacobi and Pontryagin optimality conditions has been an active topic in the community of Wasserstein optimal control in the recent years. Starting from the seminal paper [36] on Hamilton-Jacobi equations in the Wasserstein space, several contributions such as [24,23] have been aiming at refining a dynamic-programming principle for mean-field optimal control problems. Pontryagin-type optimality conditions, on the other hand, have received less interest. The first result derived in [13] focuses on a multi-scale ODE-PDE system in which the control only acts on the ODE part. In this setting, the Pontryagin Maximum Principle ("PMP" for short) is derived by combining Γ-convergence and mean-field limit arguments. Another approach, introduced in our previous work [14], studies the infinite-dimensional problem by means of the classical technique of needle-variations (see e.g. [3,15]) and makes an extensive use of the theory of Wasserstein subdifferential calculus formalized in [9]. The corresponding maximum principle is formulated as a Hamiltonian flow in the space of measures in the spirit of [8], and is in a sense the most natural generalization to be expected of the usual finite-dimensional Pontryagin-type optimality conditions. We would also like to mention that in [17,48], the authors derived first-order necessary optimality conditions for special classes of optimal control problems on continuity equations, via methods which are quite distinct from that which we already sketched.
It is worth noticing that optimal control problems in Wasserstein spaces bear a lot of similarities with mean-field games. It was highlighted as early as [43] and further detailed e.g. in [21] that in the class of so-called potential mean-field games, the self-organization of an ensemble of agents could be equivalently reformulated as an optimal control problem in Wasserstein spaces involving adequately modified functionals. In particular in [20], a PMP was derived for controlled McKean-Vlasov dynamics describing such an optimal control problem from a probabilistic point of view. The analysis therein is carried out by leveraging the formalism of Lions derivatives in Wasserstein spaces (see e.g. [19]), which are one of the possible equivalent ways of looking at derivatives in the metric space of probability measures.
and Ψ I (µ(T )) ≤ 0, Ψ E (µ(T )) = 0, Here, the functions (t, µ, ω) → L(t, µ, ω) and µ → ϕ(µ) describe running and final costs, while the maps (t, µ) → Λ(t, µ) and µ → Ψ I (µ), Ψ E (µ) are running and end-point constraints respectively. The velocity field (t, x, µ) → v[µ](t, x) is a general non-local drift, which can be given e.g. in the form of a convolution (see [26,46,34]). The control (t, x) → u(t, x) is a vector-field which depends on both time and space, as customary in distributed control of partial differential equations (see e.g. [52]). The methodology that we follow relies on the technique of packages of needle-variations, combined with a Lagrange multiplier rule. In essence, this method allows to recover the maximum principle from a family of finite-dimensional first-order optimality conditions by means of the introduction of a suitable costate. Even though classical in the unconstrained case, this direct approach does require some care to be translated to constrained problem. Indeed, the presence of constraints induces an unwanted dependency between the Lagrange multipliers and the needle-parameters. This extra difficulty can be circumvented by considering N -dimensional perturbations of the optimal trajectory instead of a single one, and by performing a limiting procedure as N goes to infinity. Originally introduced in [11] for smooth optimal control problems with end-point constraints, this approach was extended in [51] to the case of non-smooth and state-constrained problems. When trying to further adapt this method to the setting of Wasserstein spaces, one is faced with an extra structural difficulty. In the classical statement of the maximum principle, the presence of state constraints implies a mere BV regularity in time for the covectors. However, a deep result of optimal transport theory states that solutions of continuity equations in Wasserstein spaces coincide exactly with absolutely continuous curves (see e.g. [9,Theorem 8.3.1]). Whence, in order to write a well-defined Wasserstein Hamiltonian flow in the spirit of [8,14], we choose to formulate a maximum principle in the so-called Gamkrelidze form (see e.g. [10]), which allows to recover a stronger absolute continuity in time of the costates at the price of an extra regularity assumption on the state constraints.
This article is structured as follows. In Section 2, we recall general results of analysis in measure spaces along with elements of subdifferential calculus in Wasserstein spaces and existence results for continuity equations. We also introduce several notions of non-smooth analysis, including a general Lagrange multiplier rule formulated in terms of Michel-Penot subdifferentials. In Section 3, we state and prove our main result, that is Theorem 3.1. The argument is split into four steps which loosely follow the methodology already introduced in [14]. We exhibit in Appendix A a series of examples of functionals satisfying the structural assumptions (H) of Theorem 3.1, and we provide in Appendix B the analytical expression of the Wasserstein gradient of a functional involved in the statement of Theorem 3.1.

Preliminary results
In this section, we recall several notions about analysis in the space of measures, optimal transport theory, Wasserstein spaces, subdifferential calculus in the space (P 2 (R d ), W 2 ), and continuity equations with non-local velocities. We also introduce some elementary notions of non-smooth calculus in Banach spaces. For a complete introduction to these topics, see [9,53,50] and [44,25] respectively.

Analysis in measure spaces and the optimal transport problem
In this section, we introduce some classical notations and results of measure theory, optimal transport and analysis in Wasserstein spaces. We denote by (M + (R d ), · T V ) the set of real-valued non-negative Borel measures defined over R d endowed with the total variation norm, and by L d the standard Lebesgue measure on R d . It is known by Riesz Theorem (see e.g. [7,Theorem 1.54]) that this space can be identified with the topological dual of the Banach space (C 0 0 (R d ), · C 0 ), which is the completion of the space of continuous and compactly supported functions C 0 c (R d ) endowed with the C 0 -norm. We denote by P(R d ) ⊂ M + (R d ) the set of Borel probability measures, and for p ≥ 1, we denote by P p (R d ) ⊂ P(R d ) the set of measures with finite p-th moment, i.e.
The support of a Borel measure µ ∈ M + (R d ) is defined as the closed set supp(µ) = {x ∈ R d s.t. µ(N ) > 0 for any neighbourhood N of x}. We denote by P c (R d ) ⊂ P(R d ) the set of probability measures with compact support.
We say that a sequence (µ n ) ⊂ P(R d ) of Borel probability measures converges narrowly towards µ ∈ P(R d ) -denoted by µ n ⇀ * n→+∞ µ -provided that We recall in the following definitions the notion of pushforward of a Borel probability measure through a Borel map, along with that of transport plan. Definition 2.1 (Pushforward of a measure through a Borel map). Given µ ∈ P(R d ) and a Borel map Definition 2.2 (Transport plan). Let µ, ν ∈ P(R d ). We say that γ ∈ P(R 2d ) is a transport plan between µ and ν -denoted by γ ∈ Γ(µ, ν) -provided that for any pair of Borel sets A, B ⊂ R d . This property can be equivalently formulated in terms of pushforwards by π 1 # γ = µ and π 2 # γ = ν, where the maps π 1 , π 2 : R 2d → R d denote the projection operators on the first and second factor respectively.
In 1942, the Russian mathematician Leonid Kantorovich introduced in [42] the optimal mass transportation problem in its modern mathematical formulation. Given two probability measures µ, ν ∈ P(R d ) and a cost function c : R 2d → (−∞, +∞], one aims at finding a transport plan γ ∈ Γ(µ, ν) such that This problem has been extensively studied in very broad contexts (see e.g. [9,53]) with high levels of generality on the underlying spaces and cost functions. In the particular case where c(x, y) = |x − y| p for some real number p ≥ 1, the optimal transport problem can be used to define a distance over the subset P p (R d ) of P(R d ).

Definition 2.3 (Wasserstein distance and Wasserstein spaces).
Given two probability measures µ, ν ∈ P p (R d ), the p-Wasserstein distance between µ and ν is defined by The set of plans γ ∈ Γ(µ, ν) achieving this optimal value is denoted by Γ o (µ, ν) and is referred to as the set of optimal transport plans between µ and ν. The space (P p (R d ), W p ) of probability measures with finite p-th moment endowed with the p-Wasserstein metric is called the Wasserstein space of order p.
We recall some of the interesting properties of these spaces in the following proposition (see e.g. [9,Chapter 7] or [53,Chapter 6] Given two measures µ, ν ∈ P(R d ), the Wasserstein distances are ordered, i.e. W p 1 (µ, ν) ≤ W p 2 (µ, ν) whenever p 1 ≤ p 2 . Moreover, when p = 1, the following Kantorovich-Rubinstein duality formula holds In what follows, we shall mainly restrict our considerations to the Wasserstein spaces of order 1 and 2 built over P c (R d ). We end these introductory paragraphs by recalling the concepts of disintegration and barycenter in the context of optimal transport. Definition 2.4 (Disintegration and barycenter). Let µ, ν ∈ P p (R d ) and γ ∈ Γ(µ, ν) be a transport plan between µ and ν. We define the disintegration {γ x } x∈R d ⊂ P p (R d ) of γ on its first marginal µ, usually denoted by γ = γ x dµ(x), as the µ-almost uniquely determined Borel family of probability measures such that Then, it holds that by Kantorovich duality (2.2) since the maps r → ξ(x, r) are 1-Lipschitz for all x ∈ R d . Taking now the supremum over ξ ∈ Lip(R 2d ) with Lip(ξ; R 2d ) ≤ 1 yields the desired estimate, again as a consequence of (2.2).

Subdifferential calculus in
In this section, we recall some key notions of subdifferential calculus in Wasserstein spaces. We also prove in Proposition 2.4 a general chainrule formula along multi-dimensional families of perturbations for sufficiently regular functional defined over P c (R d ). We refer the reader to [9, Chapters 9-11] for a thorough introduction to the theory of subdifferential calculus in Wasserstein spaces, as well as to [37] and [53,Chapter 15] for complementary material. Let φ : P 2 (R d ) → (−∞, +∞] be a lower semicontinuous and proper functional with effective domain D(φ) = {µ ∈ P 2 (R d ) s.t. φ(µ) < +∞}. We introduce in the following definition the concepts of extended Fréchet subdifferential in (P 2 (R d ), W 2 ), following the terminology of [9, Chapter 10]. Definition 2.5 (Extended Wasserstein subdifferential). Let µ ∈ D(φ). We say that a transport plan γ ∈ P 2 (R 2d ) belongs to the extended subdifferential ∂φ(µ) of φ(·) at µ provided that (ii) For any ν ∈ P 2 (R d ), it holds that We furthermore say that a transport plan γ ∈ P 2 (R 2d ) belongs to the strong extended subdifferential

4)
holds for any ν ∈ P 2 (R d ) and l µ We proceed by recalling the notions of regularity and metric slope that are instrumental in deriving a sufficient condition for the extended subdifferential of a functional to be non-empty. This result is stated in Theorem 2.1 and its proof can be found in [9, Theorem 10.3.10]. Definition 2.6 (Regular functionals over (P 2 (R d ), W 2 ) and metric slope). A proper and lower semicontinuous functional φ(·) is said to be regular provided that whenever (µ n ) ⊂ P 2 (R d ) and (γ n ) ⊂ P 2 (R 2d ) are taken such that it implies that γ ∈ ∂φ(µ) andφ = φ(µ). Furthermore, we define the metric slope |∂φ|(µ) of the functional φ(·) at µ ∈ D(φ) as where (•) + denotes the positive part.
The main reason for resorting to the abstract notion of measure subdifferentials in the context of our argument is that the approximation property of the minimal selection by a sequence of strong subdifferentials plays a key role in the proof of the general Wasserstein chainrule of Proposition 2.4 below. In the sequel however, we will mainly use the simpler notion of classical Wasserstein differentials which we introduce in the following definition. Definition 2.7 (Classical Wasserstein subdifferentials and superdifferentials). Let µ ∈ D(φ). We say that a map ξ ∈ L 2 (R d , It has been proven recently in [37] that the definition of classical Wasserstein subdifferential involving a supremum taken over the set of optimal transport plans is equivalent to the usual one introduced in [9] which involves an infimum. This allows for the elaboration of a convenient notion of differentiability in Wasserstein spaces as detailed below.
We conclude these recalls by stating in Proposition 2.4 below a new chainrule formula for Wassersteindifferentiable functionals along suitable multi-dimensional perturbations of a measure. Proposition 2.4 (Chainrule along multidimensional perturbations by smooth vector fields). Let K ⊂ R d be a compact set and µ ∈ P(K). Suppose that φ : P(K) → R is Lipschitz in the W 2 -metric, regular in the sense of Definition 2.6 and Wasserstein-differentiable over P(K). Given N ≥ 1 and a small parameter ǫ > 0, suppose that G ∈ C 0 ([−ǫ, ǫ] N × K, R d ) is a map satisfying the following assumptions.
We list in Appendix A a series of commonly encountered functionals which are both regular in the sense of Definition 2.6 differentiable in the sense of Definition 2.8, and we provide their Wasserstein gradients. This list was already presented along the same lines in our previous work [14], and we add it here for self-containedness.

The continuity equation with non-local velocities in R d
In this section, we introduce the continuity equation with non-local velocities in (P c (R d ), W 1 ). This equation is commonly written as where t → µ(t) is a narrowly continuous family of probability measures on R d and (t, is a Borel family of vector fields satisfying the condition Equation (2.9) has to be understood in duality with smooth and compactly supported functions, i.e.
. This definition can be alternatively written as We recall in Theorem 2.2 the classical existence, uniqueness and representation formula for solutions of non-local PDEs. Although these results were first derived in [8], we state here a version explored in [45,46] which is better suited to our smoother control-theoretic framework. (2.13) and satisfying the following assumptions

(H')
(i) There exist positive constants L 1 and M such that for every µ, ν ∈ P c (R d ) and t ∈ R.
Then, for every initial datum µ 0 ∈ P c (R d ), the Cauchy problem admits a unique solution µ(·) ∈ Lip loc (R + , P c (R d )). If µ 0 is absolutely continuous with respect to L d , then µ(t) is absolutely continuous with respect to L d as well for all times t ∈ R + . Furthermore for every T > 0 and every µ 0 , ν 0 ∈ P c (R d ), there exists positive constants R T , L T > 0 such that for all times t ∈ [0, T ], where µ(·), ν(·) are solutions of (2.14) with initial conditions µ 0 , ν 0 respectively. Let µ(·) be the unique solution of (2.14) and (Φ v (0,t) [µ 0 ](·)) t≥0 be the family of flows of diffeomorphisms generated by the non-autonomous velocity field (t, Then, the curve µ(·) is given explicitly by the pushforward formula In the following proposition, we recall a standard result which links the differential in space of the flow of diffeomorphisms of an ODE to the solution of the corresponding linearized Cauchy problem (see e.g. [15, Theorem 2.3.1]).

the flows of diffeomorphisms generated by a non-local velocity field v[·](·, ·) satisfying hypotheses (H'). Then, the flow map
In our previous work [14], we extended the classical result of Proposition 2.5 to the Wasserstein setting in order to compute derivatives of the flow maps Φ v (0,t) [µ 0 ](·) with respect to their initial measure µ 0 . In the following proposition, we state a further refinement of this result to the case in which the initial measure is perturbed by a multi-dimensional family of maps, in the spirit of the chainrule stated in Proposition 2.4. Proposition 2.6 (Wasserstein differential of a non-local flow of diffeomorphisms). Let K ⊂ R d be a compact set, µ 0 ∈ P(K), v[·](·, ·) be a non-local velocity field satisfying hypotheses (H') of Theorem is Fréchet-differentiable at e = 0 and its differential w σ (·, x) in an arbitrary direction σ ∈ [−ǫ, ǫ] N can be expressed as where for any k ∈ {1, . . . , N }, the map w k (·, x) is the unique solution of the non-local Cauchy problem Proof. By Proposition 2.4 and as a consequence of our hypotheses on v[·](·, ·), we know that the map Therefore, the action of its differential on a given direction σ ∈ [−ǫ, ǫ] N can be expressed in coordinates using partial derivatives, i.e.
Moreover, it has been proven in [14,Proposition 5] that such one-dimensional variations could be characterized as the unique solution of the linearized Cauchy problems (2.18).

Non-smooth multiplier rule and differentiable extension of functions
In this section, we recall some facts of non-smooth analysis as well as a non-smooth Lagrange multiplier rule which is instrumental in the proof of our main result. This multiplier rule is expressed in terms of the so-called Michel-Penot subdifferential, see e.g. [44,40]. In the sequel, we denote by (X, · X ) a separable Banach space and by X * its topological dual associated with the duality bracket ·, · X .
Given a map f : where The MP-subdifferential -smaller than the Clarke subdifferential -bears the nice property of shrinking to a singleton whenever the functional f (·) is merely Fréchet-differentiable. It also enjoys a summation rule and a chained-derivative formula for compositions of locally Lipschitz and Fréchet-differentiable maps. We list these properties in the following proposition. Proposition 2.7 (Properties of the Michel-Penot subdifferentials). Let x ∈ X, f, g : These properties can be proven easily by computing explicitly the Michel-Penot derivatives of the corresponding maps and using the definition of the set ∂ MP (•), see e.g. [51]. Another useful feature of this notion of subdifferential is that it allows to write Lagrange multiplier rules for locally Lipschitz functions. This family of optimality conditions was initially derived in [40] and refined in [51] where the author extended the result to the class of so-called calm functions.

Definition 2.10 (Calm functions)
. A map f : X → R is calm at x ∈ X provided that the following holds.
We end this introductory section by stating a Lusin-type lemma for vector-valued functions and a derivative-preserving continuous extension result that will both prove to be useful in the sequel. We refer the reader e.g. to [28] for notions on Bochner integrals and abstract integration in Banach spaces.
Proof. This result is a consequence of [ Proof. We adapt here a simple proof that can be found e.g. in [51,Lemma 2.11]. Define the map By definition, g(·) is continuous over R N + \{0} and can be extended to R N + by imposing that g(0) = 0 since f (·) is differentiable at e = 0 relatively to R N + . Invoking Dugundji's extension theorem (see [29]), we can define a continuous extensiong(·) of g(·) on the whole of R N .
We now define the auxiliary mapf : e ∈ R N → f (0) + D e f (0)e + |e|g(e). By construction,f (·) is continuous and coincides with f (·) over R N + . Moreover, one has for any e ∈ R N that

Proof of the main result
In this section we prove the main result of this article, that is Theorem 3.1 below. We recall that this result is a first-order necessary optimality condition of Pontryagin-type for the general Wasserstein optimal control problem (P ) defined in the Introduction.
be an optimal pair control-trajectory for (P ) and assume that the set of hypotheses (H) below holds.
(i) The map t → ν * (t) is a solution of the forward-backward Hamiltonian system of continuity equations

3)
and For any l ∈ {1, . . . , r}, the map t ∈ [0, T ] → ζ * l (t) ∈ R + denotes the cumulated state constraints multiplier associated with ̟ l , defined by (iii) The Pontryagin maximization condition Remark 3 (On the regularity hypothesis (H1)). One of the distinctive features of continuity equations in Wasserstein spaces, compared to other families of PDEs is that they require Cauchy-Lipschitz assumptions on the driving velocity fields in order to be classically well-posed for arbitrary initial data. Even though the existence theory has gone far beyond this basic setting, notably through the DiPerna-Lions-Ambrosio theory (see [27,6] or [9,Section 8.2]), such extensions come at the price of losing the strict micro-macro correspondence of the solutions embodied by the underlying flow-structure. Therefore, from a mean-field control-theoretic viewpoint, it seemed more meaningful for the author to work in a setting where classical well-posedness holds for the optimal trajectory. Furthermore, the proof of Theorem 3.1 relies heavily on the geometric flow of diffeomorphism structure of the underlying characteristic system of ODEs, both forward and backward in time. For this reason, the Lipschitz-regularity assumption (H1) is instrumental in our argumentation.

Remark 2 (The Gamkrelidze Maximum Principle
Let it be remarked however that there exists common examples of Wasserstein optimal control problems for which the optimal control is C 1 -smooth in space. Such a situation is given e.g. by controlled vector fields of the form u(t, x) = m k=1 u k (t)X k (·) where X 1 , . . . , X m ∈ C 1 (R d , R d ) and u 1 , . . . , u m ∈ L ∞ ([0, T ], R), or by non-linear controlled vector field (t, x, µ, u) We divide the proof of Theorem 3.1 into four steps. In Step 1, we introduce the concept of packages of needle-like variations of an optimal control and compute the corresponding perturbations induced on the optimal trajectory. In Step 2, we apply the non-smooth Lagrange multiplier rule of Theorem 2.3 to the sequence of finite-dimensional optimization problem formulated on the length of the needle variations, to obtain a family of finite-dimensional optimality conditions at time T . We introduce in Step 3 a suitable notion of costate, allowing to propagate this family of optimality condition backward in time, yielding the PMP with a relaxed maximization condition restricted to a countable subset of needle parameters. The full result is then recovered in Step 4 through a limiting procedure combined with several approximation arguments.
Step 1 : Packages of needle-like variations : We start by considering an optimal pair control-trajectory (u * (·), µ * (·)) ∈ U ×Lip([0, T ], P(B(0, R T ))) where R T > 0 is given by in the sense of Bochner's integral (see e.g. [28,Theorem 9]). This set has full Lebesgue measure in [0, T ], and Lemma 2.1 yields the existence of two subsets A , M ⊂ T , having respectively null and full Lebesgue measure, such that for any τ ∈ M , there exists (τ k ) ⊂ A converging towards τ and such that We further denote by U D a countable and dense subset of the set of admissible control values U which is compact and separable in the C 0 -topology as a consequence of (H1).
This class of variations is known in the literature of control theory to generate admissible perturbations of the optimal control without any assumption on the structure of the control set U , while allowing for an explicit and tractable computation of the relationship between the perturbed and optimal states (see e.g. [15]).
In the following lemma, we make use of the geometric structure of solutions to non-local transport equations presented in Theorem 2.2 together with some notations borrowed from Proposition 2.6 to expressμ e (t) as a function of µ * (t) for all times t ∈ [0, T ]. In the sequel, we denote by Φ v,u (s,t) [µ(s)](·) the flow map generated by the non-local velocity field v[·](·, ·) + u(·, ·) between times s and t, defined as in (2.15).

(iii) There exists a constant
is Fréchet-differentiable at e = 0 with respect to the C 0 (B(0, R T ), R d ) -norm, uniformly with respect to t ∈ [0, T ]. The corresponding Taylor expansion can be written explicitly as is the unique solution of the non-local Cauchy problem (3.10) Proof. The proof of this result is similar to that of [14,Lemma 5], with some extra technicalities arising from the induction argument performed on the non-local terms. By definition of a package of needle-like variations, the perturbed controlsũ e (·, ·) generate well-defined flows of diffeomorphisms Hence, items (i), (ii) and (iii) hold for any e ∈ [0, . We focus our attention on the proof by induction of (iv). Let t ∈ [0, T ] be such that ι(t) = 1. By (3.7), one has that Invoking Lebesgue's Differentiation Theorem (see e.g. [7, Corollary 2.23]) along with the continuity of e → v[μ e (t)](t, ·) in the C 0 -norm topology, it holds that as well as Chaining these two expansions, we obtain that We can now proceed to compute the induced first-order expansion on the non-local flows Φ v,u * (τ 1 ,t) [μ e (τ 1 )](·) as follows where w 1 (·, ·) is defined as in Proposition 2.6, and where we used the fact that the e → D x Φ v,u * (τ 1 ,·) [μ e (·)](·) is continuous as a consequence of hypothesis (H1)-(H2). Introducing for all times t ∈ [τ 1 , T ] the map and invoking the statements of Proposition 2.5 and Proposition 2.6, we have that both (3.9) and (3.10) hold for any e 1 ∈ [0,ǭ N ] and all times t ∈ [0, T ] such that ι(t) = 1.
Let us now assume that (3.9) and (3.10) hold for all times t ∈ [0, T ] such that for e ∈ [0,ǭ N ] N . By definition (3.7) of an N -package of needle-like variations, we have that As in the initialization step, we can write using Lebesgue's Differentiation Theorem that Furthermore, invoking the induction hypothesis (3.11) and the results of Proposition 2.4, we obtain that (3.13) where the maps (w l (·, ·)) 1≤l≤k−1 are defined as in Proposition 2.6 with F l (·) ≡ F ω l ,τ l are solutions of (3.10) on [τ k−1 , τ k ] with initial condition F ω l ,τ l τ k−1 (·) at time τ k−1 for any l ∈ {1, . . . , k −1}. By Cauchy-Lipschitz uniqueness, we can therefore extend the definition of the maps t → F ω l ,τ l t (x) to the whole of [τ l , τ k ] for any l ∈ {1, . . . , k − 1}.
Chaining the expansions (3.12) and (3.13) along with our previous extension argument, we obtain that both (3.9) and (3.10) hold up to time τ k , i.e.
for any e ∈ [0,ǭ N ] N . Performing yet another coupled Taylor expansion of the same form on the expressionμ , and invoking the same extension argument yields the full induction step for all times t ∈ [0, T ] such that ι(t) = k. Hence, we have proven that item (iv) holds for all e ∈ [0,ǭ N ] N . Using Lemma 2.2, we can now extend the map e ∈ [0, in a continuous and bounded way, uniformly with respect to t ∈ [0, T ], while preserving its differential at e = 0.
In the sequel, we drop the explicit dependence of the flow maps on their starting measures and adopt the simplified

t) [µ(s)](x) for clarity and conciseness.
Step 2 : First-order optimality condition In Lemma 3.1, we derived the analytical expression of the first-order perturbation induced by a N -package of needle-like variations on the solution of a controlled non-local continuity equation. By the very definition of an N -package of needle-like variations, we know that the finite-dimensional optimization problem In the following lemma, we check that the functionals involved in (P N ) meet the requirements of the Lagrange multiplier rule stated in Theorem 2.3. We also compute their first-order variation induced by the package of needle-like variations at e = 0.

.1]) as the set of Borel regular measures such that
Invoking again Proposition 2.4 and Lemma 3.1, we can write the differential of e → Λ(t,μ e (t)) at e = 0 evaluated in a direction σ ∈ [0,ǭ N ] N as  Since all the functions involved in the subdifferential inclusion (S) are calm, we can use the summation rule of Proposition 2.7-(b) along with the characterization of MP-subdifferentials for Fréchetdifferentiable functionals stated in Proposition 2.7-(a) to obtain that By combining the expressions of the gradients (3.14), (3.15) and the MP-derivative (3.16) derived in Lemma 3.2, along with the composition rule of Proposition 2.7-(c) for MP-subdifferentials, we obtain is defined as in (3.4). By choosing particular vectors σ ∈ [0,ǭ N ] N which have all their components except one equal to 0, this family of inequalities can be rewritten as Step

: Backward dynamics and partial Pontryagin maximization condition
The next step of our proof is to introduce a suitable notion of state-costate variable transporting the family of inequalities (3.17) derived at time T to the base points (τ 1 , . . . , τ N ) of the needle-like variations while generating a Hamiltonian dynamical structure. To this end, we build for all N ≥ 1 a curve ν * N (·) ∈ Lip([0, T ], P c (R 2d )) solution of the forward-backward system of continuity equations Here, the non-local velocity field V * N [·](·, ·, ·) is given for where we introduced the notation p). (3.19) Notice that the transport equation (3.18) does not satisfy the classical hypotheses of Theorem 2.2. Following a methodology introduced in our previous work [14], it is possible to circumvent this difficulty by building explicitly a solution of (3.18) relying on the cascade structure of the equations. Lemma 3.3 (Definition and well-posedness of solutions of (3.18)). Let (u * (·), µ * (·)) be an optimal pair control-trajectory for (P ). For µ * (T )-almost every x ∈ R d , we consider the family of backward flows (Ψ x,N (T,t) (·)) t≤T solution of the non-local Cauchy problems (3.20) and the associated curves of measures .
Proof. Let us denote by Ω ⊂ R 2d a compact set such that Such a set exists since the maps (∇ µ S N (µ * (T ))(·)) are continuous by (H4), as well as uniformly bounded as a consequence of the non-triviality condition (NT) on the Lagrange multipliers (λ N 0 , . . . , λ N n , η N 1 , . . . , η N m ). The existence and uniqueness of the maps (t, x, r) → w x (t, r) solving the family of non-local Cauchy problems (3.20) can be obtained under hypotheses (H), as a consequence of Banach fixed point Theorem in the spirit of [14,Proposition 5]. In this context, the Banach space under consideration is that of all maps w : [0, T ] × Ω → R d endowed with the norm By an application of Grönwall's Lemma to (3.20), it holds that (t, r) ∈ [0, T ] × π 2 (Ω) → Ψ x,N (T,t) (r) is bounded by a positive constant, uniformly with respect to x ∈ supp(µ * (T )) and N ≥ 1. This follows in particular from the uniform boundedness of the sequences of multipliers (λ 0 N ) and (ζ * N (·)). Therefore, there exists a uniform constant R ′ T > 0 such that for all t ∈ [0, T ]. This in turn implies that the right-hand side of (3.20) is uniformly bounded, so that the maps t ∈ [0, T ] → Ψ x,N (T,t) (r) are Lispchitz, uniformly with respect to (x, r) ∈ Ω and N ≥ 1. By applying again Gröwall's Lemma to the difference |Ψ x,N (T,t) (r 2 ) − Ψ x,N (T,t) (r 1 )| with r 1 , r 2 ∈ π 2 (Ω), we further obtain that (t, r) ∈ [0, T ] ∈ π 2 (Ω) → Ψ x,N (T,t) (r) is also Lipschitz regular, uniformly with respect to x ∈ supp(µ * (T )) and N ≥ 1. It can be checked by leveraging Kantorovich duality in the spirit of [14,Lemma 6] that this in turn yields the Lipschitz regularity of t ∈ [0, T ] → σ * x,N (t) uniformly with respect to x ∈ supp(µ * (T )) and N ≥ 1. An application of Proposition 2.3 combined with the uniform Lipschitz regularity of (t, x) ∈ [0, T ] × π 1 (Ω) → Φ v,u * (T,t) (x) to ν * N (·) provides the existence of a uniform constant L ′ T > 0 such that In order to prove that ν * N (·) is indeed a solution of (3.18), take ξ ∈ C ∞ c (R 2d ) and compute the time derivative where we used the fact that by Fubini's Theorem This can in turn be reformulated into the more concise expression which by (2.12) precisely corresponds to the fact that ν * N (·) is a solution of (3.18).

Remark 4 (Wasserstein and classical costates).
In the finite-dimensional proof of the Gamkrelidze PMP, the optimal costates p * N (·) are defined as the solutions of the backward equationṡ

21)
where H λ N 0 (·, ·, ·, ·, ·) and S N (·) are the counterparts of H λ N 0 (·, ·, ·, ·) and S N (·) associated with the finite-dimensional optimal control problem and defined by In our statement of the PMP, one should think of π 2 # ν * (·) as being concentrated on the characteristic curves of the backward costate dynamics. Indeed in Lemma 3.3, the curves σ * x,N (·) are concentrated on the unique characteristic of the linearized backward non-local dynamics (3.20) can then be seen as a Lagrangian superposition of integral curves of (3.21) depending on the starting point of the curve in supp(µ * (T )). Now that we have built a suitable notion of solution for (3.18), let us prove that ν * N (·) is such that the PMP holds with a relaxed maximization condition formulated over the collection of needle for L 1 -almost every t ∈ [0, T ] and every (x, r) ∈ B 2d (0, R ′ T ). For k ∈ {1, . . . , N }, we introduce the collection of maps K N ω k ,τ k (·) defined for (3.23) k ∈ {1, . . . , N }. By construction, the maps K N ω k ,τ k (·) satisfy since it can be checked that the evaluation of K N ω k ,τ k (·) at T coincides with the left-hand side of (3.17), which has been shown to be non-positive for all k ∈ {1, . . . , N }. Moreover, the evaluation of the maps K N ω k ,τ k (·) at τ k can be written explicitly as We now aim at showing that the maps K N ω k ,τ k (·) are constant over [τ k , T ]. By definition, these functions are in BV ([0, T ], R) and therefore admit a distributional derivative in the form of finite Borel regular measures (see e.g. [7,Chapter 3]). A simple computation of the time derivatives of the last two terms in (3.23) shows that the non-absolutely continuous parts of the derivatives of the maps K N ω k ,τ k (·) cancel each other out, since the weak derivatives of the maps ζ * N,l (·) are such that dζ * N,l = −̟ N l . Hence, the maps K N ω k ,τ k (·) are absolutely continuous and therefore differentiable L 1 -almost everywhere. One can then compute their derivative at L 1 -almost every t ∈ [τ k , T ] as follows.
The time-derivatives of the summands of the last term can be computed as follows using Proposition 2.4 and the geometric structure (2.15) of solutions of (2.14) associated with the non-local velocity field v[µ * (t)](t, ·) + u * (t, ·).
by applying Fubini's Theorem and identifying the analytical expressions of the Wasserstein gradients of the summands C l (t, µ * (t), u * (t)) of C (t, µ * (t), ζ * (t), u * (t)) derived in Proposition B.1. Plugging this expression into (3.26) along with the characterization (3.10) of ∂ t F ω k ,τ k t (·) derived in Lemma 3.1, we obtain that In the first line we used the fact that as a consequence of Fubini's Theorem, where lΓ . v [ν * N (·)](·, ·) is defined as in (3.19). Recalling the definition of the vector field V * N [·](·, ·, ·) given in (3.18), we therefore observe that d dt K N ω k ,τ k (t) = 0 for L 1 -almost every t ∈ [τ k , T ], so that it is constant over this time interval. Merging this fact with (3.24) and (3.25) yields (3.22) and concludes the proof of our claim.
Step 4 : Limiting procedure The PMP for absolutely continuous state constraints multipliers In Step 3, we have built for any N ≥ 1 a suitable state-costate curve ν * N (·) solution of the Hamiltonian system (3.1), and such that the relaxed Pontryagin maximization condition (3.22) holds on an Ndimensional subset of needle parameters. The last step in the proof of Theorem 3.1 is to take the limit as N goes to infinity of the previous optimality conditions in order to recover the PMP formulated on the whole set of needle parameters.
We now prove that there exists an accumulation point ν * (·) of (ν * N (·)) which solves the system of equations (3.1) associated with the limit multipliers (λ 0 , . . . , λ n , η 1 , . . . , η m , ̟ 1 , . . . , ̟ r ). To this end, we start by making an extra simplifying assumption on the state constraints multipliers. We shall see in the sequel how this extra assumption can be lifted at the price of an extra approximation argument by absolutely continuous measures. Let ν * (·) ∈ Lip([0, T ], P(B 2d (0, R ′ T )) be an accumulation point of (ν * N (·)) along a suitable subsequence. As a direct consequence of the convergence of the scalar Lagrange multipliers, one recovers the uniform convergence of the final gradient map This implies by standard convergence results for pushforwards of measures (see e.g. [9, Lemma 5.2.1]) that ν * (·) satisfies the boundary condition Moreover, the weak- * convergence of (̟ N 1 , . . . , ̟ N r ) towards (̟ 1 , . . . , ̟ r ) along with (H7) implies by Proposition 2.1 that for all times t ∈ [0, T ]. By definition (2.11) of distributional solutions to transport equations, the fact that ν * N (·) is a solution of (3.18) can be written as Since all the functionals involved in the definition of the Wasserstein gradient of the augmented infinite-dimensional Hamiltonian are continuous and bounded, we have that uniformly with respect to t ∈ [0, T ], as a by-product of the convergence of the Lagrange multipliers. By using this fact along with the uniform equi-compactness of the supports of (ν * N (·)), we can take the limit as N → +∞ in (3.27) an apply Lebesgue's Dominated Convergence Theorem to recover that Hence, the accumulation point ν * (·) of (ν * N (·)) in the C 0 -topology is a solution of the Hamiltonian flow (3.18) associated with the limit multipliers (λ 0 , . . . , λ n , η 1 , . . . , η m , ̟ 1 , . . . , ̟ r ).

A Examples of functionals satisfying hypotheses (H)
In this Appendix, we show that the rather long list of hypotheses (H) is not too restrictive and that a good score of relevant functionals for applications fit into the framework of Theorem 3.1. This list of examples is partly borrowed from our previous work [14]. with µ ⊗n = µ × · · · × µ satisfies (H4) of Theorem 3.1 and its Wasserstein gradient at some µ ∈ P(K) is given by ∇ µ ϕ(µ)(x 1 , . . . , x n ) = n j=1 ∇ x j W (x 1 , . . . , x n ).

B Wasserstein differential of the running constraint penalization
In this Section, we give the analytical expression of the Wasserstein differential of the running constraint penalization map (t, µ, ζ, ω) → C (t, µ, ζ, ω) defined in (3.3).