Finite state N-agent and mean field control problems

We examine mean field control problems (MFCP) on a finite state space, in continuous time and over a finite time horizon. We characterize the value function of the MFCP as the unique viscosity solution of a HJB equation in the simplex. In absence of any convexity assumption, we exploit this characterization to prove convergence, as $N$ grows, of the value functions of the centralized $N$-agent optimal control problem to the limit MFCP value function, with a convergence rate of order $1/\sqrt{N}$. Then, assuming convexity, we show that the limit HJB admits a smooth solution and establish propagation of chaos, i.e. convergence of the $N$-agent optimal trajectory to the unique limiting optimal trajectory, with an explicit rate.


Introduction
Mean field control problems (MFCP), also called control of McKean-Vlasov equations, can be interpreted as limit of cooperative N -agent games, as the number of players tends to infinity.Such agents have a common cost to minimize and the minimizers are also called Pareto equilibria; alternatively, we could think of a social planner that minimizes an average cost.
We investigate N -agent optimization and mean field control problems in continuous time over a finite time horizon, with dynamics belonging to a finite state space {1, . . ., d}.More precisely, the N agents X = (X 1 , . . ., X N ) follow the dynamics where µ N is the empirical measure and the controls (here in feedback form) β = (β 1 , . . ., β N ) are chosen in order to minimize the common cost Assuming that controls depend only on the empirical measure, the usual propagation of chaos arguments suggest that the limit of this N -agent optimization, as N grows, consists in a single player which evolves according to and aims at minimizing J(α) := E T 0 f (t, X t , α(t, X t ), Law(X t ))dt + g(X T , Law(X T )) .
Our goal is to study in detail the N -agent optimization and the mean field control problem and thus prove convergence of the former to the latter, the main result being to provide an explicit convergence rate.Such problems are studied so far mainly for continuous state space and diffusion-based dynamics.In such situation, the limiting MFCP can be analyzed in two different ways, by considering either open-loop or (Markovian) feedback controls.In the first case, a version of the Pontryagin principle is derived in [11], which leads to study a forward-backward system of (Itô-type) SDEs of McKean-Vlasov type.While the case of feedback controls is analyzed in [34,35], where the MFCP is reformulated in a deterministic way as the optimal control of the Fokker-Planck equation, which permits to derive a dynamic programming principle and then a Hamilton-Jacobi-Bellman (HJB) equation for the value function, written in the Wasserstain space of probability measures.See also [5,31] for previous ideas in this direction and [4,21] for more general versions of the dynamic programming principle.We refer to Chapter 6 of [12] for a comparison of the two approaches.Mean field control problems arise also in the study of potential mean field games, see e.g.[6,10], where it is shown that the mean field game system represents the necessary conditions for optimality of a suitable mean filed control problem.
The question of convergence of the N -agent optimization to the MFCP, still in the diffusion setting, has been analyzed mainly in two ways.The first consists in showing that the set of (relaxed open-loop) optimizers of the N -agent optimization is precompact and then that the limit points are supported on optimizers of the MFCP.We remark that the optimizer is non-unique in general, but is unique under additional convexity assumptions; we return to this point below.This strategy is employed first in [29], then in [25] for deterministic dynamics and, more recently, in [22] for more general dynamics with a common noise and in [20] for problems with interaction also through the law of the control.Clearly, a convergence rate for the convergence of the value functions can not be proved using compactness arguments.The other way is to prove convergence via the system of FBSDEs, in case the limit solution is unique.In [11], the value functions are shown to converge, with a suitable convergence rate, assuming that the cost f is convex in (a, x, m) and g is convex in (x, m); see also Section 6.1.3 of [13].Moreover, a propagation of chaos property is also proved, i.e. the prelimit (unique) optimal trajectories are shown to converge to the (unique) limit optimal trajectory, with a convergence rate (actually, such result is not stated in this way, but can be immediately derived from the proof of Theorem 6.1 therein).More recently, this method has been applied to problems with interaction through the law of the control in [32].In both cases, as a consequence of the convergence of the value functions, an optimal control for the MFCP is shown to be ε N -optimal for the N -agent optimization, with lim N ε N = 0, with the same limitations just explained (no convergence rate in the first case and convexity required in the second).
Here, in the finite state setting, we first analyze the MFCP with feedback controls.We rewrite it as a deterministic control problem for the Fokker-Planck equation and show that its value function V is the unique viscosity solution of the corresponding HJB equation, stated in the d-dimensional simplex.Then we examine the convergence problem: we stress that convergence can be understood both in terms of value functions and of optimal trajectories.Our main result is to show that the value function V N of the N -agent optimization converges to V , with a convergence rate of order 1/ √ N , in absence of any convexity assumption.As explained above, a result of this type is not available for diffusion-based models.As a consequence of this convergence, we also prove that any optimal control for the MFCP is C √ N -optimal for the N -agent optimization.A similar result is proved also in [27] with different methods and without interpreting the prelimit model as a N -agent optimization; see Remark 2.12 for the details.In the discrete time setting, the MFCP and the related convergence of the N agent are investigated in full generality in [33].
Our main novelty consists in the method of the proof of the main result, which is different with respect to the two explained above and we believe can be of interest.This is based on the viscosity solution characterization of V : indeed, we will see that the ODE satisfied by V N can be seen as a finite difference scheme for the HJB equation satisfied by V .Notably, V is not differentiable as we do not assume any convexity of the costs, neither in a nor in m.Finite difference schemes for viscosity solutions have been investigated by many authors in the last decades.Being impossible to give a complete bibliography on the subject, we would like to mention two papers which inspired our proof of convergence and that established, in particular, a rate of convergence.The first is [8], which studied a semidiscrete approximation scheme for the HJB equation of infinite horizon control problem with discount; see also the book ( [1], Sect.VI.1).While the second is [38], which analyzed a finite difference scheme for a general time-dependent Hamilton-Jacobi equation; see also [18].
We also study the propagation of chaos property for the optimal trajectories, in case the limit is unique.If the value function V is sufficiently smooth, i.e. in C 1,1 , then the MFCP is uniquely solvable and we prove that the prelimit (unique) optimal empirical measures converge to the limit deterministic optimal flow, with a suitable convergence rate.We give also sufficient conditions for which V ∈ C 1,1 : these are the standard convexity assumptions.We remark that also these smoothness and propagation of chaos results seem to be new in the study of MFCP.Moreover, it is worth saying that we do not treat here neither problems with a common noise nor with interaction through the law of the control; these are left to future work.
Finally, we mention that convergence results have been obtained also for the opposite regime of mean field games.In that case, players are non-cooperative in the prelimit N -player game and the notion of optimality is that of Nash equilibrium, which highly depends on the set of admissible startegies that is considered.This makes the convergence analysis more difficult, expecially in case limiting mean field game solutions are non-unique; some references are [9,19,23,28,30] for diffusion-based models and [2,3,14,17,26] for finite state space.
The rest of the paper is organized as follows.In Section 2, we collect our main results: after introducing the notations and assumptions that will be in force, we define properly the N -agent optimization and show the equivalence with the mean field formulation, then we present the MFCP with its well-posedness result (Thm.2.9) and thus we state the convergence theorems.In Section 3, we examine the MFCP and prove first well-posedness of viscosity solutions and then, under additional convexity assunptions, well-posedness of classical solutions.Notably, we establish a comparison principle (Thm.3.4) for viscosity solutions on the interior of the simplexwithout boundary conditions, by exploiting the invariance of the domain-, which is a new result in the theory.Finally, Section 4 contains the proofs of the convergence results (Thms.2.10 and 2.13): first the convergence of the value functions via viscosity solutions and then, assuming V ∈ C 1,1 , the propagation of chaos property.

Notation
We denote d = {1, . . ., d} and let be the (d − 1)-dimensional simplex, endowed with the euclidean norm | • | in R d .We denote by •, • the scalar product in R d and, for a matrix (Q i,j ) i,j∈ d , we let Q i,• be the row i.We denote the elements of the simplex by m, while µ denotes processes with values in the simplex.
The simplex can be viewed as a subset of R d−1 , by expressing the last coordinate as We express the simplex using the particular local chart (m 1 , . . ., ; when we refer to the interior of simplex, denoted by Int(S d ), we mean the image of Int( S d ) under the above chart.via the above local chart, a function v defined on the simplex is equivalently written as a function v defined on S d .Thus we say that with derivatives that extend continuously up to the boundary.In the interior of the simplex, derivatives are allowed only along directions (δ j − δ i ) i,j∈ d , which are denoted as ∂ mj −mi v(m); we define the vector It takes values in the discretized simplex while for a function u :

Assumptions
We provide three sets of assumptions.The first is the weakest and gives convergence of the value functions, with a convergence rate.Note in particular that we don't assume convexity, neither in a, nor in m.
Assumption A. (A1) The action space (A, d) is a compact metric space.(A2) The transition rate Q i,j is continuous on [0, T ] × A × S d (thus uniformly continuous and bounded) and Lipschitz-continuous in (t, m), uniformly in a: ) We denote then, for z ∈ R d such that z i = 0, a ∈ A and m ∈ S d , the pre-Hamiltonian and the Hamiltonian The second assumption is a linear-convex assumption, very common in control theory, which, together with existence of classical solution to the limiting problem, gives convergence of the optimal trajectories.Assumption B. Assumption A holds and, in addition: The running cost f is continuously differentiable in A, ∇ a f is Lipschitz-continuous with respect to m, and f is uniformly convex in A, i.e. there exists λ > 0 such that (2.9) Under this assumption, thanks to Proposition 1 in [26], there exists a unique maximizer of H, which we denote by a * (t, i, m, z), and, further, a * is Lipschitz continuous with respect to m and z, i.e. (2.10) We will consider feedback controls α : [0, T ] × d → A (or equivalently α : [0, T ] → A d ), thus, when (B1) holds, we denote α i,j (t) := α j (t, i).
The last is a convexity assumption in the couple (α, m) that is needed to prove smoothness of the value function of the MFCP.
Assumption C. Assumption B holds and, in addition: Assumption (C2) may seem strange at this stage, but will be clarified in Section 3.2.
Remark 2.1.The reader will notice that not all of the conditions of Assumption A (Lipschitz-continuity in (t, m)) are necessary in all the statements in which A is assumed.The same is true also for Assumption B. We choose to make only three Assumptions for the sake of definiteness.We stress again that the main differences among the three assumptions are that in A nothing is convex, while in B we assume convexity in a and in C in (a, m).
We remark that A allows to treat also the case of directed graphs, in which some transitions are forbidden: if E = {e 1 , . . ., e d } is the set of nodes and E + (e i ), for each i ∈ d , is the subset of E \ {e i } of nodes e j for which there exists a directed edge from e i to e j , then the transition matrix Q is required to satisfy Q i,j (t, a, m) = 0 whenever e j / ∈ E + (e i ).We also remark that we do not assume that f splits in a function of (i, a) plus a function of (i, m), as it would be in the case of potential mean field games which are described in Section 3.3.
We conclude this part with a natural example for which B and/or C are satisfied.
Example 2.2.As a natural running cost for which (2.9) holds, we could take In this case, we get where, for r ∈ R, a * (r) := In fact, we remark that our Assumptions B and C are more general since they allow for a running cost which does not split as in (2.11).If f is as in (2.11), then (C2) is satisfied if the function m → i m i f i 0 (m) is convex; indeed, one can easily verify that the function (w, m) → i j =i mi is convex in (w, m).

N -agent optimization
Consider N players, X = (X 1 , . . ., X N ), such that X i t ∈ d , evolving in continuous time over a finite horizon T .Agents can choose controls β = (β 1 , . . ., β N ) in feedback form, i.e. any β k is a measurable function of time and state of all players: The dynamics is given as a Markov chain such that for j = x k , as h → 0 + .Agents are cooperative and aim at minimizing the common cost The cost coefficients f and g depend on the empirical measure We denote The initial conditions (X 1 0 , . . ., X N 0 ) are assumed to be i.i.d with Law(X 1 0 ) = m 0 .This can be seen as a single optimization problem for the process X, governed by the generator for any ϕ : d N → R. The value function v N of this control problem solves the HJB equation which, by definition of the Hamiltonian in (2.8), gives ( Proof.This is a standard verification theorem (see e.g.[24], Thm.III.8.1) and existence of solution is given by the Lipschitz continuity of the Hamiltonian in (2.16), which follows by the continuity of the coefficients and compactness of A. These properties also yield existence of a maximizer in (2.7) and hence existence of an optimal feedback.
Remark 2.4.The optimal control is not unique and there might exist non-exchangeable optimizers.We recall that a vector of stochastic processes is said to be exchangeable if its joint law is invariant under permutations.Under Assumption B the optimal control is unique.
Remark 2.5.For problem (2.13)-(2.15), the choice of controls in Markovian feedback form is made for convenience only and is the most natural setup for this type of control problems.We could consider more general open-loop controls: in this case, the strategy vector (π 1 t , . . ., π N t ) t∈[0,T ] is a vector of predictable A-valued processes and the dynamics of the state process X can be defined as the solution of the controlled martingale problem related to the generator (2.15), in which β k is replaced by the stochastic process π k .Notably, the value function of this more general control problem still solves equation (2.16), which admits a unique solution, and thus these two control problems are equivalent.

Mean field formulation
We give another formulation of the N -agent optimal control, by restricting the class of admissible controls.Nevertheless, we will prove that the two formulations are equivalent, in the sense that the value function is the same and thus the infimum of the cost is the same.
Assume then that the control is the same for any player and is given by a feedback (measurable) Markovian function α N : [0, T ] × d × S N d → A: we make the mean field assumption for which the control depends on the private state and on the state of other players only through the empirical measure µ N t of the entire system.Namely, we assume that, for any k ∈ N , (2.17) Thus we have as h → 0 + .The aim of the players is to choose α N in order to minimize the cost in (2.13), which, assuming now (2.17), can be rewritten as and hence Therefore the N -agent mean field control problem can be seen as a single optimization problem for the empirical measure, which is a time-inhomogeneous Markov chain on S N d such that for any m ∈ S N d and i = j ∈ d .The control (in feedback form) is now the vector valued measurable function The HJB equation for the value function of this problem is then which rewrites as that is an ODE indexed by m ∈ S N d .Proposition 2.6.Under Assumption A, the HJB equation (2.22) admits a unique solution , and there exists an optimal feedback.V N satisfies the Lipschitz property and thus the control problems (2. Proof.The first claim follows by the Lipschitz continuity of H and a standard verification theorem, an optimal feedback being a measurable function that attains the maximum in (2.7), which exists by compactness of A and continuity of the coefficients.The Lipschitz continuity of V N is proved in Section 4.1, Lemma 4.1.If w N is the function defined by the r.h.s. of (2.24), then Hence, by uniqueness of the solution to (2.16), we have w N = v N and thus (2.24) is satisfied.
Remark 2.7.We could restrict the class of admissible controls to the set of α N : [0, T ] → A d that are deterministic functions only of time and of the private state i ∈ d .This control problem is equivalent to what we consider here because its value function W N is clearly Lipschitz in time, and hence absolutely continuous, and then, by the dynamic programming principle, it is easy to show that W N solves (2.22) at any point of differentiability.Thus, by uniqueness of the solution to (2.22) (defined in the sense of Caratheodory in the class of absolutely continuous functions) we get V N = W N , which means that the costs have the same infimum.We could further restrict the class of admissible controls, for instance, to the set of piecewise constant (deterministic) functions of time.This setting is the one considered in [27].
If assumption B holds, then the optimal control is unique, since the maximizer in (2.7) is unique (see e.g.[16], Thm. 5).The control is the transition rate α N (t, i, m) ∈ [0, M ] d , which we denote as a transition matrix In such case, (2.18) becomes simply and the dynamics of µ N is then given by Note that the values of α i,i never enter in the dynamics.Then it follows: Proposition 2.8.Under Assumption B, there exists a unique optimal control.It is given by (2.28)

Mean field control problem
In the limit, there is a single player which evolves according to and Law(X 0 ) = m 0 .The control is here a deterministic measurable function α : [0, T ] → A d , or equivalently α : [0, T ] × d → A, which is indeed a feedback function of the state X t , denoted by i ∈ d .As a particular case, this set includes the functions [0 ) is a deterministic function of time.The reference player aims at minimizing the cost The problem can be recasted into a deterministic control problem for the dynamics of the law of X, thanks to the fact that we consider Markovian feedback controls only.Indeed, denoting µ t = Law(X t ), i.e. µ i t = P(X t = i), the cost is written as where µ solves the ODE, indexed by i ∈ d , (2.32) The HJB equation for the value function V of this problem is then which rewrites as The peculiarity of this first-order equation is that it is stated in the simplex, which is a bounded domain, but there are no boundary conditions.This is explained by the fact that the simplex is invariant for the dynamics (2.32), and so is its interior.The first order HJB equation has no classical solutions in general, and for this reason viscosity solutions were introduced.These are defined properly in the next section, by viewing the simplex as a subset of R d−1 instead of R d .Viscosity solutions can be defined either on S d or on Int(S d ), depending on the boundary regularity of the test functions involved.One problem with defining viscosity solutions on S d is that it is not clear that a classical solution is a viscosity solution on S d ; however, this definition is the one we will use to prove convergence of V N to V .Exploiting the fact that Int(S d ) is invariant for the state dynamics, which results in a property on the subdifferential of H, it is possible to show uniqueness of viscosity solutions on Int(S d ); see the comparison principle below (Thm.3.4).
If B holds, we denote as above α i,j (t) = α j (t, i), which is the transition rate matrix.In the next section, we prove the following: Theorem 2.9.Let V be the value function of the deterministic control problem (2.31)-(2.32): 1. if Assumption A holds, then V is the unique viscosity solution of (2.33) on S d , and V is Lipschitzcontinuous in (t, m); 2. if Assumption A holds, then V is the unique viscosity solution of (2.33) on Int(S d ), and, if B holds, there exists an optimal control; 3. if Assumption C holds, then V ∈ C 1,1 ([0, T ] × S d ) and is the unique classical solution of (2.33); 4. if V ∈ C 1,1 (S d ) and B holds, then the control given by the feedback is the unique optimal control, in the sense that any optimal control α : [0, T ] → [0, M ] d×d , with related optimal process µ is such that α(t) = α * (t, µ t ) for dt-a.e.t ∈ [0, T ].

Convergence results
We state here the results about the convergence, as N → ∞, of the value function V N of the N -agent optimization (2.19)-(2.20) to the value function V of the mean field control problem (2.31)-(2.32),with a convergence rate of order 1/ √ N .We recall that V N is the classical solution to ODE (2.22), while V is the viscosity solution to PDE (2.33).The following is our main result: The theorem is proved in Section 4.1.As announced in the Introduction, we exploit the characterization of V as the viscosity solution to (2.33) in order to prove the result.In fact, ODE (2.22) can be seen as a finite difference scheme for the PDE (2.33), even if time is still continuous.Indeed the argument D N,i V of the Hamiltonian in (2.22) converges, at least formally, to D i V appearing in (2.33), as lim This result also permits to construct quasi-optimal controls for the N -agent optimization, starting from quasi-optimal controls for the MFCP, with an explicit rate of approximation.Theorem 2.11.Assume A and fix ε > 0 and N ∈ N. Let α : [0, T ] → A d be an ε-optimal control for the MFCP.Then This is also proved in Section 4.1.Here, J N (α) is understood as applying the control α N (t, m) = α(t), which is independent of m.Recall that the infimum over controls α N is the same as the infimum over controls β, depending on states of all the players, by (2.25), and is also equal to the infimum over controls not depending on m (like the α considered), by Remark 2.7.
Remark 2.12.In [27], Kolokoltsov proved a result similar to Theorem 2.10, but assuming in addition that F , G and Q and C 1,1 w.r.t.m (similarly to Assumption (C1)).He analyzes a mean field N -optimization like in (2.19)-(2.21),but allowing for controls that are piecewise constant functions of time only, and are the same controls he considers in the limiting deterministic control problem (2.31)-(2.32).However, we explained in Remark 2.7 that considering this smaller class is not restrictive, as the value of the N -agent optimization is the same as what we treat here, i.e. over controls that might depend also on m.Then, by applying the convergence of the generator (2.21) to the limiting dynamics (2.32), he shows convergence of the value functions with a stronger convergence rate (Thm. 2 therein): As a matter of fact, from his method of the proof (basically, the same set of controls is considered for the prelimit and the limit optimization problems), it is also possible to derive estimate (2.35).Indeed, by applying standard arguments in propagation of chaos, we can get a convergence rate of order 1/ √ N , assuming that the costs and the transition rate are just Lipschitz-continuous w.r.t.m, and not in C 1,1 .Therefore, what we propose in this paper is a new method for proving the convergence in (2.35), based on the theory of viscosity solutions, which we believe can be of interest.
In case V is smooth, the optimal control of the MFCP is unique and then, if B holds, we are also able to establish a propagation of chaos result, that is, we prove convergence of the optimal trajectory of the N -agent optimization to the unique optimal trajectory of the MFCP, with a suitable convergence rate.
Denote then by α N the unique optimal feedback control for the N -agent optimization defined by (2.28), and by µ N the corresponding optimal process satisfying (2.27).Also, let α * be the unique optimal feedback control for the MFCP defined by (2.34) and µ the corresponding optimal trajectory given by (2.32).We stress that α N and α * are functions of t and m. (2.39) The propagation of chaos result can be stated also for the vector of processes X related to the optimal control α N and optimal empirical measure µ N , that is, X is given by (2.12) assuming (2.17).For N fixed, denote by X the i.i.d process (given by (2.12)) in which all players choose the same local control α(t, i) := α * (t, i, µ t ) depending only on the private state, i.e. β k (t, x) = α(t, x k ).The propagation of chaos consists in proving convergence of X to the i.i.d.process X.

Mean field control problem
The aim here is to examine in detail the mean field control problem (2.31)-(2.32) in order to prove Theorem 2.9.We first rewrite the state dynamics and the cost in terms of the local chart , so that we are allowed to apply standard results about deterministic control probles on Euclidean spaces; see e.g.[1,7,24]. For while the cost (2.31) is written as It is clear that the value function V of (3.1)-(3.2),defined on S d , is equal to the value function V of (2.31)-(2.32),defined on S d , by setting V (t, x) = V (t, x).The HJB equation for V in S d is then where the modified Hamiltonian is defined, for z ∈ R d−1 , by We will use several times the fact that Int(S d ) is invariant for the dynamics (2.32), or equivalently that Int( S d ) is invariant for (3.1).Indeed, by assumption A follows that Q is bounded (as it is continuous on a compact set): let us set and recall that Q i,j ≥ 0. Then (2.32) gives which, by Gronwall's inequality, provides meaning that µ t ∈ Int(S d ) if µ 0 ∈ Int(S d ).

Viscosity solution
The value function in not C 1 in general and thus does not solve (2.33) in the classical sense, unless the convexity assumption C holds; see the next Subsection.Hence we present the two definitions of viscosity solutions we make use of: one in S d and one in Int(S d ).The notion of viscosity solution is the usual one, but we prefer to state them to avoid confusions, because the use of test functions defined on a closed set is not standard.By our definition of C 1 (S d ), viscosity solutions can be defined in two equivalent ways, since it is equivalent to define functions on (ii) a viscosity supersolution of (2.33) on S d (resp.on Int(S d )) if, for any Let us remark that, in the above definition, by solutions to (2.33) we clearly mean solutions to the first line of (2.33).As an example of test functions in C 1 ([0, T ) × S d ), we may consider continuously differentiable functions defined on an open subset of R d−1 containing S d .With this remark, it is straightforward to obtain the following result, which proves point (1) in Theorem 2.9: Proposition 3.2.Under Assumption A, the value function V of the MFCP is the unique viscosity solution of (2.33 Proof.The Lipschitz-continuity and the viscosity solution property of V follow from standard results in deterministic control theory, see Thms.7.4.10 and 7.4.14 of [7], by considering the dynamics in R d−1 and using the fact that S d−1 is invariant for the dynamics (3.1).Uniqueness of viscosity solutions on S d follows by the usual proof of uniqueness, see for instance ( [24], Thm.II.9.1), by observing that -if the minimizers are on boundary points-we can use the fact that the test functions constructed in the proof are quadratic and thus defined in the whole R d−1 , in particular then belonging to C 1 ([0, T ] × S d ).We believe that there is no need to rewrite the proof here.
This notion of viscosity solution on S d will be used in the proof of the convergence result Theorem 2.10, see Subsection 4.1.Actually, uniqueness of viscosity solutions on S d can also be derived as a consequence of the proof therein.The problem with the definition on the closed set S d is that it is not clear whether a classical solution is a viscosity solution on S d .Indeed, if the value function V is smooth, then for sure V is a viscosity solution on Int(S d ), but if a maximizer of V − ψ lies on the boundary of S d then it is not clear a priori that (3.7) holds.We could prove this fact, but we prefer instead to show uniqueness of viscosity solution on Int(S d ), which implies in particular that V ∈ C 1 ([0, T ] × S d ) is a viscosity solution on S d , but is more general and we believe can have an interest on itself.The following result then proves points (2) and (4) in Theorem 2.9. 1. there exists an optimal control for the MFCP; 2. if V ∈ C 1,1 ([0, T ] × S d ), then the control given by the feedback (2.34) is the unique optimal control, in the sense that any optimal control α : [0, T ] → [0, M ] d×d , with related optimal process µ is such that Proof.Existence of an optimal control follows from ([7], Thm 7.4.5),using the convexity of the cost in a.
then it is the unique classical solution of (2.33), solving the equation also at boundary points.Under assumption B, using in particular the strict convexity of F in a, a * is the unique argmin of the pre-Hamiltonian 2.7 and thus the optimal control defined by (2.34) is unique.Note that dynamics (2.32) is well-posed using the feedback α * because D i V is Lipschitz-continuous.
It remains to show the comparison principle for viscosity solutions on Int(S d ); to this end, it turns out that it is better to consider the dynamics in R d−1 and thus we state the equivalent result on Int( S d ).In absence of boundary conditions in space, we must rely on the invariance of Int( S d ) for the dynamics (3.1).The following result then extends what we presented in ( [15], Thm.6.2) to a more general dynamics, and borrows ideas from the proofs of Theorem 3.8 and Proposition 7.3 in [36].We stress that we do not require here neither differentiability of the Hamiltonian nor convexity of the costs (in a or m).Proof.The idea is to define a supersolution v h that dominates u at points near the boundary, for any h, and then use the comparison principle and pass to the limit in h.The parameter h is needed to force v h to be infinity at the boundary of the simplex.Since the simplex has corners, the distance to the boundary is a smooth function, so the first step is to construct a nice test function that goes to 0 as x approaches the boundary.Roughly speaking, we consider the product of the distances to the faces of the simplex, and then take its logarithm.
Step 1.Let ρ i (x), for x ∈ Int( S d ), be the distance of x from the hyperplane {y ∈ R d−1 : y i = 0}, for i ∈ d − 1 , and ρ d (x) be the distance from {y ∈ R d−1 : d−1 l=1 y l = 1}.Specifically, for x ∈ Int( S d ), we have where we recall that x −d = 1 − l∈ d−1 x l .Clearly, ρ i ∈ C ∞ ( S d ) and the derivatives, for j ∈ d − 1 , are Let us denote, for x ∈ Int( S d ) and z ∈ R d−1 , the Hamiltonian in (3.3) where H i , for i ∈ d , are defined by (3.4); note that H is convex in w.
Step 2. For any h > 0, let We claim that v h is a viscosity supersolution of (3.given by ϕ h (t, x) = ϕ(t, x) + h 2 i∈ d log(ρ i (x)) − h(T − t), we get that (t, x) is a local minimum of v − ϕ h and thus We denote w = D x ϕ(t, x), wi = (w 1 − w i , . . ., w d−1 − w i , −w i ) ∈ R d for i ∈ d − 1 and wd = (w, 0), y j = Dρ j (x) for j ∈ d and similarly yi j , and i = wi + h 2 j∈ d yi j ρj (x) , for i ∈ d .We apply the following property, which is an immediate consequence of the definition of the Hamiltonian in (2.8): where we used the bound 0 ≤ Q i,j ≤ M , for any i = j ∈ d , and the definition of ρ i in the last two lines.The latter inequality, applied in (3.10), gives , which implies that v h is a viscosity supersolution of (3.3) on Int( S d ) if h is small enough.
Step 3. As ρ i ≤ 1, we have v h (t, x) ≥ v(t, x) for any (t, x) ∈ [0, T ] × Int( S d ).In particular, v h (T, x) ≥ v(T, x) ≥ u(T, x) for any x ∈ Int( S d ).We denote ρ(x) = d i=1 ρ i (x).Since u and v are bounded, we find that for any h > 0 there exists η > 0 (which may depend on h) ≤ η}; note that O η is a smooth domain.Thus v h (t, x) ≥ u(t, x) for any t ∈ [0, T ] and x ∈ O η c , in particular for any x ∈ Γ η .Therefore we can apply the comparison principle (see [24], Thm.II.9.1) in . Finally, we obtain u ≤ v on [0, T ] × Int( S d ) by sending h to 0, as lim h→0 v h (t, x) = v(t, x) for any (t, x) ∈ [0, T ] × Int( S d ), and then the inequality u ≤ v can be extended up to the boundary of S d by continuity.

Classical solution
Here we give the sufficient condition for the value function of the MFCP to belong to C 1,1 ([0, T ] × S d ).This is the the convexity assumption C: we prove hence point (3) in Theorem 2.9.
Theorem 3.5.Under Assumption C, the value function is in C 1,1 ([0, T ] × S d ), and thus it is the unique classical solution to (2.33) (it solves the equation also at boundary points).

We recall that a function
for any t ∈ [0, T ], m ∈ Int(S d ), h with t ± h ∈ [0, T ], and p with m ± p ∈ Int(S d ).
Proof.We show that the value function is both semiconcave and semiconvex, in time and space, globally in Int(S d ), with a constant c; clearly it is equivalent to prove this propeties either for V defined on S d or for V defined on S d .Recall again that Int(S d ) is invariant for dynamics (2.32).Hence Corollary 3.3.8 in [7] ensures that V ∈ C 1,1 ([0, T ] × Int(S d )) and the Lipschitz constant of D i V is c, ∀i ∈ d .Thus in particular V can be extended uniquely to a function in C 1,1 ([0, T ] × S d ), and then it solves (2.33) also at boundary points.The classical solution to (2.33) is unique because any solution is the value function, by a standard application of the verification theorem.
The value function is semiconcave on [0, T ] × Int(S d ) by Theorem 7.4.11 in [7], thanks to Assumption (C1); as above, to apply this result set on a Euclidean space, we have just to consider the equivalent formulation of the control problem on Int( S d ).To prove that V is semiconvex, we rewrite the MFCP in an equivalent formulation, with a control w, such that the cost is convex in (m, w) and the dynamics is linear in (m, w).Consider hence the problem of minimizing the cost2 where the couple (µ, w) satisfies the ODE and is subject to the constraints w i,j t ≥ 0, any t ∈ [0, T ], i.e. µ i t > 0, thus the cost is well-defined in this case.This control problem is indeed well-defined only for µ 0 ∈ Int(S d ) and it is seen to be equivalent to the MFCP (2.32)-(2.31)by setting w i,j = µ i α i,j , meaning that the value function is the same.The advantage in using this new formulation is that the dynamics is now linear in w and the set of (µ, w) satisfying the constraints is convex.Moreover, the running cost is convex in (m, w) and the terminal cost is convex in m by Assumption (C2).Thus we can apply ( [7], Thm.7.4.13),which says that V (t, m) is a convex function of m ∈ Int(S d ), for any t ∈ [0, T ].Then V is semiconvex in time and space, in [0, T ] × Int(S d ), again by ( [7], Thm.7.4.13),but using the original formulation of the MFCP in the proof therein, for which the coefficients are globally Lipschitz in Int(S d ), yielding thus global semiconvexity, while in the new formulation the cost is only locally Lipschitz in m ∈ Int(S d ).

Further properties and potential mean field games
We collect, for reference, other results concerning the mean field control problem and its relation, in some cases, with a mean field game.These are not used here, but might be useful for future works.They derive directly from the results of Section 7.4 of [7] about deterministic control problems in a Euclidean space.Thus we consider the problem (3.1)-(3.2) defined on S d whose value function is denoted by V .As before, for a function G defined on S d , we denote by G its version in local chart, i.e.G(x) = G(x).Proposition 3.6.Assume A and that ), for the Hamiltonian H defined by (3.9).If α is an optimal control and x the corresponding optimal trajectory, then there exists w ∈ C 1 (0, T ; R d−1 ) such that and w t belongs to the space superdifferential of V (t, x t ) for any time.Moreover, if H(t, x, •) is strictly convex for any t ∈ [0, T ] and x ∈ Int( S d ), and the costs F and G are semiconcave w.r.t.m, then, assuming that the control problem starts at (t 0 , x 0 ) ∈ [0, T ] × Int( S d ), -V is differentiable in (t, x t ) for any t ∈ (t 0 , T ], for any optimal trajectory x; -V is differentiable in (t 0 , x 0 ) if and only if there exists a unique optimal trajectory x; in such case the adjoint process w satisfies The Hamiltonian is strictly convex w.r.t.w, for x in the interior, in case e.g. the running cost is given by (2.11), as observed in [15].We recall that the value function is (time-space) Lipschitz continuous, and thus differentiable almost everywhere for the 1-dim Lebesgue measure in time and the (d−1)-dim.Lebesgue measure in space.Further, V is shown to be (time-space) semiconcave in the proof of Theorem 3.5, in case only B and (C1) hold.The first assertion is instead the Pontryagin principle, which holds also under weaker assumptions.
In case the cost splits as the Hamiltonian H in (2.8) splits as H i (t, m, z) = H i 0 (t, a) − f i 0 (t, m), and then (3.13) becomes where F 0 (m) = i∈ d m i f i 0 (m) and H is defines as in (3.4).The above equation is in fact equivalent to the HJB equation in the MFG system, first analyzed in [26], which is for a given running cost f and terminal cost g, in case B holds, so that the transition rate in the coupled equation for µ in (2.32) are given by α i,j t = a * j (t, i, (u j t − u i t ) j∈ d ).Equivalence holds if the functions f and g are such that for instance, the latter holds true by defining ) where all the occurrences of u i − u j have been replaced by w i − w j if j ∈ d − 1 and by w i if j = d.We remark that (3.13) can not be interpreted as a mean field game if the cost does not split as in (3.14).In case the cost splits, we have shown that, if an optimal control for the MFCP exists, then it gives rise to a solution of a particular mean field game -with costs determined by (3.17)-and therefore this can provide more informations on the optimal control and on the corresponding optimal trajectory of the MFCP.
In general, a mean field game, in which hence the costs (f i ) i∈ d and (g i ) i∈ d are given, is said to be potential if (3.17) holds and the cost f i 0 and g i defining the MFCP do not depend on i -say they are equal to f 0 and g-so that F 0 = f 0 and G = g.Thus the mean field game system represents the necessary conditions for optimality of the deterministic MFCP and we refer to [15] for a detailed study of potential mean field games and corresponding MFCP, in particular for the interpretation of (3.17).

Convergence results
We prove here the main convergence results: Theorems 2.10 and 2.13.Throughout this section, V N denotes the value function of the mean field N -agent optimization (2.19)-(2.20),while V denotes the value function of the mean field control problem (2.31)-(2.32).We recall that V N is the classical solution to ODE (2.22), while V is the viscosity solution to PDE (2.33).

Convergence of value functions
Here, we prove Theorem 2.10.Assume hence that Assumption A is in force.We exploit here the characterization of V as the unique viscosity solution to (2.33) on the closed set S d ; see Definition 3.1 and Proposition 3.2.
We first need to show that V N is time-space Lipschitz-continuous, uniformly in N ; this is (2.23) in Proposition 2.6.In this point, the compactness of the control set A is required (Assumption (A1)).Lemma 4.1.If A holds, then for every t, s ∈ [0, T ] and m, p for a constant C independent of N .
Proof.We represent the dynamics of µ N , given by (2.20), as an SDE with respect to a Poisson random measure, as we did in [16].We restrict attention to controls α : [0, T ] → A d that are just functions of time and of the private state i ∈ d .Recall that, by Remark 2.7, the value function over this smaller class is the same V N .Fix N ∈ N and denote by N (dt, dθ) a standard Poisson random measure on [0, T ] × [0, M ] d×d , with intensity measure ν on [0, M ] d×d , where M is the maximum of Q (which is continuous over a compact set).Let ν be defined as the sum of the measures of the intersections with the axes, i.e. ν(E) = i,j∈ d Leb(E ∩ Θ i,j ), where Θ i,j := {θ ∈ [0, M ] d×d : θ i ,j = 0 for all (i , j ) = (i, j)}, which is viewed as a subset of R, as E ∩ Θ i,j is, and Leb is the Lebesgue measure on R. Hence, the dynamics of µ N is written as This equation is well-posed because µ N takes values in S N d , which is finite, and then (2.20) follows by equation (2.34) of [16].Observe that 1 N (δ j − δ i ) represents the increment of µ N , while the indicator function gives the transition rate.
Let µ N start in m and ρ N start in p, at a fixed time t ∈ [0, T ), with the same control α.Let ε > 0 and α : [t, T ] → A d be an ε-optimal control for the problem starting at (t, m), i.e.
with an obvious notation for J N (t, m, α).Then, by Lemma 3 of [16], it follows that by using the Lipschitz-continuity of Q, and thus Gronwall's lemma yields sup Then where we applied the Lipschitz-continuity of F and G (defined by (2.3)) and (4.3).Taking the limit as ε vanishes, we get V N (t, p) − V N (t, m) ≤ C|m − p| and then, changing the role of m and p we obtain also the opposite inequality, which provides To prove the Lipschitz-continuity in time, note that (4.4) implies |D N,i V N (t, m)| ≤ C for any t ∈ [0, T ] and m ∈ S N d , and thus, recalling that V N is C 1 in time, from the HJB equation (2.22) we derive that for any t and m, where we used the Lipschitz-continuity of H w.r.t.z.Therefore sup which, together with (4.4), yields (4.1).
We now turn to the proof of Theorem 2.10.As a first step to prove the convergence, it is required to extend the definition of V N outside S N d .One way to do this, similarly to [8], would be to consider the same control problem (2.19)-(2.20),but starting at any point in the simplex, not only on S N d .However, in this way, the dynamics of µ N would go also outside the simplex, and thus we prefer to not follow this strategy.Instead, we use a piecewise constant interpolation of V N , and thus we have to pay a price in order to define it carefully at boundary points, so that the maximum in the usual doubling of variable argument (see (4.6) below) is attained.
Proof of Theorem 2.10.
We first prove that If E + N ≤ 0 then (4.5) trivially holds, thus we assume that E + N > 0. Since V N is defined on the grid S N d only, we have to construct a piecewise constant extension V N defined on the whole centered in points of S N d and invariant by translations.Note that the cells centered at points on the boundary cover also points outside the simplex.Define V N (p) = V N (p k ) if p ∈ Int(Γ k ) and p k is the center of Γ k .It remains to define V N at the boundaries of Γ k .
Step 1.We exploit the usual argument of doubling the variables, which prompts us to consider the function, to be defined on where ε is a parameter to be fixed later in terms of N .Then the value of V N at the boundaries of the cells has to be chosen such that the above function admits a maximum.Let us give first the idea of our construction.The strategy is to define first V N constant in any closed cell, then perform the maximization in any closed cell -in which a maximum of Φ exists-and thus take the maximum of the values obtained, so that a maximum point for Φ exists.If this maximum point lies in the interior of a cell, then there is no problem and the value of V N at the boundaries does not matter.The critical situation is when the maximum point belongs to the boundary of a cell.In such situation, we have to define carefully V N at the boundary of the neighboring cells in order to verify equality (4.7) below.It is required because we want to exploit the ODE (2.22), and this is indeed the main reason for considering a piecewise constant interpolation.We give below an example of our construction in case d = 2, the generalization to d > 2 being not difficult.Now, more precisely, for a cell Γ k centered at p k ∈ S N d , k ∈ {1, . . ., n(N, d)}, let V N k (p) := V N (p k ) for any p ∈ Γ k , thus also on the boundary of Γ k .Then define Φ k as is (4.6), but with p ∈ Γ k and V N therein replaced by V N k , and let Such a maximum exists and is attained at a point (t k , s k , m k , p k ), which might be non-unique and such that p k belongs to the boundary of the cell Γ k : we then let γ = max k=1,...,n(N,d) We can now define V N such that V N (s, p k ) = V N (p) := V N k (p) = V N (s, p k ), the last equality holding by the definition above, where we recall p k ∈ S N d is the center of the cell Γ k .Note that p = p k may belong to the boundary of Γ k , in which case we have defined V N at the boundary of Γ k .In addition, we require that for any i, j ∈ d .Note that this is automatically satisfied if p ∈ Int(S d ), while it gives the definition of V N on a part of the boundary of the cells bordering Γ k , if p belongs to the boundary of Γ k .Lastly, V N can be defined arbitrarily right or left continuous at the other boundary points.
Hence this construction guarantees that Φ in (4.6) admits a maximum on [0, T ] 2 × S 2 d at (t, s, m, p).We stress that V N is defined also outside the simplex, so that the RHS of (4.7) is meaningful in case p k + 1 N (δ j − δ i ) belongs to the boundary of the simplex.Note that V N is not continuous in space, but it remains Lipschitz in time, uniformly in space and w.r.t.N .
Before proceeding with the rest of the proof, let us give an example of the above construction in case d = 2.In this case the simplex is one dimensional and, if projected on [0, 1], we have S N 2 = { k N : k = 0, 1, . . ., N }, the increments are ± 1 N and the cells are ], for k = 0, . . ., N .Let us assume that we are in the critical situation, that is, p belongs to the boundary of Γ k ; for instance, we can assume that p = p k = k N + 1 2N .The following picture provides then the construction of V N (where we omit the time dependence): Namely, if V N is left continuous at p then it is defined to be left continuous also at p ± 1 N , so that . This latter property is crucial in the proof, see (4.10) below.We remark again that V N can be define arbitrarily right or left continuous at the other boundary points.It is not difficult to generalize this construction to d > 2. We stress that (4.8) and (4.9) hold also in case (t, s, m, p) belong to the (time and space) boundary of [0, T ] 2 × S 2 d .We fix now ε = 1 √ N .
Step 3. In order to prove (4.5), we consider the three cases: either t = T , or s = T , or t, s < T .
First case: t = T and s ∈ [0, T ].The inequlity Φ(t, s, m, p) ≥ Φ(t, t, m, m), for any (t, m) ∈ [0, T ] × S N d , exploiting the Lipschitz-continuity in time of V N and in S d of G, gives Taking the supremum of the l.h.s. and applying (4.8) and (4.9), we have which yields (4.5).Second case: s = T and t ∈ [0, T ].This can be treated as the first case, by using the Lipshitz-continuity in time of V .
Third case: t ∈ [0, T ) and s ∈ [0, T ).Let p k be, as above, the point in S N d such that V N (s, p k ) = V N (s, p).Here we use the piecewise constant construction of V N and in particular (4.7).Hence, from (2.22) we obtain where a * i = a * i (s, p k ) is a point in A which attains the maximum in (2.7).We recall that V N is defined also in case p + 1 N (δ j − δ i ) is outside the simplex, which could happen if p k + 1 N (δ j − δ i ) belongs to the boundary; while p k + 1 N (δ j − δ i ) can be outside S d only if p i k = 0, in which case the term in (4.10) is zero.Since V N − ϕ has a minimum at (s, p), where we get d dt V N (s, p) ≥ ∂ t ϕ(s, p), with equality if s = 0, and thus On the other hand, as V − ψ has a maximum at (t, m), where since V is a viscosity subsolution of (2.33) on the entire S d (see again Def.3.1) and ψ is indeed defined on R d , we get which, by definition of the Hamiltonian, gives The inequality Φ(t, s, m, p) ≥ Φ(t, s, m, p + 1 N (δ j − δ i )) gives, for any i and j in d , which, applied in (4.11), yields, as Q i,j ≥ 0, Summing (4.12) and (4.13), using Assumption A (i.e. the boundedness of Q and the Lipschitz-continuity of Q and F w.r.t.(t, m)), we obtain which provides, using (4.8), (4.9) and that |p − p k | ≤ 1 N by construction, Step 4. The opposite inequality, can be proved in the same way, by changing the roles of V N and V , and using instead the viscosity supersolution property of V .Finally, (4.5) and (4.14) give (2.35).
Remark 4.2.Similarly to [8], see also ( [1], Sect.VI.1), it might be possible to establish a stronger convergence rate of order 1/N , if V N is semiconcave uniformly in N : for a constant c independent of N .Such estimate should hold if the costs F and G are semiconcave in space.This is left to future work.
As a consequence, we can also prove Theorem 2.11.In the proof as weel as in the next section, we use several times the following basic estimate: if ξ = (ξ 1 , . . ., ξ N ) is a vector of N i.i.d.random variables with values in d , such that Law(ξ 1 ) = m ∈ S d , then ) i∈ d has a multinomial distribution with parameters N and m, and thus the variance is E|N µ Hence (4.15) follows by Cauchy-Schwarz inequality and recalling that we are using the Euclidean norm.
Proof of Theorem 2.11.Let α N (t, i, m) be an optimal feedback control for the N -agent optimization and α(t, i) be ε-optimal for the MFCP, i.e.
Recall that the dynamics start at time 0 and the initial condition X 0 is assumed to be i.i.d. with Law(X where we used (2.35), (4.16), and the Lipschitz-continuity of V .As µ N 0 is the empirical measure of an i.i.d sequence, (4.15) permits to bound E|µ N 0 − m 0 |.Thus, in order to prove (2.37), it remains to show This follows by standard arguments in propagation of chaos.Let ρ N be the process given by dynamics (2.20), when using the decentralized control α(t, i).Assumption (A3) gives (denoting α = (α 1 , . . ., α d ) and using definition (2.3)) Hence, to prove (4.17), we have to show We use the SDE representation of the dynamics introduced in (4.2), thus ρ N solves the SDE This process is not the empirical measure of a vector of independent processes because of the dependence of Q on m, thus we introduce also the process µ N in which the empirical measure is replaced by the limit deterministic flow µ: µ N solves and represents the empirical measure associated to N i.i.d copies of the limit process X satisfying (2.29) and such that Law( X t ) = µ t .Again (4.15) gives sup Thus it remains to estimate the distance between ρ N and µ N .As in the proof of Lemma 4.1, the representation in terms of SDEs yields, by exploiting the Lipschitz-continuity of Q w.r.t. the variable m,

Propagation of chaos
Here we prove Theorem 2.13.Throughout this subsection, we hence assume that V ∈ C 1,1 ([0, T ] × S d ) and that Assumptions B and (C1) are in force.
We first show the following simple result.
Lemma 4.3.There exists a constant C such that for any i ∈ d Proof.By definition, for each i, j ∈ d and any m ∈ S N d (we omit the time in the notation), using Taylor's formula and the Lipschitz-continuity of D i V , we obtain As in the statement of Theorem 2.13, let α N be the unique optimal feedback control for the N -agent optimization defined by (2.28), and µ N be the corresponding optimal process satisfying (2.27).Also, let α * be the unique optimal feedback control for the MFCP defined by (2.34) and µ the corresponding optimal trajectory given by (2.32).We stress that α N and α * are functions of t and m.Since Assumption (C1) is in force, as explained in Remark 2.12, we can assume here that the convergence of the value functions holds with a stronger rate, given by (2.38).
As an intermediate step, we consider the process ρ N satisfying (2.27) with the limiting feedback control α * ; such intermediate process is needed to prove convergence and describes the empirical measure of a standard mean field interacting particle system, in the sense that the transition rate function α * is the same for any N (and depends on the empirical distribution ρ N ), while in µ N the rate α N depends on N .Thus, we first show the proximity of µ N and ρ N and then prove convergence of ρ N to µ.We assume that ρ N and µ N start at time zero in the same point in S N d .The proof of the following proposition is the point where assumption (B3) is required.Proposition 4.4.We have
Step 1.We compute the limit value function V along µ N .Dynkin formula and then the HJB equation (2.33) give Observe that on the l.h.s.there is exactly J N (α N ), as defined by (2.19)We introduce also the process Y related to the empirical measure ρ N , in which the transition rate function is given in terms of the limiting optimal feedback α * (t, i, m), independent of N .In order to prove the propagation of chaos result (Thm.2.13) we first show the proximity of X and Y and then prove propagation of chaos for the process Y , which is a standard mean field interacting particle system.The proof is pretty standard and the arguments are the same that we used in [17], based on a probabilistic representation of the dynamics, thus we are not going to give all the details below.We recall that the initial conditions are fixed and i.i.d..The following is the counterpart of Proposition 4.4 for the N trajectories.We are finally in the position to prove Theorem 2.13.Recall that X is the decoupled process, made by independent copies of the limit dynamics, in which player X k uses the decentralized strategy α * (t, X k t , µ(t)), where α * is the optimal control and µ is the optimal trajectory for the MFCP, and we have Law( X k t ) = µ(t) for any t ∈ [0, T ] and k ∈ N .

Proof of
which can be found in [37].
S d or on S d , by the relation v(x) = v(x).Thus we prefer to state only the definitions of solutions to (2.33) on S d or Int(S d ); the definitions of solutions to (3.3) on S d or Int( S d ) being equivalent.Definition 3.1.A function v ∈ C([0, T ) × S d ) is said to be: (i) a viscosity subsolution of (2.33) on S d (resp.on Int(S d )) if, for any ) at every (t, m) ∈ [0, T ) × S d (resp. in [0, T ) × Int(S d )) which is a local minimum of v − ϕ on [0, T ) × S d (resp.on [0, T ) × Int(S d )); (iii) aviscosity solution of (2.33) on S d (resp on Int(S d )) if it is both a viscosity subsolution and a viscosity supersolution of (2.33) on S d (resp on Int(S d )).

Proposition 3 . 3 .
Under Assumption A, V is the unique viscosity solution of (2.33) on Int(S d ), in C([0, T ] × S d ) satisfying the terminal condition.Moreover, if B holds:

Theorem 3 . 4 (
Comparison Principle on Int( S d )).Assume A and let u, v ∈ C([0, T ] × S d ), u be a viscosity subsolution and v be a viscosity supersolution, respectively, of (3.3) on Int( S d ).If u(T, x) ≤ v(T, x) for any x ∈ S d , then u(t, x) ≤ v(t, x) for any t ∈ [0, T ] and x ∈ S d .

3 )
on Int( S d ).Let then ϕ ∈ C 1 ([0, T ) × Int( S d )), and (t, x) ∈ [0, T ) × Int( S d ) be a local minimum of v h − ϕ on [0, T ) × Int( S d ).Since v is a viscosity supersolution of (3.3) on Int( S d ), considering the test function ϕ h ∈ C 1 ([0, T ) × Int( S d )) ) and f d (t, x) = 0.It is easy to see that(3.15)and(3.16)are equivalent: given a solution u to(3.16), it suffices to let w j = u j − u d , j ∈ d − 1 ; conversely, given w solution to(3.15), it suffices to solve(3.16 S d .Denote by n(N, d) the number of elements in S N d , and cover the simplex by closed cells (Γ k ) n(d,N ) k=1

Lemma 4 . 5 . 0 αs
For any N ∈ N and k ∈ N , it holds By exploiting the representation of X and Y in terms of SDEs driven by Poisson random measures (similar for the representation for µ N used above, see[17] for the details), we obtainϕ(t) := E sup s∈[0,t] |X k s − Y k s | ≤ CE t 0 α N (s, X k s , µ N s ) − α * (s, Y k s , ρ N s ) + |X k s − Y k s | ds ≤ CE t 0 α N (s, X k s , µ N s ) − α * (s, X k s , ρ N s ) + |X k s − Y k s | ds,where we used that any bounded function is Lipschitz with respect to i ∈ d finite.Using the exchangeability of the processes (X, Y ), we can rewriteE t N (s, X k s , µ N s ) − α * (s, X k s =i} α N (s, i, µ N s ) − α * (s, i, ρ N s α N (s, i, µ N s ) − α * (s, i, ρ N s ) ds ≤ C √ N ,the latter bound following from the proof of the previous proposition.Therefore we derive ϕ(t) ≤ C √ N + t 0 ϕ(s)ds, which gives (4.24) by Gronwall's lemma.
where a * (t, i, m, z) ∈ A is an argmax of (2.7), which might not be unique.(This is equivalent to say that −Q i,• (t, a * (t, i, m, z), m) belongs to the subdifferential of the convex function H i (t, m, z).) Choosing a maximizer a * (t, i, x, i ) for any i ∈ d , applying (3.11) and (3.4), we obtain (we omit the dependence on (t, x)) The inequality Φ(t, s, m, p) ≥ Φ(t, s, p, p) gives which, together with (4.19), provides (4.18) that concludes the proof.
H i (t, µ N t , D i V (t, µ N t )) + α i N (t, µ N t ), D N,i V (t, µ N t ) dt H i t, a * (t, i, µ N t , D i V (t, µ N t )), µ N t , D i V (t, µ N t ) − H i t, α i N (t, µ N t ), µ N t , D i V (t, µ N t )) − f i (t, α i N (t, µ N t ), µ N t ) + α i N (t, µ N t ), D N,i V (t, µ N t ) − D i V (t, µ N t ) dt .Since a * (t, i, m, z) maximizes the pre-Hamiltonian H i (t, m, z) in (2.7), using (2.9), we obtain, for any a ∈ [0, M ] d ,H i (t, a * (t, i, m, z), m, z) − H i (t, a, m, z) ≥ λ|a * (t, i, m, z) − a| 2 ,(4.22)This inequality, together with (4.20) and the uniform boundedness of α N , yields , µ N t ) + N j − δ i ) − V (t, µ N t ) dt , which is equal to E[V N (0, µ N 0 )].Step 2. Considering now the process ρ N , and applying the SDE representation as in the proof of Lemma 4.1, we obtain for any t > 0 ds and applying Jensen's inequality, the boundedness and Lipschitz-continuity of the feedback function α * (verified as V is C 1,1 and α * is given by (2.34)), and then (4.23), we get Theorem 2.13.Thanks to (4.24), claim (2.40) is proved if we show that Let µ N t be the empirical measure related to the i.i.d.process X. Again (4.15) gives where in the last inequality we used the exchangeability of the processes.Thus Gronwall's inequality yields (4.25).Finally, observe that, if (C1) is not in force -and thus just (2.35) is satisfied and not (2.38)-then (4.23) holds with N replaced by √ N and hence, as a consequence of the above proofs, estimates (4.21) and (4.25) hold with √ N replaced by N 1/4 .Thus (4.27) gives * is Lipschits w.r.t.m because so is DV .We use this fact and also the inequality|µ N x − µ N y | ≤ C 1 N N k=1 |x k − y k | (4.27)which follows from the definition of the 1-Wasserstein distance, which is is equivalent to the Euclidean distance in finite dimension.As in the proof of the previous lemma, we haveϕ(t) := E sup * (s, Y k s , ρ N s ) − α * (s, Y k s , µ s ) ds + C