A policy iteration method for Mean Field Games

The policy iteration method is a classical algorithm for solving optimal control problems. In this paper, we introduce a policy iteration method for Mean Field Games systems, and we study the convergence of this procedure to a solution of the problem. We also introduce suitable discretizations to numerically solve both stationary and evolutive problems. We show the convergence of the policy iteration method for the discrete problem and we study the performance of the proposed algorithm on some examples in dimension one and two.

The policy iteration method is usually attributed to Bellman [8] and Howard [24] and it has played a pivotal role in the numerical solution of deterministic and stochastic control problems, both in discrete and continuous settings. It can be interpreted as a linearization method for an intrinsically nonlinear problem, and its global convergence in the finite dimensional case was proved in [24]. Moreover, Puterman and Brumelle [35] observed that the policy iteration method can be also seen as a Newton's algorithm for the nonlinear control problem; therefore, if the initial guess is in a neighborhood of the true solution, then the convergence is quadratic. For continuous control problems, assuming that the control set is bounded, the convergence of the method has been obtained by Fleming [22] and Puterman [33,34], who used this procedure to give a constructive proof of the existence of classical and weak solutions to quasilinear parabolic differential equations arising in the control of non-degenerate diffusion processes. Instead, for deterministic control problems with continuous state space, despite the method is largely used in the computation of the value function and the optimal control, no general convergence result is known. For recent results about the policy iteration method and its applications, see [5,10,26,36].
In this paper, we consider the following policy iteration algorithm for the MFG system (1.1). Let L(q) be the Lagrangian associated to the Hamiltonian H. Fixed R > 0 and given a bounded, measurable vector field At k th -step, frozen the policy q (k) , we first update m (k) by means of the forward FP equation (1.2), we plug the new distribution of agents in (1.3) computing the corresponding value function u (k) and, lastly, we determine the new policy q (k+1) corresponding to the value function u (k) . If the coupling cost F is independent of the density m, step (ii) and (iii) of the previous algorithm coincide with the classical policy iteration method for the HJB equation in (1.1). In our first result, see Theorem 3.1, we prove convergence (up to a subsequence) of the policy iteration method for the MFG system (1.1) assuming that the Hamiltonian is convex and globally Lipschitz, hence in a setting similar to [22,33,34]. Our second result, see Theorem 3.2, deals with Hamiltonians having polynomial growth and states that, for R sufficiently large, the sequence (u (k) , m (k) ) given by (1.2)-(1.4) converges (up to a subsequence) to a solution of (1.1). Since this result does not suppose the existence of a solution to (1.1) nor monotonicity assumptions, it can be also seen as a constructive proof of the existence of solutions to (1.1). As in [22,33,34], our approach relies on a priori estimates for the solutions of the linear problems (1.2), (1.3) in spaces of maximal regularity and on compactness properties of the functional spaces where the solution of the (nonlinear) problem is defined. With respect to former papers, we have two additional difficulties: the method is applied to a system of PDEs instead that to a single equation; moreover, in the second result, the Hamiltonian has polynomial gradient growth and therefore the control variable is defined in the whole R d . The latter point is solved by observing that, via an a priori gradient estimates from [19] for the solution to the HJB equation obtained via duality arguments, the behaviour of H only matters in a sufficiently large ball B(0, R). Hence, one can truncate the Hamiltonian, note then that the solution of (1.1) and the one of the corresponding truncated problem coincide and, finally, solve via policy iteration method the latter problem to obtain an approximation of the former one.
We also briefly discuss in Section 4 a treatment for the stationary counterpart of (1.1) introduced by Lasry and Lions [28], i.e. we implement a policy iteration algorithm for the ergodic MFG system where λ stands for the ergodic constant. As it is well-known, this system describes the long-time average asymptotics of solutions to (1.1) and it is widely analyzed in the literature, see e.g. [15,32] and the references therein. In this case, the convergence result for the policy iteration algorithm will be proved in Theorem 4.2.
Finally, we introduce suitable discretizations for both stationary and evolutive MFGs, and we employ the policy iteration method to numerically solve the corresponding discrete systems. We show the convergence of the policy iteration method for the discrete problem and we explain that it can be interpreted as a quasi-Newton method applied to the discrete MFG system. Some numerical tests in dimension one and two complete the presentation, including a performance comparison with a full Newton method.
The paper is organized as follows. In Section 2 we collect definitions and some technical lemmas necessary to prove the convergence results for the parabolic problem, to which is devoted Section 3. Section 4 describes the policy iteration method for the stationary ergodic MFG system. Section 5 comprehends the numerical approximation and the convergence of the policy iteration for the discrete problem, while in Section 6 we show some tests.

Notations and preliminary results
In this section we introduce some functional spaces and state some preliminary results we need in the forthcoming sections.
We denote by L r (T d ) the space of all measurable and periodic functions on R d belonging to L r loc (R d ) equipped with the norm u r = u L r ((0,1) d ) . For µ ∈ (0, 1), r ≥ 1, we denote with W µ,r (T d ) the standard fractional Sobolev spaces of periodic functions u ∈ L r (T d ) such that the semi-norm is finite, thus endowed with the natural norm · W µ,r (T d ) = · r + [·] W µ,r (T d ) . When µ > 1 is non-integer, one writes µ = k + σ, with k ∈ N and σ ∈ (0, 1) and W µ,r (T d ) comprehends those functions f ∈ W k,r (T d ) (the standard integer-order Sobolev space on the torus) whose distributional derivatives D α f , |α| = k, belong to W σ,r (T d ) previously defined. We refer the reader to [37] for a treatment of fractional spaces on the torus as well as to [18,30] for the definitions via real interpolation in Banach spaces, see also the references therein. For any r ≥ 1, we denote by W 2,1 r (Q) the space of functions u such that ∂ δ t D β x u ∈ L r (Q) for all multi-indices β and δ such that |β| + 2δ ≤ 2, endowed with the norm We recall that, by classical results in interpolation theory, the sharp space of initial (or terminal) traces of W 2,1 r (Q) is given by the fractional Sobolev class W 2− 2 r ,r (T d ), cf. Corollary 1.14 of [30]. To treat problems with divergence-type terms, we first define W 1,0 s (Q) as the space of functions such that the norm is finite. Then, we denote by H 1 s (Q) the space of those functions u ∈ W 1,0 s (Q) with ∂ t u ∈ (W 1,0 s (Q)) , equipped with the natural norm For α ∈ (0, 1), we denote the classical parabolic Hölder space C α, α 2 (Q) as the space of functions u ∈ C(Q) such that where dist(x, y) stands for the geodesic distance from x to y in T d . If s > d + 2, then H 1 s (Q) is continuously embedded onto C δ,δ/2 (Q) for some δ ∈ (0, 1), see Appendix A of [31].
We now recall some standard parabolic regularity results we will use in the sequel.
has a unique solution m ∈ H 1 2 (Q), which is a.e. non negative on Q.
Proof. The well-posedness and positivity of m are standard matter that can be found in [27], while integrability estimates, even under weaker assumptions on the drift, can be found in [9]. When m 0 ∈ W 1,s (T d ), the estimate in H 1 s can be obtained following the arguments in Proposition 2.2 of [19], although one can get the regularity result even when m 0 ∈ W 1−2/s,s (T d ) via maximal regularity.
admits a unique solution u ∈ W 2,1 r (Q) and it holds ), (2.1) where C depends on the norm of the coefficients as well as on r, d, T and remains bounded for bounded values of T . Furthermore, we have Du ∈ C α,α/2 for some α ∈ (0, 1). Finally, if the coefficients b, f belong to C α,α/2 (Q) and u T ∈ C 2+α (T d ), then

Convergence of the policy iteration method: the evolutive problem
In this section, we prove the convergence of the policy iteration method for the evolutive problem. Concerning the Hamiltonian, we focus on two different settings (i) H is differentiable, convex and globally Lipschitz continuous, i.e. there exists a constant β > 0 such that We define the Lagrangian L : R d → R as the Legendre transform of H, i.e. L(ν) = sup p∈R d {p · ν − H(p)}. In particular, it holds Note that, if (3.1) holds, one can write H(p) = sup |q|≤β {p · q − L(q)} and therefore in this case we may assume that the set of controls is bounded. Concerning the coupling cost, we consider bounded local couplings by assuming that F : Finally, we suppose that Our first result concerns the case of a globally Lipschitz Hamiltonian, and extends to MFG systems the works by Fleming and Puterman. (3.4) be in force. Then the sequence (u (k) , m (k) ), generated by the policy iteration algorithm converges, up to a subsequence, to a (strong) solution (u, m) ∈ W 2,1 then all the sequence converges to the unique solution of (1.1).
Proof. Due to assumption (3.1), we have Moreover, by Lemma 2.2 and (3.3), there exists a unique strong solution u (k) ∈ W 2,1 r (Q) such that with C depending only on β. Since r > d + 2, by parabolic Sobolev embeddings we have and, by the hypotheses on H, this implies that H(Du (k) ) is space-time Hölder continuous. Since the supremum in (3.6) is attained at D p H(Du (k) ), we have that In view of (3.7) and the continuous embedding of H 1 s (Q) in C δ, δ 2 (Q) for some δ ∈ (0, 1), then there exists a subsequence, still denoted by m (k) , which uniformly converges to a continuous function m. By (3.8), (3.9), there exists a subsequence, still denoted by u (k) , and a function u such that u (k) , Du (k) converge uniformly to u, Du Consider the subsequence (u (k) , m (k) ) obtained by first extracting a subsequence m (k) converging to m and then a subsequence u (k) converging to u. Then, passing to the limit in the weak formulation of (1.1) by means of the aforementioned convergences, one finds that the limit value (u, m) ∈ W 2,1 Finally, if assumption (3.5) holds, by a classical argument in [28], (see [18], Thm. 5.1 CG1 with s = 1 and observe that it is only necessary to have sufficiently smooth solutions to run the arguments), the system (1.1) has a unique solution (u, m). Hence, since any converging subsequence of the policy iteration method converges to the same limit, we get that all the sequence (u (k) , m (k) ) converges to the unique solution of (1.1).
In the proof of the previous result, we also obtain the uniform convergence of the policy q (k) = D p H(Du (k−1) ) to the optimal control for the limit problem q = D p H(Du). We now consider the case of a Hamiltonian of the form H(p) = |p| γ , for γ > 1. Our second main result is the following (3.4) be in force. Then, for R sufficiently large in (1.4), the sequence (u (k) , m (k) ), generated by the policy iteration algorithm converges, up to a subsequence, to a solution (u, m) ∈ W 2,1 r (Q) × H 1 s (Q) of (1.1). Moreover, if (3.5) holds, then all the sequence converges to the unique solution of (1.1). In this case, the main ingredient of the proof is an a priori gradient estimate recently obtained in [19] for strong solutions to Hamilton-Jacobi equations with H as in (3.2) and unbounded right-hand sides, which is stated in the next lemma for bounded source terms.
then there exists a constant C depending only on the data and c H ,c H , C H such that Proof. The proof of this result, based on the Bernstein gradient estimate and the nonlinear adjoint method, can be found in [19], Theorem 1.3, noting that, since f ∈ L ∞ (Q), then f ∈ L q for every q > 1 and that assumption (3.4) implies u T ∈ W 1,∞ (T d ) by standard Sobolev embeddings. We emphasize that the Bernstein procedure can be applied to strong solutions in W 2,1 r arguing as in [20] without need to differentiate the equation.
We are now in position to prove Theorem 3.2. In the proof, we first introduce a truncated Hamiltonian, which is globally Lipschitz continuous. This is due to the fact that the solution of the first equation in (1.1) satisfies the gradient bound in Lemma 3.3, which readily implies that the behaviour of H is important merely for p ∈ B(0, R), R ∼ Du L ∞ and therefore, for R large enough in (1.4), a solution of the truncated problem is also a solution of the original one. As a result, we can apply the convergence result proved in Theorem 3.1 to the MFG system with H given by (3.2).
Proof of Theorem 3.2. Owing to the bound (3.10) and following e.g. [3], one introduces a truncated Hamiltonian defined as (3.11) We observe that H S satisfies Given a solution (u S , m S ) of the system (3.11), repeating the same proof of [19], one first proves the bound Q ρ S min{|Du| γ , S γ } dxdt ≤ C with C independent of S, being ρ S the solution to the adjoint problem where ρ τ ∈ C ∞ (T d ) with ρ τ 1 = 1. Then, using the previous estimate, one gets the bound on Du S (·, τ ) L ∞ (T d ) , independent of S. So, if we take S large enough, we have that a solution of (3.11) is also a solution of (1.1). Finally, since H S is globally Lipschitz continuous, the convergence of the policy iteration method to a solution of (3.11), and therefore of (1.1), follows by Theorem 3.1.
Some comments on the previous result and its proof are in order. Remark 3.5. In the case of a regularizing coupling F and regular final data u T ∈ C 2+α (T d ), the convergence results for the policy iteration method in Theorems 3.1 and 3.2 hold in the space C 2+α,1+ α 2 (Q) × H 1 s (Q). Indeed, in this case, it is possible to exploit the Schauder-type estimate in Lemma 2.2 since Du ∈ C α,α/2 (Q) by parabolic Sobolev embeddings. Therefore the linear HJ equation can be regarded as a linear problem with space-time Hölder coefficients.

The ergodic problem
We consider the stationary MFG system For fixed R > 0 and given a bounded, measurable function q (0) such that q (0) L ∞ (T d ) ≤ R, a policy iteration method for (4.1) is given by We have the following convergence theorems for the stationary case.
holds, then all the sequence converges to the unique solution of (4.1).
Proof. The proof is similar to the one of the parabolic case and we do not give the details. To obtain the stationary analogue of Lemma 3.3 it is enough to adapt the proof in [19] considering the dual equation where ψ ∈ C ∞ (T d ), ψ 1 = 1 plays the same role of the initial datum of the parabolic adjoint problem in [19]. As alternative, one can use the integral Bernstein gradient estimate in [29], Theorem III.1 (see also [20]) as a counterpart of that in Lemma 3.3.

Numerical approximation
In this section, we present some details on the numerical approximation of the stationary/evolutive MFG systems, and we prove the convergence of the corresponding discrete policy iteration method in a simple setting. We consider the reference case of the Eikonal-diffusion HJB equation, namely we choose the Hamiltonian where V is a given bounded potential, and we focus on the stationary ergodic problem (4.1).
We define a grid G on T d , the vectors U, M approximating respectively u, m at the grid nodes, and the number Λ approximating the ergodic cost λ. Then, we approximate (4.1) by the following nonlinear problem on G, where, in order to avoid cumbersome notation, we use the symbol to denote suitable discretizations of the linear differential operators, evaluations of functions at the grid nodes, and quadrature rules for the integrals. Typical choices on uniform grids are centered second order finite differences for the discrete Laplacian, and simple rectangular quadrature rules for the integral terms, whereas the Hamiltonian and the divergence term in the FP equation are both computed via the Engquist-Osher numerical flux for conservation laws. For instance, in dimension d = 1, given a uniform discretization of T d with I nodes x i , for i = 0, . . . , I − 1, and space step h = 1/I, we have where the index operator [·] = {(· + I) mod I} accounts for the periodic boundary conditions. Moreover, using the notation (·) + = max {·, 0} and (·) − = min {·, 0}, we have We refer the reader to [1,4,13] and the references therein, for further details on the discretization of MFG systems.
It is well known that the two-sided gradient D is designed to approximate viscosity solutions to Hamilton-Jacobi equations, and to correctly catch, for first order equations, possible kinks in the solution U . It is worth noting that, at a formal level, D U acts in the scheme as a vector field with a number of components 2d, doubled with respect to dimension d of the problem. This suggests a natural way to approximate the policy q in (4.2)-(4.3) when building the policy iteration algorithm. Indeed, given an initial guess Q = (Q L , Q R ) : G → R 2d and using the notation Q ± = (Q + L , Q − R ), we set Q (0) = Q and we iterate on k ≥ 0 the following steps: (iii) Update the policy In the following theorem, we prove the convergence of the above discrete policy iteration, in the case of a quadratic Hamiltonian and in dimension one, but the argument can be extended with similar techniques to any dimension and more general Hamiltonians.
Theorem 5.1. For R in (5.2) sufficiently large, the sequence (U (k) , Λ (k) , M (k) ), generated by the policy iteration algorithm, converges, up to a subsequence, to a solution (U, Λ, M ) of (5.1). Moreover, if (3.5) holds, then all the sequence converges to the unique solution of (5.1).
Proof. We first show that the policy iteration algorithm is well defined. To this end, we begin with the discrete FP equation in (i), namely we consider the following matrix for a given Q = (Q L , Q R ) : G → R 2 such that |Q| ≤ R for some R > 0. We claim that A(Q) is singular (e.g., for Q = 0, we simply get the discrete Laplacian with periodic boundary conditions, which has a zero eigenvalue). More precisely, we show that dim(ker(A(Q))) = 1. For i = 0, . . . , I − 1, the non zero entries of A(Q) are the following By Fredholm alternative, dim(ker(A(Q))) = dim(ker(A T (Q))), where the transpose matrix has the following non zero entries which is exactly the linear operator appearing in the linearized HJ equation (ii) (conversely this duality is just exploited in [1], Remark 1 to define the discrete divergence operator div ). Since the Hamiltonian is continuous, non decreasing with respect to p 1 , non increasing with respect to p 2 and convex, it can be proved that the equation A T (Q)U = 0 admits only constant solutions (C, . . . , C) ∈ R |G| for C ∈ R, see step 1 of [1], Theorem 1. We conclude that dim(ker(A T (Q))) = 1 and the claim is proved. We now build a solution M ∈ R |G| of the discrete FP equation A(Q)M = 0 satisfying M ≥ 0 and M = 1. To this hand, we recall that |Q| ≤ R and we observe that A(Q) has a non negative diagonal and non positive off-diagonals, since by definition Q + L ≥ 0 and Q − R ≤ 0. This implies that, for µ > 0 sufficiently large, the matrix µI + A(Q) is a non singular M-matrix, hence the following iterations on s ≥ 0 are well defined (µI + A(Q))W (s+1) = µW (s) .
Moreover, if we choose W (0) ∈ R |G| \ {0} such that W (0) ≥ 0 and W (0) = 1, the same properties hold for every s ≥ 0, respectively due to the monotonicity of the M-matrix and by definition of A(Q). In particular, the sequence W (s) is bounded. Therefore it converges, up to a subsequence, to a vector M ≥ 0 satisfying M = 1 and Since dim(ker(A(Q))) = 1, it follows that M is the unique solution of the discrete FP equation satisfying M = 1. In particular, the whole sequence W (s) is convergent and it provides a constructive way to compute M . Surprisingly, we found out that the condition M = 1 is enough to prevent a change of sign in M , hence, a posteriori, the condition M ≥ 0 can be omitted. Summing up, we proved that, for every Q (k) : G → R 2 such that |Q (k) | ≤ R, there exists a unique solution to the problem in step (i) of the policy iteration.
We now consider the problem in step (ii). Using again the Hamiltonian (5.3) with Q (k) in place of Q, it can be proved (as in step 1 of [1], Thm. 1) that the problem admits a unique solution (U (k) , Λ (k) ) satisfying the normalization condition U (k) = 0. Moreover the following estimates hold for two positive constants C 1 , C 2 depending on R, max G |V | and max G |F |. Hence, also the sequence Therefore, up to a subsequence, we find that, as k → ∞, (U (k) , Λ (k) , M (k) ) converges to (U, Λ, M ) ∈ R |G| × R × R |G| and Q (k) converges to Q ∈ R |G|×|G| . Moreover, passing to the limit in (i)-(iii), (U, Λ, M ) satisfies By [1], Theorem 3, (5.1) has a solution and, since the problem is discrete, it trivially satisfies for some constant C, depending on h. Hence, for R sufficiently large, solutions to (5.4) are also solutions to (5.1) and therefore, for such R, we get the convergence of the policy iteration method. Moreover, if (3.5) holds, then the solution of (5.1) is unique (see [1], . 3) and therefore we get the convergence of all the sequence.
Remark 5.2. As observed, the estimate (5.5) is not uniform in h. But, since we are studying the convergence of the policy iteration method for h fixed, this is not an issue at this level. Estimates on the discrete gradient, uniform in h, are provided in [1,2], also for more general Hamiltonians. They are important to study the convergence of the discrete problem to the continuous one, but we do not consider this point here.
For the sake of comparison, we consider here the direct method for stationary MFGs introduced in [13], which is based on a Newton-like algorithm applied to the full system (5.1), rewritten as a multidimensional root-finding problem. More precisely, performing a linearization of (5.1) with respect to (U, M, Λ), along a direction (W U , W M , W Λ ) and starting from an initial guess (U (0) , M (0) , Λ (0) ), we get the following Newton iterations for k ≥ 0, with updates where, denoting by |G| the number of nodes of G, the map F : R 2|G|+1 → R 2|G|+2 is defined as while the Jacobian matrix J is given by with 0 = (0, . . . , 0) T ∈ R |G| and 1 = (1, . . . , 1) T ∈ R |G| . Note that, for each k ≥ 0, the above linear system consists in 2|G| + 2 equations in the 2|G| + 1 unknowns (W U , W M , W Λ ), and its solution is meant in a least-squares sense, see [13] for further details. By rewriting (W U , W M , W Λ ) in terms of successive iterations, we readily end up with Since both U (k) and M (k) are expected to converge, we can neglect, for k large, the two terms F (M (k) )(M (k+1) − M (k) ) and div (M (k) D (U (k+1) − U (k) )). This completely decouples the above system, and yields exactly the policy iteration algorithm by setting Q (k) = D U (k) at each iteration. Thus, we can reinterpret the policy iteration as a quasi-Newton method for the system (5.1), by dropping the two corresponding off-diagonal blocks in the Jacobian matrix J .

Numerical simulations
Here, we provide some details on the implementation of the policy iteration method. Then we present a comparison with the direct Newton method (5.6) for a stationary MFG system in dimension one, and a test in dimension two for the evolutive case.
Concerning the stationary case, at each iteration k, the solution of the discrete FP equation in step (i) is obtained by the M-matrix regularization discussed in Theorem 5.1: starting from an initial guess W (0) ∈ R G \ {0}, with W (0) ≥ 0 and W (0) = 1, we solve iteratively on s ≥ 0 (µI + A(Q (k) ))W (s+1) = µW (s) , namely a sequence of linear systems of size |G| × |G|. Note that this introduces an additional (inner) iteration in the policy iteration algorithm. Moreover, by rewriting each linear system in the form we can reinterpret the regularization as an implicit gradient descent scheme with step 1 µ for finding the zeros of A(Q (k) ), via minimization of its associated quadratic form. It is clear that, as we increase µ to recover the M-matrix property, we dramatically loose the advantage of an implicit scheme, slowing down the convergence of W (s) . In practice we can tune the parameter µ for the specific test, and perform only a fixed number of inner iterations.
For the remaining steps of the policy iteration algorithm, we observe that step (ii) corresponds to the solution of a square linear system of size (|G| + 1) × (|G| + 1) in the unknowns (U (k) , Λ (k) ), while the policy update in step (iii) is explicit due to the particular choice for the Hamiltonian. In the general case, according to (4.4), a point-wise optimization on G is needed to obtain the new policy.
On the other hand, each iteration in the direct Newton method (5.6) requires the solution of a system of size (2|G| + 2) × (2|G| + 1) in a least-squares sense. Both algorithms are implemented in C language, employing the free library SuiteSparseQR [21] for solving the linear systems via QR factorization. To check convergence, given a tolerance τ > 0, we rely on the 2-norm of the residual for the discrete nonlinear system (5.7), requiring F(U (k) , M (k) , Λ (k) ) 2 < τ .
In the following test, we set the problem in dimension d = 1, with τ = 10 −8 , ε = 0.3, V (x) = sin(2πx) + cos(4πx) and F (m) = m 2 . In particular, the choice of the coupling cost satisfies the monotonicity assumption (3.5), ensuring uniqueness of solutions for the MFG system. Moreover, we set the initial guess for the Newton method as U (0) ≡ 0, M (0) ≡ 1 on G and Λ (0) = 0, while we take the initial policy Q (0) ≡ (0, 0) on G for the policy iteration algorithm. Finally, for the inner M-matrix iterations, we set µ = 10 −3 and s = 1, with W (0) ≡ 1 for k = 0 and W (0) = M (k−1) for k ≥ 1. Figure 1 shows the solution computed by the policy iteration algorithm on a grid with |G| = 200 nodes, while in Figure 2 we compare the performace of the two methods. More precisely, in Figure 2a, we show the residuals of the two methods, against the number of iterations needed to reach the given tolerance τ . The Newton method converges in just 5 iterations, while the policy iteration requires 24 iterations. Similarly, in Figure 2b-c-d we show the differences between the solutions of the two methods in the discrete L 2 norm, against the number of iterations. Due to the particular choice of the initial guess, at the first iteration the two methods compute the same solution, but the policy iteration algorithm requires more iterations to reach the same accuracy for the residual. This is clearly expected, since the Newton method employs the descent direction associated to the full Jacobian matrix J . Nevertheless, as reported in Table 1, the policy iteration exhibits a better performance as the number of grid nodes increases, due to the reduced size of the corresponding linear systems (see the averaged CPU times per iteration). We must observe that the comparison is not truly fair, since the update step for the policy iteration is explicit in this example, with a negligible computational cost. However, in the general case, we expect that the relevant speed-up of the proposed algorithm on large grids can compensate the efforts for the optimization process (4.4), since it is a point-wise procedure that can be completely parallelized. Now, let us consider the evolutive MFG system (1.1), again in the special case of the Eikonal-diffusion HJB equation, but in dimension d = 2. Spatial discretization is performed in both dimensions as in the one dimensional case, while, for time discretization, we employ an implicit Euler method for both the time-forward  To this end, we introduce a uniform grid on the interval [0, T ] with N + 1 nodes t n = n dt, for n = 0, . . . , N , and time step dt = T /N . Then, we denote by U n , M n and Q n the vectors on G approximating respectively the solution and the policy at time t n . In particular, we set on G the initial condition M 0 = m 0 (·) and the final condition U N = u T (·). The policy iteration algorithm for the fully discretized system is the following: given an initial guess Q Note that each iteration of the algorithm now requires the solution of 2N linear systems of size |G| × |G|.
In the following test, we choose a number of nodes I = 50 for each space dimension and N = 100 nodes in time, corresponding to 200 linear systems of size 2500 × 2500 per iteration. We set the final time T = 1, the diffusion coefficient ε = 0.3, the coupling cost F (m) = m 2 and the potential V (x 1 , x 2 ) = −| sin(2πx 1 ) sin(2πx 2 )|. Moreover, to check convergence, we rely on the discrete L 2 squared distance between policies at successive iterations, i.e. we stop the algorithm when max n |Q (k+1) n − Q (k) n | 2 < τ , setting the tolerance τ = 10 −8 . Finally, we take the initial policy Q  namely two Gaussian with opposite signs centered at the point ( 1 2 , 1 2 ), with C > 0 such that T 2 m 0 (x)dx = 1. The algorithm requires 58 iterations to reach convergence up to τ , with an averaged CPU time per iteration of 7.3 seconds, and a total CPU time of 423 seconds. In Figure 3, we report some relevant frames of the time evolution, by plotting, for n fixed, the solution density M n in gray scales, and superimposing the optimal dynamics for the FP equation, which is obtained by merging the two-sided components of Q n , namely (Q 1 n,L + Q 1 n,R , Q 2 n,L + Q 2 n,R ). We remark that, by definition, the absolute minimum of the potential V is achieved at the points ( 1 4 , 1 4 ), ( 3 4 , 1 4 ), ( 1 4 , 3 4 ), ( 3 4 , 3 4 ). We observe that the optimal dynamics readily splits the density symmetrically in four parts, pushing them to concentrate around these minimizers, while, in the final part of the time interval [0, T ], it forces the density to merge again and concentrate exactly around the point (1/2, 1/2) (i.e. the absolute minimizer of u T ), in order to to satisfy the final condition for the HJB equation. This configuration corresponds to the so called turnpike phenomenon [32]. Roughly speaking, it turns out that the solution of the evolutive problem corresponds to approach the solution of the corresponding stationary ergodic problem, standing on this equilibrium as long as possible before moving again towards u T .