EXTENDED MCKEAN-VLASOV OPTIMAL STOCHASTIC CONTROL APPLIED TO SMART GRID MANAGEMENT ∗ , ∗∗

. We study the mathematical modeling of the energy management system of a smart grid, related to a aggregated consumer equipped with renewable energy production (PV panels e


Introduction
The energy sector is currently facing major changes because of the raising concern about climate change, the search for energy-efficiency and the need to reduce carbon footprint. In particular, the share of renewable energy (RE for short) production has increased in most industrialized countries over the last few years, and further effort has to be done to limit the temperature increase well below 2 • C by 2100, as targeted by the 2015 Paris agreement. However, even if these renewable energies allow a huge reduction of carbon footprint during the energy production phase, they raise a major issue: the amount of energy produced is intermittent and uncertain, as a main difference with more conventional energy production units (coal/gas-fired units, or nuclear power plants).
Since the electricity production has to meet consumption at all spatial and time scales, the load balancing operations become harder in this uncertain context, this leads to higher operating costs for the whole electricity system; furthermore, it sometimes lead to ecologically catastrophic solutions such as the use of coal units to compensate the deficit of clean energy production. See [19] for an overview on how to integrate renewables in electricity markets. Therefore, a major challenge is to smooth the electricity consumption by better predicting RE production and better managing the energy system. We address the latter in the context of a consumer equipped with its own RE production (e.g. PV panels), and formalize the problem as a stochastic control problem of McKean-Vlasov (MKV for short) type that we solve theoretically and numerically. More specifically, we study a decentralized mechanism aimed at reducing the variability of residual consumption on the electricity network; thus, operating the network could be done at lower costs and with a lower carbon footprint. This mechanism is a setting where a consumer has to commit in advance (say T = one day-ahead, to match the usual working of day-ahead markets) to a predefined load profile and then, he has to command optimally and dynamically his system according to his stochastic consumption/production. Both the optimal load profile and the optimal control are the outputs of the stochastic control problem described below. The above model is a simplified prototype of smart grid (as defined by the European Commission 1 ): our so-called consumer is considered as an association of small consumers, with possibly individual RE production and individual storage facilities, that we aggregate and consider as a whole.
We take the point of view of a consumer supplied in energy by its own intermittent sources (PV panels for instance) and by the electrical public grid. We consider the situation where the non-flexible consumption and the intermittent production are exogenous and can not be predicted perfectly: a stochastic model should be used for both of them. See [4] about a recent methodology for deriving a probabilistic forecast for solar irradiance (and thus PV production). To smooth his residual consumption, the consumer can take advantage of storage facilities (for instance conventional batteries, electrical vehicle batteries, heating network, flywheel etc) which we consider as a whole. At time t, his control is denoted by u t , the level of storage is represented by X u t , its net consumption on the electrical public grid is P grid,u t . The (deterministic) committed profile load is the curve (P grid,com. t : 0 ≤ t ≤ T ). Optimal control of a single micro-grid has already been considered in the literature, without the optimal committed load profile. A popular yet without theoretical optimality guarantee is Model Predictive Control [26]. In discrete-time settings, Stochastic Dynamic Programming [18,29] and Stochastic Dual Dynamic Programming [20] are popular approaches to get theoretical optimality guarantees. Long-term aging of the battery equipping a micro-grid is taken into account by two time-scales time decomposition in [12]. Continuous time optimal control problems are considered in [15] in a deterministic setting, and in [16] in a stochastic environment. By jointly optimizing with the profile P grid,com. , we change the nature of the stochastic control problem, compared to these works. We shall consider general filtrations with processes possibly exhibiting jumps, to account for sudden variations of solar irradiance or consumption for instance.
In short, in a simplified setting, the optimization criterion takes the form of the following cost functional minimized over admissible controls (u t ) t . The first term in the above cost functional is the cost of buying electricity to the electrical public grid, at a price C t which can be random. The second term in the cost functional accounts for a penalization of the use of the storage (e.g. aging cost in the case of a battery). The third and fifth terms are penalization of the deviation from the desired state of charge of the storage, which we define as 1 2 by convention. The fourth term is a penalization (through a convex loss function l 1 ) of the deviation of the power supplied by the electrical public grid P grid,u from the commitment profile P grid,com. . The latter is chosen in such a way that the variability of its residual consumption on the electrical public grid is minimized in a consistent way. On the side of the electricity supplier on the electrical public grid, since the consumption is close to a deterministic profile chosen in advance, the operating costs are lower and the use of fossil-fueled generation units can be likely avoided. We shall highlight that presumably, good loss functions l 1 should penalize more the consumption exceedance than the consumption deficit: indeed, exceedance possibly requires the use of extra production units with high carbon footprint, this is clearly to discard as often as possible. A typical example of loss function would be: (1.1) see Figure 1 for an example with α + = 1, α = 1. This choice is somehow related to generalized risk measures accounting for both left and right tails of the distribution, such as expectiles [6]. If P grid,com. were exogenously given, the problem would take the form of a standard stochastic control problem. In our model, it is endogenous and its optimal value at time t is obtained by solving the following stochastic optimization problem: In general, only a numerical solution of this stochastic optimization problem is available. In addition, there is one such problem for each t ∈ [0, T ], hence the set of such problems is continuous thus uncountable. Moreover, these stochastic optimization problems depend on u, which is in turn the optimal solution of a stochastic control problem parameterized by the solutions of the stochastic optimization problems P grid,com. . Hence, one could try to employ a fixed point iterative procedure, consisting in: t for all t ∈ [0, T ], solving a stochastic control problem, to find and update u. -Given u, solving the solutions of the infinite set of stochastic optimization problems (1.2) to update P grid,com. , -Then start over again until convergence is reached.
For simplification, we employ a different approach by setting: This amounts to set P grid,com. heuristically to a consistent (though no longer optimal in general) value, depending endogenously on u, which allows to obtain a different but close optimization problem for finding good values of u and P grid,com. . This choice is inspired by the quadratic case for l 1 : indeed, solving (1.2) for l 1 : x → x 2 leads to (1.3), as the reader can easily check. Doing so, we obtain a stochastic control problem of MKV type with scalar interactions, see later. As mentioned earlier, with the choice of loss function (1.1) with α + > 0, the choice P grid,com.
is no longer an optimal solution of (1.2) in general. However, we have the following estimation: .
In particular, we have: Hence, choosing P grid,com.
is a proxy for solving (1.2) with l 1 given by (1.1), which remains good as long as α + is small compared to α. Moreover, note that the presence of the expectation of the controlled process P grid,u in the criteria arises directly from the will to control its probability distribution (by choosing P grid,com. as in (1.3)). Therefore, the underlying reason for using a McKean-Vlasov formulation here is conceptually different from another usual application of such models: the asymptotic behavior of a large number of actors interacting, see for instance [2].
Another point to stress is the need to account for jumps in the production/consumption dynamics -i.e. the consumption might have discontinuities as appliances/devices are switched-on/off, the power production by a solar panel might suddenly drop to zero if a cloud hides the sun. To summarize, in order to fit application needs, we shall consider non quadratic loss functions and a probabilistic setting of general filtration (allowing jumps).
We embed the previous example in a more general setting: (1.5) The functions l, g, ψ, k, φ depend on time, control, state variable and on the ambient randomness ω, precise assumptions are given later. Note that the control only appears in the drift of the state variable: we could also have considered a more general model X u t = x + t 0 φ(s, ω, u s , X u s )ds + Z t where Z is càdlàg semi-martingale (independent of u), but actually, this extended model is equivalent to the current one by setting X u t = X u t − Z t as a new state variable and by adjusting the (already random) coefficients. Besides, note that the above dynamics for X u is compatible with usual battery dynamics [17], like for example models of the form d State of charge dt = constant · Battery power. (1.6) The problem (1.5) is of McKean-Vlasov (MKV) type since the distribution of (u, X u ) enters into the functional cost. But since this is through generalized moments via the functions g and k, the interactions are so-called scalar, which avoids to use the notion of derivatives with respect to probability measures, while maintaining some interesting flexibility. For a full account on control of Stochastic Differential Equations (SDE for short) of MKV type and the link with Mean Field Games, see the recent books [11] and in particular Chapter 6 of Volume I. However, in the above reference, only the distribution of SDE enters in the coefficients, not that of the control as in our setting. We refer to this more general setting as extended MKV stochastic optimal control. Studies in such an extended framework are quite unusual in the literature. In [23], the general discrete case is studied. In [31] and very recently in [5], both the probability distributions of the state and control variables appear in the dynamic of the state and the cost function, but only through their first and second order moments (Linear-Quadratic problems, LQ for short). In [24], the cost functional and the dynamic depend both on the joint probability distribution of the state and control variables, but the authors consider closed-loop controls, which allows them to consider the probability distribution of the state variable only: in our setting, we do not make any Markovian assumptions for the characterization of the optimal control. During the preparation of this work (started in 2016), we have been aware of the recent preprint [1] which deals also with the extended MKV stochastic optimal control, with fully non-linear interaction, Markovian dynamics, in the case of a Brownian filtration.
As a difference with the previous references, we do not restrict ourselves to the LQ setting, we deal with extended MKV stochastic optimal control, without Markovian assumptions, and we do not assume that the underlying filtration is Brownian (allowing jump processes). Besides, apart "expected" results about existence/uniqueness, we provide some numerical approximations by using some perturbations analysis around the LQ case. We shall insist that MKV stochastic control is a very recent field and numerical methods are still in their infancy; see [3] for a scheme based on tree methods for solving some MKV Forward-Backward SDE (FBSDE for short) that characterize optimal stochastic controls. Our perturbation approach is different from theirs. As a consequence, we design an effective numerical scheme to address the problem raised by the optimal management of storage facilities able to reduce the variability of residual electricity consumption on the electrical public grid, in the context of uncertain production/consumption of an aggregated consumer. This presumably opens the door to a wider use of these approaches in real smart grid applications. Now let us go into the details of mathematical/computational arguments. For characterizing the optimal control, we follow a quite standard methodology (see e.g. [10]), although details are quite different. This is made in three steps: necessary first order conditions, which become sufficient under additional convexity assumptions, existence of solutions to the first order equations. The derivation of the first order conditions follows the stochastic Pontryagin principle, see for instance [7,10,22]. This is achieved for general running and terminal cost functions. In particular, to account for jumps in the production/consumption dynamics, our mathematical analysis is performed in the context of general filtration. It gives rise to an optimality system (see Thms. 2.2 and 2.3), composed of a forward degenerate SDE and of a backward SDE (the adjoint equation), with possibly discontinuous martingale term, and an optimality condition linking the values and probability laws of the state and control variables with the adjoint variable.
In Section 2.4, we establish that this system of equations has a unique solution under some regularity conditions, an invertibility assumption and for small time horizon T (see Thm. 2.4). The condition on T is quite explicit from the proof, which makes the verification on practical examples easy. Here the proof has to be specific and restricted to small time because of non-Brownian filtration and of non-Markovian dynamics: indeed, we can not invoke neither a drift-monotony condition, as in [21], nor a non-degeneracy condition as in [13]. In Section 2.5, we discuss how the unique solution to the first order condition may or may not be the optimal solution; we provide a counter-example (Prop. 2.6), which is interesting for its own, we believe that this kind of situation is already known but we could not find an appropriate reference.
Then we show in Section 2.7 that the necessary optimality conditions established in Theorem 2.3 become sufficient if we assume some convexity conditions on the Hamiltonian and the terminal cost. We shall highlight that the usual Hamiltonian [10] (when the distribution of the control is not optimized) can not match with our framework; alternatively, we define a version in expectation (Lem. 2.9). The final optimality result is stated in Theorem 2.10.
In Section 3, we exemplify our study to the toy model presented in introduction, motivated by practical applications to smart grid management. To get a tractable and effective solution, we perform a perturbation approach around the LQ case. We establish error bounds and as an approximation, we select the expansion with the second order error terms. Numerical experiments illustrate the performance and accuracy of the method, as well the behavior on the optimally controlled system. Long and technical proofs are postponed to Section 4 in order to smooth the reading.
We list the most common notations used in all this work. Numbers, vectors, matrices. R, N, N * denote respectively the set of real numbers, integers and positive integers. The notation |x| stands for the Euclidean norm of a vector x, without further reference to its dimension. For a given matrix A ∈ R p ⊗ R d , A refers to its transpose. Its norm is that induced by the Euclidean norm, i.e. |A| := sup x∈R d ,|x|=1 |Ax|. Recall that |A | = |A|. For p ∈ N * , Id p stands for the identity matrix of size p × p.
Functions, derivatives. When a function (or a process) ψ depends on time, we write indifferently ψ t (z) or ψ(t, z) for the value of ψ at time t, where z represents all other arguments of ψ.
For a smooth function g : R q → R p , g x represents the Jacobian matrix of g with respect to x, i.e. the matrix (∂ xj g i ) i,j ∈ R p ⊗ R q . However, a subscript x t refers to the value of a process x at time t (and not to a partial derivative with respect to t). We also introduce ∇ Probability. To model the random uncertainty on the time interval [0, T ] (T > 0 fixed), we consider a complete filtered probability space (Ω, F, {F t } 0≤t≤T , P), we assume that the filtration {F t } 0≤t≤T is rightcontinuous, augmented with the P-null sets. For a vector/matrix-valued random variable V , its conditional expectation with respect to the sigma-field F t is denoted by E t [Z] = E [Z|F t ]. Denote by P the σ-field of predictable sets of [0, T ] × Ω.
All the quantities impacted by the control u are upper-indexed by u, like Z u for instance. As usually, càdlàg processes stand for processes that are right continuous with left-hand limits. All the martingales are considered with their càdlàg modifications.
Since the arrival space R k will be unimportant, we will skip the reference to it in the notation and write the related norms as Let p, q ≥ 1. The Banach space of R k -valued random variables X such that E [|X| p ] < +∞ is denoted by L p (Ω, R k ), or simply L p Ω ; the associated norm is Here again we will omit the reference to R k , which will be clear from the context. The associated norm is We shall most often consider p = q = 2.

Stochastic control and MKV-FBSDEs
The aim is to analyze the control problem, about minimizing (1.5). We first discuss the smart grid setting and the class of admissible controls u; second we derive the first-order condition (Pontryagin principle) which writes as a MKV-FBSDE; third we derive sufficient conditions for the existence and uniqueness to the above; fourth in the absence of convexity conditions we provide a counter-example to optimality; last, with suitable convexity assumptions we establish that the MKV-FBSDE solution characterizes the optimal control.

Stochastic model and smart grid framework
As explained in introduction, (1.5) may describe the optimal energy management of an aggregated consumer, with storage facilities (e.g. battery), with his own RE production (e.g. building equipped with solar panel), with a connection to the electrical public grid. The management horizon T is typically short, e.g. 24 hours for reasons explained in introduction.
The control is made through a R d -valued vector process u = (u t : 0 ≤ t ≤ T ), d ∈ N * . We consider u as a F t -predictable process in H 2,2 P : the intuition behind it is that decisions occurring at time t have to be made in accordance with the information available up to this time. This is coherent with the smart grid application. In particular, there has to be a slight delay between sudden events and the decisions taken by the controller, whence the predictability assumption.
The dynamics of the system are represented by a R p -valued state variable, denoted by X, which satisfies the following ODE This state variable includes information about all controlled processes in the smart grid application, like the state of charge of a battery (see (1.6) or [2]), the temperature of a water heater, as in [27] or [28], which dynamics can be modeled by first-order ordinary differential equations. Note that, by an appropriate transformation of the cost functional and a change of variables, our results remain valid if the state process X u takes the form X u = (X u,c , X nc ) with X u,c having a dynamic of the form: and X nc being a general uncontrolled semi-martingale (i.e., independent from u). Indeed, by an appropriate transformation of the (random) running and terminal cost, it is possible to consider only the controllable state X u,c as state variable. We are thus free to use very general models for exogenous stochastic processes impacting the system. For instance, in our energy-related application, such exogenous stochastic processes X nc include the electricity spot price, the local electricity production by photo-voltaic panels, the inflexible electricity consumption of a household, or the impact of (exogenous) random water consumption on the temperature of hot water tank. The cost functional is described by J (u), given in (1.5). In the smart grid application, Markovian-type costs would take the form, for instance, l(t, ω, u, x,ḡ) =l(t, Z t (ω), u, x,ḡ) where Z would represent a multidimensional stochastic factor modeling the evolution of the exogenous uncontrolled variables (weather, consumption. . . ), but we also allow non Markovian models. In the sequel, we omit ω when we write terms inside J (u) and X u , since it is now clear that we deal with random coefficients. All in all, the optimal control problem we study is Last, we summarize the coefficients from the toy example described in page 3.
Example 2.1 (Smart grid toy example). Let P load be the difference between the instantaneous consumer local consumption and his RE production: we assume this is a process in × Ω, R) corresponds to the power supplied by the battery, while the state X u corresponds to the normalized state of charge of the battery which dynamics is linear with respect to the control u, see [15]: If P grid,u is the power supplied by the electrical public grid, the power balance imposes that Then set d = p = 1 and The time-dependent coefficients µ t and ν t give the flexibility to include hourly effect in the management. We recall that the convex loss function l 1 may take the form (1.1). Considering the left-hand limit t− in the above definitions is a technicality to fulfill the following assumptions.

Standing assumptions
From now on, we assume the following hypotheses hold. When we refer to a constant, we mean a finite deterministic constant.
s., for some constant C and some random process C g(·, ·, 0, 0) ∈ H 2,1 , g is continuously differentiable in (u, x) and there exist constants C g,u and C g,x such that

k) and the growth condition
holds for any (x,k) ∈ R p × R r a.s., for some constant C and some random variable C It is easy to check these conditions in Example 2.1.
As a consequence of (H.φ), the dynamics of X u in (2.1) writes as a ODE with Lipschitz-continuous stochastic coefficient: the uniqueness and existence stem from the Cauchy existence theorem for ODE, applied ω by ω. In addition, we easily show where the second inequality comes from Gronwall's lemma. Then one directly shows that, since u and φ(·, 0, 0) are in H 2,2 , X u is in H ∞,2 ⊂ H 2,2 . Then, a careful inspection of the assumptions (H.l)-(H.g)-(H.ψ)-(H.k) shows that it implies that the cost J (u) is finite.

Necessary condition for optimality
For admissible controls u and v, we now provide a representation of the derivativė using an adjoint process Y u .
ThenL u is invertible and its inverse satisfies (see Lem. 4.1) Define also L u := ((L u ) −1 ) . The following R p -valued process Y u is well defined as a càdlàg process in H ∞,2 : Besides, for any u, v ∈ H 2,2 P , the directional derivativeJ (u, v) exists and is given bẏ The proof is postponed to Section 4.1. At the optimal control u (whenever it exists), the above derivativė J (u, v) must be 0, in any direction v ∈ H 2,2 P . Take for instance v given by: 2) (H.ψ) holds and there exist constants C ψ * , where * and stand for x ork such that: Observe again that this set of conditions is consistent with Example 2.1. We now aim at establishing the solvability of the system composed of (2.1), (2.5) and (2.6). We are going to show that this system has a unique solution for a small enough time horizon T , hence the existence and uniqueness of a solution to the optimal control problem, under the sufficient conditions of Theorem 2.10.
is well defined and Lipschitz continuous. If moreover, then for T small enough, Θ is a contraction and therefore has a unique fixed point u . In that case, there exists a unique u ∈ H 2,2 P satisfying (2.1)-(2.5)-(2.6) and u = u . The proof is available in Section 4.2. Regarding the proof of a fixed point when the time interval [0, T ] is arbitrary large, observe that, as a difference with [21] and [13] for instance, in our setting we can rely on a monotony condition of the drifts, nor a non-degeneracy condition. This is why we shall restrict to small time condition.

Existence and uniqueness of critical point do not necessarily imply existence of a minimum
If there exists a unique solution to the first order optimality condition (unique critical point), and under other assumptions like continuity, growth properties, it is tempting to conclude that this point is a minimum. However, this is not necessarily the case in infinite dimension. This section aims at clarifying this fact by providing an example 2 where continuity, coercivity and unique critical point are ensured, but without existence of minimum. Therefore, extra conditions are necessary to get the existence of a minimum, see later the discussion in Section 2.6. Proposition 2.6. Set F : Then F satisfies the following properties: -Continuity: F is continuous -Coercivity: F (u) tends to +∞ when u L 2 1 tends to +∞ -Existence and uniqueness of critical point: F is Gateaux-differentiable and has a unique critical point.
However, F does not have a minimum.
The proof is postponed to Section 4.3. The function F defined in this example cannot be quasi-convex (and a fortiori F cannot be convex), since it would then have a minimum, as stated in the next section.

Existence of an optimal control
We now give sufficient conditions for the existence of an optimal control, i.e. existence of a minimizer of J . In such a favorable case, and if the necessary optimality conditions (2.1)-(2.4)-(2.6) have a unique solution u * , then u * is the unique minimum of J . We start with a general result.
Theorem 2.7. Let E be a reflexive Banach space, let F : E → R be a lower semi-continuous, quasi-convex function which satisfies the coercivity condition lim u E →+∞ F (u) = +∞. Then F has a minimum on E.
Proof. We adapt the arguments of Corollary 3.23, pp. 71 in [9], where the operator considered is assumed to be continuous and convex. However, the hypothesis can be relaxed to lower semi-continuity and quasi-convexity of the function F , since we only need closedness and convexity of the sub-level sets Γ Let us add a few comments. In the finite dimensional case, any lower semi-continuous and coercive function has a minimum (since any closed and bounded set is compact). In the infinite dimensional case, the example in Section 2.5 illustrates that this may be not the case without the quasi-convexity assumption. Besides, note that without the coercivity condition, the existence of minimum may not hold, even in finite dimension (take E = R and F (x) = exp(x)). Moreover, without the lower semi-continuity of F , the result may fail as well (take F : (−∞, 0] → R defined by F (x) = |x|1 x<0 + 1 x=0 , which is coercive and convex).
Apply the previous result with E = H 2,2 and F = J : E is an Hilbert space, thus a reflexive Banach space. The functional J is continuous, hence lower semi-continuous. Therefore, we have proved the following.
2 Such examples might exist in the literature, but we have not been able to find a reference for this.
-The mapping I : is convex.
Lemma 2.9. Under (Conv), J is convex. If furthermore, I is strictly convex inũ, or I is strictly convex in X and φ u has full column rank (which implies p ≥ d) for almost every t in [0, T ], then J is strictly convex.
Proof. Under the assumption on φ, u → X u is affine-linear. This yields the first result using the fact that a composition of an affine-linear function by a convex function is convex. If I is strictly convex in u then so is J .
If φ u has full column rank, u → X u is an affine-linear injection and if besides I is strictly convex in X, we get that J is strictly convex.
Let us emphasize the difference with usual stochastic maximum principle (when distributions do not enter in the cost functions). In that case, i.e. without the dependence w.r.t. E [g (t,ũ t , X t )] of the running cost and w.r.t. E [k(X)] of the terminal cost, the sufficient optimality condition is the affine-linearity in (u, X) of φ, the point-wise convexity in (u, x) of (t, u, x) → l(t, u, x), for any t and the point-wise convexity of ψ in x.
In the current MKV setting, it would be tempting to require: to be convex in (u, X) ∈ L 2 Ω × L 2 Ω for any t and to be convex in X in L 2 Ω . However, even for the simple linear-quadratic case with d = p = q = 1, i.e.
with parameter κ > 0, this fails to be true. Indeed, denoting ζ(u) = ξ(t, u, x), we get: Now if u 1 is a Bernoulli random variable with parameter 1 2 , and u 2 = −u 1 , then on the set {ω : u 1 (ω) = u 2 (ω) = 0} of positive probability, the above equals κ 4 > 0, which violates the convexity condition for these ω. On the contrary, E ζ( u1+u2 2 ) − ζ(u1)+ζ(u2) 2 ≤ 0 for κ ≥ 0, and it is easy to see that E [ζ(u)] is convex in u, for such κ. This discussion clarifies better why the correct convexity condition for the integrated Hamiltonian I or the point-wise one H is in expectation and not ω-wise, as stated in (Conv).
We now summarize all the results for having existence and uniqueness of an optimal stochastic control. This is one of the main results of this section. Proof. 1. This is a direct consequence of Theorem 2.8 and Lemma 2.9. 2. If (u , X u , Y u ) satisfies (2.1)-(2.5)-(2.6), thenJ (u , v) = 0 for any v ∈ H 2,2 P according to Theorem 2.2. Besides, under our assumptions, J is convex and therefore, for all v ∈ H 2,2 P and t ∈ (0, 1], By taking the limit when t → 0, we obtain J (v) − J (u ) ≥J (u , v − u ) = 0, hence the optimality of u . The direct implication ⇒ has been established in Theorem 2.3. linear, that l : (t, ω, u, x,ḡ) → l(t, ω, u, x,ḡ) is jointly convex with respect to (u, x,ḡ) and strongly convex with respect to u, dP ⊗ dt-a.e. Then J is strongly convex, hence coercive.
Remark 2.12. The strong convexity of l as defined in its second argument x and full column rank of φ u are not sufficient assumptions to get the coercivity of J , nor its strong convexity. Indeed, define J by: Then we have u (ε) s ds remains bounded uniformly in ε, and hence J (u (ε) ) is uniformly bounded in ε. Therefore, J is not coercive. As a consequence, it cannot be strongly convex either, by similar arguments as in the proof of Proposition 2.11.

Model/context
For simplicity, we assume one-dimensional processes (p = q = r = 1), but the results can be easily extended to any dimension, since the arguments are based on the solution of Linear-Quadratic FBSDE, which are well known (see [30]). Let us consider the following toy problem: This model is the same as the one presented in the introduction and has the same interpretation. We consider the following hypothesis: (Toy) -The parameters µ, ν, γ are deterministic and satisfy µ > 0, ν ≥ 0, γ ≥ 0.
-The mapping l is deterministic, convex, continuously differentiable with the growth condition |l (x)| ≤ C l,x (1 + |x|) for all x, for some constant C l,x > 0. -P load ∈ H 2,2 , C ∈ H 2,2 are F-adapted and càdlàg.  Under assumptions (Toy), there exists a unique optimal control u ∈ H 2,2 P . Besides, there exist unique processes X u ∈ H ∞,2 and Y u ∈ H ∞,2 such that (u, X u , Y u ) satisfies the following McKean-Vlasov Forward Backward SDE: (3.1) Although we can derive specific results for the control problem under assumption (Toy) (see Props. 3.1 and 3.4), solving explicitly the system (3.1) remains difficult for general convex l. To get approximation results, we consider a specific form of l.
(ToyBis) The mapping l is given by l( From the application point of view, we remind that we want to penalize more consumption excess (compared to the commitment) than consumption deficit. The asymmetry parameter ε should thus be taken non-negative. Under assumptions (Toy) and (ToyBis), the last equation in (3.1) writes: We now provide a first order expansion of the solution of this problem with respect to the parameter ε → 0.

Preliminary result
The computation of a first order expansion of the solution of the MKV FBSDE (3.1) will rely extensively on the following result.
Proposition 3.2. Let a, b, c, e, f, g be deterministic real parameters with a > 0, g > 0, b ≥ 0 and e ≥ 0. Let (h t ) t be a stochastic process in H 2,2 P and x 0 ∈ L 2 (Ω) be F 0 -measurable. Define: Define x, y and v by: P of the Forward-Backward system: Besides, for T small enough, this solution to (3.6) is the unique one in H ∞,2 × H ∞,2 × H 2,2 P . The proof is postponed to Section 4.4.

Remark 3.3.
Uniqueness of the solution of the FBSDE (3.6) could be proved for arbitrary time horizon T , using the fact that (3.6) characterizes the solution of a (linear-quadratic) stochastic control which has a unique solution (as the associated cost function is continuous, convex and coercive [9], Corol. 3.23, pp. 71).

Average processes
We introduce the following notations for the average (in the sense of expectation) of the solutions of (3.1): By taking the expectation in (3.1), we immediately get the following simple but remarkable result: the average processes do not depend on l. (3.7) In particular, (ū,X,Ȳ ) does not depend on l.
Note that the FBSDE (3.7) is explicitly solvable, as a particular case of equation (3.6) with
We shall emphasize that all terms in these expansions are solutions of FBSDEs of the form (3.6) for different input parameters (see Tab. 1) and thus they are explicitly solvable.
For other problems with more regularity (notice that x → (x + ) 2 is not twice continuously differentiable), the previous approach could actually be extended to a second order expansion or even higher order, but it would lead to more and more nested FBSDEs: on the mathematical side, there is no hard obstacle to derive these equations under appropriate regularity conditions. The concerns would be rather on the computational side since it would require larger and larger computational time.
Remark 3.8. Theorem 3.7 is not specific to the form of the loss function l given in (ToyBis) and it could be extended for any loss function l provided that it is continuously differentiable with Lipschitz-continuous derivative.

Models for random uncertainties
We assume the electricity price C is constant (C = C and C ∆ = 0), and we suppose P load is given by P load = P cons − P sun , where P cons and P sun are two independent scalar SDEs 3 , representing respectively the consumption and the photo-voltaic power production. For the consumption P cons , we use the jump process: ( 3.12) where N cons is a compensated Poisson Process with intensity λ cons . Regarding the PV production, we follow [4] by setting P sun = P sun,max X sun where P sun,max : [0, T ] → R is a deterministic function (the clear sky model) and X sun solves a Fisher-Wright type SDE which dynamics is with α, β ≥ 1/2. As proved in [4], there is a strong solution to the above SDE and the solution X sun takes values in [0, 1]. Since the drifts are affine-linear, the conditional expectation of the solution is known in closed forms (this property is intensively used in [8]): for s ≥ t. This will allow us to speed up computations of the conditional expectations E t [P load s ] as required when deriving the optimal control. 3.2 is repeatedly used to solve the affine-linear FBSDEs (ū,X,Ȳ ), (u ∆,(0) , X ∆,(0) , Y ∆,(0) ) and (u,Ẋ,Ẏ ) arising in the first order expansion of the optimal control w.r.t. ε (see Thm. 3.7). In Algorithm 1 we give the pseudo-code of the scheme used to compute solutions of the FBSDE of the form (3.6).
-For the computation of (u ∆,(0) , X ∆,(0) , Y ∆,(0) ), the conditional expectations (E nτ [h s ]) nτ ≤s≤T are given by affine-linear combinations of P cons nτ and P sun nτ with deterministic coefficients, depending on s and n, by assumption on our models for P cons and P sun (see (3.14)-(3.15)). Therefore, π(nτ ) is also given by an affine-linear combination of P cons nτ and P sun nτ with deterministic coefficients. This allows to speed up Steps 4 and 5 in Algorithm 1.
-For the computation of (u,Ẋ,Ẏ ), the conditional expectations E nτ P load,∆ Step 4 is estimated by Monte-Carlo methods. The procedure for doing so is given in Algorithm 2. This Step 4 has a complexity of order O((N T − n)M 0 ), which is the most costly Step in the loop of Algorithm 1; hence sampling (u,Ẋ,Ẏ ) has a computational cost of order O(N 2 T M 0 ).

Numerical values of parameters
We report the values chosen for the next experiments.
Parameters for smart grid. We consider the following values for the time horizon, the size of the storage system and the initial value of its normalized state of charge.
Value 24 h 200 kWh 0.5 Parameters for uncertain consumption/production. The following table gives the values of the parameters used in the modeling of the underlying exogenous stochastic processes impacting the system.  Sample (P cons kτ , P sun kτ ) conditionally to (P cons (k−1)τ , P sun (k−1)τ ) using (3.12)-(3.13), independently from all other random variables simulated so far.    Time discretization. The average processes (ū,X) are computed explicitly (up to numerical integration), while the recentered processes (u ∆,(0) , X ∆,(0) ) and first order correction processes (u,Ẋ) are computed using discretization schemes (detailed in Algorithms 1 and 2) with time-step equal to 0.5h.
Monte-Carlo simulations. To compute the first order correction, we need Monte-Carlo estimations, as explained in Algorithm 2. We choose M 0 = 4000. For assessing the statistical performances of the optimal control associated to a symmetric loss function (ε = 0), we consider M 1 = 100000 macro-runs. Among those M 1 trajectories, we only consider the first M 2 = 4000 trajectories for the computation of the first order corrections associated to ε = 0.2. Reduction of fluctuations. We plot the time-evolution of quantiles (see Fig. 3) of the power supplied by the network in 3 cases: using no flexibility, with optimal control of the battery with ε = 0, and with the approximated  optimal control associated to ε = 0.2 respectively. The comparison of the first graph with the two others shows that the quantiles are much closer to each other in the case of storage use, meaning that the variability of the power supplied by the grid has been much reduced, as expected. However, the difference between the optimal control with symmetric and asymmetric loss functions is not much visible on these plots.

Results from the experiments
Impact of first order correction. Overall, the effect of the first order correctionu (which has theoretically an average value of 0), is to lower the probability of large upper deviations of P grid from its expectation. This is quite visible if we plot the time-evolution of quantiles of the deviations P grid (t) − E [P grid (t)] for the case ε = 0, in green in Figure 4, refered as "LQ" and ε = 0.2, in red, refered as "First Order Correction". In Figure 4, we have represented from top to bottom, the quantiles of P grid (t) − E [P grid (t)] associated to levels 99%, 95%, 80%, 50%, 20%, 5% and 1%. We observe that the empirical estimations of the lower quantiles are left unchanged, while the upper quantiles 99% and 95% have been notably decreased, which was the effect sought by  Simulation-based bound on approximation error of first order expansion. Following the proof of Proposition 3.6, with our choice of parameters, we obtain an upper bound on the error in the approximation of the optimal control u (ε) : We would like to obtain a bound on the relative error committed . To do this, we have by triangular inequality: 4ε 2 (P load,∆ − u ∆,(0) ) + H 2,2 (1 − α(T ))(1 − α(T ) − 2ε) ū + u ∆,(0) + εu H 2,2 − 4ε 2 (P load,∆ − u ∆,(0) ) + H 2,2 .
(although their values may change from line to line), they do not depend on u, v, ε, they only depend on T > 0 and on the bounds from the assumptions (H.x)-(H.g)-(H.l)-(H.k)-(H.ψ)-(H.φ). At this point of the proof, it is more convenient to work with Jacobian matrices than with gradients (as in the statement). Only at the very end are we going to make the link with Y u and go back to the gradient notation. Set θ u t := (t, u t , X u t ) and letẊ u,v t be the solution to the following linear equatioṅ since it can be noticed that L u is the unique solution of using Lemma 4.1 and ∇ x φ = (φ x ) . Note, whenever useful, that is the derivative of X u+εv t at ε = 0. To justify this, we proceed in a few steps. First the Taylor formula equality gives, for smooth ϕ, applying that to φ x and φ u , we obtain where L u,v,ε is the unique solution of By hypothesis on φ x , L u and (L u ) −1 are dt × dP-a.e. uniformly bounded by a constant. Besides, (t, ω) → φ x (t, u t + εv t , [X u t , X u+εv t ]) satisfies the hypothesis of Lemma 4.1 with a constant C independent from ε. Therefore L u,v,ε is invertible and L u,v,ε and (L u,v,ε ) −1 are dt × dP-a.e. uniformly bounded by a constant independent from ε. From (4.2)-(4.3) and using the fact that φ u and φ u (·, [u, u + εv], X u ) are dt × dP-a.e. bounded by a constant independent from ε (by hypothesis on φ u ), we derive where C is a constant independent from ε. With similar arguments, we can represent the residual error RX u,v,ε as follows: The formula is also valid for Y u t− since the jumps of Y u are countable. Theorem 2.2 is proved.

Proof of Theorem 2.4
In the proof, T ≤ 1 and C denotes a generic (deterministic) constant which only depends on the bounds in the hypothesis (and not on T ). For u (1) and u (2) in H 2,2 , if a process or variable F u depends on u we write F (1) := F u (1) and F (2) := F u (2) . Besides, for any function, operator or process F which depends on u, X u ,. . . , we write ∆F := F (2) − F (1) .
The proof is decomposed into several steps.
-First, notice that by our assumptions on φ, L u (resp.L u = ((L u ) −1 ) ) (defined in Thm. 2.2) is independent from u, therefore, we simply write L (resp.L) instead. Using Lemma 4.1, L andL are bounded by constants. -Consider the application Θ (X) : It is well-defined, since we have already seen that X u ∈ H ∞,2 whenever u ∈ H 2,2 P . We want to show that Θ (X) is Lipschitz continuous and its Lipschitz constant is such that Using assumption (H.φ.2) and computations as in (4.2): By construction,ũ is predictable. Besides, for u ∈ H 2,2 P , ds .
Using Minkowski's inequality, this shows that the right-hand side is finite since g(·, 0, 0) ∈ H 2,1 , h(·, 0, 0, 0, 0), X u , Y u and u are in H 2,2 , whence the well-posedness of Θ. Similar computations give Again, from Minkowski's inequality it follows that Using · H 2,2 ≤ √ T · H ∞,2 and our estimates on ∆X H ∞,2 and ∆Y H ∞,2 , we obtain that Θ is Lipschitz continuous and its Lipschitz constant satisfies: -Under assumption (2.8), for T small enough, Θ is a contraction in the complete space H 2,2 P and has therefore a unique fixed point u in H 2,2 P . -To conclude, notice (2.1)-(2.4)-(2.6) are satisfied by (u, X u , Y u ) with X u = Θ (X) (u) and Y u = Θ (Y ) (u) if and only if u is a fixed point of Θ.

Proof of Proposition 2.6
The continuity and coercivity of F are obvious. Similar computations as in the proof of Theorem 2.2 show that F is Gateaux-differentiable and that the Gateaux-derivative of F at u in direction v is given by: Let us identify the critical points u ∈ L 2 1 : for such element, we must haveḞ (u , L(u )) = 1 0 |F(u ) t | 2 dt = 0, which implies (4( u 2 (n + 1)tdt = 1 2(n + 1) , therefore F (u (n) ) = 1 2(n+1) → 0, as it was sought. Last, we prove that the minimum is not achieved. Assume the contrary with the existence of u ∈ L 2 1 s.t. F (u ) = 0. We must have u L 2 1 = 1 and t|u t | 2 = 0 a.e. on [0, 1]: the second condition requires u = 0 which is incompatible with the first condition. We are done, F does not have a minimum.

Proof of Proposition 3.2
Usual results about the solution to affine-linear FBSDEs hold for Brownian filtration, see [30] for instance. Here, we consider more general filtrations, but the arguments are quite similar. For the sake of completeness, we give the proof.
The function θ is the unique solution of the following affine-linear second order ODE Define the following BSDE: which has a unique solution in H ∞,2 × H ∞,2 (see [14], Thm. 5.1 with p = 2) in our context of general filtrations.
By the integration by parts formula applied toπ t exp T 1 θ τ dτ ds where we used the definitions of p and π. We deduce that the process π also has the following representation: (agp s π s + ap s h s − c)ds . (4.8) Since θ and p are bounded on [0, T ], we easily prove that π ∈ H ∞,2 . From that and our assumptions on the data of the problem, it is clear that (x, y, v) as defined in (3.5) belong to H ∞,2 × H ∞,2 × H 2,2 P . In particular, v is predictable since x is continuous by construction.
We now prove that (x, y, v) defined by (3.5) solves (3.6). By definition of y and v in (3.5), we can check that: v t = gy t− + h t .
Definex the unique solution of the following affine-linear ODE: x t := x 0 − t 0 (agp sxs + agπ s + ah s )ds. hence, x is the unique solution of (4.9). Since π has countably many jumps and changing the Lebesgue integral is left unchanged by changing the integrand at countably many points, we then get by definition of v: It remains to show that the second equation in (3.6) is verified. Using that p is solution of (4.7), π verifies (4.8) and x is solution of (4.9), we get: y t = p t x t + π t = E t p T x T − T t dp s ds x s + dx s ds p s ds + π T − T t (agp s π s + ap s h s − c)ds = α(T ) u (ε) −u H 2,2 + 2ε u (ε) H 2,2 .

Boundedness of solutions to linear ODE with bounded stochastic coefficient
This following result is used in the proof of Theorems 2.2 and 2.4. Proof. A direct computation shows that dt × dP-a.e., d(L t R t ) dt = 0, thus ∀t ∈ [0, T ], L t R t = L 0 R 0 = Id p . Therefore R and L are invertible with R = L −1 . Let us now turn to the uniform boundedness. Let v ∈ R p , we have d|L t v| 2 dt = v L t (A t + A t )L t v ≤ |A t + A t | |L t v| 2 ≤ 2C |L t v| 2 , dt × dP-a.e.. Therefore, by integration, |L t v| 2 ≤ |v| 2 exp (2CT ) for t ∈ [0, T ], which yields sup 0≤t≤T |L t | 2 ≤ exp (2CT ) . This proves exp (CT ) ≥ sup 0≤t≤T |L t | = sup 0≤t≤T |L t |, whence the announced bound for L. For bounding |R| start from d|Rtv| 2 dt and proceed similarly.

Conclusion
In this work, we have identified the optimal control of storage facilities of a smart grid under uncertain consumption/production, in order to reduce the stochastic fluctuations of the residual consumption on the electrical public grid. It has been possible thanks to the resolution of a new extended McKean-Vlasov stochastic control problem, using Pontryagin principle and Forward Backward Stochastic Differential Equations. For situations where the costs are close to quadratic functions, we have derived quasi-explicit formulas for the control, using perturbation arguments.
In further works, we will consider subsequent issues like more realistic dynamics of the battery flow accounting with aging/boundary effect, sizing of the smart grid and of the storage/production capacities, risk aggregation of optimized smart grids with dependent solar productions, impact of model mis-specification on the optimal solution (risk model).