Mean Field approach to stochastic control with partial information

The classical stochastic control problem under partial information can be formulated as a control problem for Zakai equation, whose solution is the unnormalized conditional probability distribution of the state of the system. Zakai equation is a stochastic Fokker-Planck equation. Therefore, the problem to be solved is similar to that met in Mean Field Control theory. Since Mean Field Control theory is much posterior to the development of Stochastic Control with partial information, the tools, techniques, and concepts obtained in the last decade, for Mean Field Games and Mean field type Control theory, have not been used for the control of Zakai equation. Our objective is to connect the two theories. We get the power of new tools, and we get new insights for the problem of stochastic control with partial information. For mean field theory, we get new interesting applications, but also new problems. Indeed, Mean Field Control Theory leads to very complex equations, like the Master equation, which is a nonlinear infinite dimensional P.D.E., for which general theorems are hardly available, although active research in this direction is performed. Direct methods are useful to obtain regularity results. We will develop in detail the LQ regulator problem, but since we cannot just consider the Gaussian case, well-known results, such as the separation principle is not available. An important result is available in the literature, due to A. Makowsky. It describes the solution of Zakai equation for linear systems with general initial condition (non-gaussian). We show that the separation principle can be extended for quadratic pay-off functionals, but the Kalman filter is much more complex than in the gaussian case. Finally we compare our work to the work of E. Bandini et al. and we show that the example E. Bandini et al. provided does not cover ours. Our system remains nonlinear in their setting.


Introduction
With the development of nonlinear filtering, stochastic control with partial information has been one of the most important and well established topics in control theory, where the underlying controlled state dynamics is also influenced by unobservable uncertainties. With full information, the optimal control can often be taken in a Markovian feedback form as a function in the state up to the present time. In contrast, under partial information, the state of the system is unobservable, yet Markov property can still be reinstated by introducing fruitful for the general theory, but does not lead to the explicit solution as we obtain in this paper. Roughly speaking, our problem considered here can be reduced to that investigated in an interesting recent alternative formulation in [2], which is fundamentally different from our present proposed framework; indeed, their set up leads to a conditional probability (hence normalized) solution of a linear stochastic PDE, which is another special case of Duncan-Mortensen-Zakai equations but different from the one we obtained here. They have also provided an example with linear equations, which we believe does not cover the proposed linear quadratic case, with an arbitrary initial distribution, as extensively dealt in the present work; particularly our system remains nonlinear in the framework of [2], which is due to the complication of their approach. The organization of this article is as follows. Section 2 formulates the stochastic control problem with partial information, and introduces the corresponding controlled Zakai equation. The key toolkit of mean field approach is introduced in Section 3, and then we attempt to obtain a weak solution of the stochastic control problem by solving the Zakai equation under a weak formulation in Section 4. In Section 5, we present the solution of the linear quadratic problem with an arbitrary initial distribution using the mean field approach, and hence demonstrate the effecitiveness of the mean-field theory. Finally in Section 6, we compare our approach with the recent one in [2].

The problem
We first describe the problem formally, without making precise the assumptions. The state of the system x(t) ∈ R n is a solution of a diffusion: dx = g(x, v)dt + σ(x)dw, and we assume that there exists a probability space (Ω, A, P) on which a random variable ξ and a standard R n -valued Wiener process, which is independent of ξ, are constructed. There is a control v(t) in the drift term, with values in R m . Since, we cannot access to the state x(t), which is not observable, it can neither be defined by a feedback on the state, nor adapted to the state. Formally, we have an observation equation: dz = h(x)dt + db(t) (2.2) in which z(t), with values in R d , represents the observation and b(t) is another Wiener process, independent of the pair (ξ, w(·)). The function h(x) corresponds to the measurement of the state x and b(t) captures a measurement error. So the control v(t) should be adapted to the process z(t), not a feedback of course. It is well-known that this construction is ill-posed. Indeed, the control is adapted to the observation, which henceforth depends also on the state, which depends on the control. It is a chicken and egg matter, that is usually solved by the Girsanov theorem, at the price of constructing appropriately the Wiener process b(t). In practice, we construct on (Ω, A, P) three objects: ξ, w(·) and z(·). The processes w(·) and z(·) are independent Wiener processes on R n and R d , respectively, while ξ is independent of these two processes. We set F t = σ(ξ, w(s), z(s), s ≤ t), Z t = σ(z(s), s ≤ t), the filtrations on (Ω, A, P) generated by (ξ, w(·), z(·)) and z(·) respectively. The process z(·) is the observation process, but it is defined exogeneously. We can then choose the control v(·) as a process with values in R m , which is adapted to the filtration Z t . So, it is perfectly well-defined, as well as the process x(·) as the solution of (2.1). In fact in (2.1) v(·) is fixed, like ξ and w(·), and we assume that we can solve the SDE (2.1) in a strong sense. So x(·) is well-defined. Here comes Girsanov theorem. We define the scalar P, F t -martingale η(t), which is the solution of the equation: This martingale allows to define a new probability on Ω, A, denoted by P v(·) to emphasize the fact that it depends on the control v(·). It is given by the Radon-Nikodym derivative: Finally, we define the process h(x(s))ds, (2.5) which also depends on the control decision. We take a finite horizon T in the rest of this article. Making the change of probability from P to P v(·) and considering the probability space (Ω, F T , P v(·) ), then b v(·) appears as a standard Wiener process, which is independent of w(·) and ξ. Therefore, (2.5) is a template of (2.2) as far as probability laws are concerned. We can then rigorously define the control problem (without the chicken and egg matter): in which the functions f (x, v) and f T (x) represent the running cost and the final cost, respectively, contributing to the payoff functional to be minimized. The notation E v(·) refers to the expected value with respect to the probability law P v(·) .
Remark 2.1. The previous presentation, which is currently the common one to formalize stochastic control problems with partial information, has a slight drawback, in comparison with the description of the problem with full information. With full information, there is no Z t and the underlying filtration

Control of Zakai equation
Note first that the functional (2.6) can be written as This is obtained by using the Radon-Nikodym derivative (2.4) and the martingale property of η(t). We next recall the classical nonlinear filtering theory result. Let Ψ(x) be any bounded continuous function. We want to express the conditional expectation E v(·) [Ψ(x(t))|Z t ] of the random variable Ψ(x(t)) with respect to the σ−algebra Z t , on the probability space (Ω, A, P v(·) ). We have the basic result of non-linear filtering theory: where q(x, t) is called the unnormalized conditional probaility density of the random variable x(t) with respect to the σ−algebra Z t . One can directly read off from (2.8) that the conditional probability density itself is . The function q(x, t) is a random field adapted to the filtration Z t . It is the solution of a stochastic PDE: (2.9) in which A * is the second order differential operator: which is the dual of with a(x) = 1 2 (σσ * )(x). The initial condition q 0 (x) is the probability density of ξ. We suppose that ξ has a probability density. The random field q(x, t) depends on v(·) and is thus denoted by q v(·) (x, t). From (2.8) and (2.7), we can write the pay-off J(v(·)) as The minimization of J(v(·)) is a stochastic control problem for a dynamic system whose evolution is governed by the stochastic PDE (2.9).
Remark 2.2. We can elaborate more on the difference between feedback controls and open-loop controls, as addressed in Remark 2.1, by considering equation (2.9) describing the evolution of the state q(x, t). In this equation v(t) is a stochastic process adapted to the filtration Z t , so it is fixed with respect to the space variable x.

Preliminaries
We define the value function: and following the main concept of Dynamic Programming, we embed this value function into a family parametrized by initial conditions q and t, where q denotes an unnormalized probability density on R n . We also make precise the choice of the functional space in which the function q(x) lies. To fix ideas, we take q ∈ L 2 (R n ) ∩ L 1 (R n ) and q(x) ≥ 0. We shall assume that Considering functionals on L 2 (R n ), Ψ(q), we say that it is Gâteaux differentiable, with Gâteaux derivative ∂Ψ ∂q (q)(x) if the function t → Ψ(q + tq) is differentiable with the following formula: We shall assume that with (q, x) → ∂Ψ ∂q (q)(x) is continuous, satisfying: such that c(q) is continuous and bounded on any bounded subsets of L 2 (R n ) . We also need the concept of second order Gâteaux derivative. The second order Gâteaux derivative is a functional ∂ 2 Ψ ∂q 2 (q)(ξ, η) such that the function t → Ψ(q + tq) is twice differentiable in t so that Moreover, the function (q, ξ, η) → ∂ 2 Ψ ∂q 2 (q)(ξ, η) is continuous satisfying: with c(q) continuous and bounded on any bounded subsets of L 2 (R n ). From formula (3.5), it is clear that we can choose ∂ 2 Ψ ∂q 2 (q)(ξ, η) to be symmetric in (ξ, η). Set f (t) = Ψ(q + tq). Then, combining the above assumptions (3.2), (3.4) and (3.6), we can assert that f (t) is C 2 . Therefore, we have the identity: which leads to the formula:

Bellman equation
We consider the control problem with initial conditions q(x) at time t, or (q, t) for short: and define the value function: Assuming that the value function has derivatives ∂Φ ∂t (q, t), ∂Ψ ∂q (q)(x), ∂ 2 Ψ ∂q 2 (q)(ξ, η), then, by standard arguments, we can check formally that Φ(q, t) is the solution of the Bellman equation: The optimal open-loop control is obtained by achieving the infimum in (3.11). We derive a functionalv(q, t), which is a feedback in q but not in x. We can then feed the Zakai equation (3.8) with this feedback to get the optimal state equation: Once we solve this stochastic PDE, we obtain the optimal stateq(s) :=q(x, s). We then define the control v(s) =v(q(s), s), which is indeed adapted to the filtration Z s t = σ(z(τ ) − z(t), t ≤ τ ≤ s). This is the optimal open-loop control.

The master equation
The functionalv(q, t) defined at the very end of the last subsection depends on the function (3.13) in which we omit to write explicity the arguement t for simplicity. Bellman equation (3.11) can be rewritten as: (3.14) It is also convenient to set Therefore, Bellman equation reads The Master equation is an equation for U (x, q, t). It is obtained by differentiating (3.16) with respect to q.
We therefore obtain, formally: which is symmetric in the arguments (x, ξ, η).

System of HJB-FP equations
In Mean Field Theory approach, the Master equation is the key equation. However, it is an infinite-dimensional nonlinear PDE. Direct approaches are very limited. The most convenient approach is to use ideas similar to the classical method of characteristics. This amounts to solving a system of forward-backward finite dimensional stochastic PDE. Since it is forward-backward the initial conditions matter. We shall consider that the initial time is 0, for convenience; yet the same argument applies for any time t ∈ [0, T ]. This system is called Hamilton-Jacobi-Bellman for the backward equation and Fokker-Planck for the forward one. The Fokker-Planck equation is the Zakai equation in which we insert the optimal feedabckv(q, U ). So we get The functional U (x, q, t) used in (3.19) is the functional solution of the master equation (3.17). We call simply q(t) the solution of (3.19). We then set We use the notationv(q t , u t ) to represent the functionalv(q, U ) in which the arguments q, U are replaced by q(·, t) and U (·, q(·, t), t) = u(·, t). We denote q t = q(·, t), u t = u(·, t) to simplify. The functionalv(q t , u t ) achieves the infimum ofv The next step is to obtain the equation for u(x, t). It is a long and tedious calculation, obtained in taking the Ito differential of the random field defined by (3.20). We omit the detail and only give the result as follows: where K(x, t) is defined by the formula: In fact, we do not need to compute K(x, t) by formula (3.23), which would require the knowledge of V (q t , t)(x, ξ), thus solving the master equation. From the theory of backward stochastic PDE the random field K(x, t) is required by the condition of adaptativity of u(x, t). So the solution of (3.22) is not just u(x, t) but the pair (u(x, t), K(x, t)), and we can expect uniqueness. Equation ( recalling also (3.21). So the pair (3.22), (3.24) is the pair of HJB-FP equations. Since q(x, 0) = q(x), we can assert that Therefore we can compute U (x, q, 0) by solving the system of HJB-FP equations and using formula (3.25). Of course u(x, t) = U (x, q, t). To compute U (x, q, t), we have to write the system (3.22), (3.24) on the interval (t, T ) instead of (0, T ). In that sense, the system of HJB-FP equations (3.22), (3.24) is a method of characteristics to solve the master equation (3.17). Besides the optimal optimal feedbackv(q, U (·, q, t), t) can be derived from the system of HJB-FP equations; indeed,v (q, U (·, q, 0), 0) =v(q, u(·, 0)), and setting the initial condition of the system of HJB-FP equations at t instead of 0 yieldsv(q, U (·, q, t), t). To compute the value function, we have to rely on Bellman equation (3.16). Let us compute ∂Φ ∂t (q, 0), by using (3.16). The only term which is not known is Collecting results, we can write the formula: In a similar way, we can define ∂Φ ∂t (q, t), for any t and any q. Since we know Φ(q, T ), we obtain Φ(q, t) for any t. So solving the system of HJB-FP equations provides all the information on the value function and on the optimal feedback.

Weak formulation and linear dynamics
In this section, we consider Zakai equation as follows: and thus which is the weak formulation of Zakai equation. Note that the formulation (4.1) (strong form) and the weak form (4.2) are not equivalent. We may have a weak solution but not a strong solution.

Linear system and linear observation
We want to solve Zakai equation in the following special case: In general, this case is associated to an initial probability q(x), which is Gaussian. In our approach, we cannot take a special q(x). It must remain general, because it is an argument of the value function and of the solution of the master equation. When we solve the system of HJB-FP equations, we can take q(x) Gaussian, but then we cannot use this method to obtain the solution of the master equation or of Bellman equation. For a given control v(t) which is a process adapted to Z t , Zakai equation reads: where a = 1 2 σσ * . Makowski [12] has shown that this equation has an explicit solution, that we describe now, in a weak form. We first need some notations. We introduce the matrix Σ(t) solution of the Riccati equation: (4.5) We then define the matrix solution Φ(t) of the differential equation: and We then introuduce stochastic processes β(t) and ρ(t) adapted to the filtration Z t , defined by the following respective equations The process β(t) is the Kalman filter for the linear system (4.3) with a deterministic initial condition, equal to 0. If we set we obtain the Kalman filter for the same linear dynamic system, with an intial condition x. It satisfies the equation: Finally, we introduce the martingale θ(x, t) defined by: θ(x, 0) = 1, (4.12) whose solution is the Doléans-Dade exponential

Formulas
We can state the following result, due to Makowski [12], whose proof can be found in [3].
Proposition 4.1. For any test function ψ(x, t), we have (4.14) Proof. Equality (4.14) is true for t = 0. Let us set According to (4.2), it is thus sufficient to show that This can be done through some tedious calculation, whose details can be found in [3].
We shall derive from (4.14) a more analytic formula. We first set Hence, from (4.12), But from (4.14), we see that Therefore, where we have setx Referring to (2.8), we see thatx the conditional mean of the process x(t) defined by, see (2.1): dx = (F x(t) + Gv(t))dt + σdw, with respect to the filtration Z t on the probability space (Ω, A, P v(·) ). It is thus the Kalman filter in this probabilistic set up. We shall derive the form of its evolution in the sequel. Now, from (4.17) we can assert that: recalling that, see (4.15), ν(0) = R n q(x)dx. Next, from (4.13) and (4.10), we have: and recalling the definition of S(t) and ρ(t), see (4.7) and (4.9). From (4.15), we obtain: Combining results, we can assert that: (4.21) Next, using (4.16) and (4.10), we have: therefore, from (4.18) we also obtain:x Let us introduce the deterministic function of arguments ρ ∈ R n and t: We can finally state the main formula for the unnormalized conditional probability q(x, t).
Theorem 4.2. The unnormalized conditional probability q(x, t) is given by:
We also have the follwing interpretation of Γ(ρ(t), t) as the conditional variance of the process x(t): Proposition 4.4. We have the formula: Proof. We use (4.25) to write: which is (4.36) as desired.

The Gaussian case
We first begin by giving the characteristic function (Fourier transform) of the unnormalized probability density denoted by: (4.37) The Gaussian case corresponds to an initial value of the system (4.19) which is Gaussian: where we have assumed the initial variance P 0 to be invertible, to simplify calculations. Using (4.23), we obtain: Therefore, from (4.27), we obtain: which is independent of ρ. An easy calculation shows that P (t) is the solution of the Riccati equation: andx(t) is then the classical Kalman filter: To obtain q(x, t), we use the characteristic function (4.37). An easy calculation yields: which is the characteristic function of a Gaussian random variable with meanx(t) and variance P (t). Recall that it is a conditional probability density given Z t .

Setting of the problem
We want to apply the theory developed in Section 3 to the linear dynamics and linear observation (4.3), with a quadratic cost: in which M and M T are n × n symmetric positive semi-definite matrices and N is an m × m symmetric positive definte matrix. We want to solve the control problem (2.9), (2.10) in this case. We write it as follows: In the sequel, we shall drop the index v(·) in q(x, t) without cause of too much ambiguity.

Application of mean field theory
We begin by finding the functionv(q, U ) defined by (3.13). We have to solve for the minimization problem: We consider the value function: and we write Bellman equation (3.16) specifically for the linear quadratic case as follows: (5.8) in which we recall the notation: We can next write the Master equation (3.17), which is the equation for U (x, q, t). We get:

System of HJB-FP equations
We now write the system of HJB-FP equations (3.22) and (3.24). We look for a pair (u(x, t), q(x, t)) the adapted random fields solution of the coupled system: q(x, 0) = q(x); (5.11) The random field K(x, t) can be expressed as: The key result is that we can solve this system of equations explicitly and obtain the optimal control. We introduce the matrix π(t) solution of the Riccati equation: We next introduce the function Z(x, ρ, t) solution of the deterministic linear PDE: x + 2tr(aπ(t)) = 0, Z(x, ρ, T ) = 0.
(5.14) We next introduce the pair of adapted processes (x(t), ρ(t)) the solution of the system of SDE: They are built on a convenient probablity space on which z(t) is a standard Wiener process with values in R d . We associate to the pair (x(t), ρ(t)) the unnormalized conditional probability q(x, t) defined by Zakai equation: We next define the random field We now state the main result of our paper: Theorem 5.1. We have the property: , ρ(t), t)q(x, t)dx = 0, a.s., for any t, (5.19) and u(x, t) and q(x, t) defined by (5.18) and (5.17), respectively, are solutions of (5.10) and (5.11). The optimal control is given byv Proof. We first prove (5.19). We differentiate (5.14) with respect to x in order to obtain: We next consider q(x, t) defined by (5.17). A long calculation then shows that: From this relation, it follows that is a Z t martingale. Since it vanishes at T, it is 0 at any t, a.e. Hence (5.19) is obtained. Consider u(x, t) by formula (5.18), therefore, using (4.18) and (5.19), we get: To check (5.10) we have to check: Note that the final condition can be trivially verified. We can check (5.24) by direct calculation. We obtain also the value of K(x, t): So we have proven that (u(x, t), q(x, t)) is the solution of the system of HJB-FP equations (5.10) and (5.11).
The result (5.20) is an immediate consequence of (5.23). The proof is complete.

Complements
The result (5.20) is important. It shows that the optimal control of the problem (5.2) and (5.3) follows the celebrated "Separation Principle". We recall that in the deterministic case, the optimal control, which is necessarily an open-loop control can be obtained by a linear feedback on the state. Open loop and feedback controls are equivalent. The separation principle claims that in the partially observable case, the optimal open loop control (adapted to the observation process) can be obtained by the same feedback as in the deterministic case, replacing the nonobservable state by its best estimate, the Kalman filter. The fact that the separation principle holds is well-known when the initial state follows a Gaussian distribution. We have proven that it holds in general. What drives the separation principle is the linearity of the dynamics and of the observation and the fact that the cost is quadratic. The Gaussian assumption does not play any role. A significant simplification occurs in the Gaussian case, regarding the computation of the Kalman filter. In the Gaussian case, the Kalman filter solves a single equation. In general, the Kalman filter is coupled to another sufficient statistics ρ(t) and the pair (x(t), ρ(t)) must be obtained simultaneously.

Objectives
We compare in this section our work with another recent approach of Bandini et al. [2]. They consider a general problem of stochastic control with partial information, to which our problem can be reduced. Their set up leads to a conditional probability, (hence normalized) solution of a linear stochastic PDE, which they call DMZ equation (for Duncan-Mortensen-Zakai equation). They formulate a control problem for this infinite dimensional state equation, for which they write a Bellman equation. The solution is a functional on the Wasserstein space of probability measures, since indeed the state is a probability. When we formulate our problem in their set up, our Zakai equation cannot be their DMZ equation, since we do not have a probability, but an unnormalized probability. To make the comparison easy we keep our model, but we follow the setup of [2]. We explain the difference between the two equations, and also the difference between our own Bellman equation and their Bellman equation. Although our problem can appear as a particular case of [2], it is at the price of complicating it, which turns out to be not suitable. The discussion will explain the reasons. [2] provides an example with linear equations, which does not cover ours. In the set up of [2], our system remains nonlinear, which is also a consequence of the complication of the approach. We remain formal in our presentation, since we want to discuss the concepts and compare the methods.

Use of the set up of [2]
In the set up of [2], we consider the pair (x(t), η(t)) the solution of the system: in which w(·) and z(·) are independent Wiener processes and x 0 and η 0 are random variables independent of w(·) and z(·). We observe only the process z(·). The DMZ equation introduced by [2] is the equation for the conditional probability of the pair (x(t), η(t)) given the σ-algebra Z t = σ(z(s), s ≤ t). In (6.1), the control v(t) is simply adapted to Z t . If ϕ(η, x, t) is a deterministic function on R n+1 × R + , we are interested in the process ρ(ϕ)(t) = E[ϕ(η(t), x(t), t)|Z t ]. It is the solution of the DMZ equation. We use the notation: x ϕ(η, x, t)), with a(x) = 1 2 σ(x)σ * (x). We note We next define the operators: Then the DMZ equation is: ρ(ϕ)(0) = Eϕ(η 0 , x 0 , 0).

(6.7)
In the sequel, we assume the existence of a density p(η, x, t) which is the joint conditional probability density of (η(t), x(t)) given Z t . It is defined by: ρ(ϕ)(t) = p(η, x, t)ϕ(η, x, t)dηdx. (6.8) The conditional probability density is the solution of the stochastic PDE:      dp + A * x p(η, x, t) + div(g(x, v(t))p(η, x, t)) − 1 2 |h(x)| 2 ∂ 2 ∂η 2 (η 2 p(η, x, t)) dt = −h(x) ∂ ∂η (ηp(η, x, t)) · dz(t), p(η, x, 0) = p 0 (η, x). (6.9) It is easy to check that q(x, t) = ηp(η, x, t)dη (6.10) is the solution of Zakai equation (2.9), provided that the initial condition q 0 (x) = ηp 0 (η, x)dη. (6.11) It is then clear that although p(η, x, t) is indeed a probability density, while q(x, t) is not. Conversely, if we start with q 0 (x) and want to solve Zakai equation (2.9), we can use (6.10) by looking for p(η, x, t) the solution of the DMZ equation (6.9). We need to take the initial condition of product measure: p 0 (η, x) = δ η − q 0 (ξ)dξ ⊗ q 0 (x) q 0 (ξ)dξ . (6.12) This is not a probability density, so we need to use the weak formulation, to proceed. We get some kind of interesting quandary. Using the set up [2] we can use probability measures, the Wasserstein topology and the lifting method of P.L. Lions, but the price to pay is to increase the dimension by 1, with a nonlinearity. If we stay with the traditional setup, we have to work with unnormalized probability densities. If we can work with densities, it is not a serious drawback, but otherwise we have to find an alternative to the Wasserstein space and the lifting procedure, and it is not clear how to proceed. We can, of course, consider Kushner equation, instead of Zakai equation, whose solution is a probability. But Kushner equation is nonlinear, conversely to Zakai equation.

(6.16)
If we compare with the Bellman equation (3.11) rewritten with the current notation (with argument an unnormalized probability), we obtain:

The linear case
If we go back to the linear case (4.3), we get: dx = (F x + Gv)dt + σdw, dη = ηHx · dz, (6.20) therefore, in the setup [2], we still have a nonlinear system. Therefore, we cannot use the linear case of [2]. This explains why our formulas are completely different. The fact that we have an explicit solution of the system of HJB-FP equations does not imply that we have an explicit solution of Bellman equation. This is consistent with the spirit of the method of characteristics.