A GENERAL MAXIMUM PRINCIPLE FOR PROGRESSIVE OPTIMAL STOCHASTIC CONTROL PROBLEMS WITH MARKOV REGIME-SWITCHING ∗

. In this paper, we give a general maximum principle for optimal controls of stochastic systems driven by Markov chains. The control is allowed to enter both diﬀusion and jump terms and the control domain is not necessarily convex. We apply a new spike variation and the stochastic integral of progressive processes to obtain the main result.


Introduction
Markov regime-switching models have been widely used in finance and stochastic optimal controls problems in the past few years. It modulates the system with a continuous-time finite-state Markov chain with each state representing a regime of the system or level of an economic indicator, which depends on the market mode that switches among finite number states. For example, in the stock market, the up-trend volatility of a stock tends to be smaller than its down-trend volatility, therefore, it is reasonable to describe the market trends by a two-state Markov chain. For details, the readers can refer to [13].
The maximum principle, a necessary condition for optimal control, is one of the central results in stochastic control problems. There is a very extensive literature on the stochastic maximum principles for various types of optimal control problems. Peng [7] proved the general maximum principle for the forward stochastic control system without jump by using a second-order variation equation to overcome the difficulty appearing along with the nonconvex control domain and control entering the diffusion term. Donnelly [3] studied the sufficient maximum principle in a regime-switching model. Zhang et al. [14] develops a sufficient stochastic maximum principle for a forward system driven by Markov regime switching and Poisson random measure. Lv and Wu [6] obtained the sufficient stochastic maximum principle of forward-backward Markov regime switching jump diffusion system. In Tao and Wu [11], a stochastic maximum principle of forward-backward system was obtained. Tang and Li [10] proved the maximum principle for the forward control system driven by Poisson random measure where the control variable is allowed into both diffusion and jump coefficients. Song et al. [9] fixed the deficiencies of [10] by introducing a new form of variation and allowing the control to be progressive instead of predictable. As we know, Markov chains are pure jump processes that are quasi-left continuous. Under this condition, the counting process related to a Markov chain (the process V in this paper) has many similar properties to Poisson processes. When we consider stochastic integrals of continuous martingales, there is no difference between integrand being predictable or progressive. For example, we usually assume the integrand to be progressive when we consider the stochastic integral of Brownian Motion. However, when it comes to the stochastic integrals of martingales with jumps, the difference is huge. Therefore, there is a fundamental difference between "predictable" assumptions and "progressive" assumptions of controls when the system is driven by Markov chains. Zhang et al. [15] proved the maximum principle for the system driven by Markov chain and Poisson random measure with mean-field terms. They assumed that all admissible controls are predictable and cited some main estimates of [10]. Similar to [10], the flawed estimates may cause some problems.
In this paper, we will prove the maximum principle for the system driven by Markov chains with all admissible controls being progressive. Compared with [15], our model which overcomes the main difficulty caused by the jump term is more rigorous and subtle in mathematics. The rest of this paper is organized as follows. In Section 2, we give some preliminaries for Markov chains and introduce the stochastic integral of progressive processes with respect toṼ . In Sections 3-5, we give a rigorous proof of the maximum principle. To make the approach in [7] effective, we introduce a new spike variation in Section 4. Since the new variation is not necessarily predictable, our admissible control set is a set of progressive processes. That is the reason why we need the stochastic integral of progressive processes. In Section 5, we give our main result. It has a similar form to the result in [7], but indeed they are fundamentally different because the adjoint equations are different. Our result is also different from that in [15]. In the Appendix, we prove the existence and uniqueness of the solutions of SDEs and give an L p estimate of the solutions.

Preliminaries
Suppose we are given a complete probability space with filtration (Ω, F , {F t } t≥0 , P ). Let P be the predictable σ-field and G be the progressive σ-field.

Continuous-Time Markov Chains
We suppose that E is a finite set with n elements. Set E = {1, 2, . . . , n} without loss of generality. The topology on E is the discrete topology. For readers' convenience, we give the following definition which is from Section 6.5 of [5]. Let α t be a continuous-time Markov chain, L be its generator. Since the state space E is finite, it is clear that the domain of L is R E . For each i ∈ E, let S(i) be the first jump of α whose initial value is i, then S(i) is exponentially distributed with parameter q(i). In this paper, we assume that q(i) > 0 for each i, i.e., there is no absorbing point. Now for each i ∈ E, we define a function f i : E → R, f i (j) = I {i} (j). Clearly f i ∈ D(L ) and f i (α t ) has the following semimartingale decomposition, where L is the transition intensity matrix of α, M i is a martingale. Since for each i ∈ E, q(i) > 0, we have L(i, i) < 0 for each i.
If i = j, for each (ω, s), only one of f i (α s− ), f j (α s− ) can not be 0. Let V ij t be the counting process which counts the number of jumps from i to j up to time t, i.e., it is easy to verify that r t (ω) = q(i 0 ), i.e., r t = q(α t ). Therefore r t can only have finite values. Hence r t is bounded from above and below.
Since V is a counting process, we have Since all f i are bounded, M t is a martingale. Then the compensator (i.e., dual predictable projection) of V is t 0 r s ds. In addition, where "p" stands for compensator. Observing the compensator of V is continuous, we know that V is quasi-left continuous. From now, we use the new notationṼ t = V t − t 0 r s ds to represent M t . By (2.1), it is obvious that

Stochastic integral with progressive integrand
Fix T > 0, the measure υ generated by (V t ) t≤T is a measure on (Ω × [0, T ], F ⊗ B([0, T ])) defined in the following way, Then which means that υ is a finite measure. For any υ-integrable process X, set E[X] := Xdυ, and E[X|P] the R-N derivative with respect to the σ-field P. E is not an expectation (for υ is not a probability measure), though it has similar properties to expectation.
Suppose that (H t ) t≤T is a progressive process satisfing Then the stochastic integral t 0 H s dṼ s is well-defined; it is a martingale. For further details, the readers can refer to ([4], no. 2, Chap. 9). Next we give some important properties of the stochastic integral without proof. Noticing that the compensator of (2.5) Remark 2.3. By the above proposition, we can get Specifically when H is predictable we have Since every Feller process is quasi-left continuous ([1], Thm. 5.40, Chap. 9), α t is quasi-left continuous. Noticing that V t has the same jump time to α t , V t is also quasi-left continuous. Therefore by Corollary 9.9 of [12] we have the following property: and For processes with jumps, we give the following Itô's formula referring to Theorems 32, 33 of [8] or Theorem 9.35 of [4].

Example
Indeed, the difference between progressive controls and predictable controls may be significant. Let us consider the following stochastic control problem driven by Markov chain.
and two admissible control sets First let us find an optimal control in U 1 that minimizes J. Define For any u ∈ U 1 , applying Ito's lemma we have Since u is predictable, we have LetX be the solution of the following equation, Then It is obvious thatû is progressive but not predictable since V is quasi-left continuous. By (2.5), the original equation turns to Therefore we find a control in U 2 which is more optimal than the optimal control in U 1 . This example shows that there are not enough controls taken into consideration when the admissible controls are predictable. On the other hand, we can not ensure thatû is an optimal control in U 2 and even can not ensure the existence of optimal controls in U 2 .

Statement of the problem
On (Ω, F , {F t } 0≤t≤T , P ), we are given an {F t } 0≤t≤T Markov chain (α t ) 0≤t≤T and an {F t } 0≤t≤T Brownian Motion (B t ) 0≤t≤T ; α t and B t are independent. We also assume that {F t } 0≤t≤T is the completion of the filtration generated by α t and B t . Let U be a nonempty subset of R l . We define the admissible control set We consider the following progressive system with jumps: along with the cost functional: The control problem is to find an element u ∈ U ad such that We aim at finding necessary conditions for an optimal control in U ad . We need the following assumptions. Assumption H: σ, c are twice continuously differentiable w.r.t x with bounded first and second order derivatives. In addition, there is a constant C such that -f, g are twice continuously differentiable w.r.t x with bounded second order derivatives. In addition, there is a constant C such that Under these assumptions, we can show that there exists a unique solution of (3.1) for any admissible control in Appendix.

Variation
Since U is not necessarily convex, we employ spike variations. Suppose u ∈ U ad is the optimal control. Let {T n } n≥1 be the jump times of V t . For anyt ∈ [0, T ], the spike variation of u is defined by where T n := {(ω, t) ∈ Ω × [0, T ] | T n (ω) = t} is the graph of T n , v is a bounded Ft measurable function that takes values in U . Since T n is a stopping time, T n is a progressive set. Therefore, the spike variation u is progressive; then it is easy to show that u is in U ad .
The method of variation is showed in Figure 1. Fix ω; we consider one path of u and u. The difference between the new method and the traditional method is that if there are jumps in (t, t + ], for example, as Figure 1 shows that if T 1 (ω) is in (t, t + ], then the value of u at T 1 (ω) is equal to u rather than v. Remark 4.1. As we know, T n is not a predictable time, so T n is not predictable which means that u is not predictable. That is the reason why we need the integrand of the stochastic integral to be progressive. In fact, T n are totally inaccessible times.
Let X (resp. X ) be the trajectory of u (resp. u ). Since (Leb × P )( T n ) = 0, by the estimate of SDE, we obtain Since there is no jump on O, we have which means the jump term does not influence the order of variation. In fact, if we do not subtract the jump is always of order O( ) no matter how large p is. By this new variation, we can use the method in [7] to get the desired conclusion. Now we introduce the variation equations. The first equation iŝ and the second one isŶ where δφ = φ(s, X s , u s ) − φ(s, X s , u s ); δφ x = φ x (s, X s , u s ) − φ x (s, X s , u s ) for φ = b, σ. We treat φ xx (s, X s , u s ) as a symmetric linear function from R n × R n to R n . It is easy to show that (4.3) and (4.4) have unique solutions by Theorem A.2 in the Appendix. Now we give some basic estimates w.r.t.X andŶ .
Lemma 4.2. For p ≥ 2, we have the following estimates,

Proof. By Theorem A.3 in the Appendix, forX we have
ForŶ , by the boundedness of b xx , σ xx , c xx , Lemma A.1 and Theorem A.3 in the Appendix, we have Proof. First we find the equation that X t +X t +Ŷ t satisfies.
is a symmetric linear function from R n × R n → R n . Then we get Then by Theorem A.3, we obtain which shows the result. Now we get the variation equation for cost functional. We have where δf = f (s, X s , u s ) − φ(s, X s , u s ). We treat f xx , g xx as a symmetric linear function from R n × R n to R here. Indeed, f xx , g xx can be also treated as a matrix. In this case we have Then we have the following lemma. Proof. Then By the same method we can show that E

Adjoint equations and the maximum principle
We introduce the first order and second order adjoint equations.
The first order equation is And the second order equation is Here we treat f xx , g xx (X T ) as a matrix; we treat b xx , σ xx , c xx as a linear function from R n to R n×n . For the existence and uniqueness of the two BSDEs above, we refer to [2]. Since φ x , φ xx are bounded, there exists a unique solution of (5 with norm Z 2 = E T 0 |Z s | 2 ds , and Applying Itô's formula (2.8) for p t ,X t , p t ,Ŷ t ; and P tXt ,X t , using (2.6), (2.7), we get and where L is a n × 1 matrix such that L i t = n j=1 [X i , P ij ] t . In (5.5), we use the fact The second equality follows from the fact that ∆V t = 1 or 0. From (5.3)(5.4)(5.5), we can get the form of g x (X T )(X T + Y T ) since g xx (X T )(X T , X T ) = P TXT ,X T . Then we haveĴ = E T 0 δb p t + δσ q t + δf + 1 2 δσ P t δσ dt + o( ), (5.6) where o( ) represents E T 0 q t , δσ xXt + δσ P t σ xXt + δb P tXt + δσ Q tXt dt .
Theorem 5.1. Suppose Assumption H is satisfied. Let u be the optimal control, and X is the trajectory of u. If (p, q) satisfies (5.1); P satisfies (5.2), then for any w ∈ U , we have the following inequality a.e. a.s., Proof. Noticing that ∞ n=1 T n is negligible under P × Leb, by (5.6) we can obtain If we divide both sides by and let tend to 0, we obtain for a.e.t E H(t, Xt, v, pt, qt) − H(t, Xt, u, pt, qt) Then for any A ∈ Ft and w ∈ U , letting v = wI A + uI A c , we have  H(t, Xt, w, pt, qt) − H(t, Xt, u, pt, qt) + 1 2 (σ (t, Xt, w) − σ (t, Xt, u))Pt(σ(t, Xt, w) − σ(t, Xt, u)) ≥ 0.
Remark 5.2. Compared with the results in [15], (5.7) is independent of the jump term c. In other words, (5.7) only describes the optimal control in the area where V does not jumps. The behavior of the optimal control on the jump times ∞ n=1 T n remains unsolved. In our progressive model, we need to give characterizations of the optimal control not only on the continuous part ( ∞ n=1 T n ) c but also on the jump part ∞ n=1 T n , while in the predictable model of [15] only "continuous part" need to be obtained. This because the performance of predictable processes on jump part is similar to that on continuous part. Therefore, our model gives a more comprehensive and detailed description of the optimal control.

Conclusion
In this paper, we give the Maximum Principle of systems driven byṼ , a martingale depends on a Markov chain. To apply the approach in [7], the key issue is the order of the last term of (4.2). The same issue appears in [9]. To fix this problem, we introduce the spike variation (4.1) to make the last term of (4.2) become 0.
However, another problem follows. The new control u is not predictable sinceṼ is quasi-left continuous. We deal with this problem by introducing the stochastic integral with respect to progressive processes. Compared with our previous work [9], we apply a similar method in this paper. This is because Poisson processes and V are both counting processes and quasi-left continuous. The quasi-left continuity ensures that the stochastic integral with respect to progressive processes makes sense. They are counting processes to make sure their respective quadratic variation processes are themselves, which is an important property. However, the systems driven by Markov chains are more widely used than the systems driven by Poisson random measures. Also because of their similarity, the flawed estimate in [10] may appear in the estimate of systems driven byṼ . Specifically, if we use the predictable quadratic variation (2.3) instead of quadratic variation (2.2) when applying the BDG inequality, the issue in (4.2) will not exist. Then the approach in [7] seems to be right.
The Maximum Principle in this paper is obtained in a clear and concise framework and lays a solid foundation for further related theoretical and application research. Our future work is to give a characterization of the optimal control on the jump times of V .

Appendix A. The existence and uniqueness of the solutions
We are given the following SDE We introduce a Banach space S p [0, T ] := X | X has càdlàg paths and adapted and E sup 0≤t≤T |X t | p < ∞.
with norm X p p = E sup 0≤t≤T |X t | p .
Lemma A.1. Suppose that X t is an adapted process with càdlàg paths, then for any p > 0 we have where C is a constant only depending on p.
Proof. We can suppose E sup 0≤s≤T |X s | p < ∞, otherwise the conclusion is obvious. Set A t = t 0 |X s− |dV s . Since V t is a counting process, A t is a pure jump process. Notice that the jump time of A t is also a jump time of V t and the jump size of V t is always equal to 1, so we have For any k ≥ 1, since A ·− , X ·− and I [0,T k ] are predictable, we have Since E A p s∧T k ≤ kE sup 0≤t≤T |X t | p , by Gronwall's inequality we have Let k tend to infinity, by Fatou's lemma we obtain We have the following assumptions: Assumption H1: where C is a constant only depend on p. Choose T small enough that C T p + T p 2 + e CT T < 1, then T is a contraction.
For arbitrary T , we can split T into finite small pieces, so that we get a unique solution on each piece and connect them. Now we give the L p estimate of the solution.