Strong Rates of Convergence for Space-Time Discretization of the Backward Stochastic Heat Equation, and of a Linear-Quadratic Control Problem for the Stochastic Heat Equation

We introduce a time-implicit, finite-element based space-time discretization scheme for the backward stochastic heat equation, and for the forward-backward stochastic heat equation from stochastic optimal control, and prove strong rates of convergence. The fully discrete version of the forward-backward stochastic heat equation is then used within a gradient descent algorithm to approximately solve the linear-quadratic control problem for the stochastic heat equation driven by additive noise.

We consider problem SLQ as a prototype example of a (linear-quadratic) stochastic optimal control problem involving a stochastic PDE, for which corresponding numerical analyses so far are rare in the existing literature; see e.g.[23,8].This is in contrast to the deterministic counterpart problem LQ which involves a linear PDE, where optimal rates of convergence are available for (finite element based) space-time discretization of related optimality conditions (see e.g.[17,16,21,18,11]), which may then be used as part of a gradient descent algorithm with step size control [12] to approximate the minimizing tuple (X * , U * ), which here consists of deterministic state and control functions.If compared to problem LQ, problem SLQ owns some distinctive characters and additional difficulties caused by the driving Wiener process in the SPDE (1.2), which make the generalization of the numerical results for the deterministic control problem to SLQ a non-trivial task.For example, a crucial difficulty consists in solving the adjoint equation in the context of SLQ, which here is a backward stochastic PDE (BSPDE for short) of the form having a solution tuple (Y, Z) ∈ L 2 F Ω; C([0, T ]; H 1 0 ) ∩ L 2 (0, T ; H 1 0 ∩ H 2 ) × L 2 F Ω; L 2 (0, T ; H 1 0 ) ; cf.[7].The adjoint variable Y is then related to the optimal control by Pontryagin's maximum principle, which in the case of problem SLQ is (1.4) 0 = U * (t) − Y (t) ∀ t ∈ (0, T ) .
The convergence analysis of space-time discretization BSPDEs is a recent research subject, and available results are rare as well.A first error analysis for an (abstract) time-space discretization based on the implicit Euler method for the above BSPDE (1.3) is [23], where the error depends on the ratio of time discretization and Galerkin parameters.That work heavily draws conclusions with the help of Malliavin calculus, and the (space-)time regularity of solutions to the underlying BS(P)DE.In [8], the authors derive rates of convergence for a conforming finite element semidiscretization, and discuss its actual implementation.The proofs in [8] use simple variational arguments, resting on improved regularity properties of the variational solution, Itô's formula, and approximation results for the finite element method.However, the interplay of spatial and temporal discretization errors is left open in [8], in particular the relevant question regarding unconditional convergence rates, which allow discretization parameters w.r.t.time and space to independently tend to zero, and general regular space-time meshes.
We address this issue in Section 3 as our first goal in this work: its derivation requires to study the time regularity of the solution (Y, Z) to BSPDE (2.10) in particular, which seems not available in the literature so far, and uses Malliavin calculus for that matter for the solution component Z, in particular.For this purpose, we borrow related arguments from [26,13] where BSDEs are studied, and variational arguments.
The second goal in this work is addressed in Section 4, where strong error estimates for a space-time discretization of the coupled forward-backward SPDE (1.2)-(1.3)(FBSPDE for short) are shown.These results extend available ones (cf.[8]) in the literature in several aspects: the obtained strong convergence rates for the used finite element based space-time discretization (4.15) holds for arbitrary times T -and is not only a semi-discretization in space where optimal rates are obtained in [8] for small times T via a contraction argument.The numerical analysis uses variational arguments to first bound the error in the optimal controls, and here exploits the unique solvability of (the discretization of) problem SLQ, as well as the related sufficient and necessary optimality conditions; in a second step, error bounds for the optimal state, and its adjoint are based on stability properties of the state equation, and the adjoint equation available from Section 3.
To solve a BSPDE computationally requires huge computational resources (see [8]), and it is even more computationally demanding (in terms of computational storage requirements and computational times) to solve the coupled FBSPDE.Consequently, an alternative numerical strategy to the space-time discretization of FBSPDE is useful to make accurate computations feasible, which decouples the computation of (approximating iterates of) the solution parts from a SPDE from that of a BSPDE per iteration.A simple fixed-point method on the level of optimality conditions to accomplish this goal is known to converge only for small times T > 0 (cf.[8]); instead, we may again return to the fully discretized problem SLQ hτ (4.13)-(4.14)and exploit its character as a minimization problem to initiate a gradient descent method to successively determine approximations of the optimal control; this method is detailed in Section 5, and a convergence order is shown for this iteration which is the final goal in this work.
The rest of this paper is organized as follows.In Section 2, we introduce notations, and review relevant properties of the problems BSPDE (2.10) and FBSPDE considered in this work.In Section 3, we prove strong error estimates for a space-time discretization of BSPDE.By virtue of the obtained error estimates, in Section 4, we prove a convergence rate for a space-time discretization of FBSPDE, which is related to problem SLQ.Convergence of the related iterative gradient descent method towards the minimizer U * of SLQ is shown in Section 5.

Preliminaries
2.1 Notation -involved processes and the finite element method • H 2 respectively.Let (Ω, F, F, P) be a complete filtered probability space, where F is the filtration generated by the R m -valued Wiener process W , which is augmented by all the P-null sets.Below, we set m = 1 for simplicity.The space of all F-adapted processes F Ω; C([0, T ]; K) .We partition the bounded domain D ⊂ R d via a regular triangulation T h into elements K with maximum mesh size h := max{diam(K) : K ∈ T h }, and consider spaces where P i (K) denotes the space of polynomials of degree i (i = 0, 1).The L 2 -projection Π i h : h .We use approximation estimates for the projection Π 1 h , and an inverse estimate (cf.[5]) to conclude that We denote by For simplicity, we choose a uniform partition, i.e. τ = T /N and τ ≤ 1.The results in this work still hold for quasi-uniform partitions.

The stochastic heat equation -strong convergence rates for a space-time discretization
For a given U ∈ L 2 F Ω; L 2 (0, T ; L 2 ) , and X 0 ∈ H 1 0 in (1.2), there exist a strong solution X ∈ L 2 F Ω; C([0, T ]; H 1 0 ) ∩ L 2 (0, T ; H 1 0 ∩ H 2 ) solving P-a.s. for all t ∈ [0, T ] (2.2) A finite element discretization of (2.2) then reads: For all t ∈ [0, T ], find ) such that P-a.s. and for all times t ∈ [0, T ] (2.4) Equation (2.4) may be recast into the following SDE system, (2.5) The derivation of a strong error estimate is standard, and uses the improved (spatial) regularity properties of the strong variational solution, (2.6) sup We now consider a time-implicit discretization of (2.4) on a partition I τ of [0, T ].The problem then reads: For every 0 where ∆ n+1 W := W (t n+1 ) − W (t n ).The verification of the error estimate (see [24]) rests on stability properties of the implicit Euler, as well as the bound (2.9) which requires additional regularity properties of involved data, i.e., X 0 ∈ H 1 0 ∩ H 2 , as well as and the H 1 -stability of the L 2 -projection Π 1 h ; cf.[6,4].

The backward stochastic heat equation -a finite element based spatial discretization
such that P-a.s. for all times t ∈ [0, T ] (2.11) and there exists a constant C ≡ C(D, T ) > 0 such that (2.12) E sup The existence of a strong solution to (2.10) in the above sense, as well as its uniqueness are shown in [7].
We now consider a finite element discretization of the Equation (2.13) is equivalent to the following system of BSDEs: The existence and uniqueness of a solution tuple (Y h , Z h ) e.g.follows from [9, Theorem 2.1].Moreover, there exists C ≡ C(f, T ) > 0 such that -The following result is taken from [8, Theorem 3.2], whose proof exploits the bounds (2.12).
Let (Y, Z) be the solution to (2.11), and (Y h , Z h ) solve (2.13).There exists Choosing Y T,h = Π 1 h Y T thus leads to an error estimate for the spatial semi-discretization (2.14).

Temporal discretization of the backward stochastic heat equation -the role of the Malliavin derivative
The numerical analysis of a temporal discretization of (2.14) requires Malliavin calculus to bound temporal increments We therefore recall the definition of the Malliavin derivative of processes, and the crucial connection between the Malliavin derivative of Y h and Z h from (2.14).For further details, we refer to [20,9].Let us recall that F T = σ{W (t); 0 ≤ t ≤ T }, and that K denotes a separable Hilbert space.We define the Itô isometry W : For ℓ ∈ N, we denote by C ∞ p (R ℓ ) the space of all smooth functions g : R ℓ → R such that g and all of its partial derivatives have polynomial growth.Let P be the set of R-valued random variables of the form (2.16) In general, we can define the k-th iterated derivative of F by D k F = D(D k−1 F ), for any k ∈ N. Now we extend the derivative operator to K-valued variables.For any k ∈ N, and u in the set of K-valued variables: we can define the k-th iterated derivative of u by , and and its Malliavin derivative 3 Strong rates of convergence for a space-time discretization of the BSPDE (2.10) In this section, we introduce the time discretization scheme (3.6) to approximate the solution The main results are Theorems 3.4 and 3.6 in Subsection 3.2.Their derivation crucially hinges on the time regularity of the solution (Y h , Z h ) to (2.14), which is provided in the subsequent Subsection 3.1.

14)
We start with the derivation of uniform estimates for Y h which control its temporal increments.We note again that all involved generic constants C > 0 do not depend on h.
Here, the constant C > 0 only depends on Y T,h , f and T .
Proof.We only prove (i).The other statements can be proved in a similar vein.By BSDE (2.14), we get 14), we see Then (i) can be deduced by the above estimates.
Then, it holds that Proof.By Lemma 2.2, we know that Z h (t) = D t Y h (t) for all 0 ≤ t ≤ T , and therefore, for In what follows, we estimate the two terms on the right-hand side of (3.2) independently. Step . By (2.17), we have the BSDE Itô's formula and Poincaré's inequality lead to Taking θ 2 = s and θ 1 = t and using (3.1) then lead to Step 2. By (2.17), Itô's isometry together with Poincaré's inequality, Inserting (3.5) and (3.4) into (3.2) then settles the proof of the lemma.

A time-implicit space-time discretization of the BSPDE (2.10)
We use an implicit time discretization on the mesh I τ to approximate BSPDE h (2.14); we refer to it as BSPDE hτ , and the discretization reads as follows: For every 0 We introduce an auxiliary BSDE system on each time interval [t n , t n+1 ] for the convergence analysis of (3.6 ).Note that the integrand in the drift is evaluated with the help of the solution part Proof.The first identity is immediate; the second follows from multiplication of (3.7) with the admissible We may now prove a strong error estimate for the first component of (Y h , Z h ) that solves (2.14).
Proof.Consider (Y h , Z h ) from (3.7), and define Fixing one realization ω ∈ Ω, testing with the admissible e n (ω) ∈ V 1 h , using binomial formula, and then taking expectation, and Poincare's and Young's inequality lead to (3.10) Subsequently, the discrete Gronwall inequality leads to Then, summing up over all steps of (3.10) yields (3.12) Then, (3.11), (3.12) together with (ii) of Lemma 3.1, and 3.3 lead to the desired estimate.
By Theorems 2.1, 3.4 and Lemma 3.1 (i), we thus get the following convergence rate for the approximation {Y n h } N n=0 of the first solution component Y to (2.11) via the space-time discretization scheme (3.6), (3.13) max Remark 3.5.If the drift term of (2.10) is −∆Y (t, x) + f t, x, Y (t, x) , with a Lipschitz nonlinearity f , we may apply a similar procedure to get the above convergence rate.However, the above strategy is not clear to be successful if Z appears in the drift term.
We now derive estimates for the approximation {Z n h } N −1 n=0 of the second solution component Z to (2.11), which uses the characterization Z h (t) = D t Y h (t), and (2.17).
Theorem 3.6.Let (Y h , Z h ) solve (2.13), where data satisfy the assumptions in Lemma 3.1 (ii), Lemma 3.2, as well as The proof begins with an estimate for Z h − Z h , which exploits time regularity properties of the solution (Y h , Z h ) in stronger norms; cf.Lemma 3.1, (ii).Moreover, the following technical result is needed; see also [23].
Lemma 3.7.For any ϕ ∈ L 2 F (0, T ; K) and 0 ≤ s < t ≤ T , define For any ξ ∈ L 2 Fs (Ω; K) there holds Proof.Let {φ i } ∞ i=1 be an orthonormal basis of K, and Π n be the projection from K to span{φ i : Since, ϕ 0 and ξ are F s -measurable, the last term vanishes, i.e., Therefore, By letting n ↑ ∞, we may therefore conclude which completes the proof.
Proof of Theorem 3.6.
Step 1. Claim: theres exists a constant C, which is independent of h, τ , such that We recall the definition of {e n } N −1 n=0 in the proof of Theorem 3.4, as well as equation (3.9), which we recast into the form Taking squares and afterwards expectations on both sides, by binomial formula, Itô isometry, and Young's inequality, we arrive at By the discrete version of Gronwall's inequality, and Lemma 3.1, (iii), the right-hand side is bounded by Cτ .Hence, (3.14) is proved.
Step 2. We use the triangle inequality twice to deduce By the definition of Z n h , on taking ξ = Z h (t n ) in Lemma 3.7 we may further estimate by We use (3.14) to bound the first term, and Lemma 3.2 is utilized to bound the last term.

Strong rates of convergence for a space-time discretization of SLQ
In this part, we discretize the original problem SLQ within two steps, starting with its semidiscretization in space (which is referred to as SLQ h ), which is then followed by a discretization in space and time (which is referred to as SLQ hτ ).Our goal is to prove strong convergence rates in both cases.By [15], the SLQ problem is uniquely solvable, and its solution (X * , U * ) may be characterized by the following FBSPDE with the unique solution (X * , Y, Z, U * ), ( We remark that by (4.1) 1 , X * may be written as X * = S(U * ), where ) is the bounded 'control-to-state' map.Moreover, we introduce the reduced functional The first component of the solution to equation (4.1) 2 may be written as Y = T (X * ), where T T : which is also bounded.For every U ∈ L 2 F Ω; L 2 (0, T ; L 2 ) , the Gateaux derivative D J (U ) is also a bounded operator on L 2 F Ω; L 2 (0, T ; L 2 ) and takes the form (4.3) D J (U ) = U − T S(U ) .

Problem SLQ h : Semi-discretization in space
We begin with a spatial semi-discretization SLQ h of the problem SLQ stated in the introduction, which reads: Find an optimal pair (X subject to the equation The existence of a unique optimal pair (X * h , U * h ) follows from [25], as well as its characterization via Pontryagin's maximum principle, i.e., (4.6) where the adjoint In [8], optimal error estimates have been obtained for (X * h , Y h , Z h ) with the help of a fixed point argument -which crucially exploits T > 0 to be sufficiently small.The goal in this section is to derive corresponding estimates for (X * h , U * h , Y h , Z h ) for arbitrary T > 0 via a variational argument which exploits properties of the reduced functional J ≡ J (u) that is now defined: once an estimate for L 2 ds has been obtained, we use the convergence analysis from Section 3 to derive estimates for X * − X * h , as well as Y − Y h and Z − Z h .By the unique solvability property of (4.5), we associate to this equation the bounded solution operator , which allows to introduce the reduced functional (4.8) The first solution component to equation (4.7) may be written as Y h = T h (X * h ), where For every F Ω; L 2 (0, T ; V 0 h ) at U h , and has the form (4.9) F Ω; L 2 (0, T ; V 0 h ) be arbitrary; it is due to the quadratic structure of the reduced functional (4.8) that

As a consequence, on putting
Note that D J h (U * h ) = 0 by (4.6), as well as D J (U * ) = 0 by (4.2), such that the last line equals We use (4.3) to bound I as follows, , where . By Poincaré's inequality, and a stability bound (see also (2.12)) for the backward stochastic heat equation (2.10), as well as for the stochastic heat equation (2.2) (see also (2.3)), By optimality condition (4.2), and the regularity properties of the solution to BSPDE (2.10), we know that already U * ∈ L2 F Ω; C([0, T ]; H 1 0 ) ∩ L 2 (0, T ; H 1 0 ∩ H 2 ) ; as a consequence, the right-hand side of (4.12) may be bounded by Ch 2 .
We use the representation (4.9) and properties of the projection Π 0 h to bound II via where . We split II 2 a into two terms and In order to bound II 2 a,1 , we use stability properties for BSPDE (2.10), in combination with the error estimate (2.6) for (2.5) to conclude In order to bound II 2 a,2 , we use the error estimate in Theorem 2.1 for BSPDE (2.10), in combination with stability properties of (2.5), and again the error estimate (2.6) for (2.5) to find We now insert these estimates into (4.11)resp.(4.10) to obtain the bound By arguing as below (4.12), this settles part (i) of the following Theorem 4.1.Let (X * , Y, Z, U * ) be the solution to problem SLQ, and (X * h , Y h , Z h , U * h ) be the solution to problem SLQ h .There exists C ≡ C(X 0 , T ) > 0 such that Proof.Since U * ∈ L 2 F (Ω; L 2 (0, T ; H 1 0 )), and (i), the first estimate of (ii) can be deduced as (2.6).Assertion (iii) now follows accordingly as Theorem 2.1, thanks to (ii).

Problem SLQ hτ : Discretization in space and time
In this part, we provide the temporal discretization of problem SLQ h which was analyzed in Section 4.1.For this purpose, we use a mesh I τ covering [0, T ], and consider processes ( where and for any X ∈ X hτ , U ∈ U hτ , . Problem SLQ hτ then reads as follows: Find an optimal pair (X * hτ , U * hτ ) ∈ X hτ ×U hτ which minimizes the cost functional (4.13) subject to the difference equation The following result states the Pontryagin maximum principle for problem SLQ hτ , which is later used to verify convergence rates for the solution to problem SLQ hτ towards the solution to SLQ.
Theorem 4.2.Problem SLQ hτ admits a unique minimizer together with By (4.16), we can see that U * hτ is càdlàg, and then U * hτ ∈ U hτ .Inserting (4.16) into (4.15) 1 leads to a coupled problem for , where (4.15) 2 is similar to (3.6).Note that no Z-component appears explicitly in (4.15) 2 , where the conditional expectation is used to compute the Y -component.It is in particular due to the need to compute conditional expectations in (4.15) 2 that the optimality system (4.15)-(4.16) is still not amenable to an actual implementation, but serves as a key step towards a practical method which approximately solves SLQ hτ -which is proposed and studied in Section 5.
Proof.We divide the proof into three steps.
Step 1.Let A 0 := (1 − τ ∆ h ) −1 .For any U hτ ∈ U hτ , by equation (4.15) 1 , we have (4.17) Hence, by iteration we arrive at Here, Γ : V 1 h → X hτ and L : U hτ → X hτ are bounded operators.Below, we use the abbreviations Claim: For any ξ ∈ X hτ , and any where (Y 0 , Z 0 ) solves the following backward stochastic difference equation: Proof of Claim: The existence and the uniqueness of solutions to (4.21) are obvious.Note that With the similar procedure as that in (4.17), we conclude from (4.22) and (4.21) 2 , ( Let U hτ ∈ U hτ be arbitrary.By the definition of L, (4.23) and the fact Since the second argument is V 1 h -valued, we may skip the projection operator in the first argument, and may continue instead Because of (4.23), the latter equals which is the first part of the claim.
The remaining part can be deduced from the fact that and the following calculation: Step 2. By (4.18) and (4.19), we can rewrite J τ (X hτ , U hτ ) as follows: , Rearranging terms then leads to Therefore, for any U hτ ∈ U hτ such that U hτ = U * hτ , which means that U * hτ is the unique optimal control, and (X * hτ , U * hτ ) is the unique optimal pair.Step 3. By the definition of N, H, L * , L * , and properties (4.20) and (4.18), we can get which is (4.16).This completes the proof.
We are now ready to verify strong rates of convergence for the solution to SLQ hτ ; it is as in Section 4.1 that the reduced cost functional J hτ : U hτ → R is used, which is defined via where S hτ : U hτ → X hτ is the solution operator to the forward equation (4.15) 1 .Moreover, we use the solution operator T hτ : X hτ → X hτ for the first solution component of the backward equation (4.15) 2 .
Theorem 4.3.Suppose that X(0) ∈ H 1 0 ∩ H 2 , and Let (X * h , Y h , Z h , U * h ) be the solution to problem SLQ h , and (X * hτ , Y hτ , U * hτ ) be the solution to problem SLQ hτ .There exists C ≡ C(X 0 , T ) > 0 such that Proof.We divide the proof into three steps.
Step 1.We follow the argumentation in the proof of Theorem 4.1.For every U hτ , R hτ ∈ U hτ , the first Gateaux derivative D J hτ (U hτ ), and the second Gateaux derivative D 2 J hτ (U hτ ) satisfy Define the (piecewise constant) operator Π τ : 26), and applying the fact Therefore, We use (4.9) and (4.6), and stability properties of the projection Π 0 h to bound I ′ as follows, (4.29) By stability properties of solutions to BSPDE h (2.14), and the discretization (2.5) of BSPDE, we obtain . By the optimality condition (4.6), estimate (2.15), and Theorem 4.1 (i) we have Next, we turn to II ′ , for which we use the representations (4.9), (4.26) and the stability property of Π 0 h to conclude In order to bound II ′ a,1 , we use stability properties for SPDE h (2.5), BSPDE h (4.7), in combination with the error estimate (2.8) for (2.5) to conclude To bound II ′ a,2 , it is easy to see By (4.9), Lemma 3.1 (i), stable property of (4.15) 1 , we can get Here, we apply the representation of X hτ (4.18), the fact X ∈ L 2 (0, T ; H 1 0 ), and condition (4.25).Now we insert above estimates into (4.28) to obtain assertion (i).
Step 2. For all k = 0, 1, • • • , N , we define e k X = X * h (t k ) − X * hτ (t k ).Subtracting (4.14) from (4.5) leads to Testing with e k+1 X , and using binomial formula, Poincaré's inequality, independence, and absorption lead to By taking the sum over all 0 ≤ k ≤ n and 0 ≤ k ≤ N − 1, and noting that e 0 X = 0, we find that max By (4.28), the first term on the right-hand side is bounded by Cτ .We use Itô isometry for the second term, and Hölder regularity in time of σ to bound it equally.Adopting the method in (4.31), we can bound the third term by . We use (4.31) to bound the last term by Cτ .That is assertion (ii).
Step 3. Firstly, we introduce an auxiliary BSDE (4.32) With the same argument as that in the proof of Theorem 3.4, we can deduce Applying Lemma 3.1 (ii), the first integral term is bounded by It remains to estimate the second integral term, which is bounded by Assertion (iii) now follows from the above three statements and conditions on X 0 , σ, X.
5 The gradient descent method to solve SLQ hτ By Theorem 4.2, solving minimization problem SLQ hτ is equivalent to solving the system of coupled forward-backward difference equations (4.15) and (4.16).We may exploit the variational character of problem SLQ hτ to construct a gradient descent method SLQ grad hτ where approximate iterates of the optimal control U * hτ in the Hilbert space U hτ are obtained; see also [19,14].hτ ∈ U hτ as follows: hτ (T ) = −α X  .
Note that Steps 1 and 2 are now decoupled: the first step requires to solve a space-time discretization (2.7) of SPDE (2.2), while the second requires to solve the space-time discretization (3.6) 1 of the BSPDE (4.1) 2 .We refer to related works on how to approximate conditional expectations [3,10,1,22]; a similar method to SLQ grad hτ to solve problem SLQ hτ has been proposed in [8].
We want to show convergence of SLQ grad hτ for κ > 0 sufficiently large and ℓ ↑ ∞.For this purpose, we recall the notations S hτ , T hτ , J hτ introduced in Section 4.2.For this purpose, we first recall Lipschitz continuity of D J hτ : since where operators L, L are defined in (4.18), we find K := 1 + L * L + α L * L L(U hτ ;U hτ ) , such that Since SLQ grad hτ is the gradient descent method for SLQ hτ , we have the following result.Proof.We know that D J hτ is Lipschitz continuous with constant K > 0. Also, J hτ is strongly convex.Hence, the gradient descent method in abstract form is the following iteration (see Algorithm where L, L, Γ, Γ, f are defined in (4.18) and (4.19).Via (4.20), we have that Π 0 h T hτ S hτ (U hτ , the solution of Step 2 in Algorithm 5.1.Therefore, (5.1) is consistent with the gradient descent method SLQ grad hτ .The desired error estimates now follow by standard estimates for the gradient descent method (see, e.g.[19,Theorem 1.2.4]).