KERNEL REPRESENTATION OF KALMAN OBSERVER AND ASSOCIATED H -MATRIX BASED DISCRETIZATION

. In deterministic estimation, applying a Kalman ﬁlter to a dynamical model based on partial diﬀerential equations is theoretically seducing but solving the associated Riccati equation leads to a so-called curse of dimensionality for its numerical implementation. In this work, we propose to entirely revisit the theory of Kalman ﬁlters for parabolic problems where additional regularity results proves that the Riccati equation solution belongs to the class of Hilbert-Schmidt operators. The regularity of the associated kernel then allows to proceed to the numerical analysis of the Kalman full space-time discretization in adapted norms, hence justifying the implementation of the related Kalman ﬁlter numerical algorithm with H -matrices typically developed for integral equations discretization

the estimator -also called observer -dynamics, even in the deterministic context, and with additional subtlety in the stochastic context, see [7] or the survey [19].Moreover, refined question of regularity arises, in particular concerning the Riccati operator.These questions have been widely studied in the literature, see [9,21,34] and references therein, with new results in the class of Hilbert-Schmidt [14,15,20] obtained recently for certain classes of parabolic equations.Building on these new results, we here proposed a complete vision of Kalman theory for PDEs, from its deterministic definition and its essential properties up to its complete full space-time numerical analysis in the space of Hilbert-Schmidt operators, completing in the sense the initial works of [14,26].Moreover here, in circumstances of adequate regularity, we aim at proving that the continuous Riccati operator is associated with a regular kernel which can be discretized using Hierarchical matrix algebra [10,30], as well known when discretizing integral equations [2] but also experimented with Algebraic Riccati Equations [27].
Here, we mathematically justify in the context of the Kalman filter previous numerical experiments [38], while proposing numerical improvements in the H-matrix treatment of the Riccati operators.This allows to overcome -here for parabolic problems -the classical curse of dimensionality that faces the Kalman filter for PDEs, sometimes limiting its use with respect to the least squares minimization using adjoint based methods [43].In fact, with our approach, we can consider discretization with millions of degrees of freedom and approximate the corresponding Riccati operators, whereas classical research directions are more inclined to follow reduced basis methods model approximation [5,17,42,44] or reduced covariance strategies [16,35,46] to be computationally effective.

A general parabolic initial value problem
Let Z and V two separable Hilbert spaces with V dense in Z. Denoting Z and V their respective topological dual spaces, we assume a compact injection i : V → Z such that we identify V ⊂ ) Z ≡ Z ⊂ ) V , and we denote • Z and • V with associated scalar product (•, •) Z and (•, •) V .By contrast, the duality pairing on V × V is denoted by •, • V .
We consider an operator A ∈ L(V , V) associated with a continuous and coercive bilinear form a in V × V → R ∀(v, w) ∈ V, Av, w V = a(v, w), With a slight abuse of notation, we also consider A as an unbounded operator from D(A) to Z with We denote A * of domain D(A * ), the adjoint operator.The fractional power A ρ , 0 ≤ ρ ≤ 1, are defined following [33] -see also Section II-1.4 of [9] and we further restrict our study to the classical case D(A 2 ) = D(A 1 2 * ) = V, studied and illustrated in [41].Given T > 0, we consider a dynamical system in Z represented by the following dynamics ż + Az = Bν, in (0, T ) z(0) = z 0 , ( where we use the short notation ż = d dt z.The quantity (0, T ) t → ν(t) ∈ U is typically an unknown contribution to the dynamics -namely a so-called model error -and U is also assumed to be a Hilbert space with associated norm • U .For the sake of simplicity, we consider a model-error operator B ∈ L(U, Z).
The operator (A, D(A)) is obviously maximal accretive, hence -by Lummer-Philips theorem [45] -is the generator of a C 0 -semigroup of contraction Φ such that Moreover, Φ is analytical Theorem 3.6.1 of [49] -see Definition 2.3 and Theorem 2.11 of [9].
For the sake of completeness, we recall the various notions of solution of the non-homogeneous linear evolution equations (1.1) that will be used in this work -see for instance ([9], Part II).
Proof.We here aggregate several classical results.For (i), we refer for instance to II-1 Proposition 3.3 of [9], For (ii), we refer to II-1 Proposition 3.2 of [9] and for (iii), we refer to II-2 -Theorem 1.1 of [9].

Uncertainties and observation modeling
We are now going to introduce some model uncertainties in a deterministic framework of estimation.Let us consider that we are interested in the prediction of a natural system that follows the dynamics (1.1).We denote the target trajectory {ž(t), t ∈ [0, T ]}, obtained from (1.1) with unknown initial condition and model error ν.More precisely concerning the initial condition, we separate the unknown part ζ from the known part ẑ0 of z(0) such that ž(0) = ẑ0 + ζ.
We typically assume to have an a priori on the level of uncertainty in ζ namely ζ Z = O(α) with α known.Another choice could be to assume, as it is for ill-posed inverse problems, that the initial condition belongs to a more regular space V s , typically with the injection i s : V s → V at least continuous.In this case, the estimation procedure should benefit from knowing that ζ Vs = O(α).
The source error is typically assumed to belong to L 2 ((0, T ); U) or more strongly to L ∞ ((0, T ); U) with for instance To circumvent this lack of information on this system, we assume to observe the given target trajectory, hence we expect to estimate the associated initial condition and the model error from the available measurements.We model the measurement procedure, by an observation operator C, such that a given measurement y ∈ Y is modeled from the application of C on a given z ∈ Z, namely (1.5) For the sake of simplicity, we restrict ourselves to bounded observation operators.The available noisy measurements are y δ and they are a perturbation of the unavailable perfect measurements y = C ž such that, η = y δ − y belongs to L ∞ ((0, T ); Y) or more strongly to L ∞ ((0, T ); Y) with for instance We recall that compensating the lack of knowledge on ( ζ, ν) by the known data y δ , consists in being able to invert the operator The operator Ψ can be injective but is not surjective as A is analytical hence Φ is regularizing.Therefore, inverting Ψ is ill-posed.

The advection-diffusion example
As an illuminating example all along this article, we consider an advection-diffusion problem.We introduce a bounded domain Ω ⊂ R d of C 2 boundary where we will define solutions of an advection-diffusion equation with an unknown source term error.We introduce a strictly positive known continuous function f ∈ C 0 (Ω) with ∀x ∈ Ω, f (x) > 0 but a potentially unknown time dependent ν ∈ L 2 (0, T ) where b ∈ H 1 (Ω) ∩ L ∞ (Ω) is a given velocity field such that ∇ • b = 0.The model defined by the dynamics (1.7) enters the framework introduced in Section 1.2.1 with and a model error operator given by of corresponding adjoint operator Note in particular that D(A [41].About the measurements, we typically consider to observe the system over a subdomain ω.Therefore, we have In this case, Ψ -defined by (1.6) -is injective.Indeed, let us consider (ζ, ν) ∈ Z × L 2 (0, T ) such that y = Ψ(ζ, ν) ≡ 0. We first have, using the equation (1.7) satisfied in ω ⊂ Ω , that As f is strictly positive, we deduce that ν ≡ 0.Then, from classical observability inequalities for parabolic equation from interior measurements -see [28] and references therein -we have Finally, by backward uniqueness of the solution z(T ) = 0 ⇒ ζ = 0.

least squares criterion
A classical estimation approach consists in formulating the estimation problem as an optimal control problem, hence minimizing a least squares criterion balancing the uncertainties.We thus introduce, for γ ∈ R + , where we denote by z |ζ,ν (s), a trajectory of (1.1) for a corresponding initial condition z(0) = ẑ0 + ζ.
On the one hand, the bilinear and symmetric form a s is a penalization terms aiming at controlling the regularity of the estimated initial condition of such ill-posed problem.Moreover a s will be assumed to be coercive and bounded in V s , with typically the existence of 0 < ε < α −1 such that Denoting A s the Friedrichs extension of the triple (Z, V s , a s ), we introduce and -again with a slight abuse of notation -the operator Π 0 = A −1 s can be either consider as a bounded application from Z to D(A s ) or from V s to V s .The term where •, • Vs stands for the duality product, is a generalized Tikhonov regularization term [23] enforcing a regularity in V s ⊂ Z.The operator Π 0 will be called the a priori initial covariance operator, as we typically expect that the target trajectory satisfies On the other hand, the parameter γ is a scaling positive parameters to balance uncertainty in the data information with respect to uncertainty in the source and in the initial condition.In practice for a given Π −1 0 and κ, γ is adjusted with respect to the estimated observation noise scale δ.
In this setting, our objective is typically to find the trajectory associated with min ζ∈Vs ν∈L 2 ((0,T );U ) and we emphasize that minimizing J T must be understood as a minimization under the constraint that z |ζ,ν follows the dynamics (1.1).
Let us now introduce for all z ∈ L 2 ((0, T ); Z) and y ∈ L 2 ((0, T ); Y) the adjoint dynamics which is also well posed as it is considered backward in time.Namely, we have q T ∈ C 0 ([0, T ], Z) from Theorem 1.2.The adjoint variable allows to easily compute the Fréchet derivatives with respect to ζ and ν.We find for a given (ζ, ν) At extremum, we obtain the Euler equation associated with the minimization where qT is the adjoint variable associated with the optimal trajectory zT = z | ζT ,ν T and the available measurements y δ .This leads to the so-called two-ends problem defining the optimal dynamics of the estimator: in (0, T ) zT (0) = ẑ0 + Π 0 qT (0), qT (T ) = 0.
(1. 13) where Remark 1.4.Note that here, the proof is compatible with the semigroup theory solutions whereas a slightly different approach based on the notion of variational solution could have been possible.In particular, the variational frameworks allows to introduce the Lagrangian associated with the dynamics constraint hence to illuminate the adjoint equation definition.We refer to the first chapters of [7] that adapts to observation theory the variational evolution equation control framework initially developed in [39].
Remark 1.5.In general (1.13) is solved using a gradient descent approach.Defining the gradient from Riesz' representation theorem in the space V s × L 2 ((0, T ); U), the gradient descent approach reads , can be proved to be convergent for an adequate relaxation parameter ρ < 1, small enough.This gradient descent consists in solving, from (ζ 0 , ν 0 ) = (0, 0) and for k ≥ 0, the weakly coupled system Note that the existence of a solution of (1.13) can be understood as the limit of the well-posed dynamics (1.14a)-(1.14b).

Singular value decomposition
In this section, we want to give one example of possible choice of Π 0 among many others.In this respect, let us consider the compact operator T = A −1 : Z → Z.We introduce (e n ) n∈N and (f n ) n∈N respectively the orthonormal basis associated with the diagonalization of T * T and T T * respectively.We denote (µ n ) n∈N the sequence of positive eigenvalues which decrease to 0.
We recall the following decomposition with (e m , e n ) Z = δ mn , (f m , f n ) Z = δ mn , and T f n = µ n e n , T * e n = µ n f n .We can then define and given s > 0, we introduce Note that we have in particular V 1 4 = V as we initially restrict our study to the case where D(A We further assume that there exists s 0 ≥ 1 2 such that Then, we propose to define Π 0 as If s ≥ 0, Π 0 is compact.Moreover if s ≥ s 0 , Π 0 is a Hilbert-Schmidt operator (see the recalled definition in Sect.2.1).Finally, hence ensuring that a s is bounded and coercive in V s .

The Kalman sequential estimator
The principle of optimal sequential estimation [7] is to avoid solving (1.13) by decoupling the corresponding two-ends dynamics.In this respect, we will find a Cauchy problem formulation of the so-called optimal sequential estimator -also called optimal observer -defined by the following: Definition 1.6 (The optimal sequential estimator).For all time t > 0, considering the optimal trajectory zt associated with the minimizer of J t , the optimal sequential estimator ẑ is defined by ∀t ≥ 0, ẑ(t) = zt (t). (1.16) We are going to prove that the optimal sequential estimator is in fact given by the estimator proposed by [32], and often called Kalman-Bucy estimator or simply Kalman estimator, here generalized to PDE formulations [7].Yet, we need to introduce the so-called Riccati operator solution to a Riccati dynamics before characterizing the dynamics of ẑ.

Riccati dynamics
We introduce the spaces of linear auto-adjoint bounded operators and the cone in S(Z) We then consider the following Riccati dynamics (1.17) for which we seek a solution in C 0 ([0, T ], S + (Z)), the set of all continuous mappings from [0, T ] to S + (Z), endowed with the topology of pointwise convergence: As for the evolution equation (1.2), we face several types of solution of (1.17), listed in the next Definition 1.7.Before that, we need to introduce an operator over the set of bounded symmetric operator and its associated domain.Let us denote Υ : with, for any Q ∈ S(Z), the corresponding bilinear form so that Υ is defined with the domain Note that, going back to our definition of Π 0 from the singular value decomposition of A in Section 1.5.2,we have that Π 0 ∈ D(Υ) when s ≥ 1 2 .Indeed, we easily find in this case Definition 1.7 (Notion of Riccati solution).We list 3 different notions of solution to Problem (1.17): Then we have the following existence results of the Riccati operator Π, also called covariance operator for its interpretation in the stochastic filtering framework [7,19].
Theorem 1.8 (Existence of Riccati solution).We list 2 cases of existence of a solution to Problem (1.17): (i) Assuming that B and C are bounded and Π 0 ∈ S + (Z), the Riccati dynamics (1.17) admits one and only one weak solution Π ∈ C 0 ([0, T ], S + (Z)), which is also a mild solution in the sense of (1.18).(ii) Assuming moreover that Π 0 ∈ D(Υ), then the Riccati dynamics (1.17) admits one and only one strict solution Proof.The existence and uniqueness of a mild solution is justified in IV-1 Theorem 2.2 of [9].The fact that this solution is also a weak solution is given in IV-1 Proposition 2.1 of [9].Finally, the existence of a strict solution is proved for variational operator in IV-1 Proposition 3.2 of [9].
Remark 1.9 (Time-dependent observation and control operators).Theorem 1.8 directly extends to cases where we have a time-dependent observation operator -time-dependent control operator resp.-as soon as We finally conclude this section by recalling the benefit of relying on variational semigroup to impose additional regularity properties for the Riccati operator Π, proved in IV-1 Theorem 3.3 of [9] using an initial result from [24].

Comparison principle
Fundamental properties of the Riccati operator come from comparison principles.We have already seen that for all t ≥ 0, Π(t) ∈ S(Z).Moreover, Theorem 1.8 also gives Π(t) ≥ 0, a property which can be easily understood from the recast dynamics leading to the mild solution where Φ(s, t), 0 ≤ s ≤ t ≤ T is the evolution operator associated with the perturbed operator A + γ 2 Π(t)C * C -see IV-1 Theorem 2.1 and I-1 §3.5 of [9].From (1.20) indeed, we directly infer that Π(t) ≥ 0.
Then from (1.18), we infer that where the order relation in S + (Z) is obviously given by We now recall the classical comparison result of Riccati operator proved in IV-1 Propisiton 2.2 of [9] Proposition 1.11.Consider for i = {1, 2}, the two Riccati equations By a direct use of Proposition 1.11, we obtain the following comparison where Π c -given our choice of Π 0 -is the strict solution of Let us now specify Π c , which is given by the following decomposition.
Proposition 1.12.The unique strict solution of (1.22) is given by Proof.On the one hand, we introduce the operator Λ solution of As Π 0 is a bounded operator, Λ ∈ C 1 ([0, T ], S + (Z)) is the strict solution -see Theorem 1.8 and Remark 1.9of the Riccati equation (1.24).Moreover, as a mild solution, Λ satisfies On the other hand, we introduce Namely U −1 = Λ in [0, T ] by uniqueness of the strict solution of (1.24).In addition, we find that This ensures, by uniqueness of the mild solution of (1.22), that To summarize, we have found in this section that, for all t ≥ 0, hence controlling the asymptotic behavior of Π.

Estimator dynamics
Once we have clarified the sense given to the covariance operator Π solution to the Riccati dynamics (1.17), we are able to give to (1.16) a closed loop dynamics, as recall in the next theorems.Theorem 1.13.Let A be the generator of C 0 -semigroup Φ.Let C be a bounded operator and Π a mild solution of the (1.17).There exists one and only one mild solution in C 0 ([0, T ]; Z) of the dynamics in the sense that Moreover, this solution is the unique weak solution in the sense that, (1.) z ∈ L 2 ((0, T ); Z), (2.) for all q ∈ D(A * ), q, ẑ(•) belongs to H 1 (0, T ) and (3.) for almost all t ∈ (0, T ), with ẑ0 ∈ Z and β ∈ L 2 ((0, T ); Z).From II-1 Propostion 3.4 of [9], Problem (1.30) admits a unique weak in H 1 ((0, T ); D(A * ) ) ∩ C 0 ([0, T ]; Z) which coincides with the mild solution in the sense of (1.28).
Note that we can also deduce that ẑ is also the unique varitionnal solution in the sense that, (1.) z ∈ L 2 ((0, T ); V) and (2.) dz dt ∈ L 2 ((0, T ); V ) and (3.) for almost all t ∈ (0, T ), Theorem 1.14.The Kalman observer defined by (1.16) is the unique solution of (1.27).Moreover assuming that Π 0 ∈ D(Υ), we have the fundamental identity ∀t ∈ [0, T ], zT (t) = ẑ(t) + Π(t)q T (t). (1.32) Proof.From Theorem 1.3, we have the existence of a weak solution of the two-ends problem (1.13).From Theorem 1.8, we have the existence of a strict solution of the covariance operator Additionally, a weak solution of the Kalman estimator ẑ exists from Theorem 1.13.Now let us introduce η = ẑ − zT + Π qT and v ∈ D(A * ), and compute Moreover, as Π is a weak solution of (1.17) such that for all t ≥ 0 and v ∈ D(A * ), and, as q is a weak solution of (1.13), Gathering (1.33), (1.34) and (1.35), we finally obtain whence, by Theorem 1.13, η = 0 in [0, T ], which concludes the proof We directly deduce from (1.32) that As a consequence, the Kalman-Bucy estimator, solution of (1.27), is the optimal estimator in the sense of Definition 1.16.
Remark 1.15.The condition Π 0 ∈ D(Υ) is a technical assumption facilitating the chain rule computations.It can be relaxed by using only mild solutions and Duhamel formulae.However, the proof of Theorem 1.14 is simplified in this case, and the next sections will be based on such regularity condition on Π 0 .

Kernel representation of continuous-time infinite dimensional Riccati solutions
We are now going to show an additional regularity property of the Riccati solution which for Π 0 regularizing enough can be of Hilbert-Schmidt type as it is sometime highlighted for control problem -see for instance [20] and references therein -or [13][14][15] for observation problems.The benefit of such property will be the existence of an associated kernel, regular enough so that it will be numerically approximated and efficiently computed.

Hilbert-Schmidt Riccati solutions and Kernel representation
We denote by J 2 (Z) the spaces of Hilbert-Schmidt operators, over the separable Hilbert space Z.We recall that where K(Z) is the space of compact operators.For any Hilbert basis (e n ) n≥0 of Z.
We point out that this definition can be shown to be independent of the choice of the Hilbert basis.Moreover, if Π ∈ J 2 (Z) ∩ S + (Z) then Π ≤ Π 2 .We have the following result taken from Theorem 3.6 of [14] -see also [13,15] and the seminal work [50].We now recall the following Kernel Theorems for Hilbert-Schmidt operators, see Theorem 12.6.2and Theorem 12.7.2 of [4] Theorem 2.2 (Kernel Theorem in L 2 ).An operator (2.1) is a Hilbert-Schmidt operator if and only if it is associated with a kernel π ∈ H m,p (Ω × Ω) such that Therefore in our framework, as we proved that Π(t) ∈ L(V , V) for t ∈ (0, T ), we can expect an additional regularity for the kernel π associated with Π.This will be specified in the next section for the advection diffusion case.

Kernel representation of the Kalman estimator for an advection-diffusion problem
Considering our specific advection-diffusion example of dynamics defined in (1.7), we have the following representation theorem.
. The Riccati dynamics (1.17) associated with the model (1.7) admits one and only one mild solution Π in the sense of (1.18) which is associated with a kernel such that for all t ∈ (0, T ), (∆ x + ∆ x )π(t) ∈ L 2 (Ω × Ω) and π is solution to the dynamics where π 0 ∈ H 1 (Ω × Ω) is the kernel associated with the initial covariance operator Π 0 .

√
λ n u n .We know that (h n ) n∈N is a Hilbert basis of V = H −1 (Ω) and (g n ) n∈N is a Hilbert basis of V = H 1 0 (Ω).We thus have which implies that Π(t) is a Hilbert-Schmidt operator from H −1 (Ω) to H 1 0 (Ω).Therefore this time, from Theorem 2.3, Π(t) admits a kernel representation π(t) ∈ H 1 (Ω × Ω).In particular, π admits a trace at the boundary Let us now characterize more specifically this kernel π.First, we have for all t ∈ [0, T ], Π(t) ∈ S + (Z), hence Second, we recall that, from Theorem 1.10, for t > 0 and z ∈ D(A 2 ) = H 1 0 (Ω).Therefore, for all t ∈ (0, T ), and by density of and by symmetry Third, let consider (z 1 , z 2 ) ∈ D(Ω) × D(Ω), using Fubini and the boundary property of π, We also find Identically, we have Now, we recall that for all t ∈ (0, T ), Π(t) ∈ D(Υ).Therefore, there exists a constant c st such that As π(t) ∈ H 1 (Ω × Ω), this implies that there exists a constant c st such that .
Additionally, we have and also As Π is a weak solution of Riccati in the sense of (1.19), and D(Ω) ⊗ D(Ω) is dense into D(Ω × Ω), we have for all ψ ∈ D(Ω × Ω) ending the justification of (2.2).
Remark 2.5.Such non-linear integro-differential equation associated with the Riccati kernel was already mentioned in Chapter 3, Section 5 of [39], but without justifying the necessary regularity conditions allowing to write (2.2).In [50], the regularity question was fully treated, however with different arguments and a slightly different Riccati equation.
From the kernel representation of the covariance operator, we can now specify the weak form and strong form of the optimal estimator in the context of an advection-diffusion problem.We directly find from the weak solution associated with (1.27) that Then using the symmetry of Π -and of its associated kernel π -we recall that which directly gives the strong form of the Kalman estimator for our advection-diffusion example: Here, we recognize a nonlinear integro-differential equation that we proved to define a well-posed problem.
Remark 2.6.For clarity, we have limited the kernel representation calculations to the initial advection-diffusion example.However, an analogous representation could be performed for many other types of parabolic equations, provided that Π is a Hilbert-Schmidt operator and, furthermore, Π ∈ L(V , V).This last property is justified by Theorem 1.10, unfortunately -to our knowledge -under the restrictive condition that D(A

Discretization of the direct model
We consider a spatial discretization of the model (1.1) using Ritz-Galerkin method, for instance a finite element method.To be more specific with our example of advection-diffusion equation (1.7), we consider a Lagrange finite element discretization in a finite dimensional space V h R N Z .The orthogonal projection from Z to Z h is denoted by P h .We now introduce the operator A h defined by We proceed identically with the model error and the observations.For the sake of simplicity, we restrict the present article to the case where only the state needs to be discretized.Our discretize observation operator is hence given by For the model error, the same reasoning applies but as in our specific example we have dim(U) finite hence, there is no need to introduce a specific notation for its spatial discretization.We just introduce Of note, in a very general configuration, a full numerical analysis should also take into account the observation sampling and discretization, and the model error discretization.We now proceed to the full space-time discretization using a backward Euler time scheme.We consider the following uniform discretization of the time interval [0, T ]: We denote which is clearly invertible as A h is maximal monotone.Moreover, we recall the following estimate ( [25], Thm.2.7) Finally, we also introduce ẑh 0 = P h ẑ0 and

Discretization of the estimator
Using a quadrature rule to approximate the integrals in (1.8), we obtain the following time-discretized criterion: We denote by where Proof.We recall that we are in the case of the minimization of strictly convex quadratic functional in finite dimension under a linear discrete-time dynamics constraint, hence we have one and only one minimizer.We then introduce the following Lagrangian and minimizing J h,τ n in the finite dimensional Hilbert space Z h × U n under the constraint of the discretetime dynamics (3.4), is equivalent to find the saddle point of L h,τ n .Derivating L h,τ n , we obtain in particular for 0 < k < n, When applied to the stationary point (z h,τ |n , qh,τ |n , ντ |n ), we get (3.6c) for 0 < k < n − 1, while for k = n, leads to (3.6d).And for k = 0, gives (3.6b) for the stationary point as soon as we extend the Lagrange multiplier definition up to qh,τ 0|n in (3.6c).Finally, we have gives (3.6a) at the stationary point.
The analogous version of Theorems 1.13-1.14 in the fully discrete case is stated as follows: Theorem 3.2.We have the following identity: ) is.This recursively ensures that Π h,τ − n is well defined and positive definite for all n.
The discrete-time dynamics (3.9) defines the discrete-time optimal sequential estimator as we clearly follow the definition (3.14) from the identity (3.8).The equations of (3.9) constitute what is called the prediction/correction algorithm for the discrete-time Kalman estimator as initially developed by R.E.Kalman in his seminal paper [31]: (3.9) consists in the step of prediction of the observer and filter (3.10b), which is followed (or preceded) by the correction step given by (3.9b).This prediction/correction algorithm can be seen as a splitting algorithm of the continuous-time dynamics (1.27).Indeed, we find that ẑh,τ+ n+1 follows the one-step time scheme.
which is a consistent time scheme of order 1 in τ of (1.27), as soon as the discrete covariance operator converges to the continuous covariance operator as it will be proved in Section 3.3.
We also want to point out that a simple computation gives for all n > 0 An alternative formula of the Kalman gain.Finally, note that from the Sherman-Morrison-Woodbury identity, we also have an alternative formula for Π h,τ − n+1 where the inversions are performed in the observation space, namely for all n > 0 Proof of Theorem 3.2.We proceed by induction: We have from (3.6b), zh,τ which is exactly the formula (3.8) for k = 0. Let us assume that the formula (3.8) is satisfied for all i ∈ {0, 1, . . ., k}.Since Taking into account the equations giving ẑh,τ+ We deduce from the hypothesis on k, The equation (3.10b) giving ẑh,τ− k+1 from ẑh,τ+ Replacing qh,τ k|n using (3.6c), we obtain Therefore, we have proved that zh,τ k+1|n = ẑh,τ− k+1 + Π h,τ − k+1 qh,τ k+1|n , namely the formula (3.8) is satisfied at step k + 1, which ends the proof by induction.
From Theorem 3.2, we see that the discrete-time Kalman filter is exactly the discrete-time optimal sequential estimator in the sense of the following definition.Definition 3.3.The discrete-time optimal sequential estimator is defined by This definition then allows to justify the discrete-time Kalman filter optimality in a purely deterministic framework, far from the original stochastic setting envisioned by R.E.Kalman in his seminal work [31].

Space-time convergence analysis
From the prediction-correction splitting time scheme (3.10) we can deduce a one-step recursive formula for the discretized covariance operator Π h,τ − n .Indeed by combining (3.13) and (3.12), we have for all n ∈ N which ensures that the discretized covariance operator Π h,τ + n satisfied a discrete-time version of (1.18) with for In fact, (3.16) mixes Π h,τ + n and Π h,τ − n .However, using (3.9c) or (3.13), we will prove -see the next proposition proof -that Moreover, a variant of (3.16), namely and identically from (3.9c), Π h,τ + n > 0 We can now propose a full space-time numerical analysis of the Riccati solution, in the spirit of what was done in [26] for a space discretization only.
Proof.Our proof is inspired by Theorem 3 of [26], with the additional treatment of the time discretization.We first compute Therefore, we have where and are combined with a summation of terms of the form and We now proceed to the estimation of each term.Using Lemma 3 of [26], we first have Identically, we show from B h,τ = τ Φ h,τ 1 P h B and The term associated with the observation operator is more intricate.First we decompose Then, we remark that We obtain lim Following the exact same computation as in Proof of Theorem 3 of [26] we obtain Again, the last term tends to 0 from Lemma 3 of [26], whereas the second term goes to 0 with h.Finally, let us analyze the error e τ quad introduced by replacing the integral form with a quadrature rule.We introduce which belongs to C 0 ([0, t], S + ).Indeed, it is the mild solution of Γt (s) + AΓ(s) + ΓA * (s) + E t (s) = 0, s ∈ (0, t) and Ξ = Π ∈ C 0 ([0, T ], S + ) since in our case Π ∈ C 1 ([0, T ], S + ).Furthermore, taking (z 1 , z 2 ) ∈ Z, we have We have chosen Π 0 ∈ D(Υ), hence υ Π0 is bounded and hence, Γ(0) ∈ D(Υ) which implies that Γ t ∈ C 1 ([0, t], S + ).From Peano's Kernel Theorem [22], Γtn (s) .
Combining all the estimations, we finally get that there exists h,τ satisfying lim h,τ →0 h,τ = 0 and a constant c st such that which yields the theorem by Gronwall's inequality.
Note that convergence analysis of Riccati problems has been widely studied, in particular when considering space discretization [34], even with more general unbounded observation and control operators.Our result, in the wave of [14,26], gives convergence in the class of Hilbert-Schmidt operators.This will validate our choice of numerical tools, aka H-matrices, adapted to such operators with kernel.Moreover, our choice of time discretization corresponds to the discrete-time Kalman filter hence fills the gap between the deterministic approach and the Kalman filter as it is understood in a stochastic framework, when the time-sampling is studied, see for instance the recent analysis in [1].

Matrix-based algorithm
We consider a spatial discretization using P k Lagrange finite elements and denotes by (ϕ i ) 1≤i≤Nz the associated basis functions where N z = dim(V h ).Typically for a given v h ∈ V h , we associate the vector of degrees of Note that here, we consider only the degrees of freedom associated with an actual unknown of the problem, namely after elimination of the Dirichlet conditions for instance.We then define for all The discretization of the bilinear form reads similarly We further introduce the invertible matrix where with dim(U) < ∞, Moreover, the operator Q τ is already finite dimensional leading to Q τ = κ −2 τ −1 1.
Finally for the observations, as we assume that they are available at each discretization node, hence C h is simply the matrix selecting the observed nodes, such that obs and R h,τ by R h,τ = γτ M h obs .We now proceed to the matrix-based formulation of the discrete-time Kalman filter following the initial algorithm introduced by [32].It starts with the initialization-step: and then alternates with a correction-step: followed by a prediction-step: Note that the above algorithm is well defined because Φ h,τ 1 is here an invertible matrix.This implies, indeed, that Π h,τ − n+1 and Π h,τ + n+1 are invertible for all n.We deduce by mixing the prediction step (4.3) and the correction step (4.2), a one-step time scheme for the covariance matrix , n ≥ 0, (4.4) where we have used the Sherman-Morrison-Woodbury formula [48].
Let us now compute, with the obtained matrices and the Taylor expansion of a sparse operator and its inverses.We consider a spatial partition of the degrees of freedom with a binary tree leading to the classical 2x2 block partition.For a sparse matrix, during the subdivision steps, far interactions lead to empty leaves (without data) and close interactions lead to sparse leaves.Only the sparse leaves are subdivided recursively to the deepest part of the tree.This choice allows us to limit the size of the full leaves that appear during the inversion of the sparse leaves to optimize the hierarchical storage of the inverse matrix.For example, in Figure 4, a hierarchical sparse matrix is similar to a renumbering with hierarchical permutations, and the inverse is well compressed.Only the diagonal leaves are full (red) and the rank of the extra-diagonal leaves is weak (blue), regardless of the size of the block.
To satisfy the second condition C2, we develop an iterative strategy to follow the evolution of the H-matrix representation through the prediction and the correction to preserve the hierarchical structure over time.In particular, the correction requires special treatment.Considering Π − n+1 an H-matrix representing the covariance matrix at the unknowns degrees of freedom, and C a sparse matrix, representing the reduction matrix from the unknowns degrees of freedom to the measurement degrees of freedom, Kalman filtering must first calculate the reduction product CΠ − n+1 C .This operation is performed in the hierarchical domain, converting the sparse restriction matrix into an H-matrix.However, if the number of measurements is much smaller than the number of unknowns, the standard algebra may no longer be efficient enough to move from one space to another.This is because the transition matrices C become rectangular, which has a large impact on memory and computational cost, as the low-rank representation loses effectiveness.For example, using a low-rank representation P = XY with P ∈ M mn , X ∈ M mr and Y ∈ M nr , the case m n leads to m ≈ r, which is clearly inefficient.To circumvent this limitation in this particular case, we develop a special H-matrix builder, with a recursive construction directly from the result of CΠ − n+1 C , instead of successive algebraic operations.Finally, the H matrix algebra is used at each step of the algorithm, constantly transitioning from state space to measurement space and vice versa.This allows us to maintain a maximum compressibility rate of the operators throughout the process, enabling the computation of large systems.We note that our approach has an analogy with the much more comprehensive H-matrix study [27], but the structure of our matrix equation differs as we consider the H-matrix of the discrete-time Kalman filter.

The 1D heat example
We consider a 1D heat equation on the domain Ω = (0, 1) with homogeneous Dirichlet conditions.The state space is therefore Z = L 2 (0, 1) whereas V = H 1 0 (0, 1).The observation domain is ω = (0.3, 0.6), namely Y = L 2 (0.3, 0.6).Therefore C is the restriction operator to ω and C * extend a function defined in ω by 0 in Ω\ω.Additionally, we define B ∈ L(R, Z) such that for all x ∈ [0, L] and ν ∈ R, (Bν)(x) = ν1 Ω (x).We then have Concerning the initial condition of the covariance operator, we can choose in this 1D case Indeed here, the eigenvalue associated with the Laplacian on (0, 1) with Dirichlet boundary conditions are given by λ i = π 2 i 2 and u i = x → sin(iπx) is a Hilbert basis of L 2 (0, 1).Therefore, by choosing Π 0 = −∆ −1 0 , we have hence, Π 0 ∈ J 2 and obviously Π 0 ∈ D(Υ).From a numerical point of view, we discretize the problem with P 1 finite element and choose grids from N = 10 3 to N = 10 4 elements.We have N z = N − 2 and the initial covariance Π h,τ + 0 = M h (K h ) −1 M h can be stored in a full format.Then, Π h,τ − n and Π h,τ − n can be computed in full format, following the algorithm built on (4.2)-(4.3),and implemented as in the script presented in Table 1-(top) -see the illustrative time-steps of the kernel evolution presented in Figure 1.
This allows to proceed to the numerical verification of Theorem 3.4.In this respect we consider a fine discretization h 1 = 10 −4 , τ 1 = 10 −3 and coarser discretizations 10 −4 ≤ h 2 ≤ 10 −3 , 10 −2 < τ 2 < 10 −3 with τ 2 /τ 1 ∈ N. We expect to compute for t n = nτ 2 = k(τ 2 /τ 1 )τ 1 with 0 ≤ n ≤ T /τ where P h2 h1 ∈ L(Z h1 , Z h2 ) is the orthogonal projection from Z h1 to Z h2 and (e h2 i ) 1≤i≤Ny is the finite element basis of Z h2 .Therefore, we get using matrices with P h2 h1 = (M h2 ) −1 I h2 h1 -while I h2 h1 ∈ R N2×N1 is the interpolation matrix from the grid of step h 1 to the grid of step h 2 -and its corresponding adjoint P h2 h1 .M h2 .We obtain One existing numerical question is whether a low rank strategy based on Riccati operator reduction [35,46] could have given the same type of results directly without relying on the H-matrix machinery.We typically could have thought on directly projecting the initial covariance matrix on E m = span(e 0 , e 1 , . . ., e m ), where (u i ) 1≤i≤m are the first m eigenvectors of the Laplacian with Dirichlet conditions and e 0 = (1 − τ A) −1 w ∈ D(A) ∈ V is defined from the vector w = 1 Ω used to construct B * .Indeed in Figure 3 we recognize e 0 as very close to the first vector w 1 in the low rank representation of Πh,τ, ,− N T . This is natural with respect to the Duhamel-like formula (1.18) and the comparison principle in Section 1.6.2.Therefore by solving two eigenvalue problems, we easily compute numerically giving a space-similarity index between 0 and 1, as plotted in Figure 3 (right).We see here that the first 3 vectors could have been envisioned a priori in the decomposition of Πh,τ, ,− N , whereas, we need much more eigenvectors to represent the additional 3 vectors.Our H-matrix formulation offers therefore a real new point of view with respect to reduced-based strategy with a priori reduction of the covariance operator.

Fw
denotes the Frobenius matrix norm.This last identity allows us to investigate the convergence of the Kalman filter algorithm in dense format when h and τ tend to 0. The convergence results are presented in Figure2(Top), hence illustrating Theorem 3.4.In the same Figure2(Bottom), we also numerical illustrate the approximation by an H-matrix Πh,τ, ,− n of the covariance operator Π h,τ,− n as the tolerance tends to 0. At final time T = 1 and = 10 − 6, the H-matrix Πh,τ, ,− n is only represented by a low rank matrix of d = 6 vectors (w k ) 1≤k≤d plotted in Figure3.Namely we have Πh,τ, k w k Π h,τ,− N T .

Figure 2 ...
Figure 2. Convergence results with respect to the space-discretization, time-discretization and the H-matrix representation precision.To investigate numerically this question, we can compute a sort of distance between the vector spaces E m and the space W 6 = span(w 1 , • • • , w 6 ) as m increases.In this respect, we use the quantity introduced in[47]

Figure 3 .
Figure 3. (Left) final normalized base of vector generating the low rank approximation Π h,τ, T .(Right) Dissimilarity index between subspace W h,k and E h,k .

Figure 4 .
Figure 4. H-matrix evolution for the heat equation.

Figure 5 .
Figure 5. Geometry, mesh and initial condition of the advection-diffusion problem.

Figure 6 .
Figure 6.H-matrix evolution for the advection-diffusion equation.

Figure 7 .
Figure 7. Direct solution and corresponding state estimator for the advection-diffusion problem: first time steps.

Figure 8 .
Figure 8. Direct solution and corresponding state estimator for the advection-diffusion problem: larger time steps.

Table 1 .
Matlab Code listing for a simple Kalman filter implementation and its corresponding H-matrix version in Gypsilab.