REGULARITY PROPERTIES OF THE SCHR ¨ODINGER COST

. The Schr¨odinger problem is an entropy minimisation problem on the space of probability measures. Its optimal value is a cost between two probability measures. In this article we investigate some regularity properties of this cost: continuity with respect to the marginals and time derivative of the cost along probability measures valued curves.

Here C(µ, ν) is the entropic cost given by ds, (1.1) and the infimum is taken over every (µ s , v s ) 0≤s≤1 such that (µ s ) 0≤s≤1 is an absolutely continuous path with respect to the Wasserstein distance which connects µ to ν and satisfies in a weak sense for every s ∈ [0, 1] v s ∈ L 2 (µ s ), ∂ s µ s = −∇ · (µ s v s ).
In this paper we investigate regularity properties of the functions (µ, ν) → Sch(µ, ν) and (µ, ν) → C(µ, ν). To my knowledge, regularity properties of the Schrödinger cost as function over probability measures haven't been investigate yet, but the stability of optimizer has been investigate in [24] and more recently in [2,14]. We give an overview of the main contributions of this paper, leaving precise statements to other sections.
-In Section 3 we investigate continuity properties of the cost function C. In Theorem 3.1 we show that lim k→∞ C(µ k , ν k ) = C(µ, ν) if W 2 (µ k , µ) → k→∞ 0 (resp. ν k to ν) with additional hypothesis about the entropy and the Fisher information along the sequences.
-In Section 4 we provide a few applications of the preceding continuity properties. The main result of this section is that using the continuity properties of Sch and C we are able to show that the Benamou-Brenier-Schrödinger formula (1) is valid assuming that both measures have finite entropy, finite Fisher information and locally bounded densities. Up to my knowledge this is a new result up to my knowledge, because no compactness assumptions are needed for the two marginals. -In Section 5 we investigate the question of the derivability of the functions t → Sch(µ t , ν t ) and t → C(µ t , ν t ), where (µ t ) t 0 and (ν t ) t 0 are some curves on the Wasserstein space. These results extend the existing ones for the Wasserstein distance, see Theorem 8.4.7 of [1] and Theorem 23.9 of [25]. We prove that the derivative of the entropic cost is given for almost every t, by where (µ t s ) s∈[0,1] is the minimizer of the problem (1.1) from µ t to ν t andμ t s is the velocity of the path s → µ t s defined in other section. Such minimizers are called entropic interpolations. Note that this is exactly the formula which holds for the Wasserstein distance, replacing the Wasserstein geodesics by the entropic interpolations. For technical reasons we prove this formula in the case where N = R n and L is the classical Laplacian operator.

Markov semigroups
Let (N, g) be a smooth, connected and complete Riemannian manifold. We denote dx the Riemannian measure and ·, · the Riemannian metric (we omit g for simplicity). Let ∇ denote the gradient operator associated to (N, g) and ∇· be the associated divergence in order to have for every smooth function f and vector field ζ Hence the Laplace-Beltrami operator can be defined as ∆ = ∇ · ∇. We consider a differential generator L := ∆ − ∇, ∇W for some smooth function W : N → R. We define the carré du Champ operator for every smooth functions f and g by Under our current hypotheses we have Γ(f ) := Γ(f, f ) = |∇f | 2 , which is the length of ∇f with to the Riemannian metric g. Let Z := e −W dx, then if Z < ∞ the reversible probability measure associated with L is given by If Z = ∞, the reversible measure associated with L is dm := e −W dx of infinite mass. Following the work of [3] we define the iterated carré du champ operator given by for any smooth functions f and g and we denote Γ 2 (f ) := Γ 2 (f, f ). We say that the operator L verifies the CD(ρ, n) curvature-dimension condition with ρ ∈ R and n ∈ (0, ∞] if for every smooth function f For instance, R n endowed with the classical Laplacian operator verify the CD(0, n) curvature-dimension condition. With the Ornstein-Uhlenbeck operator, R n verify the CD(1, ∞) curvature-dimension condition. More generally a Riemannian manifold of dimension n ∈ N and with a Ricci tensor bounded from below by ρ ∈ R endowed with his Laplace-Beltrami operator verify the CD(ρ, n) curvature-dimension condition. We assume that L is the generator of a Markov semigroup (P t ) t 0 , this is for example the case when a CD(ρ, ∞) curvaturedimension condition holds for some ρ ∈ R. For every f ∈ L 2 (m) the family (P t f ) t 0 is defined as the unique solution of the Cauchy system Under the CD(ρ, ∞) curvature-dimension condition this Markov semigroup admit a probability kernel p t (x, dy) with density p t (x, y), that is for every t 0 and f ∈ L 2 (m) ∀x ∈ N, P t f (x) = f (y)p t (x, dy) = f (y)p t (x, y)dm(y), for the existence of the kernel see Theorem 7.7 of [17]. We also define the dual semigroup (P * t ) t 0 which acts on probability measures. Given a probability measure µ the family (P * t µ) t 0 is given by the following equation for every t 0 and every test function f . When µ m, we have is a solution of the following Fokker-Planck type equation with initial value dµ dx . Here L * is the dual operator of L in L 2 (dx).

Wasserstein space and absolutely continuous curves
The set P 2 (N ) of probability measures on N with finite second order moment can be endowed with the Wasserstein distance given for every µ, ν ∈ P 2 (N ) by, where the infimum is running over all π ∈ P(N × N ) with µ and ν as marginals and d is the Riemannian distance on (N, g). Recall that a path (µ t ) t∈[0,1] ⊂ P 2 (N ) is absolutely continuous with to the Wasserstein distance W 2 if and only if In this case, there exists a unique vector field (V t ) t∈[0,1] such that V t ∈ L 2 (µ t ) and |μ t | = V t L 2 (µt) . Furthermore this vector field can be characterized as the solution of the continuity equation with minimal norm in L 2 (µ t ). We denoteμ t = V t , and (μ t ) t∈[0,1] is called the velocity vector field of (µ t ) t∈[0, 1] or the velocity for short. Sometimes we also use the notation dtµ t =μ t .
In the famous paper [5] Benamou and Brenier showed that the Wasserstein distance admits a dynamical formulation where the infimum is running over all absolutely continuous paths which connect µ to ν in P 2 (N ). In his article [20], Felix Otto gave birth to a theory which allowed us to consider (P 2 (N ), W 2 ), heuristically at least, as an infinite dimensionnal Riemannian manifold. This theory was baptised "Otto calculus"later by Cédric Villani.
For every µ ∈ P 2 (N ) the tangent space of P 2 (N ) at µ can be defined as , and the Riemannian metric is induced by the scalar product ·, · L 2 (µ) , see for instance Section 1.4 of [15] or Section 3.2 of [11]. As in the Riemannian case, the acceleration of a curve can be defined as the covariant derivative of the veolcity field along the curve itself. If (µ t ) t∈[0,1] is an absolutely continuous curve in P 2 (N ) and (v t ) t∈[0,1] is a vector field along (µ t ) t∈[0,1] , for every t ∈ [0, 1] we denote by D t v t the covariant derivative of v t along (µ t ) t∈[0,1] defined in Section 3.3 of [11]. It turns out that in the case where the velocity field of (µ t ) t∈[0,1] has the form (∇ϕ t ) t∈[0,1] then the acceleration of (µ t ) t∈[0,1] is given by see Section 3.3 of [11]. Covariant derivative and acceleration can be defined in more general framework, see Section 5.1 of [15].

Schrödinger problem
Here we introduce the Schrödinger problem by his modern definition, following the two seminal papers [10,18]. The first object of interest is the relative entropy of two measures. The relative entropy of a probability measure p with to a measure r is loosely defined by H(p|r) := log dp dr dp, if p r and +∞ elsewise. This definition is meaningful when r is a probability measure but not necessarily when r is unbounded. Assuming that r is σ-finite, there exists a function W : M → [1, ∞) such that z W := e −W dr < ∞. Hence we can define a probability measure r W := z −1 W e −W r and for every measure p such that For µ, ν ∈ P(N ) we define the Schrödinger cost from µ to ν by where R 01 is the joint law of the initial and final position of the Markov process associated with L starting from m, which is given by To ensure the existence and unicity of minimizer, more hypothesis are needed. Namely we assume that there exists two non-negative mesurable functions A, B : N → R such that We define the set If µ, ν ∈ P * 2 (N ), it is proven that the Schrödinger cost Sch(µ, ν) is finite and admits a unique minimizer which takes the form for two mesurable non-negative functions f and g, see Proposition 4.1.5 of [24]. Another fundamental result about the Schrödinger problem is an analogous formula to (2.2) for the Schrödinger cost.
Theorem 2.1 (Benamou-Brenier-Schrödinger formula). Let µ, ν ∈ P * 2 (N ) be two probability measures compactly supported and with bounded densities with to m. Then the following formula holds where C(µ, ν) is the entropic cost between µ and ν given by Here the infimum is running over every absolutely continuous path (µ s ) s∈[0,1] which connects µ to ν in P 2 (N ) and F is defined as Different versions of this theorem have been obtained under various hypothesis, see [6,11,12,16]. The functional F : P 2 (N ) → [0, ∞] is central on this work. Its gradient can be identified by the equation d dt F(µ t ) = grad µt F,μ t µt and is given for every µ ∈ P 2 (N ) with smooth density against m by Those definitions allowed us to see the Fokker-Planck type equation (2.1) as the gradient flow equation of F. Indeed every solution (ν t ) t 0 of this equation verifẏ see Section 3.2 of [12]. With Otto calculus, we can also introduce the notions of Hessian and covariant derivative. A great fact is that the Hessian of F can be expressed in term of Γ 2 , indeed ∀µ ∈ P 2 (N ), ∀ ∇ϕ, ∇ψ ∈ T µ P 2 (N ), Hess µ F(∇ϕ, ∇ψ) = Γ 2 (∇ϕ, ∇ψ)dµ, see Section 3.3 of [11]. The quantity I(µ) := ∇ log dµ which appears in the previous definition is central in this work, it is called the Fisher information. According to the Otto calculus formalism, the Fisher information admits the nice interpretation, Minimizers of the entropic cost C(µ, ν) are called entropic interpolations and take the form where f and g are the two positive functions which appears in the equation (2.4). Due to this particular structure, velocity and acceleration of entropic interpolations can be explicitly computed. It holds that for every t ∈ [0, 1] But the most important fact, is that entropic interpolations are solutions of the following Newton equation which can be rewrite in the Otto calculus formalism as This equation was first derived in Theorem 1.2 of [9], see also Section 3.3, Propositon 3.5 of [12].

Flow maps
In this subsection we follow ( [15], Sect. 2.1). We need this result only in the euclidean framework, hence in this subsection we take N = R n for simplicity. A crucial ingredient of the proof of the Theorem 5.2 is, given a path (µ t ) t∈[0,1] , the existence of a family of maps (T t→s ) t,s∈[0,1] such that for every s, t ∈ [0, 1] These maps are called the flow maps associated with (µ s ) s∈[0,1] . The existence of such maps can be garanted by some regularity assumptions on the path. Before the statement we recall the definition of the Lipschitz constant of a vector field proposed by Gigli in [15].
Definition 2.2 (Lipschitz constant of a vector field). For every smooth compactly supported vector field ζ on R n we define Then for every µ ∈ P 2 (N ) and every v ∈ T µ P 2 (R n ) we define where the infimum is taken over sequences (ζ n ) n∈N of smooth compactly supported vector fields which converges to v in L 2 (µ) when n → ∞.
Note that in the case where v is smooth and compactly supported L(v) is the Lipschitz constant of v.

Hypothesis about the heat kernel
Here is a summary of all hypothesis needed in all the paper.
The first hypothesis (H1) is needed to defined Markov semigroups as introduced in [4]. The second hypothesis (H2) is needed to ensure existence and unicity of minimizers of the Schrödinger problem. For instance those hypothesis hold true when N = R n is equipped with the classical Laplacian operator or the Ornstein-Ulhenbeck operator, or when N is compact.

Continuity of the entropic cost
Here we are interested in the continuity of the function (µ, ν) → C(µ, ν) where C(µ, ν) is defined as an infimum over all absolutely continuous paths connecting µ to ν. Theorem 3.1 (Continuity of the entropic cost). Let µ, ν ∈ P * 2 (N ) and (µ k ) k∈N , (ν k ) k∈N ⊂ P * 2 (N ) be two sequences such that µ k converges toward µ with to the Wasserstein distance (resp.. ν k toward ν). We also assume that for every k ∈ N there exists an entropic interpolation from µ k to ν k (resp. from µ to ν) and Proof. To begin we will show that lim k→∞ C(µ k , ν k ) ≤ C(µ, ν).
Using Theorem 8.3.1 of [1], for every t ∈ (ε/2 − δ, ε/2 + δ) we have Finally, using the CD(ρ, ∞) contraction property ( [4], Thm. 9.7.2) we obtain We have shown A similar estimate hold for the integral from 1 − ε to 1 and we obtain Finally, letting in this order k tend to ∞, δ tend to 0, and ε tend to 0 we obtain the desired inequality.
To obtain the lim inf inequality, we consider the same path but swapping the role of µ k and µ (resp. ν k and ν) and using the fact that 1 − 2ε < 1 1−2ε , we obtain for every k ∈ N, ε ∈ (0, 1/2) and δ ∈ (0, ε) Letting k tends to ∞, δ tends to 0 and ε tend to zero we obtain

Benamou-Brenier-Schrödinger formula
As mentionned before, the Benamou-Brenier-Schrödinger formula has been obtained under various hypothesis. Here we show that the result hold true in the case where both measures are not compactly supported but assuming that they have finite fisher information, finite entropy and locally bounded densities, using continuity properties of the cost proved before and existing results. Recall that, in the existing litterature, this formula is proved assuming that the two measures have bounded supports and densities, see Theorem 4.3 of [16].
Proposition 4.1 (Benamou-Brenier-Schrödinger formula). Let µ, ν ∈ P * 2 (N ) be two measures with locally bounded densities with respect to m such that I(µ), I(ν) < ∞. Furthermore, assume that there exists an entropic interpolation from µ to ν. Then Notice that, the hypothesis of existence of entropic interpolations is not so restrictive. Indeed if N = R n , entropic interpolations always exists for measures in P * 2 (N ), see Proposition 4.1 of [18].
Proof. Let x ∈ N , for every n ∈ N, we define where α n is a constant renormalization. Analogously we can define a sequence (ν n ) n∈N which converges to ν when n → ∞. As µ n and ν n are compactly supported, we can apply the Benamou-Brenier-Schrödinger formula, namely It can be easily shown that W 2 (µ n , µ) → 2), for every n ∈ N the optimal transport plan for the Schrödinger problem from µ n to ν n is given fro every probability set A of N × N by where γ is the optimal transport plan for the Schrödinger problem from µ to ν. Hence Sch(µ n , ν n ) = H(γ n |R 01 ) → n→∞ H(γ|R 01 ) = Sch(µ, ν), and the result is proved.

Longtime properties of the entropic cost
The entropic cost C(µ, ν) can be defined with more generality using a parameter T > 0. For µ, ν ∈ P 2 (N ) and T > 0 we define In Theorem 3.6 of [7] and Theorem 1.4 of [9], estimates are provided for high values of T , but only in the case where both measures are compactly supported and smooth. Using the Proposition 3.1 we are able to extend these estimates to the non-compactly supported and non-smooth case. The following lemma will be very useful, it is proved in Lemma 3.1 of [13].

Lemma 4.2 (Approximation by compactly supported measures).
Let µ ∈ P 2 (N ) be a probability measure such that F(µ) < ∞ and I(µ) < ∞. Then there exists a sequence (µ k ) k∈N ⊂ P 2 (N ) such that Using this lemma and the Theorem 3.1 we can easily extend the estimates provided in Theorem 1.4 of [9] and Theorem 3.6 of [7]. Note that in [9] the author has already extended the estimate which holds under the CD(ρ, ∞) curvature-dimension condition to the non-compact case, but we believe this is a pertinent example to illustrate the utility of Proposition 3.1. The validity of the CD(0, n) estimate for non-compactly supported measures is a new result up to my knowledge.
Corollary 4.3 (Talagrand type inequality for the entropic cost). Let µ, ν ∈ P 2 (N ) be two probability measures with finite entropy and Fisher information. Assume that there exists an entropic interpolation from µ to ν. Then if the CD(ρ, ∞) curvature-dimension condition holds for some ρ > 0 If the CD(0, n) curvature-dimension condition holds for some n > 0 then These estimates are very useful, for instance they are fundamental to show the longtime convergence of entropic interpolations, see [7].

Derivability of the Schrödinger cost
In this section, we take N = R n for some n ∈ N and L = ∆ is the classical Laplacian operator. In this case the heat semigroup (P t ) t 0 is given by the following density ∀x, y ∈ R n , t > 0, p t (x, y) = 1 (4πt) n/2 e − |x−y| 2 4t , and the reversible measure m is the Lebesgue measure. Notice that in this case, the funtions A and B wich appears in hypothesis (i) to (iv) in Section 2.3 can be chosen as

Hence in this case
A natural question is the following: given a probability measure ν can we find a formula for the derivative of the function t → C(µ t , ν) where (µ t ) t∈[0,1] is a smooth curve in P 2 (N )? From a formal point view, we can easily find an answer. Here we use the notation dtµ t s (resp. dsµ t s ) for the velocity of a given path (µ t s ) (s,t)∈[0,1]×[0,1] wrt to t (resp. wrt s), to avoid confusion between the two variables. For every t ∈ [0, 1], let (µ t s ) s∈[0,1] be the entropic interpolation from µ t to ν, then Here we have used ( [12], Lem. 20) to invert the derivatives. Noticing that and using the Newton equation (2.6) we have Unfortunately we do not see how to turn this proof into a rigorous one. From another point of view, we can try to derive the static formulation of the Schrödinger problem. Once again, we can easily guess a formula from a heuristic point of view. Indeed, let (µ t ) t∈[0,1] be a smooth curve in P 2 (N ). For every t ∈ [0, 1] we denote by γ t = f t ⊗ g t dR 01 the optimal transport plan for the Schrödinger problem from µ t to ν. Then Using the fact that γ t is a transport plan from µ t to ν it can be easily shown that γ t , ∇ log γ t γt = μ t , ∇ log f t µt . Hence we obtain Note that this is equivalent to the equation (5.1) thanks to the Benamou-Brenier-Schrödinger formula. This proof is not rigorous because we don't have the regularity properties needed for γ t . To prove our results, we follow the idea of Villani in Theorem 23.9 of [25] where he computes the derivative of the Wasserstein distance along curves. Before the statement of our main theorem a technical lemma is needed. This lemma is an easy corollary of the proof of Theorem 4.2.3 in [24].
Lemma 5.1. Let (µ k ) k∈N , (ν k ) k∈N ⊂ P * 2 (R n ) and µ, ν ∈ P * 2 (R n ) such that µ k converges toward µ with respect to the Wasserstein distance when k → ∞ (resp. ν k to ν). For every k ∈ N, we denote by γ k = f k ⊗ g k dR 01 the optimal transport plan for the Schrödinger problem from µ k to ν k and γ = f ⊗ gdR 01 the optimal transport plan for the Schrödinger problem from µ to ν. Assume that ( dµ k dm ) k∈N and ( dν k dm ) k∈N are uniformly bounded in compact sets. Then for every compact set K ⊂ N , up to extraction (f k ) k∈N and (g k ) k∈N are uniformly bounded in L ∞ (K, m). Furthermore where the weak star convergence is understood in L ∞ (K, m).
In addition to this lemma, the following fact is central in our proof. Given two probability measures p, r on R n and a smooth enough function ϕ : R n → R n , we have where | det J ϕ | is the Jacobian determinant of ϕ. We often refer to this result as the Monge-Ampère equation or the Jacobian equation, see Theorem 11.1 in [25] or Lemma 5.5.3 in [1]. Using this equation, we obtain where | det J ϕ | is the Jacobian determinant of ϕ. Given a curve (µ t ) t ⊂ P 2 (N ) and a measure ν ∈ P 2 (N ), the idea of the following proof is to apply equation (5.2) with r = R 01 , p = γ t is the optimal transport plan for the Schrödinger problem from µ t to ν and ϕ = T t→s × Id to bound from above Sch(µ s , ν), and then let s → t.
Then the application t → Sch(µ t , ν) is differentiable almost everywhere and we have for almost every t ∈ (t 1 , t 2 ) Furthermore for almost every t ∈ (t 1 , t 2 ) this equality can be rewritten as where (µ t s ) s∈[0,1] is the entropic interpolation from µ t to ν t . Proof. To begin we want to show for every ν ∈ P * 2 (R n ) such that dν dm ∈ L ∞ (m).
For every t ∈ [0, 1], γ t denotes the optimal transport plan in the Schrödinger problem from µ t to ν. Let t ∈ [0, 1] be fixed. Then for every s small enough by the very definition of the cost Sch(µ t+s , ν) ≤ H((T t→t+s × Id)#γ t |R 01 ) where (T t1→t2 ) t1,t2∈ [0,1] are the flow maps associated to (µ s ) s∈[0,1] defined in the Section 2.4. Applying the equation (5.2) with r = R 01 , p = γ t and ϕ = T t→t+s × Id we obtain As noticed in equation (23.11) of [25], by the hypothesis (5.2) there exists a constant C such that for every y ∈ R n ans s 1 , s 2 ∈ [0, 1] For every x, y ∈ R n , we have log p 1 (T t→t+s x, y) = − |Tt→t+sx−y| 2 4 − n 2 log(4π) and for some constant C > 0. Hence we can differentiate over the integral at time s = 0 to find log p 1 d(T t→t+s × Id)#γ t = log p 1 dγ t + s μ t (x), ∇ x log p 1 (x, y) dγ t (x, y) + o(s). (5.4) Notice that thanks to the Monge-Ampère equation we have Combining this with the equation (5.4), we have Observe that using the hypothesis (5.2) we have Hence we obtain For the reverse inequality we use the same kind of estimates. By definition we have Sch(µ t , ν) ≤ H((T t+s→t × Id)#γ t+s |R 01 ). Applying equation (5.2) we have As already noticed we have log | det J Tt+s→t |dµ t+s = F(µ t+s ) − F(µ t ) = s μ t , ∇ log µ t L 2 (µt) + o(s). Now we have to deal with a more complicated term. We want to show that Notice that using (5.3) we have for every s > 0 for some C > 0. For every s ∈ R small enough, we denote v s (x, y) = |Tt+s→tx−y| 2 −|x−y| 2 s and v(x, y) = −2 x − y,μ t (x) . Of course for every x, y ∈ N , we have Let χ R be the product function χ R = 1 B(0,R) ⊗ 1 B(0,R) . By the Lemma 5.1, for every R > 0 there exists a sequence (s R k ) k∈N which tends to zero when k tend to ∞ such that the sequences (f t+s R k ), (g t+s R k ) are uniformly bounded in L ∞ (B(0, R), m) and where the weak star convergence is understood in L ∞ (K R , m). Now for simplicity we denote s R k = s k and To obtain the desired estimate we are going to pass to the limsup in k, then let R tend to +∞. The third term is independent of k and by the dominated convergence theorem it is immediate that it tend to 0 when R → ∞. Things are trickier for the second term. Denote and ϕ(x) := v(x, y)g t (y)p 1 (x, y)dm(y). Then The second term tends to zero thanks to the weak star convergence of f t+s R k toward f t when k → ∞. Furthermore the same kind of calculus gives for every k ∈ N Again the second term tends to zero thanks to the weak star convergence of g t+s R k . Using the upper bound we have by the dominated convergence theorem Hence ϕ k → k→∞ ϕ pointwise. Noticing that for every x ∈ K R , we have P (x, y)p 1 (x, y)dm(y) ∈ L 1 (K R , m).
By the dominated convergence theorem, Thus the second term in (5.6) tends to zero when k → +∞. For the first term term, notice that for R 1 thus it converges to zero, see Definition 6.8 and Theorem 6.9 of [25]. Hence for every R > 0 by letting k tends to +∞ and R tends to +∞ in we obtain lim s→0 Sch(µ t , ν) − Sch(µ t+s , ν) s ≤ ∇ log p 1 (x, y),μ t (x) dγ t (x, y) + μ t , ∇ log dµ t dm µt This is enough to conclude as in the previous case and obtain lim s→0 Sch(µ t+s , ν) − Sch(µ t , ν) s μ t , ∇ log f t µt .
This ends the case where ν t = ν is constant. Now we need to use a "doubling of variables" technique. Let s, s , t ∈ [0, 1] and γ s,t (resp. γ s ,t ) be the optimal transport plan for the Schrödinger problem from µ s (resp. µ s ) to ν t . Then, using the same tricks as before we have H(γ s ,t |R 01 ) − H(γ s,t |R 01 ) ≤ F(µ s ) − F(µ s ) + 1 4 |x − y| 2 − |T s→s x − y| 2 dγ s,t (x, y). Now using (5.5), the fact that s → F(µ s ) is Lipschitz continuous and the fact that second order moment of of both curves are locally bounded, there exists a constant C > 0 such that H(γ s ,t |R 01 ) − H(γ s,t |R 01 ) ≤ C|s − s |.
By symmetry we can take absolute values in this inequality and it follows that the function (s, t) → Sch(µ s , ν t ) is locally absolutely continuous in s uniformly in t (also absolutely continuous in t uniformly in s). Hence by Lemma 23.28 in [25] the desired result follow. Considering the measures µ := N (m 0 , 1) and ν := N (m 1 , 1) it follows from Section A.2 in [7] that the curves (P * t µ) t 0 and (P * t ν) t 0 satisfies the hypothesis of Theorem 5.2. If we denote (µ t s ) s∈[0,1] the entropic interpolation from P * t µ to P * t ν applying Theorem 5.2 we obtain for almost every t > 0 d dt C(P * t µ, P * t ν) = − grad P * t ν Ent, dsµ t s | s=1 L 2 (νt) + grad P * t µ Ent, dsµ t s | s=0 L 2 (µt) Then by the Jensen inequality and neglecting the second term we obtain for every t 0, 1 0 d 2 ds 2 Ent(µ t s )ds (Ent(P * t µ) − Ent(P * t ν)) 2 .
Hence we recover the (0, 1)-contraction property of the entropic cost proved in Theorem 37 of [12]. The result could, of course, be proven in R n for n 1.