Improved Regularity Assumptions for Partial Outer Convexification of Mixed-Integer PDE-Constrained Optimization problems

Partial outer convexification is a relaxation technique for MIOCPs being constrained by time-dependent differential equations. Sum-UpRounding algorithms allow to approximate feasible points of the relaxed, convexified continuous problem with binary ones that are feasible up to an arbitrarily small δ > 0. We show that this approximation property holds for ODEs and semilinear PDEs under mild regularity assumptions on the nonlinearity and the solution trajectory of the PDE. In particular, requirements of differentiability and uniformly bounded derivatives on the involved functions from previous work are not necessary to show convergence of the method.

regularity assumptions such that they are fulfilled for a broader class of problems and can be checked more easily. However, this prevents us from making a priori estimates available, which currently require those regularity assumptions.
In particular, we are dealing with the following MIOCPs which include a potentially unbounded operator A: where we assume A to be the generator of a C 0 -semigroup on a real Banach space X, J ∈ C(C([0, T ], X) × L 1 ((0, T ), U ), R), x ∈ C([0, T ], X) (i.e. being a mild solution of the semilinear equation), u ∈ L 1 ((0, T ), U ) for a real Banach space U , v ∈ L ∞ ((0, T ), R nv ) with v(t) ∈ V a.e. where V ⊂ R nv and |V | < ∞, and we assume the function f : [0, T ] × X × U × V → X being uniformly continuous in the first and Lipschitz continuous in the second and third argument. In particular, we assume that the integer control is not distributed in space. We assume the constraint function c : X × U → Y for some Banach space Y to be Lipschitz continuous in the first argument.
Problems of the type (MIOCP) can be equivalently reformulated by means of partial outer convexification, see the publications by Berkovitz [2], Cesari [3], and Sager [17][18][19]. These proofs were developed for ODEs, but can be applied in the presence of semilinear PDEs as in (MIOCP) without any modification. The partial outer convexification of (MIOCP) reads a.e. t ∈ [0, T ]. (RC) To describe the relationship between feasible points of (RC) and feasible points of (BC δ ) for a small δ > 0 and constructed by rounding, we introduce the following definition. Then, we call (φ n ) n a sequence of vanishing integrality gap.
The mentioned Sum-Up-Rounding algorithm is given below.
Definition 1.2 (Sum-Up-Rounding algorithm, [17,19]). Let 0 = t 0 < . . . < t N = T be a discretization grid of [0, T ] with maximum discretization width ∆t := max If the maximum is ambiguous, exactly one of the maximizing indices has to be chosen by arg max.
The algorithm can be summarized as follows. For the time intervals indexed by i = 0, . . . , N − 1, rounded controls are computed one after another. The index j ∈ {1, . . . , |V |} identifies the discrete control choices. On the first interval [0, t 1 ], the component of ω corresponding to the largest component of t1 0 α is set to one; the others are set to zero. For all subsequent intervals indexed by i, the algorithm computes the integrated difference between α and ω up to the time points t i , the so-called integrated control deviation. The obtained vector is added to ti+1 ti α and the rounding is computed afterwards using the maximizing index of this sum of vectors. In this way, the outcomes of previous rounding decisions are taken into account, which enables the following proposition that is due to Sager and states that Sum-Up-Rounding indeed yields sequences of vanishing integrality gap. Proposition 1.3 (Sum-Up-Rounding yields Vanishing Integrality Gap, [19]). Let α ∈ L ∞ ((0, T ), R |V | ) solve (RC) and β n denote the binary control be computed from α with maximum discretization width 1 n by means of Sum-Up-Rounding. Then, the sequence of control deviations φ n := α − β n satisfies for a constant C > 0. That is, each coordinate sequence of (φ n ) n is of vanishing integrality gap.
Remark 1.4. Note that it is necessary to relax the algebraic constraint by an arbitrarily small δ > 0 in (BC δ ) to avoid the situation of a degenerate feasible set, cf. ( [3], Chap. 18.7).
Remark 1.5. In work under review [13,15], the authors extend the theory for additional "combinatorial" constraints of the form 0 ≤ c(x(t), u(t), v(t)). Some of the results presented there can be included in the PDE setting here without any problems. We do not want to elaborate on that here and just note that the reformulation there shows that vanishing constraints 0 ≤ β n,i (t)c(x(t), u(t), v i ) can be taken care of by a Sum-Up-Rounding variant described in [13,15], and that the claim of Proposition 1.3 still holds.
If the involved initial value problem (IVP) provides enough regularity, this implies y n → x and one obtains a result of the following form. Proposition 1.6. Let α, u and the corresponding solution x of the IVP be feasible for (RC), (β n ) n be constructed such that (φ n ) n is of vanishing integrality gap. Furthermore, let additional regularity assumptions on the IVP hold (see below). Then, the sequence of state vectors (y n ) n corresponding to (β n ) n and u satisfies y n → x.
Furthermore, by continuity of J and c J(y n , u) → J(x, u) and c(y n , u) → c(x, u).
In absence of the constraint c, Proposition 1.6 implies that the infimal value of (BC 0 ) coincides with the minimal value of (RC) and a sequence of feasible points exists such that the infimal value is approached. This holds true regardless of whether a minimizer of (BC 0 ) exists or not. In the presence of the constraint, Proposition 1.6 implies that a sequence of points exists such that for all δ > 0 there exists n δ such that all subsequent elements of the sequence are feasible for (BC δ ) and the objective converges to the minimizing value of (RC). In this case, convergence to the infimal value of (BC 0 ) cannot be guaranteed as the constraint can lead to a degenerate feasible set, see an example by Cesari ( As indicated by the requirements of Proposition 1.6, some regularity assumptions on the PDE are needed to obtain the convergence y n → x. Sufficient regularity assumptions have been provided in the presence of ODEs by Sager in [19] and in the presence of semilinear PDEs by Hante and Sager in [11] Theorem 1. In particular, they require for some C > 0 a.e. on 0 < s < t < T and i ∈ {1, . . . , |V |} where (T (t)) t≥0 denotes the semigroup generated by A. In [10], the results are extended to a class of hyperbolic PDEs where regularity conditions involving differentiability of the mapping x → f (x, u, v i ) and piecewise smooth controls are required to prove the result, see Hypothesis 3 and the results thereafter in [10]

Contribution
We generalize the existing results on the ability to approximate solution trajectories for (RC) with binaryvalued ones feasible for (BC δ ), computed with Sum-Up-Rounding, to a class of semilinear PDEs. In particular, we consider the following IVPs, under the regularity assumption f (x(s), u(s), v i ) ∈ L 1 ((0, T ), X) for i ∈ {1, . . . , |V |} and with regard to Definition 1.1. In particular, Lipschitz continuity of f in x and u and the availability of mild solutions will do. Furthermore, we characterize the convergence of (β n ) n to α by means of weak( * ) topologies in L p -spaces in Theorem 3.4.

Structure of the remainder
We state our main statement and a setup comprising a broad class of PDEs and corresponding control problems for which it holds in Section 2. Furthermore, we point out its consequences for the existing theory of Sum-Up-Rounding and partial outer convexification. In Section 3, we prove the aforementioned approximation result. Therefore, we combine of the convergence of the β n to α in a weak sense with a compactness result provided by semigroup theory and the findings in [20]. We demonstrate the results on a computational example in Section 4. Finally in Section 5, we summarize our results in relation to the literature discussed above. Furthermore, we put the results in context of the Filippov-Ważewski theorem where related questions have been studied outside the mixed-integer optimization context several decades ago.

Main statement and consequences
As mentioned above, mild solutions are the solution concept of semilinear PDEs with which we will work in the remainder. Therefore, we recall its definition and existence and uniqueness.
. Then, the function x ∈ C([0, T ], X) defined by means of the variation of constants formula . Now, we state our main result, which will be proven as Theorem 3.7 in Section 3. [19]). Let α ∈ L ∞ ((0, T ), R |V | ) such that α L ∞ ≤ 1, (β n ) n be binary-valued functions such that the coordinate sequences of (φ n ) n defined by φ n := α − β n are of vanishing integrality gap. Let x, y n for n ∈ N be the unique mild solutions of (1.1) and (1.

Proposition 2.3 (Extension of Thm. 2 in
Then, We point out the achievement of proving Proposition 2.3 below.
Remark 2.4. In particular, we have strengthened the results from the literature as follows.
(1) For the ODE-case, the regularity assumptions (6c) in Theorem 2 and (17) in Corollary 6 in [19] that which is a trivial corollary with the choice A := 0 and X = R n .
(2) For semilinear PDEs whose differential operator generates a C 0 -semigroup (T (t)) t≥0 , the prerequisite H 2 in (Thm. 1 of [11]) that for all t ∈ [0, T ], the function s → T (t − s)f (s, y(s), u(s)) is a piecewise H 1 -function and Feasible setups for the IVPs can be validated by checking the prerequisites of Corollary 2.6.
To provide a self-contained article, we state and prove the following proposition summarizing the relationship between (RC) and (BC δ ). It follows from a continuity argument.
Let (β n ) n be binary-valued functions such that the coordinate sequences of (φ n ) n defined by φ n :=ᾱ − β n are of vanishing integrality gap. Then, for every δ > 0, there exists (y δ ,ū, β δ ) being feasible for (BC δ ) such that Proof. By continuity of J and c, the fact that there exists ε > 0 such that By Proposition 2.3, there exist C r > 0 and n 0 ∈ N such that for all n ≥ n 0 x − y n C([0,T ],X) < min{δ, ε} holds. We choose β δ := β n0 and y δ := y n0 and the claim follows. Now, we establish a broad setting where (1.3) holds and which can be checked more easily.
Corollary 2.6. Let α ∈ L ∞ ((0, T ), R |V | ), (β n ) n be binary-valued functions such that the coordinate sequences of (φ n ) n defined by φ n := α − β n are of vanishing integrality gap, let u ∈ L 1 ((0, T ), U ), and let f : [0, T ] × X × U × V → X be continuous in the first and uniformly Lipschitz continuous in the second and third argument. Then, Proof. First, we note that plugging an L 1 ((0, T ))-function into a uniformly Lipschitz continuous function yields another L 1 ((0, T ))-function. We observe that is the mild solution of (1.1) and are the mild solutions of (1.2). Then, we apply Proposition 2.3.

Proof of Proposition 2.3
We approach the main statement in several steps. First, we show that (φ n ) n being of vanishing integrality gap implies t 0 φ n f → 0 uniformly for f ∈ L 1 ((0, T ), X). Teaming this insight up with some compactness arguments, we show (1.3) for a broad class of semilinear PDEs under mild regularity assumptions. Finally, we generalize the result from continuous functions to piecewise continuous ones.
3.1. Vanishing integrality gap for L 1 ((0, T ), X)-functions By means of an approximation argument, we show the following result which enables us to relax assumptions made for the proofs of previous results that relied on the direct applicability of an integration by parts formula. Proof. Let C φ := sup n∈N φ n L ∞ , which exists by assumption. Let ε > 0.
We use that fact that C ∞ ([0, T ], X) Due to the convergence of Φ n , there exists n 0 ∈ N such that for all n ≥ n 0 , we have Putting the estimates together, we arrive at for all n ≥ n 0 . which is nowhere differentiable. Furthermore, we consider the following sequence of functions φ n : [0, 2π] → [−1, 1] for which we have an equidistant discretization step width 2π 2 n which makes this example straightforward.
The sequence φ n was chosen such that If k ≥ n, the sin terms oscillate inside the constant segments of f n and cancel each other there. If k ≤ n − 2, f n oscillates and cancels itself within segments where sin has the same sign and is symmetric with respect to the extreme point in this segment. By means of Lebesgue's dominated convergence theorem, we obtain  1) can now be proven quite easily similar to the reasoning in [19]. However, as we have promised a more general result that works for semilinear PDEs as well, we are going to invest some extra effort.
Using the proof of Lemma 3.1, we can characterize the convergence of (β n ) n and (φ n ) n by means of weak topologies, which is done in Theorem 3.4 below. Then, Proof. We employ Lemma 3.1 with X = R to obtain φ n * 0 in L ∞ ((0, T ), R |V | ). This implies β n * α in L ∞ ((0, T ), R |V | ). The other claims follow immediately as we have tested with L 1 -functions and L p ⊂ L 1 for p > 1 on finite measure spaces.

Approximation error of binary controls generated by Sum-Up-Rounding
Before we can prove our result, we need the following two preparatory lemmata. The first transforms a pointwise convergence into a uniform one.
Lemma 3.5. Let X be a Banach space, (T (t)) t≥0 be a C 0 -semigroup on X, f ∈ L 1 ((0, T ), X). Then, Proof. We note that t → T (t) op is dominated by an exponential function on compact intervals, a standard result e.g.
An application of Lebesgue's dominated convergence theorem finishes the proof.
The second shows that a certain sequence of functions in C([0, T ], X) is relatively compact. Then, the set {ν n : n ∈ N} is relatively compact in L p ((0, T ), X) for p ∈ [1, ∞) and C([0, T ], X) in the norm-topology.
Proof. Again, we set C := sup t∈[0,T ] T (t) op . Due to the absolute continuity of the Bochner integral, we know (ν n ) n ⊂ C([0, T ], X). Note that the uniform boundedness of (φ n ) n and the boundedness of T (t) on compact intervals already used in the proof of Lemma 3.5 imply the uniform boundedness of (ν n ) n . We prove the claim by employing Theorem 1 in [20] by Simon, which is a practical application and extension of the Arzelà-Ascoli theorem.
Hence, using Theorem 1 of [20], we have to verify the following two conditions. B t1,t2 := t2 t1 ν n (t)dt : n ∈ N ⊂⊂ X for all 0 < t 1 < t 2 < T Regarding (3.1), we use that ν n (t) → 0 pointwise and (ν n ) n is uniformly bounded. Thus, we can employ Lebesgue's dominated convergence theorem, which yields t2 t1 ν n (t)dt X → 0 for all 0 < t 1 < t 2 < T . Hence, B t1,t2 consists of the elements of a Cauchy sequence and is therefore relatively compact in X.
To show (3.2), we observe For the integrand of the first term, we get t+h] (s)ds and convergence to zero for h ↓ 0 by Lebesgue's dominated convergence theorem independent of the specific choice of φ n . For the second term, we estimate By means of Lebesgue's dominated convergence theorem, we get Another application of Lebesgue's dominated convergence theorem gives Equipped with Lemmas 3.1 and 3.6, we are enabled to generalize the approximation result (1.3) from the settings in [13] and [11] for mild solutions of semilinear PDEs whose differential operators generate C 0 -semigroups. This is the statement of Theorem 3.7 below, which implies Proposition 2.3. Theorem 3.7. Let X be a real Banach space and A be the generator of a C 0 -semigroup (T (t)) t≥0 . Let α ∈ L ∞ ((0, T ), R) with 0 ≤ α ≤ 1 a.e., (β n ) n ⊂ L ∞ ((0, T ), R) be binary-valued functions and u ∈ L 1 ((0, T ), U ) be such that x is the unique mild solution of (1.1) and y n are the unique mild solutions of (1.2) for n ∈ N and that (φ n ) n with φ n := α − β n is of vanishing integrality gap and f i (s) := f (s, x(s), u(s), v i ) is in L 1 ((0, T ), X).
Furthermore, let ε > 0. Then, there exist n 0 ∈ N such that for all n ≥ n 0 , we obtain: Proof. Let t ∈ [0, T ]. As the mild solutions x, y n are continuous, we can evaluate them and use the variation of constants formulas (2.1) and (2.2) to compute their difference As done in [19], we insert a zero and obtain for all n ≥ n 0 and some n 0 ∈ N. The application of Grönwall's inequality finishes the proof.

Computational example
We provide an example to demonstrate our findings computationally, and consider a problem where differentiability cannot be assumed. In more detail, we consider the IVṖ x(0) ≡ 0.5 (4.1) in one spatial dimension, i.e. Ω = [ , r]. We assume a constant influx of 0 on the left side of the domain and do not impose any condition at the right boundary of the domain, which can be interpreted as a free outflow of the domain. It is well known that −∂ s generates the right translation semigroup, see Example 2.9 of [16]. In particular, the translation semigroup does not provide smoothing properties like the heat semigroup or other semigroups associated with parabolic equations do. We choose f 1 to be a nowhere differentiable Weierstraß function in time, see Example 3.2, multiplied by a constant function in space and f 2 (t) ≡ 0. As Sum-Up-Rounding does not require optimality of the (forward) solution it approximates, we may choose α somewhat arbitrarily for the purpose of demonstration. We use α 1 = α 2 ≡ 0.5. Clearly, Sum-Up-Rounding produces a chattering that approximates the constant function 0.5 weakly. We have visualized this in Figure 1 for a coarse and a finer rounding grid. We discretize the time horizon [0, 10] and Ω into 4096 intervals each and solve with the Lax-Friedrichs scheme for hyperbolic conservation laws, see for example Leveque's monograph [14] for the details. To compute the value of the right-most cell with the Lax-Friedrichs scheme, we add a ghostcell with zero-order extrapolation, see (Sect. 7.2.1 of [14]). Let ω (1) , . . . , ω (6) denote the sequence of Sum-Up-Rounding approximations of α with N (1) = 128, . . . , N (6) = 4096 rounding intervals. Let y (1) , . . . , y (6) denote  the corresponding solutions of (4.1) with ω (n) instead of α. We have computed the relative error for n = 1, . . . , 6. As the Weierstraß function cannot be evaluated exactly, we have approximated it by including k = 1, 10, 100 summands of its defining cosine series. The convergence of d (n) is very similar for the three choices of k. When only one summand of the cosine series is included, i.e. the smoothness is highest, the convergence is a little faster than when more summands are included. In numbers, we have d (6) = 1.6642 × 10 −3 for k = 1, d (6) = 2.1519 × 10 −3 for k = 10 and d (6) = 2.1521 × 10 −3 for k = 100. To see the whole process, we have visualized convergence of d (n) in Figure 2. To visualize the violation of the differentiability in the right hand side, we have plotted the approximants of the Weierstraß function used for our computations in Figure 3.

Conclusion
As mentioned before, previous proofs employed the integration by parts directly on t 0 φ n,i (s)T (t − s)f i (s)ds. As differentiability of φ n,i is not available, the demand of a certain amount of differentiability to s → T (t − s)f i (s) was inherent to them. Lemma 3.1 allowed us to shift the integration by parts to a smooth approximation of the L 1 -function. However, we would like to stress that the approximation argument in Lemma 3.1, which allows to extend our proof without the previous differentiability assumptions, currently prevents us from finding a priori estimates on the approximation error as they are available in [10,11,13,19]. The compactness argument in Lemma 3.6 allowed us to deduce strong convergence from weak convergence. This is in particular valuable because the requirement of continuously differentiable solution trajectories might not be very restrictive for ODEs, but can be quite restrictive for PDEs.
Our findings can be interpreted as a constructive and algorithmic complement to the Filippov-Ważewski theorem [7,21], which states that the solutions of a set of differential inclusions with set-valued nonlinear term are dense in the set of differential inclusions with convexified nonlinear term under similar conditions, see [4,8] for the case of semilinear evolution equations based on C 0 -semigroups. The same idea has been pursued by Gamkrelidze in [9] where he discovered that the infimal value of an OCP can be approximated with trajectories emanating from feasible controls even when no feasible limiting control exists. The optimal state trajectory is called optimal sliding state by him.
With respect to the numerics, we point out that the fact that refining the grids on which the IVPs are solved can lead to a loss of (piecewise) differentiability in the limit if the IVP does not have a differentiable solution, but the discretizations do. As far as the question of convergence of the MIOCP approximation process as n → ∞ is concerned, our results show that this loss of (piecewise) differentiability is no cause for concern anymore. Remark A.3. The existence of the bilinear mapping in Proposition A.2 takes care that the integration of step functions w.r.t µ can be defined properly with sums. Then, µ-integrable functions f are those for which Proof. The scalar case can be found in many analysis textbooks. For the vector-valued case, one can e.g. apply Lemma 1.3.3 from [1] to obtain f * ρ n → f in · L 1 for f ∈ L 1 (R, X) and (ρ n ) n being a mollifier. The choice for the smooth mollifier to have f * φ n ∈ C ∞ can be the same as for the scalar-valued case. Extending f ∈ L 1 ((0, T ), X) to L 1 (R, X) by setting it to zero on R\(0, T ) allows the application of the convolution.