Optimal control of anisotropic Allen-Cahn equations

This paper aims at solving an optimal control problem governed by an anisotropic Allen-Cahn equation numerically. Therefore we first prove the Fr\'echet differentiability of an in time discretized parabolic control problem under certain assumptions on the involved quasilinearity and formulate the first order necessary conditions. As a next step, since the anisotropies are in general not smooth enough, the convergence behavior of the optimal controls are studied for a sequence of (smooth) approximations of the former quasilinear term. In addition the simultaneous limit in the approximation and the time step size is considered. For a class covering a large variety of anisotropies we introduce a certain regularization and show the previously formulated requirements. Finally, a trust region Newton solver is applied to various anisotropies and configurations, and numerical evidence for mesh independent behavior and convergence with respect to regularization is presented.


Introduction
The goal of this paper is to study the optimal control of anisotropic phase field models describing interface evolution.These models are successfully applied, e.g., for anisotropic solidification processes like crystal growth (see also [13] and references therein).The defining equations are given by a gradient flow of a Ginzburg-Landau energy.Here several ansatzes exist to incorporate anisotropy.In the pioneering paper [18] the author considers convex anisotropies in order to obtain a well-posed problem.In [14,20,21,23,27] various approaches are taken to enlarge this also to non-convex anisotropies by adding regularization terms or changing the structure of the energy functional.We pursue to define the Ginzburg-Landau functional resembling [18], i.e., E(y) := Ω εA(∇y) + ε −1 ψ(y) dx, (1.1) but using a different class of anisotropy functions A : R d → R where A(p) = 1 2 |γ(p)| 2 with a so-called density function γ : R d → R ≥0 (see e.g., [15]).The first part of the functional represents the surface energy while the potential ψ drives the order parameter y to the pure phases given by the local minimizers ±1.The interface thickness is proportional to the variable ε > 0. The scaled L 2 -gradient flow of (1.1) yields the anisotropic Allen-Cahn equation which defines the state equation for our control problem.This reads as min J(y, u) := 1 2 y(T ) − y Ω 2 subject to the quasilinear parabolic state equation with potentially nonsmooth A Q ε∂ t yη + εA (∇y where Ω ⊂ R d is a bounded Lipschitz domain, Q := [0, T ] × Ω denotes the space-time cylinder, Σ := [0, T ] × ∂Ω its boundary and the target function y Ω ∈ L 2 (Ω) as well as the initial state y 0 ∈ H 1 (Ω) are given.The weak formulation implies the boundary condition A (∇y) T ν = 0 on Σ where ν is the outer normal.Furthermore note that the weight of the control cost is divided by ε since from practical observations one expects contributions only in vicinity of the interface.
Usually the density function γ in A(p) = 1 2 |γ(p)| 2 is assumed to be a positive 1-homogeneous function in C 2 (R d \{0}) ∩ C(R d ) (see, e.g., [4,13,15,16]) providing absolutely 2-homogeneity of A. Consequently A is absolutely 0-homogeneous and therefore it does not exist at the origin unless γ is an energy norm.Hence the control-to-state operator may not be differentiable.Since numerical methods for nonsmooth optimal control problems are still in their infancy this is problematic for efficient solvers.For a nonsmooth quasilinear elliptic control problem a semismooth Newton method is applied to a relaxed optimality system in [11].To the best of our knowledge, globally convergent methods for parabolic equations without extra regularity requirements do not exist.To circumvent this problem, the present approach is to consider a regularized A by modifying the function γ.We give the details later, when we have introduced the specific form of A which was first proposed in [3,4].Furthermore, while for the numerical experiments we use the smooth double-well potential ψ(y) = 1 4 (y 2 − 1) 2 , the analysis holds for more general ψ.This is also the case for A. The functions A and ψ shall fulfill at least the assumptions from [6] listed below under a. to guarantee the existence of the optimal control and the existence and convergence of the time discretized optimization problem.The assumptions are further restricted to obtain differentiability of the reduced cost functional.These conditions are met by the regularized A as can be seen in Section 4. Let ψ ∈ C 1 (R) be bounded from below and such that it can be approximated by Furthermore for the time discretization the restriction max j τ j < ε 2 /C ψ on the time steps τ j := t j − t j−1 holds.b.Assume in addition A ∈ C 2 (R d ) with bounded A and let ψ ∈ C 2 (R) where the Nemytskii operator given by ψ is continuous from H 1 (Ω) to L q (Ω) for some q > max{d/2, 1}.
Let us mention that one can find p > 2 with H 1 (Ω) → L p (Ω) and 1 q + 2 p < 1, e.g., when d ∈ {1, 2} choose some p ∈ ( 2q q−1 , ∞) and for d ≥ 3 choose p = 2d d−2 .Such a p will be used in the following.Note that the assumptions imply that A is uniformly positive definite and ψ ≥ −C ψ holds.Furthermore, the double-well potential fulfills the condition if d ≤ 3 since ψ induces a continuous Nemytskii operator from L 2q (Ω) to L q (Ω) (see, e.g., [30], Prop.26.6) and the imbedding The outline of this paper is as follows.In Section 2 we study under above assumptions the Fréchet differentiability of the reduced cost functional for equations (1.5)- (1.6).As a first step we analyze the differentiability of the state equation in one time step.Then, due to the implicit discretization one can successively prove differentiability of the control-to-state operator and of the reduced cost functional.The corresponding time discrete adjoint equation is deduced rigorously.Subsequently, in Section 3 we give sufficient conditions on the regularization A δ of A such that the corresponding states converge to the solution of the originally given state equation.Furthermore, also the convergence of a subsequence of global minimizers u δ τ with respect to the regularization parameter δ and the time discretization coarseness τ is addressed.While these results hold under above assumptions on A, in the subsequent section we study the class of anisotropies given in [3,5].We introduce a regularization for this class by adjusting γ.Furthermore we show that A fulfills Assumption 1.1.a,and that in addition Assumption 1.1.bas well as the conditions in Section 3 hold for the regularizations.In the final section we first set up formally the linearized equations needed for a trust region Newton solver applied to the only in time discretized control problem.We provide numerical evidence for convergence with respect to the regularization and for iteration numbers independent of the discretization level.Finally numerical results for various facets of the anisotropy and different configurations are presented.

Fréchet differentiability of the reduced cost functional for the time discretized problem
In this section we investigate the Fréchet differentiability of the cost functional j τ (u τ ) := J(y τ (u τ ), u τ ) for the time discretized optimal control problem reduced to the control u τ when the Assumption 1.1 holds.Hence the first order optimality system can be shown rigorously.Furthermore its derivative is needed for the numerical optimization solver.
As a first step the Fréchet differentiability of the discrete control-to-state operator S τ : U τ → Y τ of equation (1.6) is shown.Here, the idea is to prove it for a single time step and then to apply the chain rule.Let us recall, that the solution operator S τ : U τ → Y τ of equation (1.6) is given by mapping u τ , correspondingly (u 1 , . . ., u N ), to y τ = S τ (u τ ) determined by (y 1 , . . ., y N ) with Here S : L 2 (Ω) → H 1 (Ω), g → y is defined as the solution operator of the quasilinear elliptic problem (A (∇y), ∇ϕ) Note that under Assumption 1.1.athe left-hand side defines a strongly monotone operator.In [6] we have shown the unique existence of the solution.Let us mention that with the restriction on the space dimension d ≤ 3 a result from [10] for quasilinear elliptic equations with controls on the Neumann boundary provides solutions in L ∞ (Ω) if in addition A (0) = 0.Here the restriction on d is due to the use of Stampacchia's method.
The next auxiliary lemma is obtained by subtracting the defining equations, testing with y − ỹ and using strong monotonicity of A and ζ.
Lemma 2.1.Let Assumption 1.1.ahold.Then the solution operator S for equation (2.2) is Lipschitz continuous with a constant independent of τ for small enough τ , i.e., to be more precise for g, g ∈ L 2 (Ω) and y = S(g), ỹ = S(g) it holds Let us mention that for y τ = S τ (u τ ) and ỹτ = S τ (ũ τ ) it holds also (see Theorem 4 of [6]) Due to difficulties related to a required norm-gap for the differentiability of the A -term, the implicit function theorem is not applicable directly (cf.[24]).Since it is more illuminating, we follow the approach in [7,9,10] to show Gâteaux differentiability.We have to add some work afterwards in order to upgrade to Fréchet differentiability.
Theorem 2.2.Let Assumption 1.1 hold.Then the solution operator S : 2) is Gâteaux differentiable and the directional derivative S (g)v = z is given with y = S(g) by z ∈ H 1 (Ω) such that Furthermore there exists a C independent of g ∈ L 2 (Ω) and τ with Proof.Due to the Assumption 1.1 the bilinear form defined by the left-hand side of (2.6) is elliptic with an ellipticity constant independent of y = S(g) and is continuous given that . Hence the Lax-Milgram theorem provides existence and uniqueness of the solution of equation (2.6) for v ∈ L 2 (Ω) and-using ζ (s) ≥ − 1 ε 2 C ψ + 1 τ ≥ c > 0 for small enough τ -the estimate (2.7) holds for the solutions independently of g and τ .For v ∈ L 2 (Ω) and ρ > 0 let us consider (A (∇y ρ ), ∇ϕ) + (ζ(y ρ ), ϕ) = (g + ρv, ϕ) . (2.8) Subtracting the equation with ρ = 0 and dividing by ρ, we obtain (2.9) Lemma 2.1 yields for z ρ := 1 ρ (y ρ − y) ∀ρ. (2.10) Therefore there exists a subsequence with z ρn z in H 1 (Ω).We now show that this solves equation (2.6) by taking the limit in equation (2.9).This implies that z is in fact the desired Gâteaux derivative.For the first term we have Hence z fulfills equation (2.6).Since the limit is given uniquely by the latter equation, the whole sequence (z ρ ) ρ≥0 converges weakly to z in H 1 (Ω).
It remains to show the strong convergence in H 1 (Ω).Due to the compact imbedding into L 2 (Ω) only the part ∇z ρ → ∇z in (L 2 (Ω)) d is left.For this we consider the sequence {L T ρ ∇z ρ } ρ>0 where L ρ is the Choleskydecomposition of the s.p.d.-matrixA (w ρ ).From boundedness and uniformly positive definiteness of A , we obtain c ≤ L ρ (x) ≤ C with constants independent from ρ and x.Since A (w ρ ) → A (∇y) in (L 2 (Ω)) d×d , from the resulting almost everywhere convergence and just stated boundedness one can verify by dominated convergence that where L is the Cholesky-decomposition of A (∇y).Furthermore we have using (2.9) in the intermediate value formulation and (2.10).So we can extract from the sequence {L T ρ ∇z ρ } ρ>0 a weakly convergent subsequence whose limit is L T ∇z, due to the strong convergence of L ρ .Due to the uniqueness of the limit also the whole sequence converges weakly in L 2 (Ω) d .Furthermore, there exists a p < p with and with that we can even deduce L T ρ ∇z ρ → L T ∇z in L 2 (Ω) d .Furthermore there exists some dominating function m ∈ L 2 (Ω) with |L T ρ ∇z ρ | ≤ m.Finally, from the pointwise relations we get by dominated convergence that ∇z ρ → ∇z in L 2 (Ω).
The following theorem upgrades the last result to Fréchet differentiability.
Subtracting the defining equations for z n and z, testing with (z n − z) ∈ H 1 (Ω) and inserting 0 terms yields using the ellipticity of A (s) and ζ (s) with constants independent of s.Given the estimate (2.7) provides that the left-hand side goes to 0 as n → ∞.Hence S is Fréchet differentiable.
We now consider the solution operator S τ : U τ → Y τ of equation (1.6).On each time interval I j we have ).Using the previous shown result for S as well as the chain rule we obtain for j = 1, . . ., N where we used z 0 := 0 and by induction we can state the following theorem.
Theorem 2.4.Let Assumption 1.1 hold.Then the operator S τ : U τ → Y τ is Fréchet differentiable and consequently also the reduced cost functional j τ : , where z N is given by the solution of the following sequence with z 0 := 0 (εA (∇y j )∇z j , ∇ϕ) We note that the z j satisfy a linearized state equation given by z τ = S τ (y τ )v τ ∈ Y τ .Furthermore, equation (2.13) is the dG(0) discretization (as used for the state equation) of the in time continuous, linearized state equation (2.14) Given Assumption 1.1 the unique solvability of equation (2.14) is guaranteed by standard results on parabolic equations.
Let us mention that, while for the forward problem it may be more efficient to use the semi-implicit scheme of [5] where A (∇y i ) is approximated by M (∇y j−1 )∇y j , to show Fréchet differentiability with the above technique of applying the solution operator S recursively, higher regularity properties are required.In particular, to our best knowledge, the gradient of the previous time step solution has to be bounded in L ∞ (Ω) as it appears in the ellipticity coefficient [28].
Let us now define for given y the adjoint equation in the time continuous setting: (2.15) After the substitution t → −t as for the linearized equation (2.14) the existence of a unique adjoint p as a solution of equation (2.15) follows.In analogy to the discretization of the state equation, but taking into account the backward-in-time nature, we use the piecewise constant time discrete p τ ∈ P τ , where with Îj := [t j−1 , t j ) and use the notation p N +1 := p(T ).The Galerkin scheme (εA (∇y j )∇ϕ, starting with p N +1 = y N − y Ω then determines the approximation p τ of p.Given (2.6) and the symmetry of S (g j ) with g j = 1 ε u j + 1 τ j y j−1 and y j = S(g j ) we have With (2.12) this leads to for j = N, . . ., 1, and consequently we have Altogether, we have shown Corollary 2.5.Under Assumption 1.1 the reduced cost functional j τ : U τ → R is Fréchet differentiable and the derivative can be represented as where p τ is the solution of the discrete adjoint equation (2.16).

Convergence with respect to a regularization of A
In the previous section A had to fulfill Assumption 1.1.aand b.However, as mentioned in the beginning, anisotropy functions A typically fulfill only Assumption 1.1.a.In order to guarantee Fréchet differentiability for the numerical approach we regularize such an A to A δ so that in addition Assumption 1.1.bholds.An example of regularization is given and discussed in equation (4.6).
In this section we consider the dependence on δ of the solutions of the in time discretized optimization problem (1.5).To consider convergence with δ → 0 we need that A δ → A .However, the results of this section do not require Fréchet differentiability yet, such that Assumption 1.1.aon A δ is sufficient.We denote by y τ ∈ Y τ the solution of equation (1.6) with A, while y δ τ ∈ Y τ shall be given as the solution of the regularized equation and y δ τ (0, •) = y 0 .As before we define the reduced cost functional by We note that to not overload the notation, j δ is used in place of j τ,δ as long as it is clear from the context that τ is considered fixed.The goal of this section is to derive a convergence result for minimizers of a sequence of j τ,δ to minimizers of j τ,0 in the limit δ → 0 and to minimizers of j when additionally τ → 0 holds.Therefore some convergence behavior of the δ-dependent solution y δ τ is needed that then is combined with results concerning τ → 0 from [6].This will be covered by the following two auxiliary results.Theorem 3.1.Let the Assumption 1.1.ahold and in addition let A δ be strongly monotone with a constant C A independent of δ and let |A δ (p) − A (p)| ≤ η(δ) for all p ∈ R d .Then, for fixed y 0 and u τ and max j τ j the solutions y τ (u τ ) and y δ τ (u τ ) of equations (1.6) and (3.1), respectively, satisfy the following estimate Proof.We note down the differences by a prescript ∆, e.g.
We note that here the constant C A,ψ,T depends exponentially on the interface thickness ε as can be seen in equation (3.6).When studying the dependence on ε-which is not subject of this paper-a more careful analysis in terms of ε possibly not based on the Gronwall Lemma is necessary.
Combining the estimate (3.3) with results in [6] we obtain: Corollary 3.2.Let the assumptions of Theorem 3.1 be fulfilled and u τ , ũτ ∈ U τ be given.Then the estimate Proof.Estimate (3.7) follows by zero completion with y τ (ũ τ ), triangle inequality and estimating the resulting terms by Theorem 3.1 and Theorem 4 of [6], respectively equation (2.5).

The regularization of a class of anisotropies
Before we continue with simulations for optimal control of anisotropic phase field models we have to specify the anisotropy function A. As mentioned in the introduction, this function typically is 2-homogeneous.This in general however conflicts with the requirement of A being twice continuously differentiable.Therefore this section's goal is to specify the employed A, to introduce an appropriate regularization A δ and to show that A satisfies Assumption 1.1a.and A δ fulfills in addition Assumption 1.1b.This guarantees that the results from [6] and the preceding chapters can be applied.First, recall from the introduction that A can be written as where the so-called density function γ : ) shall be positive 1-homogeneous.The terminology 'density function' goes back to the study of sharp interface models, where the surface energy of the interface between a solid and liquid phase, say, is given by |Γ| γ = Γ γ(ν(s)) ds.In the isotropic case γ(p) = |p| this would reduce to the area of the interface |Γ| γ = |Γ|.The authors of [1,15] show for the Allen-Cahn equation (1.3)-withA defined as in equation (4.1)-that in the limit ε → 0 the zero level sets converge to a sharp interface Γ moving with V = γ(ν)κ γ if u = 0.While there exist several approaches to define γ, like e.g., in [18] or in [13], we constrain ourselves to a class of anisotropies for which the density function γ is introduced in [3].The corresponding phase field ansatz is studied e.g., in [5].In the following they are referred to as BGN-anisotropies.They allow for the modeling and approximation of a large class of common anisotropies.Also they are well suited to model crystal growth, since crystals build characteristical faces.The basic observation is that for the metric (•, •) G defined by symmetric positive definite G, the surface area element can be expressed as [4]).This motivates the choice of the class of density functions γ given by and G l ∈ R d×d are symmetric and positive definite.Note that for p = 0 the derivative of A can then be computed as and A is continuous also at p = 0 with A (0) = 0.The second derivative exists for p = 0 and is given by where We note that A : R d \ {0} → R d×d is continuous and A (p) is positive definite with constants independent of p. Moreover we have Lipschitz-continuity and strong monotonicity of A .These properties follow from results in [16] where the authors need in addition to the given properties of γ, namely continuity on R d , twice continuously differentiability on R d \ {0}, positive homogeneity of degree one, γ(p) > 0 for p = 0, the following relation ∀p, q ∈ R d with p T q = 0 and |p| = 1.
The latter can be shown by an application of the Cauchy-Schwarz inequality where equality does not hold for p T q = 0 and the compactness of the set given by p T q = 0, p = q = 1.Our goal for regularizing A is that A δ ∈ C 2 (R d ) shall fulfill the requirements for the existence of an optimal control, that the derivative shall be simple to evaluate and that the influence on the interfacial region (i.e., ∇y ≈ 0) shall be little.Our approach is to modify the γ l , but one could also think of regularizing e.g., the quotient appearing in the sum in equation (4.3).Among various choices we considered the most promising was to alter the functions γ l by a small shift of δ, i.e., where δ > 0. This we use in the following and denote the resulting regularizations by A δ and γ δ .Both are now in C ∞ (R d ).A very convenient property for this choice is that γ δ (p) = γ((p, √ δ) T ) where γ is defined employing the matrices Gl := G l 1 .Hence one can also view the regularized anisotropy A δ on R d as an unregularized BGN-anisotropy Ã on R d+1 for which above properties hold.
The derivatives still have the same structures as in equations (4.3) and (4.4), namely though these hold in the regularized version for all p ∈ R d .Note that for L = 1, i.e., for A(p) = 1 2 p T Gp which includes the isotropic case, it holds A δ = A .This is a particularly convenient property as in this case A is already smooth by itself and hence there is no need for regularization anyway.Due to for δ > 0 the Lipschitz-continuity and strong monotonicity of Ã provide these properties for A δ with constants independent of δ.Moreover, since Ã induces uniformly equivalent norms on R d+1 the same holds for A δ .Hence A δ (p) is bounded independently of p.Using A δ (0) ≥ A(0) = 0 we obtain c|p| 2 ≤ A δ (p) with c > 0. This inequality holds also for A due to the 2-homogeneity.Moreover, all constants can be chosen independently of δ.The only exception is the upper bound in the growth condition A δ (p) ≤ A δ (0) + C|p| 2 due to A δ (0).Finally, Hölder continuity of A δ (p) with respect to δ also follows with the formulation (4.9) and the Lipschitz continuity of Ã .Summarized we can state Lemma 4.1.The mappings A δ : R d → R for δ ≥ 0 (with A δ=0 := A as shorthand notation) have the following properties: a) A δ fulfill the growth condition c|p| 2 ≤ A δ (p) ≤ C δ + C|p| 2 for all p with positive constants c, C, C δ , where only C δ may depend on δ. b) A δ are Lipschitz continuous and strongly monotone on R d with constants independent of δ and A δ (0) = 0. c) A δ induce uniformly equivalent norms on R d for δ > 0, i.e., there exist constants c 0 , C, such that and Furthermore, if δ = 0 the same holds true for all p = 0. d) A (.) (p) is Hölder-continuous with exponent 1/2 and with a constant independent of p. Especially, it holds

.10)
In particular the Assumption 1.1 is fulfilled if δ > 0, Assumption 1.1a.holds for A and the convergence assumption with respect to δ → 0 in Theorem 3.3 holds.

Numerical results
In the last part of this paper we report some numerical findings.When we consider fixed δ in this chapter, we drop the corresponding index in the relevant quantities to keep the presentation lucid.We also consistently use the smooth double-well potential ψ(s) = 1 4 (1 − s 2 ) 2 which defines C ψ = 1.Our numerical approach for solving the regularized optimization problem is to first discretize in time and then apply an optimization algorithm on this semi-discretized problem.The arising equations and first order condition have rigorously been analyzed in the previous chapters.Finally, each step in the algorithm is discretized also in space where we use global continuous, piecewise linear finite element approximations.
Preliminary numerical results have been obtained by a line search method based on the gradient ∇j τ .However, since we have not seen any relevant differences in the computed controls and states we only present here results using second order informations-which are formally derived in the following-to gain efficiency in the solver.As algorithm to solve for a local minimizer we apply the trust region Newton method [12] to the reduced cost functional j τ : U τ → R. It is a common globalization of Newton's method which is needed due to the non-convexity of the problem.The main idea is to determine at each iterate u τ an approximate solution δ ūτ of the quadratic subproblem min δuτ ≤σ where σ > 0 parameterizes the size of a trust region in U τ , where this model is considered to be sufficiently valid.
The proper choice of σ is controlled by the trust region method.The solution to equation (5.1) is determined by the Steihaug-CG method [22], which iteratively applies the CG-steps to the the first order condition of the unconstrained version of the subproblem ∇ 2 j τ (u τ )δu τ = −∇j τ (u τ ), and additionally handles the cases when a CG-iterate δ ũτ exceeds the trust region boundary or j has nonpositive curvature at u τ in direction δu τ , i.e., (∇ 2 j τ (u τ )δ ũτ , δ ũτ ) ≤ 0. Finally the new trust region iterate is set to u τ + δ ūτ .Let us summarize for convenience the formulas of the last sections: where y τ = S τ (u τ ) is given by the state equation (1.6) and p τ = (S τ (u τ )) * (y τ (T ) − y Ω ) is given by the time discrete adjoint equation (2.16).The derivative of y τ in direction of δu τ we denote by δy τ , i.e., δy τ solves the linearized state equation (2.13) with v = δu τ .This derivative is employed to compute the Hessian as can be seen below.
The second derivative required only here for the Newton approach (see subproblem (5.1)), in particular its action on an L 2 -function, we deduce formally.We obtain Here, for the given solution p τ (u τ ) of equation (2.16) the derivative fulfills the so called additional adjoint equation.In time discrete form it is given by δp N +1 := δy N and ε (ψ (y j )ϕδy j , p j ) for j = 1, . . ., N. ( As for equation (2.16) the unique existence of the solution is guaranteed.The time-continuous counterpart is given by and they are related by the discontinuous Galerkin time discretization as for the adjoint equation.Note that each of the discretized equations has a time continuous counterpart (cf.equations (1.3), (2.15), (2.14) and (5.6)).In fact, equations (2.15), (2.14) and (5.6) are the equations you would expect to get as the adjoint and corresponding linearized equations for equation (1.3).Thus at least on a formal level it holds that the approaches first discretize then optimize and first optimize then discretize commute for the implicit time discretization in the sense of chapters 3.2.2 and 3.2.3 in [17].Also discretization and optimization are interchangeable for the spatial discretization if one chooses the same ansatz spaces for y and p.Consequently, one expects to obtain for the optimization solver iteration numbers independent of the discretization level.This is strengthened by numerical observations in Section 5.2.Note that in general the 3-tensor A δ appearing in equation (5.6) is not symmetric, so we have to keep care of order in the corresponding term.In the implementation, A δ is determined by automatic differentiation-stating an explicit formula does give no new insight.Moreover, since ∇y = 0 in the pure phases, one is particularly interested in the behavior of A δ (p) when p → 0. However, Ã ( √ δ( p 1 )) behaves like 1/ √ δ due to the 2-homogeneity of Ã (see (4.9)).Hence lim p→0 A δ (p) cannot be bounded independently of δ.Numerically we see this problem for values δ < 10 −13 .
To keep the computational cost moderate we set d = 2 in all experiments.Furthermore, throughout this section we use as a spatial domain the square Ω = (−1, 1) 2 and as the time horizon [0, T ] with T = 1.625 • 10 −2 , we set the parameters ε = 1 14π and λ = 0.01 and-if not mentioned otherwise-the regularization parameter δ = 10 −7 .We choose the constant time step size τ = 1.625 • 10 −4 , which fulfills the condition τ ≤ ε 2 /C ψ and Ω is uniformly discretized with 129 × 129 grid points.Moreover by numerical evidence we know that the interface is resolved sufficiently with 6-14 mesh points across the interface.Each computation started with u (0) ≡ 0.
We have looked at various set-ups that mainly vary by y 0 , y Ω and the final time T .For the anisotropies determined by G l we used three different choices, that are listed in the following.
1. isotropic case: γ(p) = p 2 , this would belong to the choice Note that in this case regularization is not necessary.In addition, the regularization would cancel out as can be seen in equation (4.6) and as it is also discussed in the corresponding text thereunder.2. regularized l 1 -norm: with some small parameter that we set to = 0.01 (not to be confused with the interface parameter ε).For = 0 this reduces to γ(p) = 1 where α l = π 3 l, l = 1, 2, 3 and as before.Note that in contrast to the choices in [4], we divide the matrices by their total number L. By this scaling the costs between the different anisotropies become more comparable, since by numerical observation the velocity of the shrinkage is approximately equal.This can also be seen on the Wulff shapes, which are defined as W := q ∈ R d : sup p∈R d \{0} p T q γ(p) ≤ 1 , see [29], and which are visualized for above choices of γ in Figure 1.Without the rescaling, the Wulff shape of the hexagon anisotropy would extend approximately to the label 2.0 on each axis.
As computing framework we used FEniCS [2] or rather its C++ interface DOLFIN [19].The simulations were carried out on an HP EliteDesk 800 G4 workstation containing an Intel Core i7-8700 CPU with 12 cores à 3.20GHz and 16 GB of RAM.
In the following subsections we support the convergence result of Theorem 3.1 concerning the regularization, we give numerical evidence for mesh independent behavior in the solution process and we present optimal control results for different anisotropies and different desired states, including star like objects and necessary topology changes.

Dependence on the regularization parameter δ
First we analyze numerically the dependence of the solution of the state equation on the parameter δ.As a setting we start from a circle of radius 0.5 and look at the evolution of the state only using u = 0.A plot containing the results for the choices of both anisotropies, respectively, is given in Figure 2. We have plotted both the difference of the states in L 2 (Ω) as well as H 1 (Ω) at the end point T .Since errors accumulate during the time evolution, the errors at T should be a good metric for comparison.With the additionally plotted function f (x) = x 1/2 the figure clearly exhibits the convergence order 1/2 which is expected according to equations (3.3) and (4.10), i.e., it equals the approximation order of A δ to A .
When considering the numerical solution of the optimization problem, according to our experience there is only weak dependence of the number of the trust region as well as of the Steihaug-CG iterations when varying δ.They stay nearly the same as in Tables 1 and 2 and are therefore not listed here.If δ is such small that rounding errors accumulate for A δ and even more for A δ (and consequently for the solutions of the respective equations) the algorithm may not converge.However, for δ ≥ 10 −10 the algorithm was always robust.

Mesh independent behavior
In this section we numerically investigate the mesh dependence of the problem solver.More concretely we look at the number of trust region iterations, called TR steps in the following tables, as well as at the number of Steihaug-CG steps that are needed to solve the quadratic subproblems.Since the Steihaug algorithm consists of early stopping criteria given by the trust region algorithm, their amount might change drastically during the progress of the algorithm.Therefore we rather look at the average amount of steps, called mean CG in the following tables, that are needed to decrease the residuum by 6 orders of magnitude for trust region steps where this kind of measurement is possible.In addition we also take the maximum amount, called max CG as  an indicator.These numbers of CG iterations reflect on the conditioning of the linear systems corresponding to the quadratic subproblems.As final time we choose in these experiments T = 2 × 10 −3 .The remaining parameters are left unchanged.We inspect the dependence on the space discretization by fixing τ = 10 −4 and varying h = 2/N .For analyzing the dependence on the step size τ we fix the spatial mesh size using N = 128.
As model problem for the isotropic case, we consider the control of a circle from radius r = 0.5 to r = 0.55, where the results can be found in Table 1.One cannot observe a clear tendency that would suggest dependency of the maximal or mean number of CG iterations and of the trust region steps on the granularity, as is expected by the discussion in the introduction of this section.Only the amount of total computing time increases with the number of unknowns.Let us mention that in case of τ = 10 −4 and N = 516 the reduced optimization problem has around 5.4 million unknowns given by the amount of discretization points of u.If τ = 10 −6 and N = 128, the number of unknowns is roughly 33.3 million.Due to the parallelization of the algorithm and the non-commutativity of floating point operations the results might vary slightly among runs sharing the same configuration.
Next we do the same analysis for the anisotropic Allen-Cahn equation with the regularized l 1 -norm.Here we choose y 0 and y Ω to be the same, i.e., we try to keep a square constant.The outcomes are listed in Table 2.  Again, almost no dependence on the discretization parameters is observed.Numbers for the average Steihaug steps as well as for the total trust region steps rather seem to ameliorate for more accurate computations.Finally, we list in Table 3 the results for the control of a circle to a star with four fingers as can be seen in Figure 3.Here more control is necessary and the solution process takes significantly more trust region steps.Also the number of CG iterations are increased compared to Table 2.While this indicates the dependency on the control configuration the results concerning the CG-iterations still show a behavior independent of the discretization level.A slight increase in the number of trust region steps is present.

Numerical examples for different desired states and anisotropies
Finally we present solutions for three different objectives: the evolution to star-like structures, the splitting of geometries, and the merging of geometries.For the presented figures, which show the evolution of the control u we employed a scaling of the color that was adjusted to the values at t ≈ T /2.Hence this allows to see where in Ω the control is present although its values may be clipped on some images.To see how much the system is controlled at which time, we in addition include figures showing the L 2 (Ω)-norm of the control over time.

Evolution to star-like structures
In the first experiment we start from a circle of radius 0.5 and try to steer it to a star-like structure with 4 or 6 fingers respectively.The images of the time evolution of the corresponding states and controls can be found in Figures 3 and 4. The state is given by the Allen-Cahn equation with the regularized l 1 -norm anisotropy for the '4-star' target, and with the 'hexagon' anisotropy for the '6-star' target.In addition we present the results in both cases for the isotropic evolution equation.In practice the choice of Allen-Cahn equation is given by the model equation and not by the desired state.
Qualitatively the main observation is that in all cases the control takes place in a neighborhood of the interface.Moreover, the evolution is controlled essentially in the second half of the time interval.This can in particularly be seen in Figure 5 where the L 2 (Ω)-norms of the control are plotted over the time t.On the t-axis we indicated the times at which the states and controls were sampled for Figures 3 and 4. While for the isotropic case the middle part seems to grow nearly linearly, for the cases of the regularized l 1 -norm and the hexagon anisotropy one observes bigger jumps intersected by approximately constant parts.The first plateau comes from the fact that at the first part the evolution follows the nearly uncontrolled Allen-Cahn flow to get a square-like, respectively a hexagon-like shape.Only then the control truly enters to initiate the development of the fingers with the strongly non-convex parts.From that point onwards more control is needed for the anisotropic cases than for the isotropic case but towards the end they approximately overlap.The last peak arises from the fact that much of the control is spent to form the details of the fingers in the last few time steps.
In Table 4 the computed (local) minima are listed together with their single constituents-the difference of the optimal state to the desired state j 1 = y − y Ω 2 L 2 (Ω) as well as the contribution of the control j 2 = λ 2ε u 2 L 2 (Q) .We note that the (local) optima for the isotropic case are slightly below their anisotropic counterparts.

Splitting and merging geometries
Finally we consider examples where topology changes are necessary to aim at the target.In Figure 6 we present the results for splitting a circle, a square and a hexagon into two of such, respectively.The underlying   Table 5.Values of the cost functional for splitting geometries.iso l 1 hexa j(u) 0.103955 0.0562286 0.0884921 j 1 + j 2 0.00409254 + 0.0998625 0.000728494 + 0.0555001 0.00115402 + 0.0873381 Table 6.Values of the cost functional for merging geometries.iso l 1 hexa j(u) 0.0666414 0.0496374 0.0588482 j 1 + j 2 0.00274715 + 0.0638943 0.0010941 + 0.0485433 0.000905395 + 0.0579428 While the hexagon is splitted by squeezing it together vertically, the circle is controlled to develop first a hole in the middle and then to increase the hole until the split is present.The square is divided at the whole middle line simultaneously-as far as we could see visually.The controls are largest at times where they force topology changes as can be observed in Figure 8.
Considering the examples for 'merging' (see Fig. 7) we observe a very similar behavior of the states as for the 'splitting' solutions, but backwards in time.There is less control necessary which is indicated by the values for j 2 in Tables 5 and 6 where the cost functionals are given as for the star like examples before.For the isotropic and hexagon example the splitting cost is higher by a factor of approximately 1.5.That the difference is not bigger is probably due to the short time interval that forces the evolution of the gradient flow to be accelerated to reach the target in time-a phenomenon present for splitting as well as for merging, with comparable impact.This also leads to the nearly constant time behavior for a long period as can be seen in Figure 8.
Altogether these examples have demonstrated that it is possible to steer to a variety of shapes even if they have a different topology as the initial state.Also targeting strong crystal-like structures is possible which might find use in material science or chemical applications.

Assumption 1. 1 .
a. Assume A ∈ C 1 (R d ) with A being strongly monotone and fulfilling the growth condition |A (p)| ≤ C|p|.

Figure 3 .
Figure 3. 'circle to 4-star' solutions: states in the first and third row and controls in the second and fourth row .

Figure 4 .
Figure 4. 'circle to 6-star' solutions: states in the first and third row and controls in the second and fourth row.

Figure 6 .
Figure 6.States (above) and corresponding controls (below) for the solution of splitting geometries.

Figure 7 .
Figure 7. State (above) and corresponding control (below) for the solution of merge geometries.

Table 4 .j 1 +
Values of the cost functional.j 2 0.0115987 + 0.0905854 0.0108916 + 0.0964865 0.0122366 + 0.102798 0.0156595 + 0.106821 model equation uses the corresponding (an-)isotropy.In Figure7the solutions of merging two of these objects into one are given.Here the target objects are the initial states of the splitting examples and vice versa.The norms of the corresponding controls over the time can be seen in Figure8.To avoid potential confusion, we point out that the scales of the ordinates are adapted to better fit the plots.

Table 2 .
Dependence on N and τ for the regularized l 1 -norm where y Ω = y 0 .