Policy gradient methods for discrete time linear quadratic regulator with random parameters | ESAIM: Control, Optimisation and Calculus of Variations (ESAIM: COCV)

Open Access

Issue		ESAIM: COCV Volume 30, 2024


Article Number		26
Number of page(s)		64
DOI		https://doi.org/10.1051/cocv/2024014
Published online		09 April 2024

R.E. Kalman, Control of randomly varying linear dynamical systems. Proc. Symposia Appl. Math. (1961) 287–298. [Google Scholar]
A.R. Tiedemann and W.L. De Koning, The equivalent discrete-time optimal control problem for continuous-time systems with stochastic parameters. Int. J. Control 40 (1984) 449–466. [CrossRef] [Google Scholar]
R. Drenick and L. Shaw, Optimal control of linear plants with random parameters. IEEE Trans. Automatic Control 9 (1964) 236–244. [CrossRef] [Google Scholar]
M. Aoki, Optimal Control and System Theory in Dynamic Economic Analysis. Vol. 1 of A Series of Volumes in Dynamic Economics: Theory and Applications. North Holland Publishing Company (1976). [Google Scholar]
M. Athans, R. Ku and S. Gershwin, The uncertainty threshold principle: some fundamental limitations of optimal decision making under dynamic uncertainty. IEEE Trans. Automatic Control 22 (1977) 491–495. [CrossRef] [MathSciNet] [Google Scholar]
W.L. De Koning, Infinite horizon optimal control of linear discrete time systems with stochastic parameters. Automatica 18 (1982) 443–453. [CrossRef] [Google Scholar]
T. Morozan, Stabilization of some stochastic discrete–time control systems. Stochastic Anal. Applic. 1 (1983) 89–116. [CrossRef] [Google Scholar]
A. Beghi and D. D’alessandro, Discrete-time optimal control with control-dependent noise and generalized riccati difference equations. Automatica 34 (1988) 1031–1034. [CrossRef] [MathSciNet] [Google Scholar]
M. Fazel, R. Ge, S. Kakade and M. Mesbahi, Global convergence of policy gradient methods for the linear quadratic regulator, in International Conference on Machine Learning. PMLR (2018) 1467–1476. [Google Scholar]
S. Tu and B. Recht, The gap between model-based and model-free methods on the linear quadratic regulator: an asymptotic viewpoint, in Conference on Learning Theory. PMLR (2019) 3036–3083. [Google Scholar]
B. Gravell, P.M. Esfahani and T. Summers, Learning optimal controllers for linear systems with multiplicative noise via policy gradient. IEEE Trans. Automatic Control 66 (2020) 5283–5298. [Google Scholar]
B. Hambly, R. Xu and H. Yang, Policy gradient methods for the noisy linear quadratic regulator over a finite horizon. SIAM J. Control Optim. 59 (2021) 3359–3391. [CrossRef] [MathSciNet] [Google Scholar]
K. Du, Q. Meng and F. Zhang, A q-learning algorithm for discrete-time linear-quadratic control with random parameters of unknown distribution: convergence and stabilization. SIAM J. Control Optim. 60 (2022) 1991–2015. [CrossRef] [MathSciNet] [Google Scholar]
J. Lai, J. Xiong and Z. Shu, Model-free optimal control of discrete-time systems with additive and multiplicative noises. Automatica 147 (2023) 110685. [CrossRef] [Google Scholar]
M. Simchowitz, H. Mania, S. Tu, M.I. Jordan and B. Recht, Learning without mixing: Towards a sharp analysis of linear system Identification, in Conference On Learning Theory. PMLR (2018) 439–473. [Google Scholar]
S. Dean, H. Mania, N. Matni, B. Recht and S. Tu, On the sample complexity of the linear quadratic regulator. Found. Computat. Math. 20 (2020) 633–679. [CrossRef] [Google Scholar]
R. Ku and M. Athans, Further results on the uncertainty threshold principle. IEEE Trans. Automatic Control 22 (1977) 866–868. [CrossRef] [MathSciNet] [Google Scholar]
R. Vershynin, High-dimensional Probability: AN Introduction with Applications in Data Science, Vol. 47. Cambridge University Press (2018). [Google Scholar]
R.A Horn and C.R. Johnson, Matrix Analysis. Cambridge University Press (2012). [CrossRef] [Google Scholar]
M.J. Wainwright, High-dimensional Statistics: A Non-asymptotic Viewpoint, Vol. 48. Cambridge University Press (2019). [Google Scholar]

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.