Issue |
ESAIM: COCV
Volume 30, 2024
|
|
---|---|---|
Article Number | 26 | |
Number of page(s) | 64 | |
DOI | https://doi.org/10.1051/cocv/2024014 | |
Published online | 09 April 2024 |
Policy gradient methods for discrete time linear quadratic regulator with random parameters
Shanghai Center for Mathematical Sciences, Fudan University, Shanghai 200433, PR China
* Corresponding author: lidy21@m.fudan.edu.cn
Received:
29
March
2023
Accepted:
27
February
2024
This paper studies an infinite horizon optimal control problem for discrete-time linear system and quadratic criteria, both with random parameters which are independent and identically distributed with respect to time. In this general setting, we apply the policy gradient method, a reinforcement learning technique, to search for the optimal control without requiring knowledge of statistical information of the parameters. We investigate the sub-Gaussianity of the state process and establish global linear convergence guarantee for this approach based on assumptions that are weaker and easier to verify compared to existing results. Numerical experiments are presented to illustrate our result.
Mathematics Subject Classification: 49N10 / 68W40 / 93E35
Key words: Linear quadratic optimal control / random parameters / reinforcement learning / model-free policy gradient method / sub-Gaussianity
© The authors. Published by EDP Sciences, SMAI 2024
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.