数学物理学报 ›› 2022, Vol. 42 ›› Issue (2): 594-604.

• 论文 • 上一篇    下一篇

Polish空间上的折扣马氏过程量子化策略的渐近优化

吴晓1,孔荫莹2,*(),郭圳滨3   

  1. 1 肇庆学院数学与统计学院 广东肇庆 526061
    2 广东财经大学智能财会管理学院 广州 510320
    3 广发证券股份有限公司发展研究中心 上海 200120
  • 收稿日期:2021-03-04 出版日期:2022-04-26 发布日期:2022-04-18
  • 通讯作者: 孔荫莹 E-mail:kongcoco@hotmail.com
  • 基金资助:
    国家自然科学基金(11961005);中山大学广东省计算科学重点实验室开放基金(2021021);广东省普通高校重点领域(新一代信息技术)基金(2020ZDZX3019);广州市科技计划项目(202102080420)

Asymptotic Optimality of Quantized Stationary Policies in Continuous-Time Markov Decision Processes with Polish Spaces

Xiao Wu1,Yinying Kong2,*(),Zhenbin Guo3   

  1. 1 School of Mathematics and Statistics, Zhaoqing University, Guangdong Zhaoqing 526061
    2 School of Intelligence Financial & Accounting Management, Guangdong University of Finance and Economics, Guangzhou 510320
    3 Development Research Center, GF Securities Co Ltd, Shanghai 200120
  • Received:2021-03-04 Online:2022-04-26 Published:2022-04-18
  • Contact: Yinying Kong E-mail:kongcoco@hotmail.com
  • Supported by:
    the NSFC(11961005);the Opening Project of Guangdong Province Key Laboratory of Computational Science at Sun Yat-sen University(2021021);the Guangdong University (New Generation Information Technology) Key Field Project(2020ZDZX3019);the Guangzhou Science and Technology Plan Project(202102080420)

摘要:

该文研究了Polish空间上、带折扣因子的连续时间马尔可夫决策过程(CTMDPs) 的量子化平稳策略的渐近最优性问题. 首先, 建立了折扣最优方程(DOE) 及其解的存在性和唯一性. 其次, 在适当的条件下证明了最优确定性平稳策略的存在性. 此外, 为了对行动空间进行离散化, 构造了一列量子化策略, 利用有限行动空间的策略来逼近一般(Polish) 空间上的折扣CTMDPs最优平稳策略. 最后, 通过一个例子来说明该文的渐近逼近结果.

关键词: 连续时间马尔可夫决策过程, 依赖状态折扣因子, 折扣准则, 量子化平稳策略, 渐近最优性

Abstract:

In this paper, we study the asymptotic optimality of the quantized stationary policies for continuous-time Markov decision processes (CTMDPs) with Polish space and state-dependent discount factors. Firstly, the existence and uniqueness of the discounted optimal equation (DOE) and its solution are established. Secondly, the existence of the optimal deterministic stationary policies is proved under appropriate conditions. In addition, in order to discretize the action space, a series of quantization policies are constructed to approximate the optimal stationary policies of the discounted CTMDPs in general state (Polish) space by using the policies in finite action space. Finally, an example is given to illustrate the asymptotic approximation results of this paper.

Key words: Continuous-time Markov decision processes, State-dependent discount factors, Discounted criterion, Quantized stationary policies, Asymptotic optimality

中图分类号: 

  • O211.6