Multi-period Asset-liability Management with Reinforcement Learning in a Regime-Switching Market

Gao, Zhongqin; Chen, Ping; Li, Xun; Lv, Yan; Zhang, Wenhao

数学 > 优化与控制

arXiv:2509.03251 (math)

[提交于 2025年9月3日 ]

标题：多期资产负债管理在制度转换市场中的强化学习

标题： Multi-period Asset-liability Management with Reinforcement Learning in a Regime-Switching Market

Authors:Zhongqin Gao, Ping Chen, Xun Li, Yan Lv, Wenhao Zhang

摘要：本文探讨了在具有状态转换动态和不可控负债的多期金融市场中的均值-方差投资组合选择问题。为了应对金融市场中决策过程的不确定性，我们引入了强化学习（RL）技术。具体而言，该研究考察了一个探索性均值-方差（EMV）框架，其中投资者在不完全市场信息下力求最小化风险并最大化收益，受经济状态变化的影响。市场模型包括无风险资产和风险资产，负债动态由马尔可夫状态转换过程驱动。为了符合现实世界中金融决策在离散时间周期内进行的情况，我们采用了一个多期动态模型。我们提出了一种使用RL技术推导出的最优投资组合策略，该策略能够适应这些市场条件。所提出的解决方案通过整合预先承诺策略公式，解决了经典均值-方差模型中的固有时变不一致性问题。此外，我们引入了部分市场可观测性，采用随机滤波技术来估计不可观测的市场状态。数值模拟和对真实金融数据的实证测试表明，与传统模型相比，我们的方法在获得更高收益、更低风险和更快收敛方面表现出色。这些发现突显了我们的基于RL的解决方案在动态和复杂金融环境中的鲁棒性和适应性。

摘要： This paper explores the mean-variance portfolio selection problem in a multi-period financial market characterized by regime-switching dynamics and uncontrollable liabilities. To address the uncertainty in the decision-making process within the financial market, we incorporate reinforcement learning (RL) techniques. Specifically, the study examines an exploratory mean-variance (EMV) framework where investors aim to minimize risk while maximizing returns under incomplete market information, influenced by shifting economic regimes. The market model includes risk-free and risky assets, with liability dynamics driven by a Markov regime-switching process. To align with real-world scenarios where financial decisions are made over discrete time periods, we adopt a multi-period dynamic model. We present an optimal portfolio strategy derived using RL techniques that adapt to these market conditions. The proposed solution addresses the inherent time inconsistency in classical mean-variance models by integrating a pre-committed strategy formulation. Furthermore, we incorporate partial market observability, employing stochastic filtering techniques to estimate unobservable market states. Numerical simulations and empirical tests on real financial data demonstrate that our method achieves superior returns, lower risk, and faster convergence compared to traditional models. These findings highlight the robustness and adaptability of our RL-based solution in dynamic and complex financial environments.

评论：	40页，5图
主题：	优化与控制 (math.OC) ; 概率 (math.PR)
MSC 类：	91B28, 93E11, 93E20
引用方式：	arXiv:2509.03251 [math.OC]
	(或者 arXiv:2509.03251v1 [math.OC] 对于此版本)
	https://doi.org/10.48550/arXiv.2509.03251

提交历史

来自： Zhongqin Gao [查看电子邮件]
[v1] 星期三， 2025 年 9 月 3 日 12:10:49 UTC (542 KB)

数学 > 优化与控制

标题：多期资产负债管理在制度转换市场中的强化学习

标题： Multi-period Asset-liability Management with Reinforcement Learning in a Regime-Switching Market

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

数学 > 优化与控制

标题： 多期资产负债管理在制度转换市场中的强化学习 显示英文标题

标题： Multi-period Asset-liability Management with Reinforcement Learning in a Regime-Switching Market

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：多期资产负债管理在制度转换市场中的强化学习