Scalable regret for learning to control network-coupled subsystems with unknown dynamics

Sudhakara, Sagar; Mahajan, Aditya; Nayyar, Ashutosh; Ouyang, Yi

电气工程与系统科学 > 系统与控制

arXiv:2108.07970 (eess)

[提交于 2021年8月18日 ]

标题：具有未知动态的网络耦合子系统学习控制的可扩展遗憾

标题： Scalable regret for learning to control network-coupled subsystems with unknown dynamics

Authors:Sagar Sudhakara, Aditya Mahajan, Ashutosh Nayyar, Yi Ouyang

摘要：我们考虑控制一个未知的线性二次高斯（LQG）系统的问题，该系统由多个通过网络连接的子系统组成。我们的目标是相对于一个了解系统模型的先知策略，最小化并量化我们的策略的遗憾（即性能损失）。从全局视角直接使用现有的LQG学习算法来处理整个系统会导致遗憾随着子系统数量的增加而超线性增长。相反，我们提出了一种基于汤普森采样的新学习算法，该算法利用了底层网络的结构。我们证明了所提出的算法的期望遗憾被$\tilde{\mathcal{O}} \big( n \sqrt{T} \big)$所限制，其中$n$是子系统的数量，$T$是时间范围，$\tilde{\mathcal{O}}(\cdot)$符号隐藏了与$n$和$T$相关的对数项。因此，遗憾与子系统的数量成线性关系。我们进行了数值实验以说明所提出算法的显著特征。

摘要： We consider the problem of controlling an unknown linear quadratic Gaussian (LQG) system consisting of multiple subsystems connected over a network. Our goal is to minimize and quantify the regret (i.e. loss in performance) of our strategy with respect to an oracle who knows the system model. Viewing the interconnected subsystems globally and directly using existing LQG learning algorithms for the global system results in a regret that increases super-linearly with the number of subsystems. Instead, we propose a new Thompson sampling based learning algorithm which exploits the structure of the underlying network. We show that the expected regret of the proposed algorithm is bounded by $\tilde{\mathcal{O}} \big( n \sqrt{T} \big)$ where $n$ is the number of subsystems, $T$ is the time horizon and the $\tilde{\mathcal{O}}(\cdot)$ notation hides logarithmic terms in $n$ and $T$. Thus, the regret scales linearly with the number of subsystems. We present numerical experiments to illustrate the salient features of the proposed algorithm.

评论：	12页
主题：	系统与控制 (eess.SY) ; 人工智能 (cs.AI); 优化与控制 (math.OC)
引用方式：	arXiv:2108.07970 [eess.SY]
	(或者 arXiv:2108.07970v1 [eess.SY] 对于此版本)
	https://doi.org/10.48550/arXiv.2108.07970

提交历史

来自： Aditya Mahajan [查看电子邮件]
[v1] 星期三， 2021 年 8 月 18 日 04:45:34 UTC (538 KB)

电气工程与系统科学 > 系统与控制

标题：具有未知动态的网络耦合子系统学习控制的可扩展遗憾

标题： Scalable regret for learning to control network-coupled subsystems with unknown dynamics

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

电气工程与系统科学 > 系统与控制

标题： 具有未知动态的网络耦合子系统学习控制的可扩展遗憾 显示英文标题

标题： Scalable regret for learning to control network-coupled subsystems with unknown dynamics

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：具有未知动态的网络耦合子系统学习控制的可扩展遗憾