Learning Based Control Policy and Regret Analysis for Online Quadratic Optimization with Asymmetric Information Structure

Tan, Cheng; Wong, Wing Shing

Mathematics > Optimization and Control

arXiv:1811.00729 (math)

[Submitted on 2 Nov 2018]

Title:Learning Based Control Policy and Regret Analysis for Online Quadratic Optimization with Asymmetric Information Structure

Authors:Cheng Tan, Wing Shing Wong

View PDF

Abstract:In this paper, we propose a learning approach to analyze dynamic systems with asymmetric information structure. Instead of adopting a game theoretic setting, we investigate an online quadratic optimization problem driven by system noises with unknown statistics. Due to information asymmetry, it is infeasible to use classic Kalman filter nor optimal control strategies for such systems. It is necessary and beneficial to develop a robust approach that learns the probability statistics as time goes forward. Motivated by online convex optimization (OCO) theory, we introduce the notion of regret, which is defined as the cumulative performance loss difference between the optimal offline known statistics cost and the optimal online unknown statistics cost. By utilizing dynamic programming and linear minimum mean square biased estimate (LMMSUE), we propose a new type of online state feedback control policies and characterize the behavior of regret in finite time regime. The regret is shown to be sub-linear and bounded by O(ln T). Moreover, we address an online optimization problem with output feedback control policies.

Subjects:	Optimization and Control (math.OC)
Cite as:	arXiv:1811.00729 [math.OC]
	(or arXiv:1811.00729v1 [math.OC] for this version)
	https://doi.org/10.48550/arXiv.1811.00729

Submission history

From: Cheng Tan [view email]
[v1] Fri, 2 Nov 2018 04:07:22 UTC (194 KB)

Mathematics > Optimization and Control

Title:Learning Based Control Policy and Regret Analysis for Online Quadratic Optimization with Asymmetric Information Structure

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Mathematics > Optimization and Control

Title:Learning Based Control Policy and Regret Analysis for Online Quadratic Optimization with Asymmetric Information Structure

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators