Search | arXiv e-print repository

Leveraging Offline Data from Similar Systems for Online Linear Quadratic Control

Authors: Shivam Bajaj, Prateek Jaiswal, Vijay Gupta

Abstract: ``Sim2real gap", in which the system learned in simulations is not the exact representation of the real system, can lead to loss of stability and performance when controllers learned using data from the simulated system are used on the real system. In this work, we address this challenge in the linear quadratic regulator (LQR) setting. Specifically, we consider an LQR problem for a system with unk… ▽ More ``Sim2real gap", in which the system learned in simulations is not the exact representation of the real system, can lead to loss of stability and performance when controllers learned using data from the simulated system are used on the real system. In this work, we address this challenge in the linear quadratic regulator (LQR) setting. Specifically, we consider an LQR problem for a system with unknown system matrices. Along with the state-action pairs from the system to be controlled, a trajectory of length $S$ of state-action pairs from a different unknown system is available. Our proposed algorithm is constructed upon Thompson sampling and utilizes the mean as well as the uncertainty of the dynamics of the system from which the trajectory of length $S$ is obtained. We establish that the algorithm achieves $\tilde{\mathcal{O}}({f(S,M_δ)\sqrt{T/S}})$ Bayes regret after $T$ time steps, where $M_δ$ characterizes the \emph{dissimilarity} between the two systems and $f(S,M_δ)$ is a function of $S$ and $M_δ$. When $M_δ$ is sufficiently small, the proposed algorithm achieves $\tilde{\mathcal{O}}({\sqrt{T/S}})$ Bayes regret and outperforms a naive strategy which does not utilize the available trajectory. △ Less

Submitted 13 May, 2025; originally announced May 2025.

arXiv:2504.10437 [pdf, other]

Online Model Order Reduction of Linear Systems via $(γ,δ)$-Similarity

Authors: Shivam Bajaj, Carolyn L. Beck, Vijay Gupta

Abstract: Model order reduction aims to determine a low-order approximation of high-order models with least possible approximation errors. For application to physical systems, it is crucial that the reduced order model (ROM) is robust to any disturbance that acts on the full order model (FOM) -- in the sense that the output of the ROM remains a good approximation of that of the FOM, even in the presence of… ▽ More Model order reduction aims to determine a low-order approximation of high-order models with least possible approximation errors. For application to physical systems, it is crucial that the reduced order model (ROM) is robust to any disturbance that acts on the full order model (FOM) -- in the sense that the output of the ROM remains a good approximation of that of the FOM, even in the presence of such disturbances. In this work, we present a framework for online model order reduction for a class of continuous-time linear systems that ensures this property for any $\mathcal{L}_2$ disturbance. Apart from robustness to disturbances in this sense, the proposed framework also displays other desirable properties for model order reduction: (1) a provable bound on the error defined as the $L_2$ norm of the difference between the output of the ROM and FOM, (2) preservation of stability, (3) compositionality properties and a provable error bound for arbitrary interconnected systems, (4) a provable bound on the output of the FOM when the controller designed for the ROM is used with the FOM, and finally, (5) compatibility with existing approaches such as balanced truncation and moment matching. Property (4) does not require computation of any gap metric and property (5) is beneficial as existing approaches can also be equipped with some of the preceding properties. The theoretical results are corroborated on numerical case studies, including on a building model. △ Less

Submitted 2 May, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

arXiv:2403.12346 [pdf, other]

Shortest Trajectory of a Dubins Vehicle with a Controllable Laser

Authors: Shivam Bajaj, Bhargav Jha, Shaunak D. Bopardikar, Alexander Von Moll, David W. Casbeer

Abstract: We formulate a novel planar motion planning problem for a Dubins-Laser system that consists of a Dubins vehicle with an attached controllable laser. The vehicle moves with unit speed and the laser, having a finite range, can rotate in a clockwise or anti-clockwise direction with a bounded angular rate. From an arbitrary initial position and orientation, the objective is to steer the system so that… ▽ More We formulate a novel planar motion planning problem for a Dubins-Laser system that consists of a Dubins vehicle with an attached controllable laser. The vehicle moves with unit speed and the laser, having a finite range, can rotate in a clockwise or anti-clockwise direction with a bounded angular rate. From an arbitrary initial position and orientation, the objective is to steer the system so that a given static target is within the range of the laser and the laser is oriented at it in minimum time. We characterize multiple properties of the optimal trajectory and establish that the optimal trajectory for the Dubins-laser system is one out of a total of 16 candidates. Finally, we provide numerical insights that illustrate the properties characterized in this work. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2402.08747 [pdf, other]

Rationality of Learning Algorithms in Repeated Normal-Form Games

Authors: Shivam Bajaj, Pranoy Das, Yevgeniy Vorobeychik, Vijay Gupta

Abstract: Many learning algorithms are known to converge to an equilibrium for specific classes of games if the same learning algorithm is adopted by all agents. However, when the agents are self-interested, a natural question is whether agents have a strong incentive to adopt an alternative learning algorithm that yields them greater individual utility. We capture such incentives as an algorithm's rational… ▽ More Many learning algorithms are known to converge to an equilibrium for specific classes of games if the same learning algorithm is adopted by all agents. However, when the agents are self-interested, a natural question is whether agents have a strong incentive to adopt an alternative learning algorithm that yields them greater individual utility. We capture such incentives as an algorithm's rationality ratio, which is the ratio of the highest payoff an agent can obtain by deviating from a learning algorithm to its payoff from following it. We define a learning algorithm to be $c$-rational if its rationality ratio is at most $c$ irrespective of the game. We first establish that popular learning algorithms such as fictitious play and regret matching are not $c$-rational for any constant $c\geq 1$. We then propose and analyze two algorithms that are provably $1$-rational under mild assumptions, and have the same properties as (a generalized version of) fictitious play and regret matching, respectively, if all agents follow them. Finally, we show that if an assumption of perfect monitoring is not satisfied, there are games for which $c$-rational algorithms do not exist, and illustrate our results with numerical case studies. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2302.02186 [pdf, other]

Perimeter Defense using a Turret with Finite Range and Service Times

Authors: Shivam Bajaj, Shaunak D. Bopardikar, Alexander Von Moll, Eric Torng, David W. Casbeer

Abstract: We consider a perimeter defense problem in a planar conical environment comprising a single turret that has a finite range and non-zero service time. The turret seeks to defend a concentric perimeter against $N\geq 2$ intruders. Upon release, each intruder moves radially towards the perimeter with a fixed speed. To capture an intruder, the turret's angle must be aligned with that of the intruder's… ▽ More We consider a perimeter defense problem in a planar conical environment comprising a single turret that has a finite range and non-zero service time. The turret seeks to defend a concentric perimeter against $N\geq 2$ intruders. Upon release, each intruder moves radially towards the perimeter with a fixed speed. To capture an intruder, the turret's angle must be aligned with that of the intruder's angle and must spend a specified service time at that orientation. We address offline and online versions of this optimization problem. Specifically, in the offline version, we establish that in general parameter regimes, this problem is equivalent to solving a Travelling Repairperson Problem with Time Windows (TRP-TW). We then identify specific parameter regimes in which there is a polynomial time algorithm that maximizes the number of intruders captured. In the online version, we present a competitive analysis technique in which we establish a fundamental guarantee on the existence of at best $(N-1)$-competitive algorithms. We also design two online algorithms that are provably $1$ and $2$-competitive in specific parameter regimes. △ Less

Submitted 4 February, 2023; originally announced February 2023.

arXiv:2110.04667 [pdf, other]

Competitive Perimeter Defense of Conical Environments

Authors: Shivam Bajaj, Eric Torng, Shaunak D. Bopardikar, Alexander Von Moll, Isaac Weintraub, Eloy Garcia, David W. Casbeer

Abstract: We consider a perimeter defense problem in a planar conical environment in which a single vehicle, having a finite capture radius, aims to defend a concentric perimeter from mobile intruders. The intruders are arbitrarily released at the circumference of the environment and they move radially toward the perimeter with fixed speed. We present a competitive analysis approach to this problem by measu… ▽ More We consider a perimeter defense problem in a planar conical environment in which a single vehicle, having a finite capture radius, aims to defend a concentric perimeter from mobile intruders. The intruders are arbitrarily released at the circumference of the environment and they move radially toward the perimeter with fixed speed. We present a competitive analysis approach to this problem by measuring the performance of multiple online algorithms for the vehicle against arbitrary inputs, relative to an optimal offline algorithm that has information about entire input sequence in advance. In particular, we establish two necessary conditions on the parameter space to guarantee (i) finite competitiveness of any algorithm and (ii) a competitive ratio of at least 2 for any algorithm. We then design and analyze three online algorithms and characterize parameter regimes in which they have finite competitive ratios. Specifically, our first two algorithms are provably 1, and 2-competitive, respectively, whereas our third algorithm exhibits different competitive ratios in different regimes of problem parameters. Finally, we provide a numerical plot in the parameter space to reveal additional insights into the relative performance of our algorithms. △ Less

Submitted 29 March, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

Comments: Version 2 has additional images

arXiv:2108.04663 [pdf]

doi 10.1111/coin.12476

Deep Learning for Breast Cancer Classification: Enhanced Tangent Function

Authors: Ashu Thapa, Abeer Alsadoon, P. W. C. Prasad, Simi Bajaj, Omar Hisham Alsadoon, Tarik A. Rashid, Rasha S. Ali, Oday D. Jerew

Abstract: Background and Aim: Recently, deep learning using convolutional neural network has been used successfully to classify the images of breast cells accurately. However, the accuracy of manual classification of those histopathological images is comparatively low. This research aims to increase the accuracy of the classification of breast cancer images by utilizing a Patch-Based Classifier (PBC) along… ▽ More Background and Aim: Recently, deep learning using convolutional neural network has been used successfully to classify the images of breast cells accurately. However, the accuracy of manual classification of those histopathological images is comparatively low. This research aims to increase the accuracy of the classification of breast cancer images by utilizing a Patch-Based Classifier (PBC) along with deep learning architecture. Methodology: The proposed system consists of a Deep Convolutional Neural Network (DCNN) that helps in enhancing and increasing the accuracy of the classification process. This is done by the use of the Patch-based Classifier (PBC). CNN has completely different layers where images are first fed through convolutional layers using hyperbolic tangent function together with the max-pooling layer, drop out layers, and SoftMax function for classification. Further, the output obtained is fed to a patch-based classifier that consists of patch-wise classification output followed by majority voting. Results: The results are obtained throughout the classification stage for breast cancer images that are collected from breast-histology datasets. The proposed solution improves the accuracy of classification whether or not the images had normal, benign, in-situ, or invasive carcinoma from 87% to 94% with a decrease in processing time from 0.45 s to 0.2s on average. Conclusion: The proposed solution focused on increasing the accuracy of classifying cancer in the breast by enhancing the image contrast and reducing the vanishing gradient. Finally, this solution for the implementation of the Contrast Limited Adaptive Histogram Equalization (CLAHE) technique and modified tangent function helps in increasing the accuracy. △ Less

Submitted 1 July, 2021; originally announced August 2021.

Comments: 19

Journal ref: Computational Intelligence, 2021

arXiv:2106.10514 [pdf, other]

Cooperative Evasion by Translating Targets with Variable Speeds

Authors: Shivam Bajaj, Eloy Garcia, Shaunak D. Bopardikar

Abstract: We consider a problem of cooperative evasion between a single pursuer and multiple evaders in which the evaders are constrained to move in the positive Y direction. The evaders are slower than the vehicle and can choose their speeds from a bounded interval. The pursuer aims to intercept all evaders in a given sequence by executing a Manhattan pursuit strategy of moving parallel to the X axis, foll… ▽ More We consider a problem of cooperative evasion between a single pursuer and multiple evaders in which the evaders are constrained to move in the positive Y direction. The evaders are slower than the vehicle and can choose their speeds from a bounded interval. The pursuer aims to intercept all evaders in a given sequence by executing a Manhattan pursuit strategy of moving parallel to the X axis, followed by moving parallel to the Y axis. The aim of the evaders is to cooperatively pick their individual speeds so that the total time to intercept all evaders is maximized. We first obtain conditions under which evaders should cooperate in order to maximize the total time to intercept as opposed to each moving greedily to optimize its own intercept time. Then, we propose and analyze an algorithm that assigns evasive strategies to the evaders in two iterations as opposed to performing an exponential search over the choice of evader speeds. We also characterize a fundamental limit on the total time taken by the pursuer to capture all evaders when the number of evaders is large. Finally, we provide numerical comparisons against random sampling heuristics. △ Less

Submitted 5 July, 2021; v1 submitted 19 June, 2021; originally announced June 2021.

arXiv:2103.11787 [pdf, other]

Competitive Perimeter Defense on a Line

Authors: Shivam Bajaj, Eric Torng, Shaunak D. Bopardikar

Abstract: We consider a perimeter defense problem in which a single vehicle seeks to defend a compact region from intruders in a one-dimensional environment parameterized by the perimeter size and the intruder-to-vehicle speed ratio. The intruders move inward with fixed speed and direction to reach the perimeter. We provide both positive and negative worst-case performance results over the parameter space u… ▽ More We consider a perimeter defense problem in which a single vehicle seeks to defend a compact region from intruders in a one-dimensional environment parameterized by the perimeter size and the intruder-to-vehicle speed ratio. The intruders move inward with fixed speed and direction to reach the perimeter. We provide both positive and negative worst-case performance results over the parameter space using competitive analysis. We first establish fundamental limits by identifying the most difficult parameter combinations that admit no $c$-competitive algorithms for any constant $c\geq 1$ and slightly easier parameter combinations in which every algorithm is at best $2$-competitive. We then design three classes of algorithms and prove they are $1$, $2$, and $4$-competitive, respectively, for increasingly difficult parameter combinations. Finally, we present numerical studies that provide insights into the performance of these algorithms against stochastically generated intruders. △ Less

Submitted 22 March, 2021; originally announced March 2021.

arXiv:1909.01855 [pdf, other]

Dynamic Boundary Guarding Against Radially Incoming Targets

Authors: Shivam Bajaj, Shaunak D. Bopardikar

Abstract: We introduce a dynamic vehicle routing problem in which a single vehicle seeks to guard a circular perimeter against radially inward moving targets. Targets are generated uniformly as per a Poisson process in time with a fixed arrival rate on the boundary of a circle with a larger radius and concentric with the perimeter. Upon generation, each target moves radially inward toward the perimeter with… ▽ More We introduce a dynamic vehicle routing problem in which a single vehicle seeks to guard a circular perimeter against radially inward moving targets. Targets are generated uniformly as per a Poisson process in time with a fixed arrival rate on the boundary of a circle with a larger radius and concentric with the perimeter. Upon generation, each target moves radially inward toward the perimeter with a fixed speed. The aim of the vehicle is to maximize the capture fraction, i.e., the fraction of targets intercepted before they enter the perimeter. We first obtain a fundamental upper bound on the capture fraction which is independent of any policy followed by the vehicle. We analyze several policies in the low and high arrival rates of target generation. For low arrival, we propose and analyze a First-Come-First-Served and a Look-Ahead policy based on repeated computation of the path that passes through maximum number of unintercepted targets. For high arrival, we design and analyze a policy based on repeated computation of Euclidean Minimum Hamiltonian path through a fraction of existing targets and show that it is within a constant factor of the optimal. Finally, we provide a numerical study of the performance of the policies in parameter regimes beyond the scope of the analysis. △ Less

Submitted 4 September, 2019; originally announced September 2019.

Showing 1–10 of 10 results for author: Bajaj, S