-
A Networked Multi-Agent System for Mobile Wireless Infrastructure on Demand
Authors:
Miguel Calvo-Fullana,
Mikhail Gerasimenko,
Daniel Mox,
Leopoldo Agorio,
Mariana del Castillo,
Vijay Kumar,
Alejandro Ribeiro,
Juan Andres Bazerque
Abstract:
Despite the prevalence of wireless connectivity in urban areas around the globe, there remain numerous and diverse situations where connectivity is insufficient or unavailable. To address this, we introduce mobile wireless infrastructure on demand, a system of UAVs that can be rapidly deployed to establish an ad-hoc wireless network. This network has the capability of reconfiguring itself dynamica…
▽ More
Despite the prevalence of wireless connectivity in urban areas around the globe, there remain numerous and diverse situations where connectivity is insufficient or unavailable. To address this, we introduce mobile wireless infrastructure on demand, a system of UAVs that can be rapidly deployed to establish an ad-hoc wireless network. This network has the capability of reconfiguring itself dynamically to satisfy and maintain the required quality of communication. The system optimizes the positions of the UAVs and the routing of data flows throughout the network to achieve this quality of service (QoS). By these means, task agents using the network simply request a desired QoS, and the system adapts accordingly while allowing them to move freely. We have validated this system both in simulation and in real-world experiments. The results demonstrate that our system effectively offers mobile wireless infrastructure on demand, extending the operational range of task agents and supporting complex mobility patterns, all while ensuring connectivity and being resilient to agent failures.
△ Less
Submitted 16 September, 2024; v1 submitted 14 June, 2023;
originally announced June 2023.
-
Mission-Aware Value of Information Censoring for Distributed Filtering
Authors:
Miguel Calvo-Fullana,
Jonathan P. How
Abstract:
In this paper, we study the problem of distributed estimation with an emphasis on communication-efficiency. The proposed algorithm is based on a windowed maximum a posteriori (MAP) estimation problem, wherein each agent in the network locally computes a Kalman-like filter estimate that approximates the centralized MAP solution. Information sharing among agents is restricted to their neighbors only…
▽ More
In this paper, we study the problem of distributed estimation with an emphasis on communication-efficiency. The proposed algorithm is based on a windowed maximum a posteriori (MAP) estimation problem, wherein each agent in the network locally computes a Kalman-like filter estimate that approximates the centralized MAP solution. Information sharing among agents is restricted to their neighbors only, with guarantees on overall estimate consistency provided via logarithmic opinion pooling. The problem is efficiently distributed using the alternating direction method of multipliers (ADMM), whose overall communication usage is further reduced by a value of information (VoI) censoring mechanism, wherein agents only transmit their primal-dual iterates when deemed valuable to do so. The proposed censoring mechanism is mission-aware, enabling a globally efficient use of communication resources while guaranteeing possibly different local estimation requirements. To illustrate the validity of the approach we perform simulations in a target tracking scenario.
△ Less
Submitted 20 November, 2022;
originally announced November 2022.
-
Distributed Filtering with Value of Information Censoring
Authors:
Miguel Calvo-Fullana,
Jonathan P. How
Abstract:
This work presents a distributed estimation algorithm that efficiently uses the available communication resources. The approach is based on Bayesian filtering that is distributed across a network by using the logarithmic opinion pool operator. Communication efficiency is achieved by having only agents with high Value of Information (VoI) share their estimates, and the algorithm provides a tunable…
▽ More
This work presents a distributed estimation algorithm that efficiently uses the available communication resources. The approach is based on Bayesian filtering that is distributed across a network by using the logarithmic opinion pool operator. Communication efficiency is achieved by having only agents with high Value of Information (VoI) share their estimates, and the algorithm provides a tunable trade-off between communication resources and estimation error. Under linear-Gaussian models the algorithm takes the form of a censored distributed Information filter, which guarantees the consistency of agent estimates. Importantly, consistent estimates are shown to play a crucial role in enabling the large reductions in communication usage provided by the VoI censoring approach. We verify the performance of the proposed method via complex simulations in a dynamic network topology and by experimental validation over a real ad-hoc wireless communication network. The results show the validity of using the proposed method to drastically reduce the communication costs of distributed estimation tasks.
△ Less
Submitted 1 April, 2022;
originally announced April 2022.
-
Distributed Riemannian Optimization with Lazy Communication for Collaborative Geometric Estimation
Authors:
Yulun Tian,
Amrit Singh Bedi,
Alec Koppel,
Miguel Calvo-Fullana,
David M. Rosen,
Jonathan P. How
Abstract:
We present the first distributed optimization algorithm with lazy communication for collaborative geometric estimation, the backbone of modern collaborative simultaneous localization and mapping (SLAM) and structure-from-motion (SfM) applications. Our method allows agents to cooperatively reconstruct a shared geometric model on a central server by fusing individual observations, but without the ne…
▽ More
We present the first distributed optimization algorithm with lazy communication for collaborative geometric estimation, the backbone of modern collaborative simultaneous localization and mapping (SLAM) and structure-from-motion (SfM) applications. Our method allows agents to cooperatively reconstruct a shared geometric model on a central server by fusing individual observations, but without the need to transmit potentially sensitive information about the agents themselves (such as their locations). Furthermore, to alleviate the burden of communication during iterative optimization, we design a set of communication triggering conditions that enable agents to selectively upload a targeted subset of local information that is useful to global optimization. Our approach thus achieves significant communication reduction with minimal impact on optimization performance. As our main theoretical contribution, we prove that our method converges to first-order critical points with a global sublinear convergence rate. Numerical evaluations on bundle adjustment problems from collaborative SLAM and SfM datasets show that our method performs competitively against existing distributed techniques, while achieving up to 78% total communication reduction.
△ Less
Submitted 29 July, 2022; v1 submitted 1 March, 2022;
originally announced March 2022.
-
Constrained Learning with Non-Convex Losses
Authors:
Luiz F. O. Chamon,
Santiago Paternain,
Miguel Calvo-Fullana,
Alejandro Ribeiro
Abstract:
Though learning has become a core component of modern information processing, there is now ample evidence that it can lead to biased, unsafe, and prejudiced systems. The need to impose requirements on learning is therefore paramount, especially as it reaches critical applications in social, industrial, and medical domains. However, the non-convexity of most modern statistical problems is only exac…
▽ More
Though learning has become a core component of modern information processing, there is now ample evidence that it can lead to biased, unsafe, and prejudiced systems. The need to impose requirements on learning is therefore paramount, especially as it reaches critical applications in social, industrial, and medical domains. However, the non-convexity of most modern statistical problems is only exacerbated by the introduction of constraints. Whereas good unconstrained solutions can often be learned using empirical risk minimization, even obtaining a model that satisfies statistical constraints can be challenging. All the more so, a good one. In this paper, we overcome this issue by learning in the empirical dual domain, where constrained statistical learning problems become unconstrained and deterministic. We analyze the generalization properties of this approach by bounding the empirical duality gap -- i.e., the difference between our approximate, tractable solution and the solution of the original (non-convex) statistical problem -- and provide a practical constrained learning algorithm. These results establish a constrained counterpart to classical learning theory, enabling the explicit use of constraints in learning. We illustrate this theory and algorithm in rate-constrained learning applications arising in fairness and adversarial robustness.
△ Less
Submitted 19 October, 2022; v1 submitted 8 March, 2021;
originally announced March 2021.
-
Towards Safe Continuing Task Reinforcement Learning
Authors:
Miguel Calvo-Fullana,
Luiz F. O. Chamon,
Santiago Paternain
Abstract:
Safety is a critical feature of controller design for physical systems. When designing control policies, several approaches to guarantee this aspect of autonomy have been proposed, such as robust controllers or control barrier functions. However, these solutions strongly rely on the model of the system being available to the designer. As a parallel development, reinforcement learning provides mode…
▽ More
Safety is a critical feature of controller design for physical systems. When designing control policies, several approaches to guarantee this aspect of autonomy have been proposed, such as robust controllers or control barrier functions. However, these solutions strongly rely on the model of the system being available to the designer. As a parallel development, reinforcement learning provides model-agnostic control solutions but in general, it lacks the theoretical guarantees required for safety. Recent advances show that under mild conditions, control policies can be learned via reinforcement learning, which can be guaranteed to be safe by imposing these requirements as constraints of an optimization problem. However, to transfer from learning safety to learning safely, there are two hurdles that need to be overcome: (i) it has to be possible to learn the policy without having to re-initialize the system; and (ii) the rollouts of the system need to be in themselves safe. In this paper, we tackle the first issue, proposing an algorithm capable of operating in the continuing task setting without the need of restarts. We evaluate our approach in a numerical example, which shows the capabilities of the proposed approach in learning safe policies via safe exploration.
△ Less
Submitted 24 February, 2021;
originally announced February 2021.
-
State Augmented Constrained Reinforcement Learning: Overcoming the Limitations of Learning with Rewards
Authors:
Miguel Calvo-Fullana,
Santiago Paternain,
Luiz F. O. Chamon,
Alejandro Ribeiro
Abstract:
A common formulation of constrained reinforcement learning involves multiple rewards that must individually accumulate to given thresholds. In this class of problems, we show a simple example in which the desired optimal policy cannot be induced by any weighted linear combination of rewards. Hence, there exist constrained reinforcement learning problems for which neither regularized nor classical…
▽ More
A common formulation of constrained reinforcement learning involves multiple rewards that must individually accumulate to given thresholds. In this class of problems, we show a simple example in which the desired optimal policy cannot be induced by any weighted linear combination of rewards. Hence, there exist constrained reinforcement learning problems for which neither regularized nor classical primal-dual methods yield optimal policies. This work addresses this shortcoming by augmenting the state with Lagrange multipliers and reinterpreting primal-dual methods as the portion of the dynamics that drives the multipliers evolution. This approach provides a systematic state augmentation procedure that is guaranteed to solve reinforcement learning problems with constraints. Thus, as we illustrate by an example, while previous methods can fail at finding optimal policies, running the dual dynamics while executing the augmented policy yields an algorithm that provably samples actions from the optimal policy.
△ Less
Submitted 21 September, 2023; v1 submitted 23 February, 2021;
originally announced February 2021.
-
The empirical duality gap of constrained statistical learning
Authors:
Luiz F. O. Chamon,
Santiago Paternain,
Miguel Calvo-Fullana,
Alejandro Ribeiro
Abstract:
This paper is concerned with the study of constrained statistical learning problems, the unconstrained version of which are at the core of virtually all of modern information processing. Accounting for constraints, however, is paramount to incorporate prior knowledge and impose desired structural and statistical properties on the solutions. Still, solving constrained statistical problems remains c…
▽ More
This paper is concerned with the study of constrained statistical learning problems, the unconstrained version of which are at the core of virtually all of modern information processing. Accounting for constraints, however, is paramount to incorporate prior knowledge and impose desired structural and statistical properties on the solutions. Still, solving constrained statistical problems remains challenging and guarantees scarce, leaving them to be tackled using regularized formulations. Though practical and effective, selecting regularization parameters so as to satisfy requirements is challenging, if at all possible, due to the lack of a straightforward relation between parameters and constraints. In this work, we propose to directly tackle the constrained statistical problem overcoming its infinite dimensionality, unknown distributions, and constraints by leveraging finite dimensional parameterizations, sample averages, and duality theory. Aside from making the problem tractable, these tools allow us to bound the empirical duality gap, i.e., the difference between our approximate tractable solutions and the actual solutions of the original statistical problem. We demonstrate the effectiveness and usefulness of this constrained formulation in a fair learning application.
△ Less
Submitted 12 February, 2020;
originally announced February 2020.
-
Safe Policies for Reinforcement Learning via Primal-Dual Methods
Authors:
Santiago Paternain,
Miguel Calvo-Fullana,
Luiz F. O. Chamon,
Alejandro Ribeiro
Abstract:
In this paper, we study the learning of safe policies in the setting of reinforcement learning problems. This is, we aim to control a Markov Decision Process (MDP) of which we do not know the transition probabilities, but we have access to sample trajectories through experience. We define safety as the agent remaining in a desired safe set with high probability during the operation time. We theref…
▽ More
In this paper, we study the learning of safe policies in the setting of reinforcement learning problems. This is, we aim to control a Markov Decision Process (MDP) of which we do not know the transition probabilities, but we have access to sample trajectories through experience. We define safety as the agent remaining in a desired safe set with high probability during the operation time. We therefore consider a constrained MDP where the constraints are probabilistic. Since there is no straightforward way to optimize the policy with respect to the probabilistic constraint in a reinforcement learning framework, we propose an ergodic relaxation of the problem. The advantages of the proposed relaxation are threefold. (i) The safety guarantees are maintained in the case of episodic tasks and they are kept up to a given time horizon for continuing tasks. (ii) The constrained optimization problem despite its non-convexity has arbitrarily small duality gap if the parametrization of the policy is rich enough. (iii) The gradients of the Lagrangian associated with the safe-learning problem can be easily computed using standard policy gradient results and stochastic approximation tools. Leveraging these advantages, we establish that primal-dual algorithms are able to find policies that are safe and optimal. We test the proposed approach in a navigation task in a continuous domain. The numerical results show that our algorithm is capable of dynamically adapting the policy to the environment and the required safety levels.
△ Less
Submitted 12 January, 2022; v1 submitted 20 November, 2019;
originally announced November 2019.
-
Constrained Reinforcement Learning Has Zero Duality Gap
Authors:
Santiago Paternain,
Luiz F. O. Chamon,
Miguel Calvo-Fullana,
Alejandro Ribeiro
Abstract:
Autonomous agents must often deal with conflicting requirements, such as completing tasks using the least amount of time/energy, learning multiple tasks, or dealing with multiple opponents. In the context of reinforcement learning~(RL), these problems are addressed by (i)~designing a reward function that simultaneously describes all requirements or (ii)~combining modular value functions that encod…
▽ More
Autonomous agents must often deal with conflicting requirements, such as completing tasks using the least amount of time/energy, learning multiple tasks, or dealing with multiple opponents. In the context of reinforcement learning~(RL), these problems are addressed by (i)~designing a reward function that simultaneously describes all requirements or (ii)~combining modular value functions that encode them individually. Though effective, these methods have critical downsides. Designing good reward functions that balance different objectives is challenging, especially as the number of objectives grows. Moreover, implicit interference between goals may lead to performance plateaus as they compete for resources, particularly when training on-policy. Similarly, selecting parameters to combine value functions is at least as hard as designing an all-encompassing reward, given that the effect of their values on the overall policy is not straightforward. The later is generally addressed by formulating the conflicting requirements as a constrained RL problem and solved using Primal-Dual methods. These algorithms are in general not guaranteed to converge to the optimal solution since the problem is not convex. This work provides theoretical support to these approaches by establishing that despite its non-convexity, this problem has zero duality gap, i.e., it can be solved exactly in the dual domain, where it becomes convex. Finally, we show this result basically holds if the policy is described by a good parametrization~(e.g., neural networks) and we connect this result with primal-dual algorithms present in the literature and we establish the convergence to the optimal solution.
△ Less
Submitted 29 October, 2019;
originally announced October 2019.
-
Random Access Communication for Wireless Control Systems with Energy Harvesting Sensors
Authors:
Miguel Calvo-Fullana,
Carles Antón-Haro,
Javier Matamoros,
Alejandro Ribeiro
Abstract:
In this paper, we study wireless networked control systems in which the sensing devices are powered by energy harvesting. We consider a scenario with multiple plants, where the sensors communicate their measurements to their respective controllers over a shared wireless channel. Due to the shared nature of the medium, sensors transmitting simultaneously can lead to packet collisions. In order to d…
▽ More
In this paper, we study wireless networked control systems in which the sensing devices are powered by energy harvesting. We consider a scenario with multiple plants, where the sensors communicate their measurements to their respective controllers over a shared wireless channel. Due to the shared nature of the medium, sensors transmitting simultaneously can lead to packet collisions. In order to deal with this, we propose the use of random access communication policies and, to this end, we translate the control performance requirements to successful packet reception probabilities. The optimal scheduling decision is to transmit with a certain probability, which is adaptive to plant, channel and battery conditions. Moreover, we provide a stochastic dual method to compute the optimal scheduling solution, which is decoupled across sensors, with only some of the dual variables needed to be shared between nodes. Furthermore, we also consider asynchronicity in the values of the variables across sensor nodes and provide theoretical guarantees on the stability of the control systems under the proposed random access mechanism. Finally, we provide extensive numerical results that corroborate our claims.
△ Less
Submitted 30 January, 2018;
originally announced January 2018.
-
Stochastic Routing and Scheduling Policies for Energy Harvesting Communication Networks
Authors:
Miguel Calvo-Fullana,
Carles Antón-Haro,
Javier Matamoros,
Alejandro Ribeiro
Abstract:
In this paper, we study the joint routing-scheduling problem in energy harvesting communication networks. Our policies, which are based on stochastic subgradient methods on the dual domain, act as an energy harvesting variant of the stochastic family of backpresure algorithms. Specifically, we propose two policies: (i) the Stochastic Backpressure with Energy Harvesting (SBP-EH), in which a node's…
▽ More
In this paper, we study the joint routing-scheduling problem in energy harvesting communication networks. Our policies, which are based on stochastic subgradient methods on the dual domain, act as an energy harvesting variant of the stochastic family of backpresure algorithms. Specifically, we propose two policies: (i) the Stochastic Backpressure with Energy Harvesting (SBP-EH), in which a node's routing-scheduling decisions are determined by the difference between the Lagrange multipliers associated to their queue stability constraints and their neighbors'; and (ii) the Stochastic Soft Backpressure with Energy Harvesting (SSBP-EH), an improved algorithm where the routing-scheduling decision is of a probabilistic nature. For both policies, we show that given sustainable data and energy arrival rates, the stability of the data queues over all network nodes is guaranteed. Numerical results corroborate the stability guarantees and illustrate the minimal gap in performance that our policies offer with respect to classical ones which work with an unlimited energy supply.
△ Less
Submitted 2 November, 2017;
originally announced November 2017.