Search | arXiv e-print repository

Reinforcement Learning with an Abrupt Model Change

Authors: Wuxia Chen, Taposh Banerjee, Jemin George, Carl Busart

Abstract: The problem of reinforcement learning is considered where the environment or the model undergoes a change. An algorithm is proposed that an agent can apply in such a problem to achieve the optimal long-time discounted reward. The algorithm is model-free and learns the optimal policy by interacting with the environment. It is shown that the proposed algorithm has strong optimality properties. The e… ▽ More The problem of reinforcement learning is considered where the environment or the model undergoes a change. An algorithm is proposed that an agent can apply in such a problem to achieve the optimal long-time discounted reward. The algorithm is model-free and learns the optimal policy by interacting with the environment. It is shown that the proposed algorithm has strong optimality properties. The effectiveness of the algorithm is also demonstrated using simulation results. The proposed algorithm exploits a fundamental reward-detection trade-off present in these problems and uses a quickest change detection algorithm to detect the model change. Recommendations are provided for faster detection of model changes and for smart initialization strategies. △ Less

Submitted 22 April, 2023; originally announced April 2023.

arXiv:2211.01338 [pdf, other]

Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages

Authors: Anusha Prakash, Arun Kumar, Ashish Seth, Bhagyashree Mukherjee, Ishika Gupta, Jom Kuriakose, Jordan Fernandes, K V Vikram, Mano Ranjith Kumar M, Metilda Sagaya Mary, Mohammad Wajahat, Mohana N, Mudit Batra, Navina K, Nihal John George, Nithya Ravi, Pruthwik Mishra, Sudhanshu Srivastava, Vasista Sai Lodagala, Vandan Mujadia, Kada Sai Venkata Vineeth, Vrunda Sukhadia, Dipti Sharma, Hema Murthy, Pushpak Bhattacharya , et al. (2 additional authors not shown)

Abstract: Cross-lingual dubbing of lecture videos requires the transcription of the original audio, correction and removal of disfluencies, domain term discovery, text-to-text translation into the target language, chunking of text using target language rhythm, text-to-speech synthesis followed by isochronous lipsyncing to the original video. This task becomes challenging when the source and target languages… ▽ More Cross-lingual dubbing of lecture videos requires the transcription of the original audio, correction and removal of disfluencies, domain term discovery, text-to-text translation into the target language, chunking of text using target language rhythm, text-to-speech synthesis followed by isochronous lipsyncing to the original video. This task becomes challenging when the source and target languages belong to different language families, resulting in differences in generated audio duration. This is further compounded by the original speaker's rhythm, especially for extempore speech. This paper describes the challenges in regenerating English lecture videos in Indian languages semi-automatically. A prototype is developed for dubbing lectures into 9 Indian languages. A mean-opinion-score (MOS) is obtained for two languages, Hindi and Tamil, on two different courses. The output video is compared with the original video in terms of MOS (1-5) and lip synchronisation with scores of 4.09 and 3.74, respectively. The human effort also reduces by 75%. △ Less

Submitted 1 November, 2022; originally announced November 2022.

arXiv:2201.04962 [pdf, other]

Distributed Cooperative Multi-Agent Reinforcement Learning with Directed Coordination Graph

Authors: Gangshan Jing, He Bai, Jemin George, Aranya Chakrabortty, Piyush. K. Sharma

Abstract: Existing distributed cooperative multi-agent reinforcement learning (MARL) frameworks usually assume undirected coordination graphs and communication graphs while estimating a global reward via consensus algorithms for policy evaluation. Such a framework may induce expensive communication costs and exhibit poor scalability due to requirement of global consensus. In this work, we study MARLs with d… ▽ More Existing distributed cooperative multi-agent reinforcement learning (MARL) frameworks usually assume undirected coordination graphs and communication graphs while estimating a global reward via consensus algorithms for policy evaluation. Such a framework may induce expensive communication costs and exhibit poor scalability due to requirement of global consensus. In this work, we study MARLs with directed coordination graphs, and propose a distributed RL algorithm where the local policy evaluations are based on local value functions. The local value function of each agent is obtained by local communication with its neighbors through a directed learning-induced communication graph, without using any consensus algorithm. A zeroth-order optimization (ZOO) approach based on parameter perturbation is employed to achieve gradient estimation. By comparing with existing ZOO-based RL algorithms, we show that our proposed distributed RL algorithm guarantees high scalability. A distributed resource allocation example is shown to illustrate the effectiveness of our algorithm. △ Less

Submitted 9 January, 2022; originally announced January 2022.

arXiv:2107.12416 [pdf, other]

doi 10.1109/TAC.2024.3386061

Asynchronous Distributed Reinforcement Learning for LQR Control via Zeroth-Order Block Coordinate Descent

Authors: Gangshan Jing, He Bai, Jemin George, Aranya Chakrabortty, Piyush K. Sharma

Abstract: Recently introduced distributed zeroth-order optimization (ZOO) algorithms have shown their utility in distributed reinforcement learning (RL). Unfortunately, in the gradient estimation process, almost all of them require random samples with the same dimension as the global variable and/or require evaluation of the global cost function, which may induce high estimation variance for large-scale net… ▽ More Recently introduced distributed zeroth-order optimization (ZOO) algorithms have shown their utility in distributed reinforcement learning (RL). Unfortunately, in the gradient estimation process, almost all of them require random samples with the same dimension as the global variable and/or require evaluation of the global cost function, which may induce high estimation variance for large-scale networks. In this paper, we propose a novel distributed zeroth-order algorithm by leveraging the network structure inherent in the optimization objective, which allows each agent to estimate its local gradient by local cost evaluation independently, without use of any consensus protocol. The proposed algorithm exhibits an asynchronous update scheme, and is designed for stochastic non-convex optimization with a possibly non-convex feasible domain based on the block coordinate descent method. The algorithm is later employed as a distributed model-free RL algorithm for distributed linear quadratic regulator design, where a learning graph is designed to describe the required interaction relationship among agents in distributed learning. We provide an empirical validation of the proposed algorithm to benchmark its performance on convergence rate and variance against a centralized ZOO algorithm. △ Less

Submitted 2 May, 2024; v1 submitted 26 July, 2021; originally announced July 2021.

Comments: The arxiv version contains proofs of Lemma 3 and Lemma 5, which are missing in the published version

arXiv:2103.04480 [pdf, other]

Learning Distributed Stabilizing Controllers for Multi-Agent Systems

Authors: Gangshan Jing, He Bai, Jemin George, Aranya Chakrabortty, Piyush K. Sharma

Abstract: We address the problem of model-free distributed stabilization of heterogeneous multi-agent systems using reinforcement learning (RL). Two algorithms are developed. The first algorithm solves a centralized linear quadratic regulator (LQR) problem without knowing any initial stabilizing gain in advance. The second algorithm builds upon the results of the first algorithm, and extends it to distribut… ▽ More We address the problem of model-free distributed stabilization of heterogeneous multi-agent systems using reinforcement learning (RL). Two algorithms are developed. The first algorithm solves a centralized linear quadratic regulator (LQR) problem without knowing any initial stabilizing gain in advance. The second algorithm builds upon the results of the first algorithm, and extends it to distributed stabilization of multi-agent systems with predefined interaction graphs. Rigorous proofs are provided to show that the proposed algorithms achieve guaranteed convergence if specific conditions hold. A simulation example is presented to demonstrate the theoretical results. △ Less

Submitted 7 March, 2021; originally announced March 2021.

Comments: This paper propose model-free RL algorithms for deriving stabilizing gains of continuous-time multi-agent systems

arXiv:2010.08615 [pdf, other]

Decomposability and Parallel Computation of Multi-Agent LQR

Authors: Gangshan Jing, He Bai, Jemin George, Aranya Chakrabortty

Abstract: Individual agents in a multi-agent system (MAS) may have decoupled open-loop dynamics, but a cooperative control objective usually results in coupled closed-loop dynamics thereby making the control design computationally expensive. The computation time becomes even higher when a learning strategy such as reinforcement learning (RL) needs to be applied to deal with the situation when the agents dyn… ▽ More Individual agents in a multi-agent system (MAS) may have decoupled open-loop dynamics, but a cooperative control objective usually results in coupled closed-loop dynamics thereby making the control design computationally expensive. The computation time becomes even higher when a learning strategy such as reinforcement learning (RL) needs to be applied to deal with the situation when the agents dynamics are not known. To resolve this problem, we propose a parallel RL scheme for a linear quadratic regulator (LQR) design in a continuous-time linear MAS. The idea is to exploit the structural properties of two graphs embedded in the $Q$ and $R$ weighting matrices in the LQR objective to define an orthogonal transformation that can convert the original LQR design to multiple decoupled smaller-sized LQR designs. We show that if the MAS is homogeneous then this decomposition retains closed-loop optimality. Conditions for decomposability, an algorithm for constructing the transformation matrix, a parallel RL algorithm, and robustness analysis when the design is applied to non-homogeneous MAS are presented. Simulations show that the proposed approach can guarantee significant speed-up in learning without any loss in the cumulative value of the LQR cost. △ Less

Submitted 7 March, 2021; v1 submitted 16 October, 2020; originally announced October 2020.

Comments: This paper contains proofs of all the theorems in the conference paper "Decomposability and Parallel Computation of Multi-Agent LQR"

arXiv:2008.06604 [pdf, other]

Model-Free Optimal Control of Linear Multi-Agent Systems via Decomposition and Hierarchical Approximation

Authors: Gangshan Jing, He Bai, Jemin George, Aranya Chakrabortty

Abstract: Designing the optimal linear quadratic regulator (LQR) for a large-scale multi-agent system (MAS) is time-consuming since it involves solving a large-size matrix Riccati equation. The situation is further exasperated when the design needs to be done in a model-free way using schemes such as reinforcement learning (RL). To reduce this computational complexity, we decompose the large-scale LQR desig… ▽ More Designing the optimal linear quadratic regulator (LQR) for a large-scale multi-agent system (MAS) is time-consuming since it involves solving a large-size matrix Riccati equation. The situation is further exasperated when the design needs to be done in a model-free way using schemes such as reinforcement learning (RL). To reduce this computational complexity, we decompose the large-scale LQR design problem into multiple smaller-size LQR design problems. We consider the objective function to be specified over an undirected graph, and cast the decomposition as a graph clustering problem. The graph is decomposed into two parts, one consisting of independent clusters of connected components, and the other containing edges that connect different clusters. Accordingly, the resulting controller has a hierarchical structure, consisting of two components. The first component optimizes the performance of each independent cluster by solving the smaller-size LQR design problem in a model-free way using an RL algorithm. The second component accounts for the objective coupling different clusters, which is achieved by solving a least squares problem in one shot. Although suboptimal, the hierarchical controller adheres to a particular structure as specified by inter-agent couplings in the objective function and by the decomposition strategy. Mathematical formulations are established to find a decomposition that minimizes the number of required communication links or reduces the optimality gap. Numerical simulations are provided to highlight the pros and cons of the proposed designs. △ Less

Submitted 16 March, 2021; v1 submitted 14 August, 2020; originally announced August 2020.

Comments: This paper proposes a hierarchical learning and control framework for model-free LQR of heterogeneous linear multi-agent systems

arXiv:2008.05853 [pdf]

Massively Parallel Amplitude-Only Fourier Neural Network

Authors: Mario Miscuglio, Zibo Hu, Shurui Li, Jonathan George, Roberto Capanna, Philippe M. Bardet, Puneet Gupta, Volker J. Sorger

Abstract: Machine-intelligence has become a driving factor in modern society. However, its demand outpaces the underlying electronic technology due to limitations given by fundamental physics such as capacitive charging of wires, but also by system architecture of storing and handling data, both driving recent trends towards processor heterogeneity. Here we introduce a novel amplitude-only Fourier-optical p… ▽ More Machine-intelligence has become a driving factor in modern society. However, its demand outpaces the underlying electronic technology due to limitations given by fundamental physics such as capacitive charging of wires, but also by system architecture of storing and handling data, both driving recent trends towards processor heterogeneity. Here we introduce a novel amplitude-only Fourier-optical processor paradigm capable of processing large-scale ~(1,000 x 1,000) matrices in a single time-step and 100 microsecond-short latency. Conceptually, the information-flow direction is orthogonal to the two-dimensional programmable-network, which leverages 10^6-parallel channels of display technology, and enables a prototype demonstration performing convolutions as pixel-wise multiplications in the Fourier domain reaching peta operations per second throughputs. The required real-to-Fourier domain transformations are performed passively by optical lenses at zero-static power. We exemplary realize a convolutional neural network (CNN) performing classification tasks on 2-Megapixel large matrices at 10 kHz rates, which latency-outperforms current GPU and phase-based display technology by one and two orders of magnitude, respectively. Training this optical convolutional layer on image classification tasks and utilizing it in a hybrid optical-electronic CNN, shows classification accuracy of 98% (MNIST) and 54% (CIFAR-10). Interestingly, the amplitude-only CNN is inherently robust against coherence noise in contrast to phase-based paradigms and features an over 2 orders of magnitude lower delay than liquid crystal-based systems. Beyond contributing to novel accelerator technology, scientifically this amplitude-only massively-parallel optical compute-paradigm can be far-reaching as it de-validates the assumption that phase-information outweighs amplitude in optical processors for machine-intelligence. △ Less

Submitted 15 August, 2020; v1 submitted 13 August, 2020; originally announced August 2020.

arXiv:2007.14186 [pdf, ps, other]

Hierarchical Control of Multi-Agent Systems using Online Reinforcement Learning

Authors: He Bai, Jemin George, Aranya Chakrabortty

Abstract: We propose a new reinforcement learning based approach to designing hierarchical linear quadratic regulator (LQR) controllers for heterogeneous linear multi-agent systems with unknown state-space models and separated control objectives. The separation arises from grouping the agents into multiple non-overlapping groups, and defining the control goal as two distinct objectives. The first objective… ▽ More We propose a new reinforcement learning based approach to designing hierarchical linear quadratic regulator (LQR) controllers for heterogeneous linear multi-agent systems with unknown state-space models and separated control objectives. The separation arises from grouping the agents into multiple non-overlapping groups, and defining the control goal as two distinct objectives. The first objective aims to minimize a group-wise block-decentralized LQR function that models group-level mission. The second objective, on the other hand, tries to minimize an LQR function between the average states (centroids) of the groups. Exploiting this separation, we redefine the weighting matrices of the LQR functions in a way that they allow us to decouple their respective algebraic Riccati equations. Thereafter, we develop a reinforcement learning strategy that uses online measurements of the agent states and the average states to learn the respective controllers based on the approximate Riccati equations. Since the first controller is block-decentralized and, therefore, can be learned in parallel, while the second controller is reduced-dimensional due to averaging, the overall design enjoys a significantly reduced learning time compared to centralized reinforcement learning. △ Less

Submitted 28 July, 2020; originally announced July 2020.

arXiv:2006.08925 [pdf, other]

Improving the Performance of Deep Learning for Wireless Localization

Authors: Ramdoot Pydipaty, Johnu George, Krishna Selvaraju, Amit Saha

Abstract: Indoor localization systems are most commonly based on Received Signal Strength Indicator (RSSI) measurements of either WiFi or Bluetooth-Low-Energy (BLE) beacons. In such systems, the two most common techniques are trilateration and fingerprinting, with the latter providing higher accuracy. In the fingerprinting technique, Deep Learning (DL) algorithms are often used to predict the location of th… ▽ More Indoor localization systems are most commonly based on Received Signal Strength Indicator (RSSI) measurements of either WiFi or Bluetooth-Low-Energy (BLE) beacons. In such systems, the two most common techniques are trilateration and fingerprinting, with the latter providing higher accuracy. In the fingerprinting technique, Deep Learning (DL) algorithms are often used to predict the location of the receiver based on the RSSI measurements of multiple beacons received at the receiver. In this paper, we address two practical issues with applying Deep Learning to wireless localization -- transfer of solution from one wireless environment to another \emph{and} small size of labelled data set. First, we apply automatic hyperparameter optimization to a deep neural network (DNN) system for indoor wireless localization, which makes the system easy to port to new wireless environments. Second, we show how to augment a typically small labelled data set using the unlabelled data set. We observed improved performance in DL by applying the two techniques. Additionally, all relevant code has been made freely available. △ Less

Submitted 16 June, 2020; originally announced June 2020.

arXiv:1911.02511 [pdf]

doi 10.1002/adpr.202000033

Electronic Bottleneck Suppression in Next-generation Networks with Integrated Photonic Digital-to-analog Converters

Authors: Jiawei Meng, Mario Miscuglio, Jonathan K. George, Aydin Babakhani, Volker J. Sorger

Abstract: Digital-to-analog converters (DAC) are indispensable functional units in signal processing instrumentation and wide-band telecommunication links for both civil and military applications. Since photonic systems are capable of high data throughput and low latency, an increasingly found system limitation stems from the required domain-crossing such as digital-to-analog, and electronic-to-optical. A p… ▽ More Digital-to-analog converters (DAC) are indispensable functional units in signal processing instrumentation and wide-band telecommunication links for both civil and military applications. Since photonic systems are capable of high data throughput and low latency, an increasingly found system limitation stems from the required domain-crossing such as digital-to-analog, and electronic-to-optical. A photonic DAC implementation, in contrast, enables a seamless signal conversion with respect to both energy efficiency and short signal delay, often require bulky discrete optical components and electric-optic transformation hence introducing inefficiencies. Here, we introduce a novel coherent parallel photonic DAC concept along with an experimental demonstration capable of performing this digital-to-analog conversion without optic-electric-optic domain crossing. This design hence guarantees a linear intensity weighting among bits operating at high sampling rates, yet at a reduced footprint and power consumption compared to other photonic alternatives. Importantly, this photonic DAC could create seamless interfaces of next-generation data processing hardware for data-centers, task-specific compute accelerators such as neuromorphic engines, and network edge processing applications. △ Less

Submitted 22 December, 2019; v1 submitted 3 November, 2019; originally announced November 2019.

Journal ref: Advanced Photonics Research 2020, 2000033

arXiv:1909.10556 [pdf, ps, other]

Multi-Agent Coordination for Distributed Transmit Beamforming

Authors: Jemin George, Anjaly Parayil, He Bai

Abstract: This paper presents the formulation and analysis of a two time-scale optimization algorithm for multi-agent coordination for the purpose of distributed beamforming. Each agent is assumed to be randomly positioned with respect to each other with random phase offsets and amplitudes. Agents are tasked with coordinate among themselves to position themselves and adjust their phase offset and amplitude… ▽ More This paper presents the formulation and analysis of a two time-scale optimization algorithm for multi-agent coordination for the purpose of distributed beamforming. Each agent is assumed to be randomly positioned with respect to each other with random phase offsets and amplitudes. Agents are tasked with coordinate among themselves to position themselves and adjust their phase offset and amplitude such that they can construct a desired directed beam. Here we propose a two time-scale optimization algorithm that consists of a fast time-scale algorithm to solve for the amplitude and phase while a slow time-scale algorithm to solve for the control required to re-position the agents. The numerical results given here indicate that the proposed two time-scale approach is able to reconstruct a desired beam pattern. △ Less

Submitted 23 September, 2019; originally announced September 2019.

arXiv:1908.06693 [pdf, ps, other]

Distributed Stochastic Gradient Method for Non-Convex Problems with Applications in Supervised Learning

Authors: Jemin George, Tao Yang, He Bai, Prudhvi Gurram

Abstract: We develop a distributed stochastic gradient descent algorithm for solving non-convex optimization problems under the assumption that the local objective functions are twice continuously differentiable with Lipschitz continuous gradients and Hessians. We provide sufficient conditions on step-sizes that guarantee the asymptotic mean-square convergence of the proposed algorithm. We apply the develop… ▽ More We develop a distributed stochastic gradient descent algorithm for solving non-convex optimization problems under the assumption that the local objective functions are twice continuously differentiable with Lipschitz continuous gradients and Hessians. We provide sufficient conditions on step-sizes that guarantee the asymptotic mean-square convergence of the proposed algorithm. We apply the developed algorithm to a distributed supervised-learning problem, in which a set of networked agents collaboratively train their individual neural nets to recognize handwritten digits in images. Results indicate that all agents report similar performance that is also comparable to the performance of a centrally trained neural net. Numerical results also show that the proposed distributed algorithm allows the individual agents to recognize the digits even though the training data corresponding to all the digits is not locally available to each agent. △ Less

Submitted 19 August, 2019; originally announced August 2019.

arXiv:1805.08633 [pdf, ps, other]

The right way to teach the FFT

Authors: Jithin Donny George

Abstract: The algorithm behind the Fast Fourier Transform has a simple yet beautiful geometric interpretation that is often lost in translation in a classroom. This article provides a visual perspective which aims to capture the essence of it. The algorithm behind the Fast Fourier Transform has a simple yet beautiful geometric interpretation that is often lost in translation in a classroom. This article provides a visual perspective which aims to capture the essence of it. △ Less

Submitted 22 June, 2018; v1 submitted 21 May, 2018; originally announced May 2018.

Comments: Corrected typos and added more details

arXiv:1711.02500 [pdf]

Integrated All-Optical Fast Fourier Transform: Design and Sensitivity Analysis

Authors: Hani Nejadriahi, David HillerKuss, Jonathan K. George, Volker J. Sorger

Abstract: The fast Fourier transform, FFT, is a useful and prevalent algorithm in signal processing. It characterizes the spectral components of a signal, or is used in combination with other operations to perform more complex computations such as filtering, convolution, and correlation. Digital FFTs are limited in speed by the necessity of moving charge within logic gates. An analog temporal FFT in fiber o… ▽ More The fast Fourier transform, FFT, is a useful and prevalent algorithm in signal processing. It characterizes the spectral components of a signal, or is used in combination with other operations to perform more complex computations such as filtering, convolution, and correlation. Digital FFTs are limited in speed by the necessity of moving charge within logic gates. An analog temporal FFT in fiber optics has been demonstrated with highest data bandwidth. However, the implementation with discrete fiber optic FFT components is bulky. Here, we present and analyze a design of an optical FFT in Silicon photonics and evaluate its performance with respect to variations in phase and amplitude. We discuss the impact of the deployed devices on the FFTs transfer function quality as defined by the transmission output power as a function of frequency, detuning phase, optical delay, and loss. △ Less

Submitted 31 October, 2017; originally announced November 2017.

Showing 1–15 of 15 results for author: George, J